Contributors Forks Stargazers Issues

Updated on 2025.06.28

Usage instructions: here

diffusion

Publish Date Title Authors PDF Code
2025-06-24 Quantum Neural Networks for Propensity Score Estimation and Survival Analysis in Observational Biomedical Studies Vojtěch Novák et.al. 2506.19973 null
2025-06-18 Origins of Creativity in Attention-Based Diffusion Models Emma Finn et.al. 2506.17324 null
2025-06-17 Expressive Score-Based Priors for Distribution Matching with Geometry-Preserving Regularization Ziyu Gong et.al. 2506.14607 link
2025-06-12 Variance estimation after matching or re-weighting Xiang Meng et.al. 2506.11317 link
2025-06-09 Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels Davide Carbone et.al. 2506.07843 null
2025-06-06 Direct Fisher Score Estimation for Likelihood Maximization Sherman Khoo et.al. 2506.06542 null
2025-06-03 IGSM: Improved Geometric and Sensitivity Matching for Finetuning Pruned Diffusion Models Caleb Zheng et.al. 2506.05398 link
2025-06-05 Learning normalized image densities via dual score matching Florentin Guth et.al. 2506.05310 null
2025-06-02 An Introduction to Flow Matching and Diffusion Models Peter Holderrieth et.al. 2506.02070 null
2025-05-31 Score Matching With Missing Data Josh Givens et.al. 2506.00557 null
2025-05-29 Estimation of Gender Wage Gap in the University of North Carolina System Zihan Zhang et.al. 2505.24078 null
2025-05-29 Score-based Generative Modeling for Conditional Independence Testing Yixin Ren et.al. 2505.23309 link
2025-05-26 Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage Chenguang Wang et.al. 2505.19431 null
2025-05-14 Robust Knowledge Graph Embedding via Denoising Tengwei Song et.al. 2505.18171 null
2025-05-22 Learning non-equilibrium diffusions with Schrödinger bridges: from exactly solvable to simulation-free Stephen Y. Zhang et.al. 2505.16644 null
2025-05-20 Compositional amortized inference for large-scale hierarchical Bayesian models Jonas Arruda et.al. 2505.14429 null
2025-05-20 Extension of Dynamic Network Biomarker using the propensity score method: Simulation of causal effects on variance and correlation coefficient Satoru Shinoda et.al. 2505.13846 null
2025-05-19 Score-Based Training for Energy-Based TTS Models Wanli Sun et.al. 2505.13771 null
2025-05-16 Approximation and Generalization Abilities of Score-based Neural Network Generative Models for Sub-Gaussian Distributions Guoji Fu et.al. 2505.10880 null
2025-05-20 Whitened Score Diffusion: A Structured Prior for Imaging Inverse Problems Jeffrey Alido et.al. 2505.10311 link
2025-05-08 Score-based Self-supervised MRI Denoising Jiachen Tu et.al. 2505.05631 null
2025-05-08 Graffe: Graph Representation Learning via Diffusion Probabilistic Models Dingshuo Chen et.al. 2505.04956 null
2025-05-07 Localized Diffusion Models for High Dimensional Distributions Generation Georg A. Gottwald et.al. 2505.04417 null
2025-05-02 Incorporating Inductive Biases to Energy-based Generative Models Yukun Li et.al. 2505.01111 null
2025-04-29 Frequency Feature Fusion Graph Network For Depression Diagnosis Via fNIRS Chengkai Yang et.al. 2504.21064 null
2025-05-20 Coreset selection for the Sinkhorn divergence and generic smooth divergences Alex Kokot et.al. 2504.20194 link
2025-04-27 Generalized Score Matching: Bridging $f$ -Divergence and Statistical Estimation Under Correlated Noise Yirong Shen et.al. 2504.19288 null
2025-05-17 Score-Based Deterministic Density Sampling Vasily Ilin et.al. 2504.18130 null
2025-05-01 Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching Kewen Peng et.al. 2504.17066 null
2025-04-23 Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion Ruixiang Zhang et.al. 2504.16431 null
2025-04-22 InstaRevive: One-Step Image Enhancement via Dynamic Score Matching Yixuan Zhu et.al. 2504.15513 null
2025-04-16 Generalization through variance: how noise shapes inductive biases in diffusion models John J. Vastola et.al. 2504.12532 link
2025-04-15 Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination Mika Setälä et.al. 2504.12347 null
2025-04-14 Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems Karthik Elamvazhuthi et.al. 2504.09836 null
2025-04-13 Knowledge Independence Breeds Disruption but Limits Recognition Xiaoyao Yu et.al. 2504.09589 null
2025-04-07 DDPM Score Matching and Distribution Learning Sinho Chewi et.al. 2504.05161 null
2025-04-04 Gaussian Process Tilted Nonparametric Density Estimation using Fisher Divergence Score Matching John Paisley et.al. 2504.03485 null
2025-04-02 A Unified Approach to Analysis and Design of Denoising Markov Models Yinuo Ren et.al. 2504.01938 null
2025-03-31 Empirical Analysis of Digital Innovations Impact on Corporate ESG Performance: The Mediating Role of GAI Technology Jun Cui et.al. 2504.01041 null
2025-03-28 Demographic Factors Associated with Triage Acuity, Admission and Length of Stay During Adult Emergency Department Visits Helena Coggan et.al. 2503.22781 link
2025-03-27 Inequality Restricted Minimum Density Power Divergence Estimation in Panel Count Data Udita Goswami et.al. 2503.21534 null
2025-03-22 Solving Schrödinger bridge problem via continuous normalizing flow Yang Jing et.al. 2503.17829 link
2025-03-21 Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation Sophia Tang et.al. 2503.17361 null
2025-03-20 Improving Discriminator Guidance in Diffusion Models Alexandre Verine et.al. 2503.16117 null
2025-03-18 Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance Liya Guo et.al. 2503.14569 null
2025-03-14 From Denoising Score Matching to Langevin Sampling: A Fine-Grained Error Analysis in the Gaussian Setting Samuel Hurault et.al. 2503.11615 null
2025-03-21 Aligning Text to Image in Diffusion Models is Easier Than You Think Jaa-Yeon Lee et.al. 2503.08250 link
2025-03-26 Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Junzhe Li et.al. 2503.08120 null
2025-03-11 Computational bottlenecks for denoising diffusions Andrea Montanari et.al. 2503.08028 null
2025-03-09 Exponential-polynomial divergence based inference for nondestructive one-shot devices under progressive stress model Shanya Baghel et.al. 2503.06414 null
2025-03-09 Causal Discovery and Inference towards Urban Elements and Associated Factors Tao Feng et.al. 2503.06395 null
2025-03-04 Exact matching as an alternative to propensity score matching Ekkehard Glimm et.al. 2503.02850 null
2025-03-03 FlowDec: A flow-based full-band general audio codec with high perceptual quality Simon Welker et.al. 2503.01485 link
2025-03-02 Underdamped Diffusion Bridges with Applications to Sampling Denis Blessing et.al. 2503.01006 link
2025-02-27 Stein’s unbiased risk estimate and Hyvärinen’s score matching Sulagna Ghosh et.al. 2502.20123 null
2025-03-04 Agnostic calculation of atomic free energies with the descriptor density of states Thomas D Swinburne et.al. 2502.18191 link
2025-02-27 A Fokker-Planck-Based Loss Function that Bridges Dynamics with Density Estimation Zhixin Lu et.al. 2502.17690 null
2025-02-25 Generalization error bound for denoising score matching under relaxed manifold assumption Konstantin Yakovlev et.al. 2502.13662 null
2025-02-18 Score Matching Riemannian Diffusion Means Frederik Möbius Rygaard et.al. 2502.13106 null
2025-02-19 X-IL: Exploring the Design Space of Imitation Learning Policies Xiaogang Jia et.al. 2502.12330 link
2025-02-14 Dimension-free Score Matching and Time Bootstrapping for Diffusion Models Syamantak Kumar et.al. 2502.10354 null
2025-02-12 Concentration Inequalities for the Stochastic Optimization of Unbounded Objectives with Application to Denoising Score Matching Jeremiah Birrell et.al. 2502.08628 null
2025-02-07 Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images? Yujin Han et.al. 2502.04725 null
2025-02-05 Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization Yu-Han Wu et.al. 2502.03435 null
2025-02-04 Generative Modeling on Lie Groups via Euclidean Generalized Score Matching Marco Bertolini et.al. 2502.02513 null
2025-02-03 Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning Hanyang Zhao et.al. 2502.01819 null
2025-02-05 Weak-to-Strong Diffusion with Reflection Lichen Bai et.al. 2502.00473 null
2025-02-01 Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning Curves Anand Jerry George et.al. 2502.00336 null
2025-02-13 Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans Christian Wald et.al. 2501.16839 null
2025-01-27 EDSep: An Effective Diffusion-Based Method for Speech Source Separation Jinwei Dong et.al. 2501.15965 null
2025-01-27 Memorization and Regularization in Generative Diffusion Models Ricardo Baptista et.al. 2501.15785 link
2025-01-24 Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation Anish Abhijit Diwan et.al. 2501.14856 null
2025-01-23 MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance Wooseok Song et.al. 2501.13449 null
2025-01-22 Sequential Change Point Detection via Denoising Score Matching Wenbin Zhou et.al. 2501.12667 null
2025-01-13 Likelihood Training of Cascaded Diffusion Models via Hierarchical Volume-preserving Maps Henry Li et.al. 2501.06999 link
2025-01-10 Explainable Federated Bayesian Causal Inference and Its Application in Advanced Manufacturing Xiaofeng Xiao et.al. 2501.06077 link
2025-01-09 Propensity score matching in semaglutide retrospective studies Elizabeth Mohney et.al. 2501.05533 null
2025-01-09 Robust Score Matching Richard Schwank et.al. 2501.05105 null
2024-12-28 An analytic theory of creativity in convolutional diffusion models Mason Kamb et.al. 2412.20292 null
2024-12-18 Catalysts of Conversation: Examining Interaction Dynamics Between Topic Initiators and Commentors in Alzheimer’s Disease Online Communities Congning Ni et.al. 2412.13388 null
2024-12-19 Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation Bofang Jia et.al. 2412.09265 null
2024-12-10 Score Change of Variables Stephen Robbins et.al. 2412.07904 null
2024-12-10 Score-matching-based Structure Learning for Temporal Data on Networks Hao Chen et.al. 2412.07469 null
2024-12-09 Improving Source Extraction with Diffusion and Consistency Models Tornike Karchkhadze et.al. 2412.06965 link
2024-12-09 Generative Lines Matching Models Ori Matityahu et.al. 2412.06403 null
2024-12-06 Local Curvature Smoothing with Stein’s Identity for Efficient Score Matching Genki Osada et.al. 2412.03962 null
2024-12-03 How to Use Diffusion Priors under Sparse Views? Qisen Wang et.al. 2412.02225 link
2024-11-29 Pretrained Reversible Generation as Unsupervised Visual Representation Learning Rongkun Xue et.al. 2412.01787 null
2024-11-29 Riemannian Denoising Score Matching for Molecular Structure Optimization with Accurate Energy Jeheon Woo et.al. 2411.19769 null
2024-11-27 Building Confidence in Deep Generative Protein Design Tianyuan Zheng et.al. 2411.18568 link
2024-11-20 Comprehensive Methodology for Sample Augmentation in EEG Biomarker Studies for Alzheimers Risk Classification Veronica Henao Isaza et.al. 2411.17717 null
2024-11-14 Propensity Score Matching: Should We Use It in Designing Observational Studies? Fei Wan et.al. 2411.09579 null
2024-11-14 Efficiently learning and sampling multimodal distributions with data-based initialization Frederic Koehler et.al. 2411.09117 null
2024-11-13 Parameter Inference via Differentiable Diffusion Bridge Importance Sampling Nicklas Boserup et.al. 2411.08993 link
2024-11-02 Supervised Score-Based Modeling by Gradient Boosting Changyuan Zhao et.al. 2411.01159 null
2024-10-31 TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt Jiahui Yang et.al. 2410.21299 null
2024-10-27 Hamiltonian Score Matching and Generative Flows Peter Holderrieth et.al. 2410.20470 null
2024-10-25 Dimension reduction via score ratio matching Ricardo Baptista et.al. 2410.19990 null
2024-10-23 Semi-Implicit Functional Gradient Flow Shiyue Zhang et.al. 2410.17935 null
2024-10-18 Mitigating Embedding Collapse in Diffusion Models for Categorical Data Bac Nguyen et.al. 2410.14758 null
2024-10-17 Diffusing States and Matching Scores: A New Framework for Imitation Learning Runzhe Wu et.al. 2410.13855 link
2024-10-15 l_inf-approximation of localized distributions Tiangang Cui et.al. 2410.11771 null
2024-10-14 High-Dimensional Differential Parameter Inference in Exponential Family using Time Score Matching Daniel J. Williams et.al. 2410.10637 link
2024-10-21 On Divergence Measures for Training GFlowNets Tiago da Silva et.al. 2410.09355 null
2024-10-11 Linear Convergence of Diffusion Models Under the Manifold Hypothesis Peter Potaptchik et.al. 2410.09046 null
2024-10-15 Score Neural Operator: A Generative Model for Learning and Generalizing Across Multiple Probability Distributions Xinyu Liao et.al. 2410.08549 null
2024-10-05 Is Score Matching Suitable for Estimating Point Processes? Haoqun Cao et.al. 2410.04037 link
2024-10-04 Classification-Denoising Networks Louis Thiry et.al. 2410.03505 null
2024-10-02 Equivariant score-based generative models provably learn distributions with symmetries efficiently Ziyu Chen et.al. 2410.01244 null
2024-10-01 Generative Precipitation Downscaling using Score-based Diffusion with Wasserstein Regularization Yuhao Liu et.al. 2410.00381 null
2024-09-12 Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning Hanyang Zhao et.al. 2409.08400 null
2024-09-11 From optimal score matching to optimal sampling Zehao Dou et.al. 2409.07032 null
2024-09-02 Highly Accurate Real-space Electron Densities with Neural Networks Lixue Cheng et.al. 2409.01306 null
2024-08-29 A Score-Based Density Formula, with Applications in Diffusion Generative Models Gen Li et.al. 2408.16765 null
2024-08-27 Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging Yuanhao Li et.al. 2408.14843 null
2024-08-26 Evaluating the effectiveness of public policies on COVID-19 containment: A PSM-DID approach Zihan Wang et.al. 2408.14108 link
2024-08-22 Variance reduction of diffusion model’s gradients with Taylor approximation-based control variate Paul Jeha et.al. 2408.12270 null
2024-08-21 MR Optimized Reconstruction of Simultaneous Multi-Slice Imaging Using Diffusion Model Ting Zhao et.al. 2408.08883 null
2024-08-13 A comparison of methods for estimating the average treatment effect on the treated for externally controlled trials Huan Wang et.al. 2408.07193 null
2024-08-09 Bootstrap Matching: a robust and efficient correction for non-random A/B test, and its applications Zihao Zheng et.al. 2408.05297 null
2024-07-26 Score matching through the roof: linear, nonlinear, and latent variables causal discovery Francesco Montagna et.al. 2407.18755 null
2024-07-24 Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference Jian Xu et.al. 2407.17033 null
2024-07-22 Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond Silvio Galesso et.al. 2407.15739 link
2024-07-23 Score matching for bridges without time-reversals Elizabeth L. Baker et.al. 2407.15455 link
2024-08-12 Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes Alexandre Abraham et.al. 2407.14861 null
2024-07-12 Learning Distances from Data with Normalizing Flows and Score Matching Peter Sorrenson et.al. 2407.09297 null
2024-07-10 What’s the score? Automated Denoising Score Matching for Nonlinear Diffusions Raghav Singhal et.al. 2407.07998 null
2024-06-15 Stein’s Method of Moments on the Sphere Adrian Fischer et.al. 2407.02299 null
2024-07-01 Learning data efficient coarse-grained molecular dynamics from forces and noise Aleksander E. P. Durumeric et.al. 2407.01286 link
2024-07-18 Localizing Anomalies via Multiscale Score Matching Analysis Ahsan Mahmood et.al. 2407.00148 link
2024-06-20 A Practical Diffusion Path for Sampling Omar Chehab et.al. 2406.14040 null
2024-07-25 Evaluating the design space of diffusion-based generative models Yuqing Wang et.al. 2406.12839 null
2024-06-17 Score-fPINN: Fractional Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck-Levy Equations Zheyuan Hu et.al. 2406.11676 null
2024-06-13 Operator-informed score matching for Markov diffusion models Zheyang Shen et.al. 2406.09084 null
2024-06-13 Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency Maor Dikter et.al. 2406.08840 link
2024-06-12 CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models Hyungjin Chung et.al. 2406.08070 null
2024-06-11 DualBind: A Dual-Loss Framework for Protein-Ligand Binding Affinity Prediction Meng Liu et.al. 2406.07770 null
2024-06-08 Mean-field Chaos Diffusion Models Sungwoo Park et.al. 2406.05396 null
2024-06-07 Combinatorial Complex Score-based Diffusion Modelling through Stochastic Differential Equations Adrien Carrel et.al. 2406.04916 link
2024-06-07 Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study Chong Zhang et.al. 2406.04633 null
2024-06-04 Democratizing Propensity Score Matching Using Web Application Adam Gajtkowski et.al. 2406.02743 null
2024-06-03 Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation Mingyuan Zhou et.al. 2406.01561 link
2024-05-29 Kernel Semi-Implicit Variational Inference Ziheng Cheng et.al. 2405.18997 link
2024-06-06 Simulating infinite-dimensional nonlinear diffusion bridges Gefan Yang et.al. 2405.18353 link
2024-05-24 ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching Yumin Zhang et.al. 2405.15914 link
2024-05-24 Score-based generative models are provably robust: an uncertainty quantification perspective Nikiforos Mimikos-Stamatopoulos et.al. 2405.15754 null
2024-05-24 Nonlinear denoising score matching for enhanced learning of structured distributions Jeremiah Birrell et.al. 2405.15625 null
2024-05-18 Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Xingyu Miao et.al. 2405.11252 link
2024-05-17 High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates Janick Weberpals et.al. 2405.10925 null
2024-05-15 Response Matching for generating materials and molecules Bingqing Cheng et.al. 2405.09057 null
2024-05-08 A score-based particle method for homogeneous Landau equation Yan Huang et.al. 2405.05187 null
2024-05-06 Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models Ludwig Winkler et.al. 2405.03549 null
2024-05-05 Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction Xiaoyu Qiao et.al. 2405.02958 null
2024-05-03 SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning Qian Long et.al. 2405.01839 null
2024-04-29 Learning general Gaussian mixtures with efficient score matching Sitan Chen et.al. 2404.18893 null
2024-04-24 Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations Kaiwen Xue et.al. 2404.15766 link
2024-04-23 Score matching for sub-Riemannian bridge sampling Erlend Grong et.al. 2404.15258 null
2024-04-20 A Massive MIMO Sampling Detection Strategy Based on Denoising Diffusion Model Lanxin He et.al. 2404.13281 null
2024-04-22 Generative Modelling with High-Order Langevin Dynamics Ziqiang Shi et.al. 2404.12814 null
2024-04-16 Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution Yutao Yuan et.al. 2404.10688 link
2024-04-18 Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models Qi Guo et.al. 2404.10335 link
2024-04-15 Convergence Analysis of Probability Flow ODE for Score-based Generative Models Daniel Zhengyu Huang et.al. 2404.09730 link
2024-03-25 The Impact of Pradhan Mantri Ujjwala Yojana on Indian Households Nabeel Asharaf et.al. 2403.17112 null
2024-03-25 Optimal convex $M$ -estimation via score matching Oliver Y. Feng et.al. 2403.16688 null
2024-03-21 MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection Jakub Micorek et.al. 2403.14497 link
2024-03-10 Propensity-score matching analysis in COVID-19-related studies: a method and quality systematic review Chunhui Gu et.al. 2403.07023 null
2024-03-10 UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation Kwanyoung Kim et.al. 2403.06275 null
2024-03-04 Soft-constrained Schrodinger Bridge: a Stochastic Control Approach Jhanvi Garg et.al. 2403.01717 null
2024-03-02 Re-evaluating the impact of hormone replacement therapy on heart disease using match-adaptive randomization inference Samuel D. Pimentel et.al. 2403.01330 null
2024-03-02 Training Unbiased Diffusion Models From Biased Dataset Yeongmin Kim et.al. 2403.01189 link
2024-03-04 Structure-Guided Adversarial Training of Diffusion Models Ling Yang et.al. 2402.17563 null
2024-02-27 Label-Noise Robust Diffusion Models Byeonghu Na et.al. 2402.17517 link
2024-02-23 The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling Jiajun Ma et.al. 2402.15170 null
2024-02-13 Space-Time Bridge-Diffusion Hamidreza Behjoo et.al. 2402.08847 null
2024-02-13 Target Score Matching Valentin De Bortoli et.al. 2402.08667 null
2024-02-23 Score-based generative models break the curse of dimensionality in learning a family of sub-Gaussian probability distributions Frank Cole et.al. 2402.08082 null
2024-02-12 Optimal score estimation via empirical Bayes smoothing Andre Wibisono et.al. 2402.07747 null
2024-02-12 Score-based Diffusion Models via Stochastic Differential Equations – a Technical Tutorial Wenpin Tang et.al. 2402.07487 null
2024-02-12 Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck Equations Zheyuan Hu et.al. 2402.07465 null
2024-02-09 Particle Denoising Diffusion Sampler Angus Phillips et.al. 2402.06320 link
2024-02-09 Iterated Denoising Energy Matching for Sampling from Boltzmann Densities Tara Akhound-Sadegh et.al. 2402.06121 link
2024-02-08 Time Series Diffusion in the Frequency Domain Jonathan Crabbé et.al. 2402.05933 link
2024-02-06 Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization Fangzhao Zhang et.al. 2402.01965 null
2024-02-02 Conditioning non-linear and infinite-dimensional diffusion processes Elizabeth Louise Baker et.al. 2402.01434 link
2024-01-29 Domain adaptation strategies for 3D reconstruction of the lumbar spine using real fluoroscopy data Sascha Jecklin et.al. 2401.16027 null
2024-02-06 Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization Yinbin Han et.al. 2401.15604 null
2024-02-04 Free public transport to the destination: A causal analysis of tourists’ travel mode choice Kevin Blättler et.al. 2401.14945 null
2024-01-23 Contractive Diffusion Probabilistic Models Wenpin Tang et.al. 2401.13115 null
2024-01-22 ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter Yi-Chiao Wu et.al. 2401.12160 null
2024-01-14 Score-matching neural networks for improved multi-band source separation Matt L. Sampson et.al. 2401.07313 link
2024-01-04 Bring Metric Functions into Diffusion Models Jie An et.al. 2401.02414 null
2024-01-19 Diffusion Model with Perceptual Loss Shanchuan Lin et.al. 2401.00110 null
2024-01-04 High-Fidelity Diffusion-based Image Editing Chen Hou et.al. 2312.15707 null
2023-12-16 Bayes-Optimal Unsupervised Learning for Channel Estimation in Near-Field Holographic MIMO Wentao Yu et.al. 2312.10438 null
2023-12-16 Continuous Diffusion for Mixed-Type Tabular Data Markus Mueller et.al. 2312.10431 link
2023-12-14 Noise in the reverse process improves the approximation capabilities of diffusion models Karthik Elamvazhuthi et.al. 2312.07851 null
2023-12-11 Adversarial Estimation of Topological Dimension with Harmonic Score Maps Eric Yeats et.al. 2312.06869 null
2023-12-09 The New Age of Collusion? An Empirical Study into Airbnb’s Pricing Dynamics and Market Behavior Richeng Piao et.al. 2312.05633 null
2023-12-14 Stochastic Optimal Control Matching Carles Domingo-Enrich et.al. 2312.02027 link
2023-11-30 SMaRt: Improving GANs with Score Matching Regularity Mengfei Xia et.al. 2311.18208 null
2023-11-23 Sample-Efficient Training for Diffusion Shivam Gupta et.al. 2311.13745 null
2023-12-02 LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching Yixun Liang et.al. 2311.11284 link
2023-11-14 Learning Bayes-Optimal Channel Estimation for Holographic MIMO in Unknown EM Environments Wentao Yu et.al. 2311.07908 null
2023-11-10 FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores Daniel Y. Fu et.al. 2311.05908 null
2023-10-30 Scaling Riemannian Diffusion Models Aaron Lou et.al. 2310.20030 null
2023-10-27 Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling Zhenyu Zhu et.al. 2310.18123 null
2023-10-26 Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration Longlin Yu et.al. 2310.17153 link
2023-10-25 Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution Aaron Lou et.al. 2310.16834 link
2023-10-25 SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process Zichong Li et.al. 2310.16336 link
2023-10-25 Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification Zichong Li et.al. 2310.16310 null
2023-10-22 Shortcuts for causal discovery of nonlinear models by score matching Francesco Montagna et.al. 2310.14246 null
2023-11-14 On propensity score matching with a diverging number of matches Yihui He et.al. 2310.14142 link
2023-10-20 Assumption violations in causal discovery and the robustness of score matching Francesco Montagna et.al. 2310.13387 link
2023-10-19 Closed-Form Diffusion Models Christopher Scarvelis et.al. 2310.12395 null
2023-10-17 Sadness, Anger, or Anxiety: Twitter Users’ Emotional Responses to Toxicity in Public Conversations Ana Aleksandric et.al. 2310.11436 null
2023-10-12 Debias the Training of Diffusion Models Hu Yu et.al. 2310.08442 link
2023-10-09 Integration-free Training for Spatio-temporal Multimodal Covariate Deep Kernel Point Processes Yixuan Zhang et.al. 2310.05485 null
2023-10-06 Generative Diffusion From An Action Principle Akhil Premkumar et.al. 2310.04490 null
2023-10-09 Diffusion Random Feature Model Esha Saha et.al. 2310.04417 null
2023-10-04 On Memorization in Diffusion Models Xiangming Gu et.al. 2310.02664 link
2023-10-03 Stochastic force inference via density estimation Victor Chardès et.al. 2310.02366 null
2023-10-01 Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion Dongjun Kim et.al. 2310.02279 link
2023-10-03 Sampling Multimodal Distributions with the Vanilla Score: Benefits of Data-Based Initialization Frederic Koehler et.al. 2310.01762 null
2023-09-29 EPiC-ly Fast Particle Cloud Generation with Flow-Matching and Diffusion Erik Buhmann et.al. 2310.00049 null
2023-09-28 Bayesian Cramér-Rao Bound Estimation with Score-Based Models Evan Scope Crafts et.al. 2309.16076 null
2023-09-20 Score Mismatching for Generative Modeling Senmao Ye et.al. 2309.11043 link
2023-09-18 Sex-based Disparities in Brain Aging: A Focus on Parkinson’s Disease Iman Beheshti et.al. 2309.10069 null
2023-09-18 Single and Few-step Diffusion for Generative Speech Enhancement Bunlong Lay et.al. 2309.09677 link
2023-09-06 Matcha-TTS: A fast TTS architecture with conditional flow matching Shivam Mehta et.al. 2309.03199 link
2023-08-29 MadSGM: Multivariate Anomaly Detection with Score-based Generative Models Haksoo Lim et.al. 2308.15069 null
2023-08-24 Machine Unlearning for Causal Inference Vikas Ramachandra et.al. 2308.13559 null
2023-08-22 Expressive probabilistic sampling in recurrent neural networks Shirui Chen et.al. 2308.11809 link
2023-08-22 Convergence guarantee for consistency models Junlong Lyu et.al. 2308.11449 null
2023-08-31 Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models Heyang Xue et.al. 2308.10428 null
2023-08-19 Semi-Implicit Variational Inference via Score Matching Longlin Yu et.al. 2308.10014 link
2023-08-07 Equity in Focus : Investigating Gender Disparities in Glioblastoma via Propensity Score Matching Solomon Eshun et.al. 2308.03827 null
2023-08-07 A Causal Inference Approach to Eliminate the Impacts of Interfering Factors on Traffic Performance Evaluation Xiaobo Ma et.al. 2308.03545 null
2023-08-04 Diffusion probabilistic models enhance variational autoencoder for crystal structure generative modeling Teerachote Pakornchote et.al. 2308.02165 null
2023-08-03 Estimating causal quantile exposure response functions via matching Luca Merlo et.al. 2308.01628 null
2023-08-01 Causal exposure-response curve estimation with surrogate confounders: a study of air pollution and children’s health in Medicaid claims data Jenny J. Lee et.al. 2308.00812 link
2023-07-25 Implicitly Normalized Explicitly Regularized Density Estimation Mark Kozdoba et.al. 2307.13763 null
2023-07-20 Analysis of the rate of force development reveals high neuromuscular fatigability in elderly patients with chronic kidney disease Antoine Chatrenet et.al. 2307.10691 null
2023-07-15 Variational Inference with Gaussian Score Matching Chirag Modi et.al. 2307.07849 link
2023-07-12 Energy Discrepancies: A Score-Independent Loss for Energy-Based Models Tobias Schröder et.al. 2307.06431 link
2023-07-07 Simulation-free Schrödinger bridges via score and flow matching Alexander Tong et.al. 2307.03672 link
2023-07-02 MissDiff: Training Diffusion Models on Tabular Data with Missing Values Yidong Ouyang et.al. 2307.00467 null
2023-06-24 Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching H. J. Terry Suh et.al. 2306.14079 null
2023-08-03 Masked Diffusion Models Are Fast and Privacy-Aware Learners Jiachen Lei et.al. 2306.11363 link
2023-06-20 Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains Yilong Qin et.al. 2306.09332 null
2023-06-15 Fast Training of Diffusion Models with Masked Transformers Hongkai Zheng et.al. 2306.09305 link
2023-06-15 Training Diffusion Classifiers with Denoising Assistance Chandramouli Sastry et.al. 2306.09192 null
2023-06-23 Image Reconstruction from Sparse Low-Dose CT Data via Score Matching Wenxiang Cong et.al. 2306.08610 null
2023-06-13 Inactivated COVID-19 Vaccination did not affect In vitro fertilization (IVF) / Intra-Cytoplasmic Sperm Injection (ICSI) cycle outcomes Qi Wan et.al. 2306.07652 null
2023-06-07 ScoreCL: Augmentation-Adaptive Contrastive Learning via Score-Matching Function JinYoung Kim et.al. 2306.04175 null
2023-06-05 Machine Learning Force Fields with Data Cost Aware Training Alexander Bukharin et.al. 2306.03109 link
2023-06-05 Faster Training of Diffusion Models and Improved Density Estimation via Parallel Score Matching Etrit Haxholli et.al. 2306.02658 null

generation

Publish Date Title Authors PDF Code
2025-06-26 SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture Kehan Sui et.al. 2506.21478 null
2025-06-26 XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Bowen Chen et.al. 2506.21416 null
2025-06-26 GenFlow: Interactive Modular System for Image Generation Duc-Hung Nguyen et.al. 2506.21369 null
2025-06-26 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Hongbo Liu et.al. 2506.21356 null
2025-06-26 HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation Diego Biagini et.al. 2506.21287 null
2025-06-26 Video Virtual Try-on with Conditional Diffusion Transformer Inpainter Cheng Zou et.al. 2506.21270 null
2025-06-26 BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models Louis Kerner et.al. 2506.21209 null
2025-06-26 Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation Ze Wang et.al. 2506.21022 null
2025-06-26 HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation Qingyue Jiao et.al. 2506.21015 null
2025-06-26 Rethink Sparse Signals for Pose-guided Text-to-image Generation Wenjie Xuan et.al. 2506.20983 null
2025-06-25 Video Perception Models for 3D Scene Synthesis Rui Huang et.al. 2506.20601 null
2025-06-25 HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling Tobias Vontobel et.al. 2506.20452 null
2025-06-25 Med-Art: Diffusion Transformer for 2D Medical Text-to-Image Generation Changlu Guo et.al. 2506.20449 null
2025-06-25 EAR: Erasing Concepts from Unified Autoregressive Models Haipeng Fan et.al. 2506.20151 null
2025-06-25 BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos Jiahao Lin et.al. 2506.20103 null
2025-06-24 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Xingyang Li et.al. 2506.19852 null
2025-06-24 GenHSI: Controllable Generation of Human-Scene Interaction Videos Zekun Li et.al. 2506.19840 null
2025-06-24 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution Liangbin Xie et.al. 2506.19838 null
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833 null
2025-06-24 Varif.ai to Vary and Verify User-Driven Diversity in Scalable Image Generation M. Michelessa et.al. 2506.19644 null
2025-06-24 Stylized Structural Patterns for Improved Neural Network Pre-training Farnood Salehi et.al. 2506.19465 null
2025-06-24 Enhancing Galaxy Classification with U-Net Variational Autoencoders for Image Denoising Sergey Mirzoyan et.al. 2506.19434 null
2025-06-24 SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation Yunsung Chung et.al. 2506.19360 null
2025-06-24 Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation Jintao Rong et.al. 2506.19348 null
2025-06-24 Style Transfer: A Decade Survey Tianshan Zhang et.al. 2506.19278 null
2025-06-23 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory Runjia Li et.al. 2506.18903 null
2025-06-23 From Virtual Games to Real-World Play Wenqiang Sun et.al. 2506.18901 null
2025-06-23 FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation Kaiyi Huang et.al. 2506.18899 null
2025-06-23 MinD: Unified Visual Imagination and Control via Hierarchical World Models Xiaowei Chi et.al. 2506.18897 null
2025-06-23 OmniGen2: Exploration to Advanced Multimodal Generation Chenyuan Wu et.al. 2506.18871 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866 null
2025-06-23 TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting Zhongbin Guo et.al. 2506.18862 null
2025-06-23 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset Zhuowei Chen et.al. 2506.18851 null
2025-06-23 Matrix-Game: Interactive World Foundation Model Yifan Zhang et.al. 2506.18701 null
2025-06-23 RDPO: Real Data Preference Optimization for Physics Consistency Video Generation Wenxu Qian et.al. 2506.18655 null
2025-06-23 Emergent Temporal Correspondences from Video Diffusion Transformers Jisu Nam et.al. 2506.17220 link
2025-06-20 Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Zeyuan Yang et.al. 2506.17218 null
2025-06-20 DreamCube: 3D Panorama Generation via Multi-plane Synchronization Yukun Huang et.al. 2506.17206 null
2025-06-20 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Jiaqi Li et.al. 2506.17201 null
2025-06-20 The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation Giulia Bertazzini et.al. 2506.17016 null
2025-06-20 AI’s Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario Ciro Beneduce et.al. 2506.16898 null
2025-06-20 Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models Semin Kim et.al. 2506.16853 null
2025-06-20 FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation Fan Yang et.al. 2506.16806 null
2025-06-20 Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation Riccardo Corvi et.al. 2506.16802 null
2025-06-20 PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model Beomseok Ko et.al. 2506.16776 null
2025-06-18 Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model Anirud Aggarwal et.al. 2506.15682 link
2025-06-20 Sekai: A Video Dataset towards World Exploration Zhen Li et.al. 2506.15675 null
2025-06-20 Show-o2: Improved Native Unified Multimodal Models Jinheng Xie et.al. 2506.15564 link
2025-06-18 Control and Realism: Best of Both Worlds in Layout-to-Image without Training Bonan Li et.al. 2506.15563 null
2025-06-18 GalaxyGenius: A Mock Galaxy Image Generator for Various Telescopes from Hydrodynamical Simulations Xingchen Zhou et.al. 2506.15060 null
2025-06-17 Frequency-Calibrated Membership Inference Attacks on Medical Image Diffusion Models Xinkai Zhao et.al. 2506.14919 null
2025-06-17 DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization Renjith Prasad et.al. 2506.14903 null
2025-06-17 The Quasi-Radial Field-line Tracing (QRaFT): an Adaptive Segmentation of the Open-Flux Solar Corona Vadim M. Uritsky et.al. 2506.14894 null
2025-06-17 Cost-Aware Routing for Efficient Text-To-Image Generation Qinchan et.al. 2506.14753 null
2025-06-17 Align Your Flow: Scaling Continuous-Time Flow Map Distillation Amirmojtaba Sabour et.al. 2506.14603 null
2025-06-17 Risk Estimation of Knee Osteoarthritis Progression via Predictive Multi-task Modelling from Efficient Diffusion Model using X-ray Images David Butler et.al. 2506.14560 null
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 null
2025-06-17 Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models Tian Xia et.al. 2506.14399 null
2025-06-17 CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation Jia-Chen Zhang et.al. 2506.14206 null
2025-06-17 DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion Makoto Shing et.al. 2506.14202 null
2025-06-18 VideoMAR: Autoregressive Video Generatio with Continuous Tokens Hu Yu et.al. 2506.14168 null
2025-06-16 UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions Zhucun Xue et.al. 2506.13691 null
2025-06-16 Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention Jeonghoon Park et.al. 2506.13298 null
2025-06-16 STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation Jiamin Wang et.al. 2506.13138 null
2025-06-15 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer Zhelun Shen et.al. 2506.12847 null
2025-06-14 Retrieval Augmented Comic Image Generation Yunhao Shui et.al. 2506.12517 null
2025-06-14 Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity Mohsen Jenadeleh et.al. 2506.12505 null
2025-06-14 Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback Janet Wang et.al. 2506.12323 null
2025-06-13 Exploring the Effectiveness of Deep Features from Domain-Specific Foundation Models in Retinal Image Synthesis Zuzanna Skorniewska et.al. 2506.11753 null
2025-06-13 SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation Xu Wang et.al. 2506.11621 null
2025-06-13 A Watermark for Auto-Regressive Image Generation Models Yihan Wu et.al. 2506.11371 null
2025-06-12 GenWorld: Towards Detecting AI-generated Real-world Simulation Videos Weiliang Chen et.al. 2506.10975 null
2025-06-13 MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning Yuxuan Luo et.al. 2506.10963 null
2025-06-12 The Role of Generative AI in Facilitating Social Interactions: A Scoping Review T. T. J. E. Arets et.al. 2506.10927 null
2025-06-12 M4V: Multi-Modal Mamba for Text-to-Video Generation Jiancheng Huang et.al. 2506.10915 null
2025-06-12 GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning Xiaoyi Bao et.al. 2506.10639 null
2025-06-12 Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models Francisco Caetano et.al. 2506.10634 null
2025-06-12 High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model Eshan Ramesh et.al. 2506.10605 null
2025-06-12 Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning Chun-Mei Feng et.al. 2506.10575 null
2025-06-12 Unitary Scrambling and Collapse: A Quantum Diffusion Framework for Generative Modeling Yihua Li et.al. 2506.10571 link
2025-06-12 DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers Lizhen Wang et.al. 2506.10568 null
2025-06-11 PlayerOne: Egocentric World Simulator Yuanpeng Tu et.al. 2506.09995 null
2025-06-11 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Zhenzhi Wang et.al. 2506.09984 null
2025-06-11 ReSim: Reliable World Simulation for Autonomous Driving Jiazhi Yang et.al. 2506.09981 null
2025-06-11 Canonical Latent Representations in Conditional Diffusion Models Yitao Xu et.al. 2506.09955 null
2025-06-11 HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations Marco Federici et.al. 2506.09932 null
2025-06-11 Only-Style: Stylistic Consistency in Image Generation without Content Leakage Tilemachos Aravanis et.al. 2506.09916 link
2025-06-11 ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models Qin Zhou et.al. 2506.09740 null
2025-06-11 DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning Dongxu Liu et.al. 2506.09644 null
2025-06-12 Consistent Story Generation with Asymmetry Zigzag Sampling Mingxiao Li et.al. 2506.09612 link
2025-06-11 Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression Dingcheng Zhen et.al. 2506.09482 link
2025-06-10 MagCache: Fast Video Generation with Magnitude-Aware Cache Zehong Ma et.al. 2506.09045 link
2025-06-10 Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Xuanchi Ren et.al. 2506.09042 link
2025-06-10 Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Dianyi Wang et.al. 2506.09040 link
2025-06-10 Diffuse and Disperse: Image Generation with Representation Regularization Runqian Wang et.al. 2506.09027 null
2025-06-11 SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping Jiajun Li et.al. 2506.08908 link
2025-06-10 CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics Shravan Nayak et.al. 2506.08835 null
2025-06-10 FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency Yifei Su et.al. 2506.08822 null
2025-06-10 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation Ziyao Huang et.al. 2506.08797 null
2025-06-10 Flow Diverse and Efficient: Learning Momentum Flow Matching via Stochastic Velocity Field Sampling Zhiyuan Ma et.al. 2506.08796 null
2025-06-10 MAMBO: High-Resolution Generative Approach for Mammography Images Milica Škipina et.al. 2506.08677 null
2025-06-09 StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets Anh-Quan Cao et.al. 2506.08013 link
2025-06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Xun Huang et.al. 2506.08009 null
2025-06-09 Dreamland: Controllable World Creation with Simulator and Generative Models Sicheng Mo et.al. 2506.08006 null
2025-06-09 Audio-Sync Video Generation with Multi-Stream Temporal Control Shuchen Weng et.al. 2506.08003 null
2025-06-09 MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation Junhao Chen et.al. 2506.07999 null
2025-06-09 Generative Modeling of Weights: Generalization or Memorization? Boya Zeng et.al. 2506.07998 link
2025-06-10 OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Jingjing Chang et.al. 2506.07977 link
2025-06-09 Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces Kevin Rojas et.al. 2506.07903 link
2025-06-09 Video Unlearning via Low-Rank Refusal Vector Simone Facchiano et.al. 2506.07891 null
2025-06-09 Diffusion Counterfactual Generation with Semantic Abduction Rajat Rasal et.al. 2506.07883 link
2025-06-06 STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Jiatao Gu et.al. 2506.06276 null
2025-06-06 GenIR: Generative Visual Feedback for Mental Image Retrieval Diji Yang et.al. 2506.06220 null
2025-06-06 Feedback Guidance of Diffusion Models Koulischer Felix et.al. 2506.06085 null
2025-06-06 Restereo: Diffusion stereo video generation and restoration Xingchang Huang et.al. 2506.06023 null
2025-06-06 Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models Chaoyi Zhu et.al. 2506.06018 link
2025-06-06 Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection Yu Li et.al. 2506.05872 null
2025-06-06 LLIA – Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models Haojie Yu et.al. 2506.05806 null
2025-06-06 Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery Sajjad Abdoli et.al. 2506.05673 null
2025-06-05 UniRes: Universal Image Restoration for Complex Degradations Mo Zhou et.al. 2506.05599 null
2025-06-05 EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh Tao Hu et.al. 2506.05554 null
2025-06-05 ContentV: Efficient Training of Video Generation Models with Limited Compute Wenfeng Lin et.al. 2506.05343 null
2025-06-05 AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model Pingyu Wu et.al. 2506.05289 link
2025-06-05 Aligning Latent Spaces with Flow Priors Yizhuo Li et.al. 2506.05240 null
2025-06-05 PixCell: A generative foundation model for digital histopathology images Srikar Yellapragada et.al. 2506.05127 null
2025-06-05 Membership Inference Attacks on Sequence Models Lorenzo Rossi et.al. 2506.05126 null
2025-06-05 DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models Revant Teotia et.al. 2506.05108 null
2025-06-06 Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers Haosong Liu et.al. 2506.05096 null
2025-06-05 FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation Huihan Wang et.al. 2506.04956 null
2025-06-05 CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx Lukas Picek et.al. 2506.04931 null
2025-06-05 Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking Yu-Feng Chen et.al. 2506.04879 null
2025-06-04 LayerFlow: A Unified Model for Layer-aware Video Generation Sihui Ji et.al. 2506.04228 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216 null
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213 null
2025-06-04 Image Editing As Programs with Diffusion Models Yujia Hu et.al. 2506.04158 null
2025-06-05 RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors Hicham Eddoubi et.al. 2506.03988 link
2025-06-04 EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation Cheng Zhang et.al. 2506.03652 null
2025-06-04 ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning Feng Han et.al. 2506.03596 link
2025-06-04 DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Ziyi Wu et.al. 2506.03517 null
2025-06-03 Robustness in Both Domains: CLIP Needs a Robust Text Encoder Elias Abad Rocamora et.al. 2506.03355 null
2025-06-03 Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas Austin Silveria et.al. 2506.03275 null
2025-06-03 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation Yuanze Lin et.al. 2506.03150 null
2025-06-04 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Bin Lin et.al. 2506.03147 null
2025-06-03 Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval Jiwen Yu et.al. 2506.03141 null
2025-06-03 CamCloneMaster: Enabling Reference-based Camera Control for Video Generation Yawen Luo et.al. 2506.03140 null
2025-06-03 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Lu Qiu et.al. 2506.03126 null
2025-06-03 DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Zhengyao Lv et.al. 2506.03123 null
2025-06-03 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models Chetwin Low et.al. 2506.03099 null
2025-06-03 ORV: 4D Occupancy-centric Robot Video Generation Xiuyu Yang et.al. 2506.03079 link
2025-06-03 EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models Mingzhe Li et.al. 2506.03067 null
2025-06-03 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Pengtao Chen et.al. 2506.03065 null
2025-05-30 ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL Yu Zhang et.al. 2505.24875 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873 null
2025-05-30 GenSpace: Benchmarking Spatially-Aware Image Generation Zehan Wang et.al. 2505.24870 null
2025-05-30 Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation Yucheng Zhou et.al. 2505.24787 link
2025-05-30 DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds Jiaxu Zhang et.al. 2505.24733 null
2025-05-30 UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation Yang-Tian Sun et.al. 2505.24521 null
2025-05-30 un $^2$ CLIP: Improving CLIP’s Visual Detail Capturing Ability via Inverting unCLIP Yinqi Li et.al. 2505.24517 link
2025-05-30 Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields Md Shahriar Rahim Siddiqui et.al. 2505.24434 null
2025-06-03 Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning Stepan Shabalin et.al. 2505.24360 link
2025-05-30 Category-aware EEG image generation based on wavelet transform and contrast semantic loss Enshang Zhang et.al. 2505.24301 link
2025-05-29 LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers Yusuf Dalva et.al. 2505.23758 null
2025-05-29 MAGREF: Masked Guidance for Any-Reference Video Generation Yufan Deng et.al. 2505.23742 link
2025-05-29 How Animals Dance (When You’re Not Looking) Xiaojuan Wang et.al. 2505.23738 null
2025-05-29 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Tingyu Song et.al. 2505.23693 link
2025-05-29 VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models Xiangdong Zhang et.al. 2505.23656 link
2025-05-29 Inference-time Scaling of Diffusion Models through Classical Search Xiangcheng Zhang et.al. 2505.23614 null
2025-05-29 Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Qingyu Shi et.al. 2505.23606 link
2025-05-29 R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation Kaijie Chen et.al. 2505.23493 null
2025-05-29 VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation Shi-Xue Zhang et.al. 2505.23484 link
2025-05-29 Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering Sixian Wang et.al. 2505.23343 link
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647 link
2025-05-28 SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation Dekai Zhu et.al. 2505.22643 null
2025-05-28 Principled Out-of-Distribution Generalization via Simplicity Jiawei Ge et.al. 2505.22622 null
2025-05-28 ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models Dmitrii Sorokin et.al. 2505.22569 null
2025-05-28 PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models Junwen Chen et.al. 2505.22523 null
2025-05-28 ProCrop: Learning Aesthetic Image Cropping from Professional Compositions Ke Zhang et.al. 2505.22490 null
2025-05-28 Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation Jiadong Pan et.al. 2505.22407 null
2025-05-28 PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models Fan Fei et.al. 2505.22394 null
2025-05-28 Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion Kewen Chen et.al. 2505.22360 null
2025-05-28 Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers Weilun Feng et.al. 2505.22167 null
2025-05-27 Frame In-N-Out: Unbounded Controllable Image-to-Video Generation Boyang Wang et.al. 2505.21491 null
2025-05-27 Policy Optimized Text-to-Image Pipeline Design Uri Gadot et.al. 2505.21478 null
2025-05-27 DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction Yiheng Liu et.al. 2505.21473 link
2025-05-27 Dynamic Vision from EEG Brain Recordings: How much does EEG know? Prajwal Singh et.al. 2505.21385 null
2025-05-28 SageAttention2++: A More Efficient Implementation of SageAttention2 Jintao Zhang et.al. 2505.21136 link
2025-05-27 Creativity in LLM-based Multi-Agent Systems: A Survey Yi-Cheng Lin et.al. 2505.21116 null
2025-05-27 Minute-Long Videos with Dual Parallelisms Zeqing Wang et.al. 2505.21070 link
2025-05-27 RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy Aiyue Chen et.al. 2505.21036 null
2025-05-27 OrienText: Surface Oriented Textual Image Generation Shubham Singh Paliwal et.al. 2505.20958 null
2025-05-27 Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models Puwei Lian et.al. 2505.20955 null
2025-05-26 FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities Jin Wang et.al. 2505.20147 null
2025-05-26 Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion Zheqi Lv et.al. 2505.20053 link
2025-05-27 Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM Peng Liu et.al. 2505.19901 null
2025-05-26 StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation Yi Wu et.al. 2505.19874 null
2025-05-26 DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving Wenchao Sun et.al. 2505.19692 link
2025-05-26 TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs Juntong Wang et.al. 2505.19535 null
2025-05-26 Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning Jiyu Hu et.al. 2505.19522 null
2025-05-26 The Role of Video Generation in Enhancing Data-Limited Action Understanding Wei Li et.al. 2505.19495 null
2025-05-26 MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models Hang Hua et.al. 2505.19415 null
2025-05-26 Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals Nate Gillman et.al. 2505.19386 null
2025-05-23 WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions Zizhang Li et.al. 2505.18151 null
2025-05-23 F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles Varun Ajith et.al. 2505.18106 null
2025-05-23 DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation Junhao Chen et.al. 2505.18078 null
2025-05-23 RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration Sudarshan Rajagopalan et.al. 2505.18047 null
2025-05-23 SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain Jiawei Zhou et.al. 2505.17727 null
2025-05-23 FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving Shuang Zeng et.al. 2505.17685 null
2025-05-23 Scaling Image and Video Generation via Test-Time Evolutionary Search Haoran He et.al. 2505.17618 null
2025-05-23 MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation Jihan Yao et.al. 2505.17613 null
2025-05-23 InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO Xueji Fang et.al. 2505.17574 link
2025-05-23 Deeper Diffusion Models Amplify Bias Shahin Hakemi et.al. 2505.17560 null
2025-05-22 GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Chengqi Duan et.al. 2505.17022 link
2025-05-22 Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO Chengzhuo Tong et.al. 2505.17017 link
2025-05-22 Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On Siqi Wan et.al. 2505.16977 link
2025-05-22 Creatively Upscaling Images with Global-Regional Priors Yurui Qian et.al. 2505.16976 null
2025-05-22 Training-Free Efficient Video Generation via Dynamic Token Carving Yuechen Zhang et.al. 2505.16864 link
2025-05-22 Conditional Panoramic Image Generation via Masked Autoregressive Modeling Chaoyang Wang et.al. 2505.16862 null
2025-05-22 Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts Taewon Kang et.al. 2505.16819 null
2025-05-22 Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation Hongji Yang et.al. 2505.16763 null
2025-05-22 MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM Siwei Meng et.al. 2505.16456 null
2025-05-22 FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design Renjie Wei et.al. 2505.16335 link
2025-05-21 MMaDA: Multimodal Large Diffusion Language Models Ling Yang et.al. 2505.15809 link
2025-05-21 Interspatial Attention for Efficient 4D Human Video Generation Ruizhi Shao et.al. 2505.15800 null
2025-05-21 IA-T2I: Internet-Augmented Text-to-Image Generation Chuanhao Li et.al. 2505.15779 null
2025-05-21 FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion Kazuaki Mishima et.al. 2505.15313 null
2025-05-21 BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution Ji Guo et.al. 2505.15308 null
2025-05-21 Scaling Diffusion Transformers Efficiently via $μ$ P Chenyu Zheng et.al. 2505.15270 link
2025-05-21 AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection Zhipei Xu et.al. 2505.15173 null
2025-05-21 Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation Xinran Wang et.al. 2505.15172 null
2025-05-21 CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation Xinran Wang et.al. 2505.15145 link
2025-05-20 Programmatic Video Prediction Using Large Language Models Hao Tang et.al. 2505.14948 link
2025-05-20 Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers Sucheng Ren et.al. 2505.14687 link
2025-05-20 UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation Rui Tian et.al. 2505.14682 null
2025-05-20 Training-Free Watermarking for Autoregressive Image Generation Yu Tong et.al. 2505.14673 link
2025-05-20 SparC: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling Zhihao Li et.al. 2505.14521 null
2025-05-20 Latent Flow Transformer Yen-Chen Wu et.al. 2505.14513 link
2025-05-20 VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank Tianhe Wu et.al. 2505.14460 link
2025-05-20 Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives Xingxing Weng et.al. 2505.14361 null
2025-05-20 Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization Yuanyuan Chang et.al. 2505.14254 link
2025-05-20 “Haet Bhasha aur Diskrimineshun”: Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs Darpan Aswal et.al. 2505.14226 null
2025-05-20 LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer Changgu Chen et.al. 2505.14167 null
2025-05-19 VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation Huawei Lin et.al. 2505.13439 link
2025-05-19 FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Dian Shao et.al. 2505.13437 null
2025-05-20 Swin DiT: Diffusion Transformer using Pseudo Shifted Windows Jiafu Wu et.al. 2505.13219 null
2025-05-19 Diffusion Models with Double Guidance: Generate with aggregated datasets Yanfeng Yang et.al. 2505.13213 null
2025-05-19 MAGI-1: Autoregressive Video Generation at Scale Sand. ai et.al. 2505.13211 link
2025-05-19 A Physics-Inspired Optimizer: Velocity Regularized Adam Pranav Vaidhyanathan et.al. 2505.13196 null
2025-05-19 Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model Jonas Brenig et.al. 2505.13152 link
2025-05-19 Accelerate TarFlow Sampling with GS-Jacobi Iteration Ben Liu et.al. 2505.12849 link
2025-05-19 FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities Shibo Hong et.al. 2505.12795 link
2025-05-19 SounDiT: Geo-Contextual Soundscape-to-Landscape Generation Junbo Wang et.al. 2505.12734 null
2025-05-16 QVGen: Pushing the Limit of Quantized Video Generative Models Yushi Huang et.al. 2505.11497 null
2025-05-16 PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment Dingbang Huang et.al. 2505.11468 null
2025-05-16 GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI Marc-Antoine Fortin et.al. 2505.11445 link
2025-05-16 Face Consistency Benchmark for GenAI Video Michal Podstawski et.al. 2505.11425 null
2025-05-16 DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models Giulia Bertazzini et.al. 2505.11257 null
2025-05-16 Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models Fu-Yun Wang et.al. 2505.11245 link
2025-05-16 CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback Yixin Wan et.al. 2505.11178 null
2025-05-16 One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework Feiran Li et.al. 2505.11131 link
2025-05-16 HSRMamba: Efficient Wavelet Stripe State Space Model for Hyperspectral Image Super-Resolution Baisong Li et.al. 2505.11062 link
2025-05-16 Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges Yuan Zhang et.al. 2505.10993 null
2025-05-15 End-to-End Vision Tokenizer Tuning Wenxuan Wang et.al. 2505.10562 null
2025-05-15 CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs Raman Dutt et.al. 2505.10496 link
2025-05-16 MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Yanbo Ding et.al. 2505.10238 link
2025-05-15 ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars Rui-Yang Ju et.al. 2505.10072 null
2025-05-15 Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Bingda Tang et.al. 2505.10046 link
2025-05-14 EnerVerse-AC: Envisioning Embodied Environments with Action Condition Yuxin Jiang et.al. 2505.09723 null
2025-05-14 EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models Hu Yue et.al. 2505.09694 link
2025-05-14 Don’t Forget your Inverse DDIM for Image Editing Guillermo Gomez-Trenado et.al. 2505.09571 null
2025-05-14 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Jiuhai Chen et.al. 2505.09568 link
2025-05-14 Train a Multi-Task Diffusion Policy on RLBench-18 in One Day with One GPU Yutong Hu et.al. 2505.09430 link
2025-05-14 Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis Bingxin Ke et.al. 2505.09358 link
2025-05-14 An Initial Exploration of Default Images in Text-to-Image Generation Hannu Simonen et.al. 2505.09166 null
2025-05-15 Generating time-consistent dynamics with discriminator-guided image diffusion models Philipp Hess et.al. 2505.09089 null
2025-05-13 Generative AI for Autonomous Driving: Frontiers and Opportunities Yuping Wang et.al. 2505.08854 link
2025-05-13 Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models Donghoon Kim et.al. 2505.08622 null
2025-05-13 Symbolically-Guided Visual Plan Inference from Uncurated Video Data Wenyan Yang et.al. 2505.08444 null
2025-05-13 Identifying Memorization of Diffusion Models through p-Laplace Analysis Jonathan Brokman et.al. 2505.08246 link
2025-05-12 Image-Guided Microstructure Optimization using Diffusion Models: Validated with Li-Mn-rich Cathode Precursors Geunho Choi et.al. 2505.07906 null
2025-05-12 DanceGRPO: Unleashing GRPO on Visual Generation Zeyue Xue et.al. 2505.07818 null
2025-05-12 ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models Ozgur Kara et.al. 2505.07652 null
2025-05-12 Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning Bohan Wang et.al. 2505.07538 null
2025-05-12 Addressing degeneracies in latent interpolation for diffusion models Erik Landolsi et.al. 2505.07481 null
2025-05-13 Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model Wei Li et.al. 2505.07449 link
2025-05-12 GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models Daria Zotova et.al. 2505.07364 null
2025-05-12 Generative Pre-trained Autoregressive Diffusion Transformer Yuan Zhang et.al. 2505.07344 null
2025-05-12 Metrics that matter: Evaluating image quality metrics for medical image generation Yash Deo et.al. 2505.07175 link
2025-05-11 DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models Junhao Xia et.al. 2505.07057 null
2025-05-11 Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation Md. Naimur Asif Borno et.al. 2505.06995 null
2025-05-09 Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation Dongying Li et.al. 2505.06117 null
2025-05-09 Discovery of the Polar Ring Galaxies with deep learning D. V. Dobrycheva et.al. 2505.05890 null
2025-05-09 Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition Zhiyuan Chen et.al. 2505.05829 link
2025-05-08 InstanceGen: Image Generation with Instance-level Instructions Etai Sella et.al. 2505.05678 link
2025-05-08 A Preliminary Study for GPT-4o on Image Restoration Hao Yang et.al. 2505.05621 link
2025-05-11 Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation Chao Liao et.al. 2505.05472 null
2025-05-08 Normalize Everything: A Preconditioned Magnitude-Preserving Architecture for Diffusion-Based Speech Enhancement Julius Richter et.al. 2505.05216 null
2025-05-12 PIDiff: Image Customization for Personalized Identities with Diffusion Models Jinyu Gu et.al. 2505.05081 null
2025-05-08 T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models Xuyang Guo et.al. 2505.04946 null
2025-05-07 CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation Viacheslav Vasilev et.al. 2505.04851 null
2025-05-07 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers Divyansh Srivastava et.al. 2505.04718 null
2025-05-08 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Teng Hu et.al. 2505.04512 null
2025-05-08 Defining and Quantifying Creative Behavior in Popular Image Generators Aditi Ramaswamy et.al. 2505.04497 null
2025-05-07 Efficient Flow Matching using Latent Variables Anirban Samaddar et.al. 2505.04486 null
2025-05-07 Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety Variath Madhupal Gautham Nair et.al. 2505.04146 null
2025-05-07 RFNNS: Robust Fixed Neural Network Steganography with Popular Deep Generative Models Yu Cheng et.al. 2505.04116 null
2025-05-06 Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators Will Hawkins et.al. 2505.03859 link
2025-05-06 Revolutionizing Brain Tumor Imaging: Generating Synthetic 3D FA Maps from T1-Weighted MRI using CycleGAN Models Xin Du et.al. 2505.03662 null
2025-05-06 Real-Time Person Image Synthesis Using a Flow Matching Model Jiwoo Jeong et.al. 2505.03562 link
2025-05-06 Safer Prompts: Reducing IP Risk in Visual Generative AI Lena Reissinger et.al. 2505.03338 null
2025-05-06 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Yibin Wang et.al. 2505.03318 null
2025-05-06 Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights Zhaiming Shen et.al. 2505.03205 null
2025-05-05 Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models Kuofeng Gao et.al. 2505.02824 link
2025-05-06 MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation Mingcheng Li et.al. 2505.02648 null
2025-05-07 Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Xinjie Zhang et.al. 2505.02567 link
2025-05-05 Text to Image Generation and Editing: A Survey Pengfei Yang et.al. 2505.02527 null
2025-05-07 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Inclusion AI et.al. 2505.02471 link
2025-05-04 Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset Jakub Wąsala et.al. 2505.02255 null
2025-05-04 Improving Physical Object State Representation in Text-to-Image Generative Systems Tianle Chen et.al. 2505.02236 link
2025-05-04 DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization Wenchuan Wang et.al. 2505.02192 null
2025-05-06 Regression is all you need for medical image translation Sebastian Rassmann et.al. 2505.02048 link
2025-05-03 PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth Bu Jin et.al. 2505.01729 null
2025-05-02 FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis Jiangtong Tan et.al. 2505.01172 link
2025-05-02 Improving Editability in Image Generation with Layer-wise Memory Daneul Kim et.al. 2505.01079 null
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704 null
2025-05-01 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Dongzhi Jiang et.al. 2505.00703 link
2025-05-01 JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers Kwon Byung-Ki et.al. 2505.00482 null
2025-05-01 T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation Xuyang Guo et.al. 2505.00337 null
2025-04-30 Direct Motion Models for Assessing Generated Videos Kelsey Allen et.al. 2505.00209 null
2025-04-30 Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis Michal Geyer et.al. 2505.00135 null
2025-04-30 ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction Qihao Liu et.al. 2504.21855 null
2025-04-30 3D Stylization via Large Reconstruction Model Ipek Oztas et.al. 2504.21836 null
2025-04-30 Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields Yixin Gao et.al. 2504.21814 null
2025-04-30 HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation Haiyang Zhou et.al. 2504.21650 link
2025-04-30 Latent Feature-Guided Conditional Diffusion for High-Fidelity Generative Image Semantic Communication Zehao Chen et.al. 2504.21577 null
2025-04-30 Simple Visual Artifact Detection in Sora-Generated Videos Misora Sugiyama et.al. 2504.21334 null
2025-04-30 Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation Abdul Sami et.al. 2504.21325 null
2025-04-30 Capturing Conditional Dependence via Auto-regressive Diffusion Models Xunpeng Huang et.al. 2504.21314 null
2025-04-30 AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images Yunhao Li et.al. 2504.21308 null
2025-04-30 Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions ZiYi Dong et.al. 2504.21292 null
2025-04-29 YoChameleon: Personalized Vision and Language Generation Thao Nguyen et.al. 2504.20998 null
2025-04-29 TesserAct: Learning 4D Embodied World Models Haoyu Zhen et.al. 2504.20995 null
2025-04-29 DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs Hao Luan et.al. 2504.20754 null
2025-04-29 Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion Zesheng Wang et.al. 2504.20685 null
2025-04-29 Advance Fake Video Detection via Vision Transformers Joy Battocchio et.al. 2504.20669 null
2025-04-30 PixelHacker: Image Inpainting with Structural and Semantic Consistency Ziyang Xu et.al. 2504.20438 null
2025-04-29 Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems Shiqian Zhao et.al. 2504.20376 null
2025-04-29 A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks Khoi Trinh et.al. 2504.20340 null
2025-04-28 Physics-Informed Diffusion Models for SAR Ship Wake Generation from Text Prompts Kamirul Kamirul et.al. 2504.20241 null
2025-04-28 CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition Quynh Phung et.al. 2504.19894 null
2025-04-28 RepText: Rendering Visual Text via Replicating Haofan Wang et.al. 2504.19724 null
2025-04-28 DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer Junpeng Jiang et.al. 2504.19614 null
2025-04-28 Image Generation Method Based on Heat Diffusion Models Pengfei Zhang et.al. 2504.19600 null
2025-04-29 WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution Pietro Bongini et.al. 2504.19595 null
2025-04-28 GenPTW: In-Generation Image Watermarking for Provenance Tracing and Tamper Localization Zhenliang Gan et.al. 2504.19567 null
2025-04-28 Masked Language Prompting for Generative Data Augmentation in Few-shot Fashion Style Recognition Yuki Hirakawa et.al. 2504.19455 null
2025-04-27 Flow Along the K-Amplitude for Generative Modeling Weitao Du et.al. 2504.19353 null
2025-04-26 Predicting Stress in Two-phase Random Materials and Super-Resolution Method for Stress Images by Embedding Physical Information Tengfei Xing et.al. 2504.18854 null
2025-04-26 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning Yifan Xie et.al. 2504.18810 null
2025-04-25 NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration Haotian Dong et.al. 2504.18448 null
2025-04-25 HepatoGEN: Generating Hepatobiliary Phase MRI with Perceptual and Adversarial Models Jens Hooge et.al. 2504.18405 null
2025-04-24 Fast Autoregressive Models for Continuous Latent Generation Tiankai Hang et.al. 2504.18391 null
2025-04-25 TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation Shintaro Ozaki et.al. 2504.18269 null
2025-04-25 Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding Kun Li et.al. 2504.18204 null
2025-04-25 Diffusion-Driven Universal Model Inversion Attack for Face Recognition Hanrui Wang et.al. 2504.18015 null
2025-04-27 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Xu Ma et.al. 2504.17789 null
2025-04-24 Dynamic Camera Poses and Where to Find Them Chris Rockwell et.al. 2504.17788 null
2025-04-24 Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields Zhuo He et.al. 2504.17712 null
2025-04-24 STCL:Curriculum learning Strategies for deep learning image steganography models Fengchun Liu et.al. 2504.17609 link
2025-04-24 Text-to-Image Alignment in Denoising-Based Models through Step Selection Paul Grimal et.al. 2504.17525 null
2025-04-24 RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Aviv Slobodkin et.al. 2504.17502 null
2025-04-24 StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies Xu Wang et.al. 2504.17401 null
2025-04-24 DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition Yiyan Xu et.al. 2504.17349 null
2025-04-24 Physics-based super-resolved simulation of 3D elastic wave propagation adopting scalable Diffusion Transformer Hugo Gabrielidis et.al. 2504.17308 null
2025-04-24 Towards Generalized and Training-Free Text-Guided Semantic Manipulation Yu Hong et.al. 2504.17269 null
2025-04-23 BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation Ruotong Wang et.al. 2504.16907 null
2025-04-23 ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Ying Li et.al. 2504.16464 null
2025-04-23 CLPSTNet: A Progressive Multi-Scale Convolutional Steganography Model Integrating Curriculum Learning Fengchun Liu et.al. 2504.16364 link
2025-04-23 VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models Xuming Hu et.al. 2504.16359 null
2025-04-22 Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching Junn Yong Loo et.al. 2504.16262 null
2025-04-22 Survey of Video Diffusion Models: Foundations, Implementations, and Applications Yimu Wang et.al. 2504.16081 link
2025-04-22 Boosting Generative Image Modeling via Joint Image-Feature Synthesis Theodoros Kouzelis et.al. 2504.16064 null
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016 null
2025-04-22 FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation Zebin Yao et.al. 2504.15958 link
2025-04-22 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning Wang Lin et.al. 2504.15932 null
2025-04-22 DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers Xuyang Zhong et.al. 2504.15827 null
2025-04-22 Satellite to GroundScape – Large-scale Consistent Ground View Generation from Satellite Views Ningli Xu et.al. 2504.15786 null
2025-04-22 DiTPainter: Efficient Video Inpainting with Diffusion Transformers Xian Wu et.al. 2504.15661 null
2025-04-21 Emergence and Evolution of Interpretable Concepts in Diffusion Models Berk Tinaz et.al. 2504.15473 null
2025-04-21 Solving New Tasks by Adapting Internet Video Knowledge Calvin Luo et.al. 2504.15369 null
2025-04-22 LACE: Controlled Image Prompting and Iterative Refinement with GenAI for Professional Visual Art Creators Yenkai Huang et.al. 2504.15189 null
2025-04-21 Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform Xianpan Zhou et.al. 2504.15182 null
2025-04-21 Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration Junyuan Deng et.al. 2504.15159 null
2025-04-21 GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security Xiangkun Wang et.al. 2504.15139 null
2025-04-22 VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation Mingxia Zhan et.al. 2504.15095 null
2025-04-21 DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation Weijie He et.al. 2504.15032 null
2025-04-21 TWIG: Two-Step Image Generation using Segmentation Masks in Diffusion Models Mazharul Islam Rakib et.al. 2504.14933 null
2025-04-21 Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation Chenjie Cao et.al. 2504.14899 link
2025-04-21 Twin Co-Adaptive Dialogue for Progressive Image Generation Jianhui Wang et.al. 2504.14868 null
2025-04-21 LACE: Exploring Turn-Taking and Parallel Interaction Modes in Human-AI Co-Creation for Iterative Image Generation YenKai Huang et.al. 2504.14827 null
2025-04-18 MLEP: Multi-granularity Local Entropy Patterns for Universal AI-generated Image Detection Lin Yuan et.al. 2504.13726 null
2025-04-18 SupResDiffGAN a new approach for the Super-Resolution task Dawid Kopeć et.al. 2504.13622 null
2025-04-18 U-Shape Mamba: State Space Model for faster diffusion Alex Ergasti et.al. 2504.13499 link
2025-04-18 Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing Joowon Kim et.al. 2504.13490 null
2025-04-18 POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation Evans Xu Han et.al. 2504.13392 null
2025-04-17 SMPL-GPTexture: Dual-View 3D Human Texture Estimation using Text-to-Image Generation Models Mingxiao Tu et.al. 2504.13378 null
2025-04-17 Personalized Text-to-Image Generation with Auto-Regressive Models Kaiyue Sun et.al. 2504.13162 link
2025-04-18 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 link
2025-04-17 HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation Wenqi Dong et.al. 2504.13072 null
2025-04-17 ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models Linkang Du et.al. 2504.13061 link
2025-04-17 RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins Yao Mu et.al. 2504.13059 null
2025-04-17 SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding Qianqian Sun et.al. 2504.12704 null
2025-04-17 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Lvmin Zhang et.al. 2504.12626 link
2025-04-17 Prompt-Driven and Training-Free Forgetting Approach and Dataset for Large Language Models Zhenyu Yu et.al. 2504.12574 null
2025-04-16 InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework Jiale Tao et.al. 2504.12395 link
2025-04-16 VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate Zhihang Yuan et.al. 2504.12259 link
2025-04-16 SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction Xia Wang et.al. 2504.12245 null
2025-04-16 Cobra: Efficient Line Art COlorization with BRoAder References Junhao Zhuang et.al. 2504.12240 null
2025-04-16 Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM Zirui Pan et.al. 2504.12048 null
2025-04-16 Instruction-augmented Multimodal Alignment for Image-Text and Element Matching Xinli Yue et.al. 2504.12018 null
2025-04-16 Novel-view X-ray Projection Synthesis through Geometry-Integrated Deep Learning Daiqi Liu et.al. 2504.11953 link
2025-04-16 Mind2Matter: Creating 3D Models from EEG Signals Xia Deng et.al. 2504.11936 link
2025-04-16 The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation Bingjie Gao et.al. 2504.11739 null
2025-04-16 Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset Muhammad Shahid Muneer et.al. 2504.11707 link
2025-04-15 Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception Ziqi Pang et.al. 2504.11457 link
2025-04-15 ADT: Tuning Diffusion Models with Adversarial Supervision Dazhong Shen et.al. 2504.11423 null
2025-04-15 VideoPanda: Video Panoramic Diffusion with Multi-view Attention Kevin Xie et.al. 2504.11389 null
2025-04-15 Omni $^2$ : Unifying Omnidirectional Image Generation and Editing in an Omni Model Liu Yang et.al. 2504.11379 null
2025-04-16 Seedream 3.0 Technical Report Yu Gao et.al. 2504.11346 null
2025-04-15 Using LLMs as prompt modifier to avoid biases in AI image generators René Peinl et.al. 2504.11104 null
2025-04-15 AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era Chenyang Zhu et.al. 2504.11015 null
2025-04-15 InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation Yukang Lin et.al. 2504.10905 null
2025-04-15 Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models Karan Jain et.al. 2504.10883 null
2025-04-15 OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding Dianbing Xi et.al. 2504.10825 null
2025-04-14 Art3D: Training-Free 3D Generation from Flat-Colored Illustration Xiaoyan Cong et.al. 2504.10466 null
2025-04-14 Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing Taihang Hu et.al. 2504.10434 link
2025-04-14 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos Rui Chen et.al. 2504.10358 null
2025-04-14 InstructEngine: Instruction-driven Text-to-Image Alignment Xingyu Lu et.al. 2504.10329 null
2025-04-14 VibrantLeaves: A principled parametric image generator for training deep restoration models Raphael Achddou et.al. 2504.10201 link
2025-04-14 GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions Jo-Ku Cheng et.al. 2504.10146 link
2025-04-14 Aligning Anime Video Generation with Human Feedback Bingwen Zhu et.al. 2504.10044 null
2025-04-14 Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics Nikolai Röhrich et.al. 2504.10021 null
2025-04-14 Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes Huijie Liu et.al. 2504.09948 null
2025-04-14 EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise Chao Liu et.al. 2504.09789 null
2025-04-11 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Tianwei Xiong et.al. 2504.08736 link
2025-04-11 Generating Fine Details of Entity Interactions Xinyi Gu et.al. 2504.08714 null
2025-04-11 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Team Seawead et.al. 2504.08685 null
2025-04-11 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Jialu Li et.al. 2504.08641 null
2025-04-11 Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging Gabriele Lozupone et.al. 2504.08635 link
2025-04-11 Discriminator-Free Direct Preference Optimization for Video Diffusion Haoran Cheng et.al. 2504.08542 null
2025-04-11 On the Design of Diffusion-based Neural Speech Codecs Pietro Foti et.al. 2504.08470 null
2025-04-11 Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion Weiye Chen et.al. 2504.08451 link
2025-04-11 Diffusion Models for Robotic Manipulation: A Survey Rosa Wolf et.al. 2504.08438 null
2025-04-11 MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization Daeun Kim et.al. 2504.08398 null
2025-04-10 PixelFlow: Pixel-Space Generative Models with Flow Shoufa Chen et.al. 2504.07963 link
2025-04-10 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction Zeren Jiang et.al. 2504.07961 link
2025-04-10 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Zhong-Yu Li et.al. 2504.07960 null
2025-04-10 Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos Rundong Luo et.al. 2504.07940 null
2025-04-10 DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows Mashrur M. Morshed et.al. 2504.07894 null
2025-04-10 Towards Sustainable Creativity Support: An Exploratory Study on Prompt Based Image Generation Daniel Hove Paludan et.al. 2504.07879 null
2025-04-10 Diffusion Transformers for Tabular Data Time Series Generation Fabrizio Garuti et.al. 2504.07566 link
2025-04-10 FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation Linyan Huang et.al. 2504.07405 null
2025-04-10 ID-Booth: Identity-consistent Face Generation with Diffusion Models Darian Tomašević et.al. 2504.07392 link
2025-04-10 Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction Qingchao Jiang et.al. 2504.07382 link
2025-04-09 OmniCaptioner: One Captioner to Rule Them All Yiting Lu et.al. 2504.07089 link
2025-04-09 A Unified Agentic Framework for Evaluating Conditional Image Generation Jifang Wang et.al. 2504.07046 link
2025-04-09 EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation Diljeet Jagpal et.al. 2504.06861 null
2025-04-09 DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation Wangbo Zhao et.al. 2504.06803 link
2025-04-09 A Meaningful Perturbation Metric for Evaluating Explainability Methods Danielle Cohen et.al. 2504.06800 null
2025-04-10 Compass Control: Multi Object Orientation Control for Text-to-Image Generation Rishubh Parihar et.al. 2504.06752 null
2025-04-09 RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism Elia Peruzzo et.al. 2504.06672 null
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-09 Collision avoidance from monocular vision trained with novel view synthesis Valentin Tordjman–Levavasseur et.al. 2504.06651 null
2025-04-09 PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering Yifan Gao et.al. 2504.06632 null
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 null
2025-04-08 HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance Jiazi Bu et.al. 2504.06232 null
2025-04-08 A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model Jihun Park et.al. 2504.06144 null
2025-04-08 CamContextI2V: Context-aware Controllable Video Generation Luis Denninger et.al. 2504.06022 link
2025-04-08 An Empirical Study of GPT-4o Image Generation Capabilities Sixiang Chen et.al. 2504.05979 link
2025-04-08 Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking Junxi Chen et.al. 2504.05838 link
2025-04-08 Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models Jiahao Chen et.al. 2504.05815 null
2025-04-08 Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling Jaskirat Singh et.al. 2504.05800 null
2025-04-07 Gaussian Mixture Flow Matching Models Hansheng Chen et.al. 2504.05304 link
2025-04-07 One-Minute Video Generation with Test-Time Training Karan Dalal et.al. 2504.05298 null
2025-04-07 Video-Bench: Human-Aligned Video Generation Benchmark Hui Han et.al. 2504.04907 null
2025-04-07 Imagining the Far East: Exploring Perceived Biases in AI-Generated Images of East Asian Women Xingyu Lan et.al. 2504.04865 null
2025-04-07 AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation Xiongbo Lu et.al. 2504.04743 null
2025-04-08 Your Image Generator Is Your New Private Dataset Nicolo Resmini et.al. 2504.04582 null
2025-04-06 Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification Shijian Wang et.al. 2504.04510 null
2025-04-06 UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding Yang Jiao et.al. 2504.04423 link
2025-04-05 SDEIT: Semantic-Driven Electrical Impedance Tomography Dong Liu et.al. 2504.04185 null
2025-04-05 Learning about the Physical World through Analytic Concepts Jianhua Sun et.al. 2504.04170 null
2025-04-04 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Wulin Xie et.al. 2504.03641 null
2025-04-04 Dynamic Importance in Diffusion U-Net for Enhanced Image Synthesis Xi Wang et.al. 2504.03471 link
2025-04-04 QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning Quanxing Xu et.al. 2504.03337 null
2025-04-04 Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models Xuran Ma et.al. 2504.03140 link
2025-04-03 How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models Pascal Chang et.al. 2504.03072 null
2025-04-03 VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Xianwei Zhuang et.al. 2504.02949 link
2025-04-03 Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments Chenyu Zhang et.al. 2504.02918 null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Chuning Zhu et.al. 2504.02792 null
2025-04-03 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Zhiyuan Yan et.al. 2504.02782 link
2025-04-03 Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model Shengjun Zhang et.al. 2504.02764 null
2025-04-03 RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models ZhongLi Fang et.al. 2504.02640 null
2025-04-03 Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation Jiwoo Chung et.al. 2504.02612 link
2025-04-04 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-03 ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer Jiayi Gao et.al. 2504.02451 link
2025-04-03 SkyReels-A2: Compose Anything in Video Diffusion Transformers Zhengcong Fei et.al. 2504.02436 link
2025-04-04 MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition Takahiro Shirakawa et.al. 2504.02361 null
2025-04-03 OmniCam: Unified Multimodal Video Generation via Camera Control Xiaoda Yang et.al. 2504.02312 null
2025-04-03 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Hanyang Wang et.al. 2504.01956 null
2025-04-03 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Runhui Huang et.al. 2504.01934 null
2025-04-02 FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs Mothilal Asokan et.al. 2504.01916 link
2025-04-02 Instance Migration Diffusion for Nuclear Instance Segmentation in Pathology Lirui Qi et.al. 2504.01577 null
2025-04-02 High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model Yiyang Shen et.al. 2504.01512 null
2025-04-01 Prompting Forgetting: Unlearning in GANs via Textual Guidance Piyush Nagasubramaniam et.al. 2504.01218 null
2025-04-01 Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Guy Kaplan et.al. 2504.01137 link
2025-04-01 ShieldGemma 2: Robust and Tractable Image Content Moderation Wenjun Zeng et.al. 2504.01081 null
2025-04-01 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Junhao Cheng et.al. 2504.01014 link
2025-04-01 MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Siyuan Li et.al. 2504.00999 link
2025-03-31 RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Zhonghan Zhao et.al. 2503.24388 null
2025-03-31 Consistent Subject Generation via Contrastive Instantiated Concepts Lee Hsin-Ying et.al. 2503.24387 null
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379 null
2025-03-31 Style Quantization for Data-Efficient GAN Training Jian Wang et.al. 2503.24282 null
2025-03-31 FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics Yixuan Li et.al. 2503.24267 null
2025-03-31 Threats and Opportunities in AI-generated Images for Armed Forces Raphael Meier et.al. 2503.24095 null
2025-04-01 HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation Boyuan Wang et.al. 2503.24026 null
2025-03-31 JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation Fangda Chen et.al. 2503.23951 null
2025-03-31 AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents Jiaxiang Chen et.al. 2503.23948 link
2025-04-01 On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices Bosung Kim et.al. 2503.23796 link
2025-03-28 Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure Frank J. Brooks et.al. 2503.22658 null
2025-03-28 Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model Jangho Park et.al. 2503.22622 null
2025-03-28 EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation Hadrien Reynaud et.al. 2503.22357 null
2025-03-28 Meta-LoRA: Meta-Learning LoRA Components for Domain-Aware ID Personalization Barış Batuhan Topal et.al. 2503.22352 null
2025-03-28 CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving Yishen Ji et.al. 2503.22231 null
2025-03-28 Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces Wonhyeok Choi et.al. 2503.22209 null
2025-03-28 ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation Yunhong Min et.al. 2503.22194 null
2025-03-28 Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items Jianghao Lin et.al. 2503.22182 null
2025-03-28 An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval Min Cao et.al. 2503.22171 link
2025-03-28 Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis Woojung Han et.al. 2503.22168 null
2025-03-27 VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Chi-Pin Huang et.al. 2503.21781 null
2025-03-27 Optimal Stepsize for Diffusion Sampling Jianning Pei et.al. 2503.21774 link
2025-03-27 Exploring the Evolution of Physics Cognition in Video Generation: A Survey Minghui Lin et.al. 2503.21765 link
2025-03-27 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework Qi Qin et.al. 2503.21758 link
2025-03-27 A Unified Framework for Diffusion Bridge Problems: Flow Matching and Schrödinger Matching into One Minyoung Kim et.al. 2503.21756 null
2025-03-27 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Dian Zheng et.al. 2503.21755 link
2025-03-27 CTRL-O: Language-Controllable Object-Centric Visual Representation Learning Aniket Didolkar et.al. 2503.21747 null
2025-03-27 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models Yuhan Zhang et.al. 2503.21745 null
2025-03-27 Audio-driven Gesture Generation via Deviation Feature in the Latent Space Jiahui Chen et.al. 2503.21616 null
2025-03-27 Zero-Shot Visual Concept Blending Without Text Guidance Hiroya Makino et.al. 2503.21277 link
2025-03-26 High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching Guoqiang Zhang et.al. 2503.20744 null
2025-03-26 RecTable: Fast Modeling Tabular Data with Rectified Flow Masane Fuchi et.al. 2503.20731 link
2025-03-26 BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation Yuyang Peng et.al. 2503.20672 null
2025-03-26 AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports Xiangwen Zhang et.al. 2503.20654 null
2025-03-26 MMGen: Unified Multi-modal Image Generation and Understanding in One Go Jiepeng Wang et.al. 2503.20644 null
2025-03-26 GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving Lloyd Russell et.al. 2503.20523 null
2025-03-26 VPO: Aligning Text-to-Video Generation Models with Prompt Optimization Jiale Cheng et.al. 2503.20491 link
2025-03-26 Wan: Open and Advanced Large-Scale Video Generative Models WanTeam et.al. 2503.20314 link
2025-03-26 Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models Prin Phunyaphibarn et.al. 2503.20240 null
2025-03-26 Video Motion Graphs Haiyang Liu et.al. 2503.20218 null
2025-03-25 FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Xuan Ju et.al. 2503.19907 null
2025-03-25 Scaling Down Text Encoders of Text-to-Image Diffusion Models Lifu Wang et.al. 2503.19897 link
2025-03-25 Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation Tianhao Qi et.al. 2503.19881 null
2025-03-25 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan et.al. 2503.19824 null
2025-03-25 SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation Jingdan Kang et.al. 2503.19791 link
2025-03-25 Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models Kartik Thakral et.al. 2503.19783 null
2025-03-25 PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models Junhyuk So et.al. 2503.19731 null
2025-03-25 VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models Suhas G Hegde et.al. 2503.19530 null
2025-03-25 Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage Zhengwentai Sun et.al. 2503.19486 null
2025-03-25 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset Haiyu Zhang et.al. 2503.19462 null
2025-03-25 Aether: Geometric-Aware Unified World Modeling Aether Team et.al. 2503.18945 null
2025-03-24 Video-T1: Test-Time Scaling for Video Generation Fangfu Liu et.al. 2503.18942 null
2025-03-24 Training-free Diffusion Acceleration with Bottleneck Sampling Ye Tian et.al. 2503.18940 null
2025-03-24 SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection Shrikant Malviya et.al. 2503.18812 link
2025-03-24 Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation Qin Wang et.al. 2503.18753 null
2025-03-24 Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings Cong Liu et.al. 2503.18719 null
2025-03-25 AMD-Hummingbird: Towards an Efficient Text-to-Video Model Takashi Isobe et.al. 2503.18559 link
2025-03-24 Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models Bin Li et.al. 2503.18556 null
2025-03-24 EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation Qiang Qu et.al. 2503.18552 null
2025-03-24 Can Text-to-Video Generation help Video-Language Alignment? Luca Zanella et.al. 2503.18507 null
2025-03-21 Position: Interactive Generative Video as Next-Generation Game Engine Jiwen Yu et.al. 2503.17359 null
2025-03-21 Leveraging Text-to-Image Generation for Handling Spurious Correlation Aryan Yazdan Parast et.al. 2503.17226 null
2025-03-21 D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens Panpan Wang et.al. 2503.17155 null
2025-03-21 Halton Scheduler For Masked Generative Image Transformer Victor Besnier et.al. 2503.17076 link
2025-03-21 Zero-Shot Styled Text Image Generation, but Make It Autoregressive Vittorio Pippi et.al. 2503.17074 null
2025-03-21 AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process Junjie Hu et.al. 2503.17029 null
2025-03-21 Enabling Versatile Controls for Video Diffusion Models Xu Zhang et.al. 2503.16983 link
2025-03-21 Multiple Ultrasound Image Generation based on Tuned Alignment of Amplitude Hologram over Spatially non-Uniform Ultrasound Source Keisuke Hasegawa et.al. 2503.16949 null
2025-03-21 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model Yingying Fan et.al. 2503.16942 null
2025-03-21 When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO Lingfan Zhang et.al. 2503.16921 null
2025-03-20 XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu et.al. 2503.16428 link
2025-03-20 Tokenize Image as a Set Zigang Geng et.al. 2503.16425 link
2025-03-20 MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance Quanhao Li et.al. 2503.16421 null
2025-03-20 SynCity: Training-Free Generation of 3D Worlds Paul Engstler et.al. 2503.16420 null
2025-03-20 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity Liming Jiang et.al. 2503.16418 link
2025-03-20 VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness SeungJu Cha et.al. 2503.16406 link
2025-03-20 ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos Haolin Yang et.al. 2503.16400 null
2025-03-20 LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images Leyang Wang et.al. 2503.16376 null
2025-03-20 Ultra-Resolution Adaptation with Ease Ruonan Yu et.al. 2503.16322 link
2025-03-20 Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction Ziyao Guo et.al. 2503.16194 null
2025-03-19 Di $\mathtt{[M]}$ O: Distilling Masked Diffusion Models into One-step Generator Yuanzhi Zhu et.al. 2503.15457 null
2025-03-19 Temporal Regularization Makes Your Video Generator Stronger Harold Haodong Chen et.al. 2503.15417 null
2025-03-19 Visual Persona: Foundation Model for Full-Body Human Customization Jisu Nam et.al. 2503.15406 null
2025-03-19 TruthLens:A Training-Free Paradigm for DeepFake Detection Ritabrata Chakraborty et.al. 2503.15342 null
2025-03-19 TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models Teng-Fang Hsiao et.al. 2503.15283 null
2025-03-19 LEGION: Learning to Ground and Explain for Synthetic Image Detection Hengrui Kang et.al. 2503.15264 null
2025-03-19 Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization Feifei Li et.al. 2503.15197 null
2025-03-20 VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention Mingzhe Zheng et.al. 2503.15138 null
2025-03-20 Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis Imanol G. Estepa et.al. 2503.15060 null
2025-03-19 FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis Yaofei Duan et.al. 2503.14906 null
2025-03-18 MusicInfuser: Making Video Diffusion Listen and Dance Susung Hong et.al. 2503.14505 null
2025-03-18 Deeply Supervised Flow-Based Generative Models Inkyu Shin et.al. 2503.14494 null
2025-03-18 DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Minglei Shi et.al. 2503.14487 null
2025-03-18 ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing Yulin Pan et.al. 2503.14482 null
2025-03-18 MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation Hongyu Zhang et.al. 2503.14428 null
2025-03-18 Impossible Videos Zechen Bai et.al. 2503.14378 null
2025-03-18 RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment Chao Wang et.al. 2503.14358 null
2025-03-18 LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models Yu Cheng et.al. 2503.14325 link
2025-03-18 Free-Lunch Color-Texture Disentanglement for Stylized Image Generation Jiang Qin et.al. 2503.14275 null
2025-03-18 Concat-ID: Towards Universal Identity-Preserving Video Synthesis Yong Zhong et.al. 2503.14151 null
2025-03-17 Unified Autoregressive Visual Generation and Understanding with Continuous Tokens Lijie Fan et.al. 2503.13436 null
2025-03-17 BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing Yaowei Li et.al. 2503.13434 null
2025-03-17 MAME: Multidimensional Adaptive Metamer Exploration with Human Perceptual Feedback Mina Kamao et.al. 2503.13212 null
2025-03-17 Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation Yihong Luo et.al. 2503.13070 null
2025-03-17 Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction Zheyuan Liu et.al. 2503.12953 null
2025-03-17 DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode Junjia Huang et.al. 2503.12838 null
2025-03-17 AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations Quang Trung Truong et.al. 2503.12828 null
2025-03-17 GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching Feng Qiao et.al. 2503.12720 link
2025-03-16 UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing Tsu-Jui Fu et.al. 2503.12652 null
2025-03-16 Personalize Anything for Free with Diffusion Transformer Haoran Feng et.al. 2503.12590 null
2025-03-14 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Jianhong Bai et.al. 2503.11647 null
2025-03-14 HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Ziqin Zhou et.al. 2503.11513 null
2025-03-14 T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Seyed Mohammad Hadi Hosseini et.al. 2503.11481 null
2025-03-14 TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Hongxiang Zhao et.al. 2503.11423 null
2025-03-14 Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking Ziyi Wang et.al. 2503.11324 null
2025-03-14 Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model Haoyang Huang et.al. 2503.11251 link
2025-03-14 Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards Zijing Hu et.al. 2503.11240 link
2025-03-14 Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption Du Chen et.al. 2503.11221 null
2025-03-14 Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation Fengchen He et.al. 2503.11213 link
2025-03-14 Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models Shree Singhi et.al. 2503.11195 null
2025-03-13 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Rongyao Fang et.al. 2503.10639 link
2025-03-13 DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation Chen Chen et.al. 2503.10618 null
2025-03-13 CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models Hao He et.al. 2503.10592 null
2025-03-13 Long Context Tuning for Video Generation Yuwei Guo et.al. 2503.10589 null
2025-03-13 Autoregressive Image Generation with Randomized Parallel Decoding Haopeng Li et.al. 2503.10568 link
2025-03-13 RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models Yijing Lin et.al. 2503.10406 null
2025-03-13 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Yufan Deng et.al. 2503.10391 null
2025-03-13 ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation Zirun Guo et.al. 2503.10358 null
2025-03-13 Do I look like a cat.n.01 to you? A Taxonomy Image Generation Benchmark Viktor Moskvoretskii et.al. 2503.10357 null
2025-03-13 MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment Hao Zhou et.al. 2503.10287 null
2025-03-12 PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop Chenyu Li et.al. 2503.09595 link
2025-03-12 FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model Jiahao Xia et.al. 2503.09560 null
2025-03-12 PromptMap: An Alternative Interaction Style for AI-Based Image Generation Krzysztof Adamkiewicz et.al. 2503.09436 link
2025-03-12 LHC Triggers using FPGA Image Recognition James Brooke et.al. 2503.09428 null
2025-03-12 Unified Dense Prediction of Video Diffusion Lehan Yang et.al. 2503.09344 null
2025-03-12 Revealing the Implicit Noise-based Imprint of Generative Models Xinghan Li et.al. 2503.09314 null
2025-03-12 Revealing Unintentional Information Leakage in Low-Dimensional Facial Portrait Representations Kathleen Anderson et.al. 2503.09306 link
2025-03-12 UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer Haoxuan Wang et.al. 2503.09277 null
2025-03-12 NAMI: Efficient Image Generation via Progressive Rectified Flow Transformers Yuhang Ma et.al. 2503.09242 null
2025-03-12 Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets Hannah Kniesel et.al. 2503.09221 null
2025-03-11 GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing Yuanhao Wang et.al. 2503.08678 null
2025-03-11 REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder Yitian Zhang et.al. 2503.08665 null
2025-03-11 Generating Robot Constitutions & Benchmarks for Semantic Safety Pierre Sermanet et.al. 2503.08663 null
2025-03-11 LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Xianfeng Wu et.al. 2503.08619 link
2025-03-11 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Subin Kim et.al. 2503.08605 null
2025-03-11 Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum Shengpeng Xiao et.al. 2503.08484 null
2025-03-12 Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens Qingsong Xie et.al. 2503.08377 null
2025-03-11 Robust Latent Matters: Boosting Image Generation with Sampling Error Kai Qiu et.al. 2503.08354 link
2025-03-12 $^R$ FLAV: Rolling Flow matching for infinite Audio Video generation Alex Ergasti et.al. 2503.08307 link
2025-03-11 OminiControl2: Efficient Conditioning for Diffusion Transformers Zhenxiong Tan et.al. 2503.08280 link
2025-03-10 V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation Guiwei Zhang et.al. 2503.07493 link
2025-03-10 GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models Ryugo Morita et.al. 2503.07463 null
2025-03-10 AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion Mingzhen Sun et.al. 2503.07418 null
2025-03-10 TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models Ruidong Chen et.al. 2503.07389 link
2025-03-10 Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment Xing Xie et.al. 2503.07334 link
2025-03-10 Automated Movie Generation via Multi-Agent CoT Planning Weijia Wu et.al. 2503.07314 link
2025-03-10 WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation Yuwei Niu et.al. 2503.07265 link
2025-03-10 Effective and Efficient Masked Image Generation Models Zebin You et.al. 2503.07197 link
2025-03-10 NFIG: Autoregressive Image Generation with Next-Frequency Prediction Zhihao Huang et.al. 2503.07076 null
2025-03-10 TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation Victor Shea-Jay Huang et.al. 2503.07050 null
2025-03-07 Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models Zheng Li et.al. 2503.05595 link
2025-03-07 Frequency Autoregressive Image Generation with Continuous Tokens Hu Yu et.al. 2503.05305 null
2025-03-07 MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio Xuenan Xu et.al. 2503.05242 link
2025-03-07 Unified Reward Model for Multimodal Understanding and Generation Yibin Wang et.al. 2503.05236 null
2025-03-07 RecipeGen: A Benchmark for Real-World Recipe Image Generation Ruoxuan Zhang et.al. 2503.05228 null
2025-03-07 Development and Enhancement of Text-to-Image Diffusion Models Rajdeep Roshan Sahu et.al. 2503.05149 null
2025-03-06 Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation Alexey Buzovkin et.al. 2503.04871 link
2025-03-06 FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video Yue Gao et.al. 2503.04720 null
2025-03-06 What Are You Doing? A Closer Look at Controllable Human Video Generation Emanuele Bugliarello et.al. 2503.04666 null
2025-03-08 The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Aoxiong Yin et.al. 2503.04606 link
2025-03-06 S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting Yecong Wan et.al. 2503.04314 null
2025-03-06 Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models Rui Jiang et.al. 2503.04215 null
2025-03-06 Underlying Semantic Diffusion for Effective and Efficient In-Context Learning Zhong Ji et.al. 2503.04050 null
2025-03-06 DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation Amin Karimi et.al. 2503.04006 null
2025-03-05 GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Xuanchi Ren et.al. 2503.03751 link
2025-03-05 Rethinking Video Tokenization: A Conditioned Diffusion-based Approach Nianzu Yang et.al. 2503.03708 link
2025-03-05 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance Zhao Yang et.al. 2503.03689 link
2025-03-05 A Generative Approach to High Fidelity 3D Reconstruction from Text Data Venkat Kumar R et.al. 2503.03664 null
2025-03-05 High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights Yuna Kato et.al. 2503.03558 link
2025-03-05 Video Super-Resolution: All You Need is a Video Diffusion Model Zhihao Zhan et.al. 2503.03355 null
2025-03-05 GenColor: Generative Color-Concept Association in Visual Design Yihan Hou et.al. 2503.03236 null
2025-03-05 An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models Binxu Wang et.al. 2503.03206 null
2025-03-05 Find Matching Faces Based On Face Parameters Setu A. Bhatt et.al. 2503.03204 null
2025-03-05 From Architectural Sketch to Conceptual Representation: Using Structure-Aware Diffusion Model to Generate Renderings of School Buildings Zhengyang Wang et.al. 2503.03090 null
2025-03-04 ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models Qinyu Zhao et.al. 2503.02883 link
2025-03-04 Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts Marta Skreta et.al. 2503.02819 link
2025-03-04 Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution Ru Ito et.al. 2503.02767 null
2025-03-04 Generative Modeling of Microweather Wind Velocities for Urban Air Mobility Tristan A. Shah et.al. 2503.02690 link
2025-03-04 SPG: Improving Motion Diffusion by Smooth Perturbation Guidance Boseong Jeon et.al. 2503.02577 null
2025-03-04 PVTree: Realistic and Controllable Palm Vein Generation for Recognition Tasks Sheng Shang et.al. 2503.02547 null
2025-03-04 RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification Zhen Yang et.al. 2503.02537 null
2025-03-04 Q&C: When Quantization Meets Cache in Efficient Image Generation Xin Ding et.al. 2503.02508 null
2025-03-04 Teaching Metric Distance to Autoregressive Multimodal Foundational Models Jiwan Chung et.al. 2503.02379 null
2025-03-04 GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning Zhun Mou et.al. 2503.02341 null
2025-02-28 How far can we go with ImageNet for Text-to-Image generation? L. Degeorge et.al. 2502.21318 null
2025-02-28 Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos Zhiyu Tan et.al. 2502.21314 null
2025-03-03 MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing Xueyun Tian et.al. 2502.21291 link
2025-02-28 A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images Zineb Sordo et.al. 2502.21151 null
2025-02-28 Training-free and Adaptive Sparse Attention for Efficient Long Video Generation Yifei Xia et.al. 2502.21079 null
2025-02-28 Synthesizing Individualized Aging Brains in Health and Disease with Generative Models and Parallel Transport Jingru Fu et.al. 2502.21049 link
2025-02-28 DiffBrush:Just Painting the Art by Your Hands Jiaming Chu et.al. 2502.20904 null
2025-02-28 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models Xiao Wang et.al. 2502.20811 null
2025-02-28 WorldModelBench: Judging Video Generation Models As World Models Dacheng Li et.al. 2502.20694 null
2025-02-28 Diffusion Restoration Adapter for Real-World Image Restoration Hanbang Liang et.al. 2502.20679 null
2025-02-27 FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction Siyu Jiao et.al. 2502.20313 link
2025-02-27 Mobius: Text to Seamless Looping Video Generation via Latent Shift Xiuli Bi et.al. 2502.20307 link
2025-02-27 Attention Distillation: A Unified Approach to Visual Characteristics Transfer Yang Zhou et.al. 2502.20235 link
2025-02-27 Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Liang Chen et.al. 2502.20172 link
2025-02-27 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Sotiris Anagnostidis et.al. 2502.20126 null
2025-02-27 New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration Xuzheng Yang et.al. 2502.20104 null
2025-02-27 C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation Yuhao Li et.al. 2502.19868 link
2025-02-27 Analyzing CLIP’s Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study Reza Abbasi et.al. 2502.19828 null
2025-02-27 MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery Lianping Yang et.al. 2502.19797 null
2025-02-27 The erasure of intensive livestock farming in text-to-image generative AI Kehan Sheng et.al. 2502.19771 link
2025-02-26 Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making Soobin Park et.al. 2502.18853 null
2025-02-26 Optimal Stochastic Trace Estimation in Generative Modeling Xinyang Liu et.al. 2502.18808 null
2025-02-26 AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools Nathalie Riche et.al. 2502.18736 null
2025-02-25 Investigating Youth AI Auditing Jaemarie Solyst et.al. 2502.18576 null
2025-02-25 ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Yifan Pu et.al. 2502.18364 null
2025-02-25 LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation Pengzhi Li et.al. 2502.18302 null
2025-02-25 Training Consistency Models with Variational Noise Coupling Gianluigi Silvestri et.al. 2502.18197 link
2025-02-25 SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Jintao Zhang et.al. 2502.18137 link
2025-02-26 Bayesian Optimization for Controlled Image Editing via LLMs Chengkun Cai et.al. 2502.18116 null
2025-02-25 Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models Jia Yu et.al. 2502.17951 link
2025-02-25 ASurvey: Spatiotemporal Consistency in Video Generation Zhiyu Yin et.al. 2502.17863 null
2025-02-25 FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks Tanawan Premsri et.al. 2502.17775 link
2025-02-25 Fractal Generative Models Tianhong Li et.al. 2502.17437 link
2025-02-24 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414 null
2025-02-24 RELICT: A Replica Detection Framework for Medical Image Generation Orhun Utku Aydin et.al. 2502.17360 link
2025-02-24 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Xiangpeng Yang et.al. 2502.17258 null
2025-02-24 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Canyu Zhao et.al. 2502.17157 link
2025-02-24 Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions Zhong Li et.al. 2502.17119 link
2025-02-24 Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence Wenzhe Yin et.al. 2502.17028 null
2025-02-24 Autoregressive Image Generation Guided by Chains of Thought Miaomiao Cai et.al. 2502.16965 null
2025-02-24 Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment Suchae Jeong et.al. 2502.16902 null
2025-02-24 A Survey of fMRI to Image Reconstruction Weiyu Guo et.al. 2502.16861 null
2025-02-21 One-step Diffusion Models with $f$ -Divergence Distribution Matching Yilun Xu et.al. 2502.15681 null
2025-02-21 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling Florent Bartoccioni et.al. 2502.15672 link
2025-02-21 Soybean pod and seed counting in both outdoor fields and indoor laboratories using unions of deep neural networks Tianyou Jiang et.al. 2502.15286 null
2025-02-21 Unsettling the Hegemony of Intention: Agonistic Image Generation Andre Ye et.al. 2502.15242 null
2025-02-21 FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation Young Beom Woo et.al. 2502.15203 null
2025-02-21 Methods and Trends in Detecting Generated Images: A Comprehensive Review Arpan Mahara et.al. 2502.15176 null
2025-02-20 Hardware-Friendly Static Quantization Method for Video Diffusion Transformers Sanghyun Yi et.al. 2502.15077 null
2025-02-20 Generative Modeling of Individual Behavior at Scale Nabil Omi et.al. 2502.14998 null
2025-02-20 LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection Qingyuan Liu et.al. 2502.14994 null
2025-02-20 Improving the Diffusability of Autoencoders Ivan Skorokhodov et.al. 2502.14831 null
2025-02-20 DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models Hongji Yang et.al. 2502.14779 null
2025-02-20 AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers Wen-Fan Wang et.al. 2502.14747 null
2025-02-20 RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Ke Cao et.al. 2502.14377 null
2025-02-20 Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation Jiayu Yang et.al. 2502.14247 link
2025-02-20 Designing Parameter and Compute Efficient Diffusion Transformers using Distillation Vignesh Sundaresha et.al. 2502.14226 null
2025-02-19 d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining Prasun Roy et.al. 2502.14007 link
2025-02-19 FlexTok: Resampling Images into 1D Token Sequences of Flexible Length Roman Bachmann et.al. 2502.13967 null
2025-02-19 IP-Composer: Semantic Composition of Visual Concepts Sara Dorfman et.al. 2502.13951 null
2025-02-19 MagicGeo: Training-Free Text-Guided Geometric Diagram Generation Junxiao Wang et.al. 2502.13855 null
2025-02-19 Flow-based generative models as iterative algorithms in probability space Yao Xie et.al. 2502.13394 null
2025-02-18 Breaking the bonds of generative artificial intelligence by minimizing the maximum entropy Mattia Miotto et.al. 2502.13287 null
2025-02-18 Personalized Image Generation with Deep Generative Models: A Decade Survey Yuxiang Wei et.al. 2502.13081 link
2025-02-19 LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation Junchen Fu et.al. 2502.12945 null
2025-02-18 Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options Lakshmi Nair et.al. 2502.12929 link
2025-02-18 VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation Xinlong Chen et.al. 2502.12782 link
2025-02-18 3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces Fabian Bongratz et.al. 2502.12742 null
2025-02-18 MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation Sihyun Yu et.al. 2502.12632 null
2025-02-18 CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation Minghao Fu et.al. 2502.12579 link
2025-02-18 DeltaDiff: A Residual-Guided Diffusion Model for Enhanced Image Super-Resolution Chao Yang et.al. 2502.12567 null
2025-02-17 LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities Florian Sestak et.al. 2502.12128 link
2025-02-17 A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond Shreya Shukla et.al. 2502.12048 null
2025-02-17 Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images Negar Kamali et.al. 2502.11989 link
2025-02-17 GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs Yi Fang et.al. 2502.11925 null
2025-02-17 DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation Zhihang Yuan et.al. 2502.11897 link
2025-02-17 Object-Centric Image to Video Generation with Language Guidance Angel Villar-Corrales et.al. 2502.11655 null
2025-02-17 Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation Taeyoung Yun et.al. 2502.11477 link
2025-02-16 MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation Michael Fuest et.al. 2502.11234 null
2025-02-16 Phantom: Subject-consistent video generation via cross-modal alignment Lijie Liu et.al. 2502.11079 null
2025-02-15 Hybrid Deepfake Image Detection: A Comprehensive Dataset-Driven Approach Integrating Convolutional and Attention Mechanisms with Frequency Domain Features Kafi Anan et.al. 2502.10682 null
2025-02-14 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Guoqing Ma et.al. 2502.10248 link
2025-02-14 RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control Teng Li et.al. 2502.10059 null
2025-02-14 ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation Yuxin He et.al. 2502.10028 null
2025-02-13 CellFlow: Simulating Cellular Morphology Changes via Flow Matching Yuhui Zhang et.al. 2502.09775 null
2025-02-13 Designing a Conditional Prior Distribution for Flow-Based Generative Models Noam Issachar et.al. 2502.09611 null
2025-02-13 Redistribute Ensemble Training for Mitigating Memorization in Diffusion Models Xiaoliu Guan et.al. 2502.09434 link
2025-02-13 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Rotem Shalev-Arkushin et.al. 2502.09411 null
2025-02-13 When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models Samuel Joseph Amouyal et.al. 2502.09307 null
2025-02-14 GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Hongyin Zhang et.al. 2502.09268 null
2025-02-13 Sequential Covariance Fitting for InSAR Phase Linking Dana El Hajjar et.al. 2502.09248 null
2025-02-13 Dynamic watermarks in images generated by diffusion models Yunzhuo Chen et.al. 2502.08927 null
2025-02-13 Detecting Malicious Concepts Without Image Generation in AIGC Kun Xu et.al. 2502.08921 null
2025-02-12 HistoSmith: Single-Stage Histology Image-Label Generation via Conditional Latent Diffusion for Enhanced Cell Segmentation and Classification Valentina Vadori et.al. 2502.08754 link
2025-02-12 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Qinghe Wang et.al. 2502.08639 null
2025-02-12 Enhancing Diffusion Models Efficiency by Disentangling Total-Variance and Signal-to-Noise Ratio Khaled Kahouli et.al. 2502.08598 link
2025-02-12 Ultrasound Image Generation using Latent Diffusion Models Benoit Freiche et.al. 2502.08580 null
2025-02-12 BCDDM: Branch-Corrected Denoising Diffusion Model for Black Hole Image Generation Ao liu et.al. 2502.08528 null
2025-02-12 FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis Wonjoon Jin et.al. 2502.08244 null
2025-02-12 Learning Human Skill Generators at Key-Step Levels Yilu Wu et.al. 2502.08234 null
2025-02-12 AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance Zhao Wang et.al. 2502.08189 null
2025-02-12 PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation Ziyan Wang et.al. 2502.08106 null
2025-02-12 ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation Qianrui Teng et.al. 2502.08097 null
2025-02-11 Training-Free Safe Denoisers for Safe Use of Diffusion Models Mingyu Kim et.al. 2502.08011 null
2025-02-11 Direct Ascent Synthesis: Revealing Hidden Generative Capabilities in Discriminative Models Stanislav Fort et.al. 2502.07753 null
2025-02-11 CausalGeD: Blending Causality and Diffusion for Spatial Gene Expression Generation Rabeya Tus Sadia et.al. 2502.07751 null
2025-02-11 Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling Shuhuai Ren et.al. 2502.07737 null
2025-02-11 Magic 1-For-1: Generating One Minute Video Clips within One Minute Hongwei Yi et.al. 2502.07701 link
2025-02-11 SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches Haichuan Lin et.al. 2502.07556 link
2025-02-11 VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Sixiao Zheng et.al. 2502.07531 null
2025-02-11 Enhance-A-Video: Better Generated Video for Free Yang Luo et.al. 2502.07508 link
2025-02-11 RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation Viacheslav Vasilev et.al. 2502.07455 link
2025-02-11 Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers Zhaodong Bing et.al. 2502.07436 null
2025-02-11 Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos Haowen Gao et.al. 2502.07327 null
2025-02-10 Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT Dongyang Liu et.al. 2502.06782 null
2025-02-10 History-Guided Video Diffusion Kiwhan Song et.al. 2502.06764 null
2025-02-10 Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists Bojia Zi et.al. 2502.06734 null
2025-02-10 TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Yangguang Li et.al. 2502.06608 link
2025-02-10 A Large-scale AI-generated Image Inpainting Benchmark Paschalis Giakoumoglou et.al. 2502.06593 null
2025-02-10 CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers D. She et.al. 2502.06527 null
2025-02-10 Universal Approximation of Visual Autoregressive Transformers Yifang Chen et.al. 2502.06167 null
2025-02-10 Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile Hangliang Ding et.al. 2502.06155 null
2025-02-10 Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models Ce Zhang et.al. 2502.06130 link
2025-02-09 Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization Jiajun Fan et.al. 2502.06061 null
2025-02-07 FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Shilong Zhang et.al. 2502.05179 link
2025-02-07 QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation Yue Zhao et.al. 2502.05178 null
2025-02-07 Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment Minh-Quan Le et.al. 2502.05153 null
2025-02-07 C2GM: Cascading Conditional Generation of Multi-scale Maps from Remote Sensing Images Constrained by Geographic Features Chenxing Sun et.al. 2502.04991 null
2025-02-07 Cached Multi-Lora Composition for Multi-Concept Image Generation Xiandong Zou et.al. 2502.04923 link
2025-02-07 Goku: Flow Based Video Generative Foundation Models Shoufa Chen et.al. 2502.04896 null
2025-02-07 HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation Qijun Gan et.al. 2502.04847 null
2025-02-07 G2PDiffusion: Genotype-to-Phenotype Prediction with Diffusion Models Mengdi Liu et.al. 2502.04684 null
2025-02-06 Fast Video Generation with Sliding Tile Attention Peiyuan Zhang et.al. 2502.04507 null
2025-02-06 Augmented Conditioning Is Enough For Effective Training Image Generation Jiahui Chen et.al. 2502.04475 null
2025-02-06 HOG-Diff: Higher-Order Guided Diffusion for Graph Generation Yiming Huang et.al. 2502.04308 link
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299 null
2025-02-06 Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression Lirui Wang et.al. 2502.04296 null
2025-02-06 Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention Ayush K. Varshney et.al. 2502.04260 null
2025-02-06 Multi-fidelity emulator for large-scale 21 cm lightcone images: a few-shot transfer learning approach with generative adversarial network Kangning Diao et.al. 2502.04246 null
2025-02-06 Generative Adversarial Networks Bridging Art and Machine Intelligence Junhao Song et.al. 2502.04116 null
2025-02-06 Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency Shangkun Sun et.al. 2502.04076 link
2025-02-06 UniForm: A Unified Diffusion Transformer for Audio-Video Generation Lei Zhao et.al. 2502.03897 null
2025-02-06 FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing Jinya Sakurai et.al. 2502.03826 null
2025-02-06 DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models Lingshun Kong et.al. 2502.03810 null
2025-02-05 On Fairness of Unified Multimodal Large Language Model for Image Generation Ming Liu et.al. 2502.03429 null
2025-02-05 TruePose: Human-Parsing-guided Attention Diffusion for Full-ID Preserving Pose Transfer Zhihong Xu et.al. 2502.03426 null
2025-02-05 Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation Alexey A. Novikov et.al. 2502.03420 null
2025-02-05 MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent Xinyao Liao et.al. 2502.03207 null
2025-02-05 Poisson Flow Joint Model for Multiphase contrast-enhanced CT Rongjun Ge et.al. 2502.03079 null
2025-02-05 A Survey of Sample-Efficient Deep Learning for Change Detection in Remote Sensing: Tasks, Strategies, and Challenges Lei Ding et.al. 2502.02835 null
2025-02-04 When are Diffusion Priors Helpful in Sparse Reconstruction? A Study with Sparse-view CT Matt Y. Cheung et.al. 2502.02771 null
2025-02-04 Controllable Video Generation with Provable Disentanglement Yifan Shen et.al. 2502.02690 null
2025-02-04 VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Hila Chefer et.al. 2502.02492 null
2025-02-04 On the Guidance of Flow Matching Ruiqi Feng et.al. 2502.02150 link
2025-02-04 IPO: Iterative Preference Optimization for Text-to-Video Generation Xiaomeng Yang et.al. 2502.02088 null
2025-02-03 VILP: Imitation Learning with Latent Video Planning Zhengtong Xu et.al. 2502.01784 link
2025-02-03 Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Haocheng Xi et.al. 2502.01776 null
2025-02-03 MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation Haibo Tong et.al. 2502.01719 null
2025-02-03 MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation Yiren Song et.al. 2502.01572 null
2025-02-03 Improved Training Technique for Latent Consistency Models Quan Dao et.al. 2502.01441 link
2025-02-03 Assessing the use of Diffusion models for motion artifact correction in brain MRI Paolo Angella et.al. 2502.01418 null
2025-02-04 Compressed Image Generation with Denoising Diffusion Codebook Models Guy Ohayon et.al. 2502.01189 null
2025-01-31 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search Yuta Oshima et.al. 2501.19252 null
2025-01-31 Ambient Denoising Diffusion Generative Adversarial Networks for Establishing Stochastic Object Models from Noisy Image Data Xichen Xu et.al. 2501.19094 null
2025-01-31 Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations Dahye Kim et.al. 2501.19066 link
2025-01-31 BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics Yuxuan Liu et.al. 2501.18972 null
2025-01-31 Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models Jaesin Ahn et.al. 2501.18877 link
2025-01-31 REG: Rectified Gradient Guidance for Conditional Diffusion Models Zhengqi Gao et.al. 2501.18865 null
2025-01-30 Every Image Listens, Every Image Dances: Music-Driven Image Animation Zhikang Dong et.al. 2501.18801 null
2025-01-30 High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2 Nandakishor M et.al. 2501.18670 null
2025-01-30 Diffusion Autoencoders are Scalable Image Tokenizers Yinbo Chen et.al. 2501.18593 null
2025-01-30 SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Enze Xie et.al. 2501.18427 link
2025-01-30 Simulation of microstructures and machine learning Katja Schladitz et.al. 2501.18313 null
2025-01-30 LLMs can see and hear without any training Kumar Ashutosh et.al. 2501.18096 link
2025-01-29 Generative AI for Vision: A Comprehensive Study of Frameworks and Applications Fouad Bousetouane et.al. 2501.18033 null
2025-01-29 Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling Xiaokang Chen et.al. 2501.17811 link
2025-01-29 A Framework for Generating Realistic Synthetic Tabular Data in a Randomized Controlled Trial Setting Niki Z. Petrakos et.al. 2501.17719 null
2025-01-29 Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment Zixue Zeng et.al. 2501.17690 link
2025-01-28 Text-to-Image Generation for Vocabulary Learning Using the Keyword Method Nuwan T. Attygalle et.al. 2501.17099 null
2025-01-28 DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Chenguo Lin et.al. 2501.16764 null
2025-01-29 Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion Shengyuan Liu et.al. 2501.16679 link
2025-01-28 Variational Schrödinger Momentum Diffusion Kevin Rojas et.al. 2501.16675 null
2025-01-28 CascadeV: An Implementation of Wurstchen Architecture for Video Generation Wenfeng Lin et.al. 2501.16612 link
2025-01-27 LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation Farzad Farhadzadeh et.al. 2501.16559 null
2025-01-27 RelightVid: Temporal-Consistent Diffusion Model for Video Relighting Ye Fang et.al. 2501.16330 null
2025-01-28 Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation Adil Kaan Akan et.al. 2501.15878 null
2025-01-27 Autonomous Horizon-based Asteroid Navigation With Observability-constrained Maneuvers Aditya Arjun Anibha et.al. 2501.15806 null
2025-01-27 Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? Yunbo Lyu et.al. 2501.15775 null
2025-01-26 Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting Yuxin Zhang et.al. 2501.15641 link
2025-01-26 Comparative clinical evaluation of “memory-efficient” synthetic 3d generative adversarial networks (gan) head-to-head to state of art: results on computed tomography of the chest Mahshid shiri et.al. 2501.15572 null
2025-01-26 “See What I Imagine, Imagine What I See”: Human-AI Co-Creation System for 360 $^\circ$ Panoramic Video Generation in VR Yunge Wen et.al. 2501.15456 null
2025-01-26 SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity Zichen Fan et.al. 2501.15448 null
2025-01-26 StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces Kyeongmin Yeo et.al. 2501.15445 null
2025-01-25 Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation Yangfan He et.al. 2501.15167 null
2025-01-24 Towards Scalable Topological Regularizers Hiu-Tung Wong et.al. 2501.14641 null
2025-01-24 Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.* Ludovica Schaerf et.al. 2501.14524 null
2025-01-24 PAID: A Framework of Product-Centric Advertising Image Design Hongyu Chen et.al. 2501.14316 null
2025-01-24 VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking Runyi Hu et.al. 2501.14195 link
2025-01-23 Can We Generate Images with CoT? Let’s Verify and Reinforce Image Generation Step by Step Ziyu Guo et.al. 2501.13926 link
2025-01-23 IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Jiayi Lei et.al. 2501.13920 null
2025-01-23 Improving Video Generation with Human Feedback Jie Liu et.al. 2501.13918 null
2025-01-23 Generating Realistic Forehead-Creases for User Verification via Conditioned Piecewise Polynomial Curves Abhishek Tandon et.al. 2501.13889 link
2025-01-23 A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation Dario Serez et.al. 2501.13718 null
2025-01-24 One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Tao Liu et.al. 2501.13554 link
2025-01-23 EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Jiangchuan Wei et.al. 2501.13452 null
2025-01-23 MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize Haohang Xu et.al. 2501.13349 null
2025-01-23 Accelerate High-Quality Diffusion Models with Inner Loop Feedback Matthew Gwilliam et.al. 2501.13107 null
2025-01-22 Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation Akshay Krishnan et.al. 2501.13087 null
2025-01-22 LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation Jiahao Wang et.al. 2501.12976 null
2025-01-22 PreciseCam: Precise Camera Control for Text-to-Image Generation Edurne Bernal-Berdun et.al. 2501.12910 null
2025-01-22 T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Lijun Li et.al. 2501.12612 link
2025-01-22 GPS as a Control Signal for Image Generation Chao Feng et.al. 2501.12390 null
2025-01-21 Taming Teacher Forcing for Masked Autoregressive Video Generation Deyu Zhou et.al. 2501.12389 null
2025-01-21 Parallel Sequence Modeling via Generalized Spatial Propagation Network Hongjun Wang et.al. 2501.12381 null
2025-01-22 Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Sili Chen et.al. 2501.12375 null
2025-01-21 Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists Thomas F. Eisenmann et.al. 2501.12374 link
2025-01-21 ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions Shiyue Zhang et.al. 2501.12173 link
2025-01-20 Are generative models fair? A study of racial bias in dermatological image generation Miguel López-Pérez et.al. 2501.11752 null
2025-01-20 GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video Zhenliang Ni et.al. 2501.11340 null
2025-01-20 CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation Zheng Chong et.al. 2501.11325 link
2025-01-20 Nested Annealed Training Scheme for Generative Adversarial Networks Chang Wan et.al. 2501.11318 null
2025-01-17 DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency Xiaohui Li et.al. 2501.10110 null
2025-01-17 DiffuEraser: A Diffusion Model for Video Inpainting Xiaowen Li et.al. 2501.10018 link
2025-01-17 RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation Yuefan Cao et.al. 2501.09982 null
2025-01-17 Physics-informed DeepCT: Sinogram Wavelet Decomposition Meets Masked Diffusion Zekun Zhou et.al. 2501.09935 link
2025-01-17 IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment Shangkun Sun et.al. 2501.09927 null
2025-01-16 PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery Shristi Das Biswas et.al. 2501.09826 link
2025-01-16 VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Zhongwei Ren et.al. 2501.09781 null
2025-01-16 Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Philippe Hansen-Estruch et.al. 2501.09755 null
2025-01-16 Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Nanye Ma et.al. 2501.09732 null
2025-01-16 AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation Junjie He et.al. 2501.09503 link
2025-01-16 Dynamic Neural Style Transfer for Artistic Image Generation using VGG19 Kapil Kashyap et.al. 2501.09420 null
2025-01-16 SVIA: A Street View Image Anonymization Framework for Self-Driving Applications Dongyu Liu et.al. 2501.09393 link
2025-01-16 Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse Guangyuan Liu et.al. 2501.09391 null
2025-01-15 Grounding Text-To-Image Diffusion Models For Controlled High-Quality Image Generation Ahmad Süleyman et.al. 2501.09194 null
2025-01-15 Generative diffusion model with inverse renormalization group flows Kanta Masuki et.al. 2501.09064 link
2025-01-15 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Jingyuan Chen et.al. 2501.09019 null
2025-01-15 How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias Tosin Fadahunsi et.al. 2501.09014 link
2025-01-15 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Ruixiang Jiang et.al. 2501.09012 link
2025-01-15 RepVideo: Rethinking Cross-Layer Representation for Video Generation Chenyang Si et.al. 2501.08994 null
2025-01-15 Enhanced Multi-Scale Cross-Attention for Person Image Generation Hao Tang et.al. 2501.08900 null
2025-01-15 StereoGen: High-quality Stereo Image Generation from a Single Image Xianqi Wang et.al. 2501.08654 null
2025-01-15 Joint Learning of Depth and Appearance for Portrait Image Animation Xinya Ji et.al. 2501.08649 null
2025-01-15 Watermarking in Diffusion Model: Gaussian Shading with Exact Diffusion Inversion via Coupled Transformations (EDICT) Krishna Panthi et.al. 2501.08604 null
2025-01-15 Comprehensive Subjective and Objective Evaluation Method for Text-generated Video Zelu Qi et.al. 2501.08545 null
2025-01-15 Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers Zhongwang Zhang et.al. 2501.08537 link
2025-01-14 GameFactory: Creating New Games with Generative Interactive Videos Jiwen Yu et.al. 2501.08325 null
2025-01-14 Diffusion Adversarial Post-Training for One-Step Video Generation Shanchuan Lin et.al. 2501.08316 null
2025-01-14 LayerAnimate: Layer-specific Control for Animation Yuxue Yang et.al. 2501.08295 null
2025-01-14 FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors Yabo Zhang et.al. 2501.08225 link
2025-01-14 D $^2$ -DPM: Dual Denoising for Quantized Diffusion Probabilistic Models Qian Zeng et.al. 2501.08180 link
2025-01-14 Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features Evgenii Evstafev et.al. 2501.08170 null
2025-01-13 Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Dongwon Kim et.al. 2501.07730 null
2025-01-13 BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations Weixi Feng et.al. 2501.07647 null
2025-01-13 Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss Xinyu Zhang et.al. 2501.07563 null
2025-01-13 Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models Yongyu Mu et.al. 2501.07086 link
2025-01-13 Enhancing Image Generation Fidelity via Progressive Prompts Zhen Xiong et.al. 2501.07070 link
2025-01-13 Detection of AI Deepfake and Fraud in Online Payments Using GAN-Based Models Zong Ke et.al. 2501.07033 null
2025-01-12 Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Michael Toker et.al. 2501.06751 null
2025-01-11 Denoising Diffusion Probabilistic Model for Radio Map Estimation in Generative Wireless Networks Xuanhao Luo et.al. 2501.06604 null
2025-01-11 DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy Wenshu Fan et.al. 2501.06533 link
2025-01-11 Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation Xiaoying Xing et.al. 2501.06481 null
2025-01-11 Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning Maomao Li et.al. 2501.06438 null
2025-01-10 MEt3R: Measuring Multi-View Consistency in Generated Images Mohammad Asim et.al. 2501.06336 null
2025-01-10 Multi-subject Open-set Personalization in Video Generation Tsai-Shien Chen et.al. 2501.06187 null
2025-01-10 VideoAuteur: Towards Long Narrative Video Generation Junfei Xiao et.al. 2501.06173 null
2025-01-10 Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models Sofia Jamil et.al. 2501.05839 link
2025-01-10 EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model Yi He et.al. 2501.05710 null
2025-01-09 Consistent Flow Distillation for Text-to-3D Generation Runjie Yan et.al. 2501.05445 null
2025-01-09 Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces Aniruddha Mahapatra et.al. 2501.05442 null
2025-01-09 Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation Xuyi Meng et.al. 2501.05427 null
2025-01-09 Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation Darius Petermann et.al. 2501.05413 null
2025-01-09 CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models Junha Park et.al. 2501.05359 null
2025-01-09 Patch-GAN Transfer Learning with Reconstructive Models for Cloud Removal Wanli Ma et.al. 2501.05265 null
2025-01-09 3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Dewei Zhou et.al. 2501.05131 null
2025-01-08 EditAR: Unified Conditional Generation with Autoregressive Models Jiteng Mu et.al. 2501.04699 null
2025-01-08 ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Yuzhou Huang et.al. 2501.04698 null
2025-01-08 On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis Yekun Ke et.al. 2501.04377 null
2025-01-08 Circuit Complexity Bounds for Visual Autoregressive Model Yekun Ke et.al. 2501.04299 null
2025-01-08 LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition Bowen Hao et.al. 2501.04204 null
2025-01-07 HistoryPalette: Supporting Exploration and Reuse of Past Alternatives in Image Generation and Editing Karim Benharrak et.al. 2501.04163 null
2025-01-07 Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers Yuechen Zhang et.al. 2501.03931 link
2025-01-07 Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Zekai Gu et.al. 2501.03847 link
2025-01-07 Motion-Aware Generative Frame Interpolation Guozhen Zhang et.al. 2501.03699 null
2025-01-08 Evaluating Image Caption via Cycle-consistent Text-to-Image Generation Tianyu Cui et.al. 2501.03567 null
2025-01-07 PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models Lingzhi Yuan et.al. 2501.03544 null
2025-01-07 Textualize Visual Prompt for Image Editing via Diffusion Bridge Pengcheng Xu et.al. 2501.03495 null
2025-01-07 SceneBooth: Diffusion-based Framework for Subject-preserved Text-to-Image Generation Shang Chai et.al. 2501.03490 null
2025-01-06 License Plate Images Generation with Diffusion Models Mariia Shpir et.al. 2501.03374 null
2025-01-06 Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Guy Yariv et.al. 2501.03059 null
2025-01-06 TransPixar: Advancing Text-to-Video Generation with Transparency Luozhou Wang et.al. 2501.03006 link
2025-01-06 Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising Yunlong Yuan et.al. 2501.02741 null
2025-01-06 Artificial Intelligence in Creative Industries: Advances Prior to 2025 Nantheera Anantrasirichai et.al. 2501.02725 null
2025-01-05 GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking Weikang Bian et.al. 2501.02690 null
2025-01-05 Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation Dawei Dai et.al. 2501.02523 link
2025-01-05 ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling Chaojie Mao et.al. 2501.02487 null
2025-01-05 MedSegDiffNCA: Diffusion Models With Neural Cellular Automata for Skin Lesion Segmentation Avni Mittal et.al. 2501.02447 null
2025-01-04 Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey Zongxia Li et.al. 2501.02189 link
2025-01-04 Generating Multimodal Images with GAN: Integrating Text, Image, and Style Chaoyi Tan et.al. 2501.02167 null
2025-01-03 JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing Qili Wang et.al. 2501.01798 link
2025-01-03 Controlling your Attributes in Voice Xuyuan Li et.al. 2501.01674 null
2025-01-02 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427 null
2025-01-03 Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions Xincheng Shuai et.al. 2501.01425 null
2025-01-02 Object-level Visual Prompts for Compositional Image Generation Gaurav Parmar et.al. 2501.01424 null
2025-01-02 On Unifying Video Generation and Camera Pose Estimation Chun-Hao Paul Huang et.al. 2501.01409 null
2025-01-02 ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer Xuyin Qi et.al. 2501.01392 link
2025-01-02 Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement Z. Zhang et.al. 2501.01368 null
2025-01-02 LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge Kyoungkook Kang et.al. 2501.01197 null
2025-01-02 HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment Zitong Xu et.al. 2501.01116 null
2025-01-02 EliGen: Entity-Level Controlled Image Generation with Regional Attention Hong Zhang et.al. 2501.01097 link
2025-01-01 OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes Sepehr Dehdashtian et.al. 2501.00962 null
2025-01-02 Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation Yuanbo Yang et.al. 2412.21117 null
2024-12-30 Quantum Diffusion Model for Quark and Gluon Jet Generation Mariia Baidachna et.al. 2412.21082 link
2024-12-30 Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model Yifei Huang et.al. 2412.21080 link
2024-12-30 Varformer: Adapting VAR’s Generative Prior for Image Restoration Siyang Wang et.al. 2412.21063 link
2024-12-30 VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Jiazheng Xu et.al. 2412.21059 link
2024-12-30 ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation Ting Zhang et.al. 2412.20901 null
2024-12-30 VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control Shaojin Wu et.al. 2412.20800 link
2024-12-30 Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling Min Zhang et.al. 2412.20725 null
2024-12-30 HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images Sungik Choi et.al. 2412.20704 null
2024-12-30 Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis Yousef Yeganeh et.al. 2412.20651 null
2024-12-27 Generative Video Propagation Shaoteng Liu et.al. 2412.19761 null
2024-12-27 VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Tao Wu et.al. 2412.19645 null
2024-12-27 P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision Junjie Hu et.al. 2412.19533 null
2024-12-27 DrivingWorld: ConstructingWorld Model for Autonomous Driving via Video GPT Xiaotao Hu et.al. 2412.19505 link
2024-12-27 Focusing Image Generation to Mitigate Spurious Correlations Xuewei Li et.al. 2412.19457 null
2024-12-25 UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation Lunhao Duan et.al. 2412.18928 null
2024-12-25 Accelerating Diffusion Transformers with Dual Feature Caching Chang Zou et.al. 2412.18911 link
2024-12-25 DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering Ruohong Yang et.al. 2412.18838 null
2024-12-25 DebiasDiff: Debiasing Text-to-image Diffusion Models with Self-discovering Latent Attribute Directions Yilei Jiang et.al. 2412.18810 null
2024-12-25 Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation Sen Peng et.al. 2412.18791 null
2024-12-24 DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers Yuntao Chen et.al. 2412.18607 null
2024-12-24 ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation Hongjie Li et.al. 2412.18600 null
2024-12-24 DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Minghong Cai et.al. 2412.18597 link
2024-12-24 Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models Qice Qin et.al. 2412.18421 null
2024-12-24 Extract Free Dense Misalignment from CLIP JeongYeon Nam et.al. 2412.18404 link
2024-12-24 TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization Yucong Luo et.al. 2412.18185 null
2024-12-24 EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation Shuhao Han et.al. 2412.18150 link
2024-12-24 Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction Xiao Guo et.al. 2412.18149 null
2024-12-24 Ensuring Consistency for In-Image Translation Chengpeng Fu et.al. 2412.18139 null
2024-12-23 Large Motion Video Autoencoding with Cross-modal Video VAE Yazhou Xing et.al. 2412.17805 null
2024-12-23 VidTwin: Video VAE with Decoupled Structure and Dynamics Yuchi Wang et.al. 2412.17726 link
2024-12-23 Personalized Large Vision-Language Models Chau Pham et.al. 2412.17610 null
2024-12-23 FFA Sora, video generation as fundus fluorescein angiography simulator Xinyuan Wu et.al. 2412.17346 null
2024-12-23 Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory Xingyao Li et.al. 2412.17254 null
2024-12-23 Discriminative Image Generation with Diffusion Models for Zero-Shot Learning Dingjie Fu et.al. 2412.17219 null
2024-12-22 Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Enshu Liu et.al. 2412.17153 link
2024-12-22 Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images Dennis Menn et.al. 2412.17109 null
2024-12-22 DreamOmni: Unified Image Generation and Editing Bin Xia et.al. 2412.17098 null
2024-12-22 SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults Jinzhi Wang et.al. 2412.17077 null
2024-12-20 Personalized Representation from Personalized Generation Shobhita Sundaram et.al. 2412.16156 link
2024-12-20 NeRF-To-Real Tester: Neural Radiance Fields as Test Image Generators for Vision of Autonomous Systems Laura Weihl et.al. 2412.16141 null
2024-12-20 CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Songhua Liu et.al. 2412.16112 link
2024-12-20 SafeCFG: Redirecting Harmful Classifier-Free Guidance for Safe Generation Jiadong Pan et.al. 2412.16039 null
2024-12-20 Semi-Supervised Adaptation of Diffusion Models for Handwritten Text Generation Kai Brandenbusch et.al. 2412.15853 null
2024-12-20 DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization Zihan Ding et.al. 2412.15689 null
2024-12-20 PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium Xinzhe Li et.al. 2412.15674 link
2024-12-20 BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models Yifei Sun et.al. 2412.15670 link
2024-12-20 CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training Xiuli Bi et.al. 2412.15646 link
2024-12-20 Stylish and Functional: Guided Interpolation Subject to Physical Constraints Yan-Ying Chen et.al. 2412.15507 null
2024-12-19 Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Qihao Liu et.al. 2412.15213 null
2024-12-19 FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching Sucheng Ren et.al. 2412.15205 link
2024-12-19 AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation Moayed Haji-Ali et.al. 2412.15191 null
2024-12-19 LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation Weijia Shi et.al. 2412.15188 null
2024-12-19 Tiled Diffusion Or Madar et.al. 2412.15185 null
2024-12-19 Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM Yatai Ji et.al. 2412.15156 link
2024-12-19 Parallelized Autoregressive Visual Generation Yuqing Wang et.al. 2412.15119 null
2024-12-19 DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space Mang Ning et.al. 2412.15032 link
2024-12-19 Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations Yucheng Hu et.al. 2412.14803 null
2024-12-19 Qua $^2$ SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models Keith G. Mills et.al. 2412.14628 null
2024-12-18 E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling Zhihang Yuan et.al. 2412.14170 null
2024-12-18 Autoregressive Video Generation without Vector Quantization Haoge Deng et.al. 2412.14169 link
2024-12-18 FashionComposer: Compositional Fashion Image Generation Sihui Ji et.al. 2412.14168 null
2024-12-18 VideoDPO: Omni-Preference Alignment for Video Diffusion Generation Runtao Liu et.al. 2412.14167 null
2024-12-18 AKiRa: Augmentation Kit on Rays for optical video generation Xi Wang et.al. 2412.14158 null
2024-12-18 SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical Video Generation Tong Chen et.al. 2412.14018 null
2024-12-18 Text2Relight: Creative Portrait Relighting with Text Guidance Junuk Cha et.al. 2412.13734 null
2024-12-18 Diffusion models and stochastic quantisation in lattice field theory Gert Aarts et.al. 2412.13704 null
2024-12-18 MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing Chuang Yang et.al. 2412.13684 null
2024-12-18 Self-control: A Better Conditional Mechanism for Masked Autoregressive Model Qiaoying Qu et.al. 2412.13635 null
2024-12-17 MotionBridge: Dynamic Video Inbetweening with Flexible Controls Maham Tanveer et.al. 2412.13190 null
2024-12-17 F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration Lu Liu et.al. 2412.13155 null
2024-12-17 Prompt Augmentation for Self-supervised Text-guided Image Manipulation Rumeysa Bodur et.al. 2412.13081 null
2024-12-17 VidTok: A Versatile and Open-Source Video Tokenizer Anni Tang et.al. 2412.13061 link
2024-12-17 3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation Haoshen Wang et.al. 2412.13059 null
2024-12-17 Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression Ruijie Chen et.al. 2412.12982 null
2024-12-17 Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance Wenhao Sun et.al. 2412.12974 link
2024-12-17 Unsupervised Region-Based Image Editing of Denoising Diffusion Models Zixiang Li et.al. 2412.12912 null
2024-12-17 ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction Zhongjie Duan et.al. 2412.12888 link
2024-12-17 Rethinking Diffusion-Based Image Generators for Fundus Fluorescein Angiography Synthesis on Limited Data Chengzhou Yu et.al. 2412.12778 null
2024-12-16 Causal Diffusion Transformers for Generative Modeling Chaorui Deng et.al. 2412.12095 link
2024-12-16 A LoRA is Worth a Thousand Pictures Chenxi Liu et.al. 2412.12048 null
2024-12-16 InterDyn: Controllable Interactive Dynamics with Video Diffusion Models Rick Akkerman et.al. 2412.11785 null
2024-12-16 Generative Inbetweening through Frame-wise Conditions-Driven Video Generation Tianyi Zhu et.al. 2412.11755 link
2024-12-16 IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation Yiren Song et.al. 2412.11638 null
2024-12-16 VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting Muhammet Furkan Ilaslan et.al. 2412.11621 link
2024-12-16 3D $^2$ -Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling Zichen Tang et.al. 2412.11599 link
2024-12-16 LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model Xi Wang et.al. 2412.11519 null
2024-12-16 FedCAR: Cross-client Adaptive Re-weighting for Generative Models in Federated Learning Minjun Kim et.al. 2412.11463 link
2024-12-16 Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models Namhyuk Ahn et.al. 2412.11423 null
2024-12-13 OP-LoRA: The Blessing of Dimensionality Piotr Teterwak et.al. 2412.10362 null
2024-12-13 TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation Xingrui Wang et.al. 2412.10275 null
2024-12-13 Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark Yudong Jiang et.al. 2412.10255 link
2024-12-13 Simple Guidance Mechanisms for Discrete Diffusion Models Yair Schiff et.al. 2412.10193 link
2024-12-13 Financial Fine-tuning a Large Time Series Model Xinghong Fu et.al. 2412.09880 link
2024-12-13 LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity Hongjie Wang et.al. 2412.09856 null
2024-12-13 MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion Xunnong Xu et.al. 2412.09828 null
2024-12-12 Human vs. AI: A Novel Benchmark and a Comparative Study on the Detection of Generated Images and the Impact of Prompts Philipp Moeßner et.al. 2412.09715 link
2024-12-12 Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation Chun-Mei Feng et.al. 2412.09706 link
2024-12-12 Doe-1: Closed-Loop Autonomous Driving with Large World Model Wenzhao Zheng et.al. 2412.09627 link
2024-12-12 OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation Weiqi Li et.al. 2412.09623 null
2024-12-12 LoRACLR: Contrastive Adaptation for Customization of Diffusion Models Enis Simsar et.al. 2412.09622 null
2024-12-12 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Zhuofan Zong et.al. 2412.09618 null
2024-12-12 FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers Yusuf Dalva et.al. 2412.09611 null
2024-12-12 Spectral Image Tokenizer Carlos Esteves et.al. 2412.09607 null
2024-12-12 Owl-1: Omni World Model for Consistent Long Video Generation Yuanhui Huang et.al. 2412.09600 link
2024-12-12 LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors Yabo Chen et.al. 2412.09597 null
2024-12-12 Video Creation by Demonstration Yihong Sun et.al. 2412.09551 null
2024-12-12 UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer Delong Liu et.al. 2412.09389 link
2024-12-11 Fast Prompt Alignment for Text-to-Image Generation Khalil Mrini et.al. 2412.08639 link
2024-12-11 Multimodal Latent Language Modeling with Next-Token Diffusion Yutao Sun et.al. 2412.08635 link
2024-12-11 LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Zejian Li et.al. 2412.08580 link
2024-12-11 Learning Flow Fields in Attention for Controllable Person Image Generation Zijian Zhou et.al. 2412.08486 link
2024-12-11 InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models Min Hou et.al. 2412.08480 link
2024-12-11 CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis Mu Zhang et.al. 2412.08464 null
2024-12-11 Pysical Informed Driving World Model Zhuoran Yang et.al. 2412.08410 null
2024-12-11 FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks Chongkai Gao et.al. 2412.08261 null
2024-12-11 VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation Zhiqiang Yuan et.al. 2412.08259 null
2024-12-11 Analyzing and Improving Model Collapse in Rectified Flow Models Huminhao Zhu et.al. 2412.08175 null
2024-12-10 UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Xi Chen et.al. 2412.07774 null
2024-12-10 From Slow Bidirectional to Fast Causal Video Generators Tianwei Yin et.al. 2412.07772 null
2024-12-10 SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Jianhong Bai et.al. 2412.07760 link
2024-12-10 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation Xiao Fu et.al. 2412.07759 null
2024-12-10 Multi-Shot Character Consistency for Text-to-Video Generation Yuval Atzmon et.al. 2412.07750 null
2024-12-10 StyleMaster: Stylize Your Video with Artistic Generation and Translation Zixuan Ye et.al. 2412.07744 null
2024-12-10 STIV: Scalable Text and Image Conditioned Video Generation Zongyu Lin et.al. 2412.07730 null
2024-12-10 ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer Jinyi Hu et.al. 2412.07720 link
2024-12-10 FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Tong Wu et.al. 2412.07674 null
2024-12-10 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Jianzong Wu et.al. 2412.07589 null
2024-12-09 Visual Lexicon: Rich Image Features in Language Space XuDong Wang et.al. 2412.06774 null
2024-12-09 Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty Meera Hahn et.al. 2412.06771 link
2024-12-09 ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet Andrei-Robert Alexandrescu et.al. 2412.06742 null
2024-12-09 EMOv2: Pushing 5M Vision Model Frontier Jiangning Zhang et.al. 2412.06674 link
2024-12-09 ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Chunwei Wang et.al. 2412.06673 null
2024-12-09 Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion Shuaiting Li et.al. 2412.06661 null
2024-12-09 Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment Kim Sung-Bin et.al. 2412.06209 link
2024-12-09 ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance Yuming Li et.al. 2412.06163 null
2024-12-09 Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters Yuan Wang et.al. 2412.06143 link
2024-12-08 GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis Ashish Goswami et.al. 2412.06089 null
2024-12-06 Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model Lening Wang et.al. 2412.05280 link
2024-12-06 Mind the Time: Temporally-Controlled Multi-Event Video Generation Ziyi Wu et.al. 2412.05263 null
2024-12-06 LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Donald Shenaj et.al. 2412.05148 link
2024-12-06 The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation Ruoyu Wang et.al. 2412.05101 null
2024-12-06 Noise Matters: Diffusion Model-based Urban Mobility Generation with Collaborative Noise Priors Yuheng Zhang et.al. 2412.05000 null
2024-12-06 Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction Gaurav Shrivastava et.al. 2412.04929 null
2024-12-06 UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving Rui Chen et.al. 2412.04842 link
2024-12-05 Hidden in the Noise: Two-Stage Robust Watermarking for Images Kasra Arabi et.al. 2412.04653 link
2024-12-05 One Communication Round is All It Needs for Federated Fine-Tuning Foundation Models Ziyao Wang et.al. 2412.04650 null
2024-12-05 Using Diffusion Priors for Video Amodal Segmentation Kaihua Chen et.al. 2412.04623 null
2024-12-05 PaintScene4D: Consistent 4D Scene Generation from Text Prompts Vinayak Gupta et.al. 2412.04471 null
2024-12-05 LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors Yusuf Dalva et.al. 2412.04460 null
2024-12-05 MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Longtao Zheng et.al. 2412.04448 null
2024-12-05 DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models Yizhuo Li et.al. 2412.04446 null
2024-12-05 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Kaiyi Huang et.al. 2412.04440 null
2024-12-05 Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Yuying Ge et.al. 2412.04432 link
2024-12-05 The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation Fredrik Carlsson et.al. 2412.04318 null
2024-12-05 T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts Ziwei Huang et.al. 2412.04300 null
2024-12-05 Structure-Aware Stylized Image Synthesis for Robust Medical Image Segmentation Jie Bao et.al. 2412.04296 link
2024-12-05 Instructional Video Generation Yayuan Li et.al. 2412.04189 null
2024-12-04 Navigation World Models Amir Bar et.al. 2412.03572 null
2024-12-04 MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Zehuan Huang et.al. 2412.03558 null
2024-12-04 Imagine360: Immersive 360 Video Generation from Perspective Anchor Jing Tan et.al. 2412.03552 null
2024-12-04 Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention Hannan Lu et.al. 2412.03520 null
2024-12-04 Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective Neta Shaul et.al. 2412.03487 null
2024-12-04 SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model Yan Li et.al. 2412.03430 null
2024-12-04 Skel3D: Skeleton Guided Novel View Synthesis Aron Fóthi et.al. 2412.03407 null
2024-12-04 Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment Feng He et.al. 2412.03400 null
2024-12-04 DIVE: Taming DINO for Subject-Driven Video Editing Yi Huang et.al. 2412.03347 null
2024-12-04 DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation Qingdong He et.al. 2412.03255 null
2024-12-03 Motion Prompting: Controlling Video Generation with Motion Trajectories Daniel Geng et.al. 2412.02700 null
2024-12-03 Taming Scalable Visual Tokenizer for Autoregressive Image Generation Fengyuan Shi et.al. 2412.02692 link
2024-12-03 FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation Kefan Chen et.al. 2412.02690 null
2024-12-03 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Viet Nguyen et.al. 2412.02687 null
2024-12-03 AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction Lingteng Qiu et.al. 2412.02684 null
2024-12-03 Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback Hiroki Furuta et.al. 2412.02617 null
2024-12-03 WEM-GAN: Wavelet transform based facial expression manipulation Dongya Sun et.al. 2412.02530 null
2024-12-03 ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation? Leixin Zhang et.al. 2412.02368 link
2024-12-03 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Mingzhe Zheng et.al. 2412.02259 link
2024-12-03 Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models Jungwon Park et.al. 2412.02237 link
2024-11-29 JetFormer: An Autoregressive Generative Model of Raw Images and Text Michael Tschannen et.al. 2411.19722 link
2024-11-29 Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing Wenyi Mo et.al. 2411.19652 link
2024-11-29 QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain Wenfang Sun et.al. 2411.19534 null
2024-11-29 Fleximo: Towards Flexible Text-to-Human Motion Video Generation Yuhang Zhang et.al. 2411.19459 null
2024-11-29 Achromatic single-layer hologram Zhi Li et.al. 2411.19445 null
2024-11-28 AMO Sampler: Enhancing Text Rendering with Overshooting Xixi Hu et.al. 2411.19415 link
2024-11-28 DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models Shwetha Ram et.al. 2411.19390 null
2024-11-28 Trajectory Attention for Fine-grained Video Motion Control Zeqi Xiao et.al. 2411.19324 null
2024-11-28 Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention Huiguo He et.al. 2411.19261 null
2024-11-28 SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation Yuhan Pei et.al. 2411.19182 null
2024-11-27 Diffusion Self-Distillation for Zero-Shot Customized Image Generation Shengqu Cai et.al. 2411.18616 null
2024-11-27 FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion Haosen Yang et.al. 2411.18552 null
2024-11-27 Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models Yiming Wu et.al. 2411.18375 null
2024-11-27 TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models Riza Velioglu et.al. 2411.18350 link
2024-11-27 MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation Haopeng Fang et.al. 2411.18281 null
2024-11-27 Prediction with Action: Visual Policy Learning via Joint Denoising Process Yanjiang Guo et.al. 2411.18179 null
2024-11-27 Type-R: Automatically Retouching Typos for Text-to-Image Generation Wataru Shimoda et.al. 2411.18159 null
2024-11-27 PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion Gwanghyun Kim et.al. 2411.18068 null
2024-11-27 Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models Shuyang Hao et.al. 2411.18000 null
2024-11-27 Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space Lingxiao Li et.al. 2411.17784 null
2024-11-26 Accelerating Vision Diffusion Transformers with Skip Branches Guanjie Chen et.al. 2411.17616 link
2024-11-26 IMPROVE: Improving Medical Plausibility without Reliance on HumanValidation – An Enhanced Prototype-Guided Diffusion Framework Anurag Shandilya et.al. 2411.17535 null
2024-11-26 Identity-Preserving Text-to-Video Generation by Frequency Decomposition Shenghai Yuan et.al. 2411.17440 link
2024-11-26 Image Generation with Multimodule Semantic Feature-Aided Selection for Semantic Communications Chengyang Liang et.al. 2411.17428 null
2024-11-26 Cross-modal Medical Image Generation Based on Pyramid Convolutional Attention Network Fuyou Mao et.al. 2411.17420 null
2024-11-26 AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation Ziyi Xu et.al. 2411.17383 null
2024-11-26 Reward Incremental Learning in Text-to-Image Generation Maorong Wang et.al. 2411.17310 null
2024-11-26 From Graph Diffusion to Graph Classification Jia Jun Cheng Xian et.al. 2411.17236 null
2024-11-26 AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM Jiarui Wang et.al. 2411.17221 link
2024-11-26 cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis Paul Friedrich et.al. 2411.17203 link
2024-11-25 Factorized Visual Tokenization and Generation Zechen Bai et.al. 2411.16681 null
2024-11-25 DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Zun Wang et.al. 2411.16657 null
2024-11-25 Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric Zhichao Zhang et.al. 2411.16619 null
2024-11-25 Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing Kaifeng Gao et.al. 2411.16375 link
2024-11-25 CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain Jingchao Peng et.al. 2411.16327 null
2024-11-25 Image Generation Diversity Issues and How to Tame Them Mischa Dombrowski et.al. 2411.16171 link
2024-11-25 Text-to-Image Synthesis: A Decade Survey Nonghai Zhang et.al. 2411.16164 null
2024-11-25 Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models Donggeun Ko et.al. 2411.16079 null
2024-11-25 Label-Free Intraoperative Mean-Transition-Time Image Generation Using Statistical Gating and Deep Learning Yan Shi et.al. 2411.16039 null
2024-11-24 PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs Teng Zhou et.al. 2411.15867 link
2024-11-22 VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Daeun Lee et.al. 2411.15115 null
2024-11-22 Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion Samarth N Ramesh et.al. 2411.15113 null
2024-11-22 OminiControl: Minimal and Universal Control for Diffusion Transformer Zhenxiong Tan et.al. 2411.15098 link
2024-11-22 Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation Lakshmikar R. Polamreddy et.al. 2411.15084 link
2024-11-22 HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads Yu Xu et.al. 2411.15034 null
2024-11-22 Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation Dingyuan Shi et.al. 2411.14871 null
2024-11-22 Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation Jeongsol Kim et.al. 2411.14863 null
2024-11-22 Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering Haoyuan Li et.al. 2411.14816 null
2024-11-22 High-Resolution Image Synthesis via Next-Token Prediction Dengsheng Chen et.al. 2411.14808 null
2024-11-22 FairAdapter: Detecting AI-generated Images with Improved Fairness Feng Ding et.al. 2411.14755 link
2024-11-21 StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart Jian Shi et.al. 2411.14295 link
2024-11-21 ComfyGI: Automatic Improvement of Image Generation Workflows Dominik Sobania et.al. 2411.14193 null
2024-11-21 TaQ-DiT: Time-aware Quantization for Diffusion Transformers Xinyan Liu et.al. 2411.14172 null
2024-11-21 MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective Hailang Huang et.al. 2411.14062 link
2024-11-21 Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction Jordan Vice et.al. 2411.13982 null
2024-11-21 On the Fairness, Diversity and Reliability of Text-to-Image Generative Models Jordan Vice et.al. 2411.13981 null
2024-11-21 Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion Jinhong He et.al. 2411.13961 link
2024-11-21 iHQGAN: A Lightweight Invertible Hybrid Quantum-Classical Generative Adversarial Network for Unsupervised Image-to-Image Translation Xue Yang et.al. 2411.13920 link
2024-11-21 Dealing with Synthetic Data Contamination in Online Continual Learning Maorong Wang et.al. 2411.13852 link
2024-11-21 Detecting Human Artifacts from Text-to-Image Models Kaihong Wang et.al. 2411.13842 link
2024-11-20 REDUCIO! Generating 1024 $\times$ 1024 Video within 16 Seconds using Extremely Compressed Motion Latents Rui Tian et.al. 2411.13552 link
2024-11-20 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Ziqi Huang et.al. 2411.13503 link
2024-11-20 From Prompt Engineering to Prompt Craft Joseph Lindley et.al. 2411.13422 null
2024-11-20 RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation Christoph Reinders et.al. 2411.13150 link
2024-11-20 CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models Naen Xu et.al. 2411.13144 null
2024-11-19 From Text to Pose to Image: Improving Diffusion Model Control and Quality Clément Bonnett et.al. 2411.12872 link
2024-11-19 Towards motion from video diffusion models Paul Janson et.al. 2411.12831 null
2024-11-19 Stylecodes: Encoding Stylistic Information For Image Generation Ciara Rowles et.al. 2411.12811 link
2024-11-19 Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting Haoyu Zhao et.al. 2411.12789 null
2024-11-19 PoM: Efficient Image and Video Generation with the Polynomial Mixer David Picard et.al. 2411.12663 link
2024-11-19 Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models Shuntaro Okada et.al. 2411.12188 null
2024-11-19 Enhancing Low Dose Computed Tomography Images Using Consistency Training Techniques Mahmut S. Gokmen et.al. 2411.12181 null
2024-11-18 Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution Brian B. Moser et.al. 2411.12072 link
2024-11-18 Medical Video Generation for Disease Progression Simulation Xu Cao et.al. 2411.11943 null
2024-11-18 SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input Zhen Lv et.al. 2411.11934 null
2024-11-18 Conceptwm: A Diffusion Model Watermark for Concept Protection Liangqi Lei et.al. 2411.11688 null
2024-11-18 A Modular Open Source Framework for Genomic Variant Calling Ankita Vaishnobi Bisoi et.al. 2411.11513 null
2024-11-19 SoK: On the Role and Future of AIGC Watermarking in the Era of Gen-AI Kui Ren et.al. 2411.11478 null
2024-11-18 MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion Dongseok Shim et.al. 2411.11475 null
2024-11-18 Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge Qinglong Cao et.al. 2411.11343 null
2024-11-18 BeautyBank: Encoding Facial Makeup in Latent Space Qianwen Lu et.al. 2411.11231 null
2024-11-17 Enhanced Anime Image Generation Using USE-CMHSA-GAN J. Lu et.al. 2411.11179 null
2024-11-17 Time Step Generating: A Universal Synthesized Deepfake Image Detector Ziyue Zeng et.al. 2411.11016 link
2024-11-17 SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Jintao Zhang et.al. 2411.10958 link
2024-11-16 ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models Vipula Rawte et.al. 2411.10867 null
2024-11-15 M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation Sucheng Ren et.al. 2411.10433 link
2024-11-15 Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding Huming Qiu et.al. 2411.10329 null
2024-11-15 The Unreasonable Effectiveness of Guidance for Diffusion Models Tim Kaiser et.al. 2411.10257 null
2024-11-15 Visual question answering based evaluation metrics for text-to-image generation Mizuki Miyamoto et.al. 2411.10183 null
2024-11-15 CART: Compositional Auto-Regressive Transformer for Image Generation Siddharth Roheda et.al. 2411.10180 null
2024-11-15 Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training Myunsoo Kim et.al. 2411.09998 null
2024-11-15 Content-Aware Preserving Image Generation Giang H. Le et.al. 2411.09871 null
2024-11-14 Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting Yian Wang et.al. 2411.09823 null
2024-11-14 GAN-Based Architecture for Low-dose Computed Tomography Imaging Denoising Yunuo Wang et.al. 2411.09512 null
2024-11-14 Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models Chutian Meng et.al. 2411.09449 null
2024-11-14 Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance Md Fahim Anjum et.al. 2411.09174 null
2024-11-14 VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation Youpeng Wen et.al. 2411.09153 null
2024-11-13 A Survey on Vision Autoregressive Model Kai Jiang et.al. 2411.08666 null
2024-11-13 Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models Chengdong Dong et.al. 2411.08642 null
2024-11-13 I Can Embrace and Avoid Vagueness Myself: Supporting the Design Process by Balancing Vagueness through Text-to-Image Generative AI Myungjin Kim et.al. 2411.08588 null
2024-11-13 EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Xiaofeng Wang et.al. 2411.08380 null
2024-11-13 Physics Informed Distillation for Diffusion Models Joshua Tian Jin Tee et.al. 2411.08378 link
2024-11-13 Motion Control for Enhanced Complex Action Video Generation Qiang Zhou et.al. 2411.08328 null
2024-11-12 Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing Zitao Shuai et.al. 2411.08196 null
2024-11-12 TIPO: Text to Image with Text Presampling for Prompt Optimization Shih-Ying Yeh et.al. 2411.08127 null
2024-11-12 Evaluating the Generation of Spatial Relations in Text and Image Generative Models Shang Hong Sim et.al. 2411.07664 null
2024-11-12 Leveraging Previous Steps: A Training-free Fast Solver for Flow Diffusion Kaiyu Song et.al. 2411.07627 null
2024-11-12 Artificial Intelligence for Biomedical Video Generation Linyuan Li et.al. 2411.07619 null
2024-11-12 GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering Zhihao Liang et.al. 2411.07478 null
2024-11-11 Exploring Variational Autoencoders for Medical Image Generation: A Comprehensive Study Khadija Rais et.al. 2411.07348 null
2024-11-11 Learning from Limited and Imperfect Data Harsh Rangwani et.al. 2411.07229 null
2024-11-11 DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID Nyle Siddiqui et.al. 2411.07205 link
2024-11-11 More Expressive Attention with Negative Weights Ang Lv et.al. 2411.07176 link
2024-11-11 Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models NVIDIA et.al. 2411.07126 null
2024-11-11 Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models Yanchen Wang et.al. 2411.07121 link
2024-11-11 Layout Control and Semantic Guidance with Attention Loss Backward for T2I Diffusion Model Guandong Li et.al. 2411.06692 null
2024-11-11 SeedEdit: Align Image Re-Generation to Image Editing Yichun Shi et.al. 2411.06686 null
2024-11-10 Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement Zhennan Chen et.al. 2411.06558 link
2024-11-10 I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength Wanquan Feng et.al. 2411.06525 null
2024-11-10 DDIM-Driven Coverless Steganography Scheme with Real Key Mingyu Yu et.al. 2411.06486 null
2024-11-08 Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models Jia-Hong Huang et.al. 2411.05706 null
2024-11-08 WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making Zhilong Zhang et.al. 2411.05619 null
2024-11-08 A Nerf-Based Color Consistency Method for Remote Sensing Images Zongcheng Zuo et.al. 2411.05557 null
2024-11-08 Improving image synthesis with diffusion-negative sampling Alakh Desai et.al. 2411.05473 null
2024-11-07 Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model Sheng Cheng et.al. 2411.05079 link
2024-11-07 Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Weixin Liang et.al. 2411.04996 null
2024-11-07 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Koichi Namekata et.al. 2411.04989 null
2024-11-07 AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation Anil Kag et.al. 2411.04967 null
2024-11-07 Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification Mischa Dombrowski et.al. 2411.04956 null
2024-11-07 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Wenqiang Sun et.al. 2411.04928 null
2024-11-07 StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration Panwen Hu et.al. 2411.04925 null
2024-11-07 MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views Yuedong Chen et.al. 2411.04924 link
2024-11-07 Taming Rectified Flow for Inversion and Editing Jiangshan Wang et.al. 2411.04746 link
2024-11-07 DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning Yuxuan Duan et.al. 2411.04571 link
2024-11-07 BendVLM: Test-Time Debiasing of Vision-Language Embeddings Walter Gerych et.al. 2411.04420 link
2024-11-06 ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks Ziji Shi et.al. 2411.03999 null
2024-11-06 Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation Chihaya Matsuhira et.al. 2411.03595 null
2024-11-05 Enhancing Weakly Supervised Semantic Segmentation for Fibrosis via Controllable Image Generation Zhiling Yue et.al. 2411.03551 null
2024-11-05 DiT4Edit: Diffusion Transformer for Image Editing Kunyu Feng et.al. 2411.03286 null
2024-11-05 On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models Tariq Berrada Ifriqi et.al. 2411.03177 null
2024-11-05 Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising Tao Huang et.al. 2411.03053 null
2024-11-05 Textual Aesthetics in Large Language Models Lingjie Jiang et.al. 2411.02930 link
2024-11-05 Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey Ao Fu et.al. 2411.02914 null
2024-11-05 BrainBits: How Much of the Brain are Generative Reconstruction Methods Using? David Mayo et.al. 2411.02783 null
2024-11-04 TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives Maitreya Patel et.al. 2411.02545 null
2024-11-04 Adaptive Caching for Faster Video Generation with Diffusion Transformers Kumara Kahatapitiya et.al. 2411.02397 null
2024-11-04 Training-free Regional Prompting for Diffusion Transformers Anthony Chen et.al. 2411.02395 link
2024-11-04 How Far is Video Generation from World Model: A Physical Law Perspective Bingyi Kang et.al. 2411.02385 null
2024-11-04 Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models Anjith George et.al. 2411.02188 null
2024-11-03 Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation Zhenbin Wang et.al. 2411.01647 null
2024-11-03 DreamPolish: Domain Score Distillation With Progressive Geometry Generation Yean Cheng et.al. 2411.01602 null
2024-11-03 Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach Qihe Pan et.al. 2411.01545 link
2024-11-03 DPCL-Diff: The Temporal Knowledge Graph Reasoning based on Graph Node Diffusion Model with Dual-Domain Periodic Contrastive Learning Yukun Cao et.al. 2411.01477 null
2024-11-02 Guided Synthesis of Labeled Brain MRI Data Using Latent Diffusion Models for Segmentation of Enlarged Ventricles Tim Ruschke et.al. 2411.01351 null
2024-11-02 Fast and Memory-Efficient Video Diffusion Using Streamlined Inference Zheng Zhan et.al. 2411.01171 link
2024-10-31 Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning Penghui Ruan et.al. 2410.24219 link
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-11-01 In-Context LoRA for Diffusion Transformers Lianghua Huang et.al. 2410.23775 link
2024-10-31 Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization Xiao Guo et.al. 2410.23556 null
2024-10-30 MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts Jie Zhu et.al. 2410.23332 null
2024-10-30 RelationBooth: Towards Relation-Aware Customized Object Generation Qingyu Shi et.al. 2410.23280 null
2024-10-31 SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Yining Hong et.al. 2410.23277 null
2024-10-30 Multi-student Diffusion Distillation for Better One-step Generators Yanke Song et.al. 2410.23274 null
2024-10-30 LumiSculpt: A Consistency Lighting Control Network for Video Generation Yuxin Zhang et.al. 2410.22979 null
2024-10-30 Private Synthetic Text Generation with Diffusion Models Sebastian Ochs et.al. 2410.22971 link
2024-10-30 An Individual Identity-Driven Framework for Animal Re-Identification Yihao Wu et.al. 2410.22927 link
2024-10-30 HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models Shengkai Zhang et.al. 2410.22901 link
2024-10-30 Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images Hanlin Wu et.al. 2410.22830 link
2024-10-30 Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models Arash Marioriyad et.al. 2410.22775 null
2024-10-30 Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots Vincent Guan et.al. 2410.22729 link
2024-10-29 Investigating Memorization in Video Diffusion Models Chen Chen et.al. 2410.21669 null
2024-10-29 Exploring Local Memorization in Diffusion Models via Bright Ending Attention Chen Chen et.al. 2410.21665 null
2024-10-29 Fingerprints of Super Resolution Networks Jeremy Vonderfecht et.al. 2410.21653 null
2024-10-28 Denoising Diffusion Planner: Learning Complex Paths from Low-Quality Demonstrations Michiel Nikken et.al. 2410.21497 link
2024-10-28 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Hanyu Wang et.al. 2410.21264 null
2024-10-28 Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences Zhihao Zhao et.al. 2410.21130 null
2024-10-28 Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models Wenda Li et.al. 2410.21088 link
2024-10-28 Markov spin models for image generation : explicit large deviations with respect to the number of pixels Cecile Monthus et.al. 2410.20906 null
2024-10-28 Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models Weijian Luo et.al. 2410.20898 link
2024-10-28 Murine AI excels at cats and cheese: Structural differences between human and mouse neurons and their implementation in generative AIs Rino Saiga et.al. 2410.20735 null
2024-10-28 CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians Chongjian Ge et.al. 2410.20723 null
2024-10-28 Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient Yintai Ma et.al. 2410.20657 null
2024-10-27 Generator Matching: Generative modeling with arbitrary Markov processes Peter Holderrieth et.al. 2410.20587 null
2024-10-27 ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation Zongyi Li et.al. 2410.20502 null
2024-10-25 FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Zhengyao Lv et.al. 2410.19355 null
2024-10-25 High Resolution Seismic Waveform Generation using Denoising Diffusion Andreas Bergmeister et.al. 2410.19343 null
2024-10-24 Framer: Interactive Frame Interpolation Wen Wang et.al. 2410.18978 null
2024-10-24 Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences Weijian Luo et.al. 2410.18881 null
2024-10-24 Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation Xiaoyu Zhang et.al. 2410.18830 null
2024-10-24 Towards Visual Text Design Transfer Across Languages Yejin Choi et.al. 2410.18823 null
2024-10-24 Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances Shilin Lu et.al. 2410.18775 link
2024-10-24 Ali-AUG: Innovative Approaches to Labeled Data Augmentation using One-Step Diffusion Model Ali Hamza et.al. 2410.18678 null
2024-10-24 FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation Christopher T. H Teo et.al. 2410.18615 null
2024-10-24 FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling Zhengqiang Zhang et.al. 2410.18410 link
2024-10-23 Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing Dongliang Guo et.al. 2410.18267 null
2024-10-23 WorldSimBench: Towards Video Generation Models as World Simulators Yiran Qin et.al. 2410.18072 null
2024-10-23 Scalable Ranked Preference Optimization for Text-to-Image Generation Shyamgopal Karthik et.al. 2410.18013 null
2024-10-23 A Wavelet Diffusion GAN for Image Super-Resolution Lorenzo Aloisi et.al. 2410.17966 null
2024-10-23 TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation Ruicheng Zhang et.al. 2410.17855 null
2024-10-23 VISAGE: Video Synthesis using Action Graphs for Surgery Yousef Yeganeh et.al. 2410.17751 null
2024-10-22 Offline Evaluation of Set-Based Text-to-Image Generation Negar Arabzadeh et.al. 2410.17331 link
2024-10-22 Altogether: Image Captioning via Re-aligning Alt-text Hu Xu et.al. 2410.17251 link
2024-10-22 IdenBAT: Disentangled Representation Learning for Identity-Preserved Brain Age Transformation Junyeong Maeng et.al. 2410.16945 link
2024-10-22 DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization Haowei Zhu et.al. 2410.16942 null
2024-10-22 Hierarchical Clustering for Conditional Diffusion in Image Generation Jorge da Silva Goncalves et.al. 2410.16910 link
2024-10-22 MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model Meng Xu et.al. 2410.16840 null
2024-10-22 Progressive Compositionality In Text-to-Image Generative Models Xu Han et.al. 2410.16719 link
2024-10-22 Dual-Model Defense: Safeguarding Diffusion Models from Membership Inference Attacks through Disjoint Data Splitting Bao Q. Tran et.al. 2410.16657 null
2024-10-21 MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors Honghua Chen et.al. 2410.16272 null
2024-10-21 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors Xi Liu et.al. 2410.16266 null
2024-10-21 Elucidating the design space of language models for image generation Xuantong Liu et.al. 2410.16257 link
2024-10-21 A Framework for Evaluating Predictive Models Using Synthetic Image Covariates and Longitudinal Data Simon Deltadahl et.al. 2410.16177 null
2024-10-21 Continuous Speech Synthesis using per-token Latent Diffusion Arnon Turetzky et.al. 2410.16048 null
2024-10-20 EVA: An Embodied World Model for Future Video Anticipation Xiaowei Chi et.al. 2410.15461 null
2024-10-20 Allegro: Open the Black Box of Commercial-Level Video Generation Model Yuan Zhou et.al. 2410.15458 link
2024-10-20 FrameBridge: Improving Image-to-Video Generation with Bridge Models Yuji Wang et.al. 2410.15371 null
2024-10-19 SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning Zhewei Dai et.al. 2410.14987 link
2024-10-19 Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence Vansh Bansal et.al. 2410.14949 link
2024-10-18 BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities Shaozhe Hao et.al. 2410.14672 link
2024-10-18 FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models Rui Hu et.al. 2410.14429 null
2024-10-18 HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation Bo Cheng et.al. 2410.14324 link
2024-10-18 HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects Oliverio Theophilus Nathanael et.al. 2410.14265 null
2024-10-18 Text-to-Image Representativity Fairness Evaluation Framework Asma Yamani et.al. 2410.14201 null
2024-10-18 Personalized Image Generation with Large Multimodal Models Yiyan Xu et.al. 2410.14170 link
2024-10-18 Assessing Open-world Forgetting in Generative Image Model Customization Héctor Laria et.al. 2410.14159 null
2024-10-17 Inference of morphology and dynamical state of nearby $Planck$ -SZ galaxy clusters with Zernike polynomials Valentina Capalbo et.al. 2410.13929 null
2024-10-17 Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Lijie Fan et.al. 2410.13863 null
2024-10-17 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Rongyao Fang et.al. 2410.13861 link
2024-10-17 VidPanos: Generative Panoramic Videos from Casual Panning Videos Jingwei Ma et.al. 2410.13832 null
2024-10-17 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Yujie Wei et.al. 2410.13830 null
2024-10-18 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Hanbo Cheng et.al. 2410.13726 link
2024-10-17 Movie Gen: A Cast of Media Foundation Models Adam Polyak et.al. 2410.13720 link
2024-10-17 LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning Yiming Shi et.al. 2410.13618 link
2024-10-17 DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation Guosheng Zhao et.al. 2410.13571 null
2024-10-17 MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models Donghao Zhou et.al. 2410.13370 null
2024-10-18 Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model Weiyi Zhang et.al. 2410.13242 null
2024-10-16 SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation Jaehong Yoon et.al. 2410.12761 null
2024-10-16 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation Dewei Zhou et.al. 2410.12669 link
2024-10-16 Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation Kathrin Khadra et.al. 2410.12542 null
2024-10-16 Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective Yongxin Zhu et.al. 2410.12490 link
2024-10-16 Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks Pranjali Pathre et.al. 2410.12432 link
2024-10-16 FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization Cheng Yu et.al. 2410.12312 link
2024-10-16 Facing Identity: The Formation and Performance of Identity via Face-Based Artificial Intelligence Technologies Wells Lucas Santo et.al. 2410.12148 null
2024-10-15 On the Effectiveness of Dataset Alignment for Fake Image Detection Anirudh Sundara Rajan et.al. 2410.11835 null
2024-10-15 KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities Hsin-Ping Huang et.al. 2410.11824 null
2024-10-16 Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Zhiyuan Ma et.al. 2410.11795 null
2024-10-15 Generative Image Steganography Based on Point Cloud Zhong Yangjie et.al. 2410.11673 null
2024-10-15 InvSeg: Test-Time Prompt Inversion for Semantic Segmentation Jiayi Lin et.al. 2410.11473 null
2024-10-15 A Simple Approach to Unifying Diffusion-based Conditional Generation Xirui Li et.al. 2410.11439 null
2024-10-15 Evolutionary Retrofitting Mathurin Videau et.al. 2410.11330 null
2024-10-15 Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling Guiyu Zhang et.al. 2410.11236 null
2024-10-14 Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models Jingzhi Bao et.al. 2410.10821 link
2024-10-14 When Does Perceptual Alignment Benefit Vision Representations? Shobhita Sundaram et.al. 2410.10817 null
2024-10-14 LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Tianwei Xiong et.al. 2410.10816 link
2024-10-14 HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Haotian Tang et.al. 2410.10812 link
2024-10-14 Boosting Camera Motion Control for Video Diffusion Transformers Soon Yau Cheong et.al. 2410.10802 null
2024-10-15 MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling Jian Yang et.al. 2410.10798 null
2024-10-14 Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Dejia Xu et.al. 2410.10774 null
2024-10-14 DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships Zhang Wan et.al. 2410.10751 null
2024-10-14 Evaluating SQL Understanding in Large Language Models Ananya Rahaman et.al. 2410.10680 null
2024-10-14 ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection Martin Aubard et.al. 2410.10554 link
2024-10-11 SceneCraft: Layout-Guided 3D Scene Generation Xiuyu Yang et.al. 2410.09049 link
2024-10-11 MiRAGeNews: Multimodal Realistic AI-Generated News Detection Runsheng Huang et.al. 2410.09045 link
2024-10-11 One-shot Generative Domain Adaptation in 3D GANs Ziqiang Li et.al. 2410.08824 link
2024-10-11 Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting Purushothaman Natarajan et.al. 2410.08612 link
2024-10-11 Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models Pascl Zwick et.al. 2410.08551 link
2024-10-11 Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities Abhijay Ghildyal et.al. 2410.08534 null
2024-10-11 Diffusion Models Need Visual Priors for Image Generation Xiaoyu Yue et.al. 2410.08531 null
2024-10-10 Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Jinbin Bai et.al. 2410.08261 link
2024-10-10 Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content Qiuheng Wang et.al. 2410.08260 null
2024-10-10 DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models Xiaoxiao He et.al. 2410.08207 null
2024-10-10 Scaling Laws For Diffusion Transformers Zhengyang Liang et.al. 2410.08184 null
2024-10-10 DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Jiatao Gu et.al. 2410.08159 null
2024-10-10 RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace Pragyan Shrestha et.al. 2410.08152 link
2024-10-10 Progressive Autoregressive Video Diffusion Models Desai Xie et.al. 2410.08151 link
2024-10-10 Generated Bias: Auditing Internal Bias Dynamics of Text-To-Image Generative Models Abhishek Mandal et.al. 2410.07884 null
2024-10-10 MinorityPrompt: Text to Minority Image Generation via Prompt Optimization Soobin Um et.al. 2410.07838 link
2024-10-10 HARIVO: Harnessing Text-to-Image Models for Video Generation Mingi Kwon et.al. 2410.07763 null
2024-10-10 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Jiahao Cui et.al. 2410.07718 link
2024-10-10 Relational Diffusion Distillation for Efficient Image Generation Weilun Feng et.al. 2410.07679 link
2024-10-09 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Xinchen Zhang et.al. 2410.07171 link
2024-10-09 Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis Bohan Zeng et.al. 2410.07155 link
2024-10-10 EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models Rui Zhao et.al. 2410.07133 link
2024-10-09 Personalized Visual Instruction Tuning Renjie Pi et.al. 2410.07113 link
2024-10-09 Decouple-Then-Merge: Towards Better Training for Diffusion Models Qianli Ma et.al. 2410.06664 null
2024-10-09 On the Solution of Linearized Inverse Scattering Problems in Near-Field Microwave Imaging by Operator Inversion and Matched Filtering Matthias M. Saurer et.al. 2410.06465 null
2024-10-08 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Jiawei Mao et.al. 2410.06244 null
2024-10-08 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way Jiazi Bu et.al. 2410.06241 null
2024-10-08 SD- $π$ XL: Generating Low-Resolution Quantized Imagery via Score Distillation Alexandre Binninger et.al. 2410.06236 link
2024-10-08 GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation Chi-Lam Cheang et.al. 2410.06158 null
2024-10-07 The Dawn of Video Generation: Preliminary Explorations with SORA-like Models Ailing Zeng et.al. 2410.05227 null
2024-10-07 Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality Ge Ya et.al. 2410.05203 link
2024-10-07 Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning Ayano Hiranaka et.al. 2410.05116 null
2024-10-07 OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction Leheng Li et.al. 2410.04932 null
2024-10-07 PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing Feng Tian et.al. 2410.04844 link
2024-10-07 ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction Hyungjin Chung et.al. 2410.04721 null
2024-10-06 Realizing Video Summarization from the Path of Language-based Semantic Understanding Kuan-Chen Mu et.al. 2410.04511 null
2024-10-06 Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training Wenbo Li et.al. 2410.04439 null
2024-10-06 Disentangling Regional Primitives for Image Generation Zhengting Chen et.al. 2410.04421 null
2024-10-05 The Visualization JUDGE : Can Multimodal Foundation Models Guide Visualization Design Through Visual Perception? Matthew Berger et.al. 2410.04280 null
2024-10-04 Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features Benyuan Meng et.al. 2410.03558 link
2024-10-04 Dynamic Diffusion Transformer Wangbo Zhao et.al. 2410.03456 link
2024-10-04 Images Speak Volumes: User-Centric Assessment of Image Generation for Accessible Communication Miriam Anschütz et.al. 2410.03430 link
2024-10-04 LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding Doohyuk Jang et.al. 2410.03355 null
2024-10-04 Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization Zichen Miao et.al. 2410.03190 null
2024-10-04 Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach Yaofang Liu et.al. 2410.03160 link
2024-10-04 ECHOPulse: ECG controlled echocardio-grams video generation Yiwei Li et.al. 2410.03143 link
2024-10-03 Revealing the Unseen: Guiding Personalized Diffusion Models to Expose Training Data Xiaoyu Wu et.al. 2410.03039 null
2024-10-03 Loong: Generating Minute-level Long Videos with Autoregressive Language Models Yuqing Wang et.al. 2410.02757 null
2024-10-03 SteerDiff: Steering towards Safe Text-to-Image Diffusion Models Hongxiang Zhang et.al. 2410.02710 null
2024-10-03 ControlAR: Controllable Image Generation with Autoregressive Models Zongming Li et.al. 2410.02705 link
2024-10-03 Grounded Answers for Multi-agent Decision-making Problem through Generative World Model Zeyang Liu et.al. 2410.02664 null
2024-10-03 Event-Customized Image Generation Zhen Wang et.al. 2410.02483 null
2024-10-04 Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation Muzhi Zhu et.al. 2410.02369 link
2024-10-03 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Jintao Zhang et.al. 2410.02367 link
2024-10-03 Plug-and-Play Controllable Generation for Discrete Masked Models Wei Guo et.al. 2410.02143 null
2024-10-02 EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing Haotian Sun et.al. 2410.02098 null
2024-10-02 DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation Jing He et.al. 2410.02067 null
2024-10-02 Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space Yangming Li et.al. 2410.01796 null
2024-10-02 ImageFolder: Autoregressive Image Generation with Folded Tokens Xiang Li et.al. 2410.01756 link
2024-10-02 ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Rinon Gal et.al. 2410.01731 null
2024-10-02 COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation Mingzhen Sun et.al. 2410.01718 null
2024-10-02 Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding Yao Teng et.al. 2410.01699 link
2024-10-02 Data Extrapolation for Text-to-image Generation on Small Datasets Senmao Ye et.al. 2410.01638 link
2024-10-02 KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models Pouyan Navard et.al. 2410.01595 link
2024-10-02 MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation Mingzhen Sun et.al. 2410.01594 link
2024-10-02 Edge-preserving noise for diffusion models Jente Vandersanden et.al. 2410.01540 null
2024-10-02 Aggregation of Multi Diffusion Models for Enhancing Learned Representations Conghan Yue et.al. 2410.01262 link
2024-09-30 Inverse Painting: Reconstructing The Painting Process Bowei Chen et.al. 2409.20556 null
2024-09-30 All-optical autoencoder machine learning framework using diffractive processors Peijie Feng et.al. 2409.20346 null
2024-09-30 Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs Zicheng Zhang et.al. 2409.20063 null
2024-09-30 Illustrious: an Open Advanced Illustration Model Sang Hyun Park et.al. 2409.19946 null
2024-09-30 MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation Wenchao Chen et.al. 2409.19937 null
2024-09-30 Replace Anyone in Videos Xiang Wang et.al. 2409.19911 link
2024-09-29 OrganiQ: Mitigating Classical Resource Bottlenecks of Quantum Generative Adversarial Networks on NISQ-Era Machines Daniel Silver et.al. 2409.19823 null
2024-09-29 Simple and Fast Distillation of Diffusion Models Zhenyu Zhou et.al. 2409.19681 link
2024-09-29 Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection Yuhang Ma et.al. 2409.19624 null
2024-09-29 Effective Diffusion Transformer Architecture for Image Super-Resolution Kun Cheng et.al. 2409.19589 link
2024-09-27 PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Shaowei Liu et.al. 2409.18964 link
2024-09-27 Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions Iskander Azangulov et.al. 2409.18804 null
2024-09-26 Realistic Evaluation of Model Merging for Compositional Generalization Derek Tam et.al. 2409.18314 link
2024-09-26 Harnessing Wavelet Transformations for Generalizable Deepfake Forgery Detection Lalith Bharadwaj Baru et.al. 2409.18301 link
2024-09-26 Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey Yi Zhang et.al. 2409.18214 link
2024-09-26 FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner Wenliang Zhao et.al. 2409.18128 link
2024-09-26 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Jing He et.al. 2409.18124 null
2024-09-26 DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models Helin Cao et.al. 2409.18092 null
2024-09-26 Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion Hengrui Gu et.al. 2409.17928 link
2024-09-26 Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation Qihan Huang et.al. 2409.17920 link
2024-09-26 Text Image Generation for Low-Resource Languages with Dual Translation Learning Chihiro Noguchi et.al. 2409.17747 null
2024-09-26 AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status Jinghao Zhang et.al. 2409.17740 null
2024-09-26 Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation Huan Yang et.al. 2409.17674 null
2024-09-26 ID $^3$ : Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition Shen Li et.al. 2409.17576 null
2024-09-26 Pixel-Space Post-Training of Latent Diffusion Models Christina Zhang et.al. 2409.17565 null
2024-09-25 GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design Phillip Mueller et.al. 2409.17045 null
2024-09-25 Pose-Guided Fine-Grained Sign Language Video Generation Tongkai Shi et.al. 2409.16709 null
2024-09-25 Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation Youngwan Jin et.al. 2409.16706 link
2024-09-25 Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement Yihao Zhou et.al. 2409.16661 null
2024-09-25 ECG-Image-Database: A Dataset of ECG Images with Real-World Imaging and Scanning Artifacts; A Foundation for Computerized ECG Image Digitization and Analysis Matthew A. Reyna et.al. 2409.16612 link
2024-09-24 Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Homanga Bharadhwaj et.al. 2409.16283 null
2024-09-24 MonoFormer: One Transformer for Both Diffusion and Autoregression Chuyang Zhao et.al. 2409.16280 link
2024-09-24 Label-Augmented Dataset Distillation Seoungyoon Kang et.al. 2409.16239 null
2024-09-24 MaskBit: Embedding-free Image Generation via Bit Tokens Mark Weber et.al. 2409.16211 link
2024-09-26 Enhanced Unsupervised Image-to-Image Translation Using Contrastive Learning and Histogram of Oriented Gradients Wanchen Zhao et.al. 2409.16042 null
2024-09-24 Deep chroma compression of tone-mapped images Xenios Milidonis et.al. 2409.16032 link
2024-09-24 Improvements to SDXL in NovelAI Diffusion V3 Juan Ossa et.al. 2409.15997 null
2024-09-23 Critic Loss for Image Classification Brendan Hogan Rappazzo et.al. 2409.15565 null
2024-09-23 Bayesian computation with generative diffusion models by Multilevel Monte Carlo Abdul-Lateef Haji-Ali et.al. 2409.15511 link
2024-09-23 Revealing an Unattractivity Bias in Mental Reconstruction of Occluded Faces using Generative Image Models Frederik Riedmann et.al. 2409.15443 null
2024-09-18 Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance Jaehoon Joo et.al. 2409.12099 null
2024-09-18 ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation Peiyu Li et.al. 2409.12010 link
2024-09-18 Tracking Any Point with Frame-Event Fusion Network at High Frame Rate Jiaxiong Liu et.al. 2409.11953 null
2024-09-18 Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation Dimitrios Christodoulou et.al. 2409.11904 null
2024-09-18 RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and Blankets Jikai Ye et.al. 2409.11831 null
2024-09-18 GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation Shuowen Liang et.al. 2409.11689 link
2024-09-17 Using Physics Informed Generative Adversarial Networks to Model 3D porous media Zihan Ren et.al. 2409.11541 null
2024-09-17 OSV: One Step is Enough for High-Quality Image to Video Generation Xiaofeng Mao et.al. 2409.11367 null
2024-09-17 Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Gonzalo Martin Garcia et.al. 2409.11355 link
2024-09-17 OmniGen: Unified Image Generation Shitao Xiao et.al. 2409.11340 link
2024-09-18 The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives Samee Arif et.al. 2409.11261 link
2024-09-17 Improving the Efficiency of Visually Augmented Language Models Paula Ontalvilla et.al. 2409.11148 link
2024-09-17 MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance Debin Meng et.al. 2409.11010 link
2024-09-16 Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Bingchen Liu et.al. 2409.10695 null
2024-09-16 SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing Qi Qian et.al. 2409.10476 null
2024-09-16 VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation Aaron Mark Thomas et.al. 2409.10339 null
2024-09-16 On Synthetic Texture Datasets: Challenges, Creation, and Curation Blaine Hoak et.al. 2409.10297 null
2024-09-16 Embodiment-Agnostic Action Planning via Object-Part Scene Flow Weiliang Tang et.al. 2409.10032 null
2024-09-15 GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion Vitor Guizilini et.al. 2409.09896 null
2024-09-15 Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$ -divergence Minimization Haoyuan Sun et.al. 2409.09774 null
2024-09-15 MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection Yaning Zhang et.al. 2409.09724 link
2024-09-15 Finetuning CLIP to Reason about Pairwise Differences Dylan Sam et.al. 2409.09721 link
2024-09-15 E-Commerce Inpainting with Mask Guidance in Controlnet for Reducing Overcompletion Guandong Li et.al. 2409.09681 null
2024-09-13 InstantDrag: Improving Interactivity in Drag-based Image Editing Joonghyuk Shin et.al. 2409.08857 null
2024-09-13 STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment Yong Ren et.al. 2409.08601 null
2024-09-13 Enhancing Privacy in ControlNet and Stable Diffusion via Split Learning Dixi Yao et.al. 2409.08503 null
2024-09-12 Click2Mask: Local Editing with Dynamic Mask Generation Omer Regev et.al. 2409.08272 link
2024-09-12 TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder NaHyeon Park et.al. 2409.08248 link
2024-09-12 IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Yinwei Wu et.al. 2409.08240 null
2024-09-12 High-Frequency Anti-DreamBooth: Robust Defense Against Image Synthesis Takuto Onikubo et.al. 2409.08167 link
2024-09-12 EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance Zicheng Duan et.al. 2409.08091 link
2024-09-12 Scribble-Guided Diffusion for Training-free Text-to-Image Generation Seonho Lee et.al. 2409.08026 link
2024-09-11 DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures Steven Hogue et.al. 2409.07649 null
2024-09-11 Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Haibo Yang et.al. 2409.07452 link
2024-09-11 FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process Yang Luo et.al. 2409.07451 null
2024-09-11 Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy Somayeh Pakdelmoez et.al. 2409.07422 null
2024-09-11 EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion Jian Zhang et.al. 2409.07255 link
2024-09-11 Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education Ali Forootani et.al. 2409.07110 link
2024-09-10 DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images Taslim Murad et.al. 2409.06694 null
2024-09-10 SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Teng Hu et.al. 2409.06633 null
2024-09-10 PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation Ginger Delmas et.al. 2409.06535 null
2024-09-10 DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement Jia-Wei Liao et.al. 2409.06355 null
2024-09-10 G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer Jinzhi Zhang et.al. 2409.06322 null
2024-09-11 MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control Yining Yao et.al. 2409.06189 null
2024-09-09 SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values Chengwei Sun et.al. 2409.05926 null
2024-09-11 DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation Wei Wu et.al. 2409.05463 null
2024-09-09 CipherDM: Secure Three-Party Inference for Diffusion Model Sampling Xin Zhao et.al. 2409.05414 null
2024-09-09 TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors Yichuan Mo et.al. 2409.05294 link
2024-09-08 Can OOD Object Detectors Learn from Foundation Models? Jiahui Liu et.al. 2409.05162 link
2024-09-07 Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation Jiaxin Cheng et.al. 2409.04847 link
2024-09-07 SpotActor: Training-Free Layout-Controlled Consistent Image Generation Jiahao Wang et.al. 2409.04801 null
2024-09-07 Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis Arjun Krishna et.al. 2409.04670 null
2024-09-06 VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Yecheng Wu et.al. 2409.04429 link
2024-09-06 Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Zhuoyan Luo et.al. 2409.04410 link
2024-09-06 Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks Hangcheng Cao et.al. 2409.04133 null
2024-09-06 Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Jing Wang et.al. 2409.04005 link
2024-09-06 DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes Jianbiao Mei et.al. 2409.04003 link
2024-09-05 ArtiFade: Learning to Generate High-quality Subject from Blemished Images Shuya Yang et.al. 2409.03745 null
2024-09-05 Blended Latent Diffusion under Attention Control for Real-World Video Editing Deyin Liu et.al. 2409.03514 null
2024-09-05 Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks Akshay Jain et.al. 2409.03458 link
2024-09-05 Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities Wei Lu et.al. 2409.03444 link
2024-09-09 RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning Lawrence Yunliang Chen et.al. 2409.03403 null
2024-09-05 Enhancing digital core image resolution using optimal upscaling algorithm: with application to paired SEM images Shaohua You et.al. 2409.03265 null
2024-09-06 HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts Xinyu Liu et.al. 2409.02919 link
2024-09-04 PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation Jun Ling et.al. 2409.02657 null
2024-09-04 Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects Kyungmin Jo et.al. 2409.02653 null
2024-09-05 Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Jianwen Jiang et.al. 2409.02634 null
2024-09-04 StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models Wen Li et.al. 2409.02543 link
2024-09-04 A Learnable Color Correction Matrix for RAW Reconstruction Anqi Liu et.al. 2409.02497 null
2024-09-04 Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing Siyi Chen et.al. 2409.02374 link
2024-09-03 QID $^2$ : An Image-Conditioned Diffusion Model for Q-space Up-sampling of DWI Data Zijian Chen et.al. 2409.02309 null
2024-09-03 DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Wenbo Hu et.al. 2409.02095 link
2024-09-03 Probing Noncentrosymmetric 2D Materials by Fourier Space Second Harmonic Imaging Lucas Lafeta et.al. 2409.02071 null
2024-08-30 CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion Yiran Chen et.al. 2408.17424 null
2024-08-30 Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution Yixin Wu et.al. 2408.17285 null
2024-08-30 VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers Juncan Deng et.al. 2408.17131 null
2024-08-30 FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition Chen Hu et.al. 2408.17090 link
2024-08-30 Text-to-Image Generation Via Energy-Based CLIP Roy Ganz et.al. 2408.17046 null
2024-08-30 AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding Yonghui Wang et.al. 2408.16986 link
2024-08-30 Contrastive Learning with Synthetic Positives Dewen Zeng et.al. 2408.16965 link
2024-08-29 STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models Koushik Srivatsan et.al. 2408.16807 link
2024-09-04 CSGO: Content-Style Composition in Text-to-Image Generation Peng Xing et.al. 2408.16766 null
2024-08-29 One-Shot Learning Meets Depth Diffusion in Multi-Object Videos Anisha Jain et.al. 2408.16704 null
2024-08-29 GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models Moreno D’Incà et.al. 2408.16700 link
2024-08-29 DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving Yongjie Fu et.al. 2408.16647 null
2024-08-29 RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model Zhuan Shi et.al. 2408.16634 null
2024-08-29 GRPose: Learning Graph Relations for Human Image Generation with Pose Priors Xiangchen Yin et.al. 2408.16540 link
2024-08-29 Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation Xiaoyu Jin et.al. 2408.16506 null
2024-08-29 Spiking Diffusion Models Jiahang Cao et.al. 2408.16467 link
2024-08-29 ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding Minghang Zheng et.al. 2408.16314 link
2024-08-29 Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation Yanghao Wang et.al. 2408.16266 link
2024-08-28 Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data Ayodeji Ijishakin et.al. 2408.15890 null
2024-08-28 GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model Yongjie Fu et.al. 2408.15868 null
2024-08-28 Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas Fabio Quattrini et.al. 2408.15660 link
2024-08-28 Hand1000: Generating Realistic Hands from Text with Only 1,000 Images Haozhuo Zhang et.al. 2408.15461 null
2024-08-28 Avoiding Generative Model Writer’s Block With Embedding Nudging Ali Zand et.al. 2408.15450 null
2024-08-27 GenRec: Unifying Video Generation and Recognition with Diffusion Models Zejia Weng et.al. 2408.15241 link
2024-08-27 Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance Weiyi Zhang et.al. 2408.15217 link
2024-08-27 Alfie: Democratising RGBA Image Generation With No $$$ Fabio Quattrini et.al. 2408.14826 link
2024-08-27 Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation Abdelrahman Eldesokey et.al. 2408.14819 null
2024-08-27 CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis Weijia Li et.al. 2408.14765 null
2024-08-27 Sequential-Scanning Dual-Energy CT Imaging Using High Temporal Resolution Image Reconstruction and Error-Compensated Material Basis Image Generation Qiaoxin Li et.al. 2408.14754 null
2024-08-27 Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation Bochao Liu et.al. 2408.14738 null
2024-08-26 DIAGen: Diverse Image Augmentation with Generative Models Tobias Lingenberg et.al. 2408.14584 link
2024-08-26 GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy Peiyan Li et.al. 2408.14368 link
2024-08-26 ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty Xindi Wu et.al. 2408.14339 null
2024-08-26 Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models Chaohua Shi et.al. 2408.14135 null
2024-08-26 SurGen: Text-Guided Diffusion Model for Surgical Video Generation Joseph Cho et.al. 2408.14028 null
2024-08-27 RT-Attack: Jailbreaking Text-to-Image Models via Random Token Sensen Gao et.al. 2408.13896 null
2024-08-25 Prior Learning in Introspective VAEs Ioannis Athanasiadis et.al. 2408.13805 null
2024-08-25 SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting Wenrui Li et.al. 2408.13711 link
2024-08-27 Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing Yitong Yang et.al. 2408.13623 null
2024-08-24 DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation Ying Jin et.al. 2408.13509 link
2024-08-24 Explainable Concept Generation through Vision-Language Preference Learning Aditya Taparia et.al. 2408.13438 null
2024-08-23 CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities Tao Wu et.al. 2408.13239 link
2024-08-23 Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation Bonan Li et.al. 2408.13149 null
2024-08-23 G3FA: Geometry-guided GAN for Face Animation Alireza Javanmardi et.al. 2408.13049 null
2024-08-23 EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation Cong Wang et.al. 2408.13005 null
2024-08-22 Unlocking Intrinsic Fairness in Stable Diffusion Eunji Kim et.al. 2408.12692 null
2024-08-22 xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Can Qin et.al. 2408.12590 null
2024-08-22 Real-Time Video Generation with Pyramid Attention Broadcast Xuanlei Zhao et.al. 2408.12588 link
2024-08-25 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Jinheng Xie et.al. 2408.12528 null
2024-08-22 CODE: Confident Ordinary Differential Editing Bastien van Delft et.al. 2408.12418 link
2024-08-22 Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce Ádám Tibor Czapp et.al. 2408.12392 null
2024-08-22 Scalable Autoregressive Image Generation with Mamba Haopeng Li et.al. 2408.12245 link
2024-08-22 MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient Yanzeng Li et.al. 2408.12236 null
2024-08-22 BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking Hanzheng Wang et.al. 2408.12232 null
2024-08-22 DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models Wuchao Li et.al. 2408.12153 null
2024-08-21 Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis et.al. 2408.11804 link
2024-08-21 DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework Zhifei Xie et.al. 2408.11788 null
2024-08-21 Iterative Object Count Optimization for Text-to-image Diffusion Models Oz Zafar et.al. 2408.11721 null
2024-08-21 FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting Liyao Jiang et.al. 2408.11706 null
2024-08-21 TrackGo: A Flexible and Efficient Method for Controllable Video Generation Haitao Zhou et.al. 2408.11475 null
2024-08-21 Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection Jingwei Sun et.al. 2408.11408 link
2024-08-21 Gender Bias Evaluation in Text-to-image Generation: A Survey Yankun Wu et.al. 2408.11358 null
2024-08-21 UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation Xiangyu Zhao et.al. 2408.11305 link
2024-08-20 Compress Guidance in Conditional Diffusion Sampling Anh-Dung Dinh et.al. 2408.11194 null
2024-08-20 MS $^3$ D: A RG Flow-Based Regularization for GAN Training with Limited Data Jian Wang et.al. 2408.11135 null
2024-08-20 MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Haoning Wu et.al. 2408.11001 link
2024-08-20 A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse Zhongliang Guo et.al. 2408.10901 null
2024-08-21 MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration Yanbo Ding et.al. 2408.10605 link
2024-08-20 Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models Cong Wan et.al. 2408.10571 link
2024-08-19 Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation Liu He et.al. 2408.10453 null
2024-08-19 The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks Niyar R Barman et.al. 2408.10446 null
2024-08-19 Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Tao Yang et.al. 2408.10119 null
2024-08-19 Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation Yunxin Li et.al. 2408.09787 link
2024-08-19 TraDiffusion: Trajectory-Based Training-Free Image Generation Mingrui Wu et.al. 2408.09739 link
2024-08-21 Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning Zhi Qiao et.al. 2408.09731 null
2024-08-18 AnomalyFactory: Regard Anomaly Generation as Unsupervised Anomaly Localization Ying Zhao et.al. 2408.09533 null
2024-08-18 Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned Pairs Bowen Xin et.al. 2408.09432 null
2024-08-18 SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama Jing Tang et.al. 2408.09333 link
2024-08-16 PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future Guangyi Wang et.al. 2408.08822 link
2024-08-16 An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation Peiming Guo et.al. 2408.08650 link
2024-08-16 Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness Hefei Mei et.al. 2408.08502 link
2024-08-15 JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Xiaochuang Han et.al. 2408.08459 null
2024-08-15 METR: Image Watermarking with Large Number of Unique Messages Alexander Varlamov et.al. 2408.08340 link
2024-08-15 Can Large Language Models Understand Symbolic Graphics Programs? Zeju Qiu et.al. 2408.08313 null
2024-08-15 Accelerated Image-Aware Generative Diffusion Modeling Tanmay Asthana et.al. 2408.08306 null
2024-08-15 Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding Xiner Li et.al. 2408.08252 link
2024-08-16 FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Jiasong Feng et.al. 2408.08189 null
2024-08-15 When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding Pingping Zhang et.al. 2408.08093 null
2024-08-15 A Novel Generative Artificial Intelligence Method for Interference Study on Multiplex Brightfield Immunohistochemistry Images Satarupa Mukherjee et.al. 2408.07860 null
2024-08-14 Boosting Unconstrained Face Recognition with Targeted Style Adversary Mohammad Saeed Ebrahimi Saadabadi et.al. 2408.07642 null
2024-08-14 Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving Yuqing Wen et.al. 2408.07605 null
2024-08-14 KIND: Knowledge Integration and Diversion in Diffusion Models Yucheng Xie et.al. 2408.07337 link
2024-08-13 Generative Photomontage Sean J. Liu et.al. 2408.07116 null
2024-08-13 Definition of multispectral camera system parameters to model the asteroid 2001 SN263 Gabriela de Carvalho Assis Goulart et.al. 2408.06886 null
2024-08-13 Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective Ouxiang Li et.al. 2408.06741 link
2024-08-13 DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion Yujia Wu et.al. 2408.06740 null
2024-08-13 DiffSG: A Generative Solver for Network Optimization with Diffusion Model Ruihuai Liang et.al. 2408.06701 link
2024-08-12 Prompt Recovery for Image Generation Models: A Comparative Study of Discrete Optimizers Joshua Nathaniel Williams et.al. 2408.06502 null
2024-08-15 ControlNeXt: Powerful and Efficient Control for Image and Video Generation Bohao Peng et.al. 2408.06070 link
2024-08-10 ZePo: Zero-Shot Portrait Stylization with Faster Sampling Jin Liu et.al. 2408.05492 link
2024-08-10 Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE Yiying Yang et.al. 2408.05477 null
2024-08-10 Artworks Reimagined: Exploring Human-AI Co-Creation through Body Prompting Jonas Oppenlaender et.al. 2408.05476 null
2024-08-10 High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model Weizhi Zhong et.al. 2408.05416 null
2024-08-09 Instruction Tuning-free Visual Token Complement for Multimodal LLMs Dongsheng Wang et.al. 2408.05019 null
2024-08-09 DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting Jihoon Lee et.al. 2408.04962 null
2024-08-08 Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets Hao Li et.al. 2408.04777 null
2024-08-08 Zero-Shot Uncertainty Quantification using Diffusion Probabilistic Models Dule Shu et.al. 2408.04718 null
2024-08-08 Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics Ruining Li et.al. 2408.04631 null
2024-08-07 ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling William Y. Zhu et.al. 2408.04102 link
2024-08-07 Counterfactuals and Uncertainty-Based Explainable Paradigm for the Automated Detection and Segmentation of Renal Cysts in Computed Tomography Images: A Multi-Center Study Zohaib Salahuddin et.al. 2408.03789 null
2024-08-07 Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model Guoqing Zhu et.al. 2408.03748 link
2024-08-07 Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling Zilyu Ye et.al. 2408.03695 link
2024-08-07 A comparative study of generative adversarial networks for image recognition algorithms based on deep learning and traditional methods Yihao Zhong et.al. 2408.03568 null
2024-08-06 Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey Vu Tuan Truong et.al. 2408.03400 null
2024-08-06 IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Ciara Rowles et.al. 2408.03209 null
2024-08-06 An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Xingguang Yan et.al. 2408.03178 null
2024-08-06 Iterative CT Reconstruction via Latent Variable Optimization of Shallow Diffusion Models Sho Ozaki et.al. 2408.03156 null
2024-08-06 Multitask and Multimodal Neural Tuning for Large Models Hao Sun et.al. 2408.03001 null
2024-08-06 DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model Yiming Zhong et.al. 2408.02993 link
2024-08-05 Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services Shaopeng Fu et.al. 2408.02814 link
2024-08-05 Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Dongyang Liu et.al. 2408.02657 link
2024-08-05 VidGen-1M: A Large-Scale Dataset for Text-to-video Generation Zhiyu Tan et.al. 2408.02629 null
2024-08-06 ProCreate, Don’t Reproduce! Propulsive Energy Diffusion for Creative Generation Jack Lu et.al. 2408.02226 link
2024-08-04 PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance Aoming Liu et.al. 2408.02157 null
2024-08-04 LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation Dwij Mehta et.al. 2408.02078 null
2024-08-04 Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation Jean Yu et.al. 2408.02054 null
2024-08-04 Robustness of Watermarking on Text-to-Image Diffusion Models Xiaodong Wu et.al. 2408.02035 null
2024-08-03 SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm Junyan Ye et.al. 2408.01812 null
2024-08-03 A Novel Evaluation Framework for Image2Text Generation Jia-Hong Huang et.al. 2408.01723 null
2024-08-03 Controllable Unlearning for Image-to-Image Generative Models via $\varepsilon$ -Constrained Optimization Xiaohua Feng et.al. 2408.01689 null
2024-08-02 VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling Qian Zhang et.al. 2408.01181 link
2024-08-02 PINNs for Medical Image Analysis: A Survey Chayan Banerjee et.al. 2408.01026 null
2024-08-02 EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts Die Chen et.al. 2408.01014 null
2024-08-02 FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation Xiang Gao et.al. 2408.00998 link
2024-08-01 Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention Susung Hong et.al. 2408.00760 link
2024-08-01 Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function Matias Oscar Volman Stern et.al. 2408.00707 null
2024-08-01 A new approach for encoding code and assisting code understanding Mengdan Fan et.al. 2408.00521 null
2024-08-01 Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion Manuel Kansy et.al. 2408.00458 null
2024-08-01 Towards Reliable Advertising Image Generation Using Human Feedback Zhenbang Du et.al. 2408.00418 link
2024-08-01 DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving Xuemeng Yang et.al. 2408.00415 null
2024-08-01 On the Limitations and Prospects of Machine Unlearning for Generative AI Shiji Zhou et.al. 2408.00376 null
2024-08-01 Few-shot Defect Image Generation based on Consistency Modeling Qingfeng Shi et.al. 2408.00372 link
2024-08-01 Navigating Text-to-Image Generative Bias across Indic Languages Surbhi Mittal et.al. 2408.00283 null
2024-07-31 WAS: Dataset and Methods for Artistic Text Segmentation Xudong Xie et.al. 2408.00106 link
2024-07-31 Detecting, Explaining, and Mitigating Memorization in Diffusion Models Yuxin Wen et.al. 2407.21720 link
2024-07-31 Tora: Trajectory-oriented Diffusion Transformer for Video Generation Zhenghao Zhang et.al. 2407.21705 link
2024-07-31 Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation Junxuan Yu et.al. 2407.21490 null
2024-07-31 Fine-gained Zero-shot Video Sampling Dengsheng Chen et.al. 2407.21475 null
2024-07-31 Deformable 3D Shape Diffusion Model Dengsheng Chen et.al. 2407.21428 null
2024-07-31 Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model Zhichao Zhang et.al. 2407.21408 null
2024-07-31 Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging Wenhua Wu et.al. 2407.21381 null
2024-07-31 ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images Xilei Zhu et.al. 2407.21363 null
2024-07-30 Adding Multi-modal Controls to Whole-body Human Motion Generation Yuxuan Bian et.al. 2407.21136 link
2024-07-29 Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory Xiaoyan Xing et.al. 2407.20785 null
2024-07-30 Understanding the Impact of Synchronous, Asynchronous, and Hybrid In-Situ Techniques in Computational Fluid Dynamics Applications Yi Ju et.al. 2407.20717 null
2024-07-30 DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis Julien Lerouge et.al. 2407.20662 link
2024-07-30 Autonomous Improvement of Instruction Following Skills via Foundation Models Zhiyuan Zhou et.al. 2407.20635 link
2024-07-30 EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos Aashish Rai et.al. 2407.20592 null
2024-07-29 Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities Lorenzo Baraldi et.al. 2407.20337 link
2024-07-29 MaskInversion: Localized Embeddings via Optimization of Explainability Maps Walid Bousselham et.al. 2407.20034 null
2024-07-29 Reproducibility Study of “ITI-GEN: Inclusive Text-to-Image Generation” Daniel Gallo Fernández et.al. 2407.19996 link
2024-07-29 FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Yu Lu et.al. 2407.19918 null
2024-07-29 Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture Stefanos Gkikas et.al. 2407.19811 null
2024-07-28 Temporal Feature Matters: A Framework for Diffusion Model Quantization Yushi Huang et.al. 2407.19547 null
2024-07-28 VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary Hanjun Luo et.al. 2407.19524 link
2024-07-28 MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability Buyu Liu et.al. 2407.19468 link
2024-07-28 FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models Changgu Chen et.al. 2407.19453 link
2024-07-28 \textsc{Perm}: A Parametric Representation for Multi-Style 3D Hair Modeling Chengan He et.al. 2407.19451 link
2024-07-27 Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions Ashkan Taghipour et.al. 2407.19205 null
2024-07-26 SHIC: Shape-Image Correspondences with no Keypoint Supervision Aleksandar Shtedritski et.al. 2407.18907 null
2024-07-26 Adversarial Robustification via Text-to-Image Diffusion Models Daewon Choi et.al. 2407.18658 link
2024-07-25 AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild Junho Park et.al. 2407.18034 link
2024-07-25 Guided Latent Slot Diffusion for Object-Centric Learning Krishnakant Singh et.al. 2407.17929 null
2024-07-25 ReCorD: Reasoning and Correcting Diffusion for HOI Generation Jian-Yu Jiang-Lin et.al. 2407.17911 link
2024-07-24 SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Yiming Xie et.al. 2407.17470 null
2024-07-24 ViPer: Visual Personalization of Generative Models via Individual Preference Learning Sogand Salehi et.al. 2407.17365 null
2024-07-25 LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model Wanggong Yang et.al. 2407.17229 null
2024-07-24 MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models Chunsan Hong et.al. 2407.17095 link
2024-07-24 An Adaptive Gradient Regularization Method Huixiu Jiang et.al. 2407.16944 null
2024-07-24 Synthetic Trajectory Generation Through Convolutional Neural Networks Jesse Merhi et.al. 2407.16938 link
2024-07-23 Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions Fabio Tosi et.al. 2407.16698 link
2024-07-23 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Canyu Zhao et.al. 2407.16655 null
2024-07-23 On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models Deniz Daum et.al. 2407.16405 link
2024-07-23 Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data Hengyu Fu et.al. 2407.16134 null
2024-07-23 Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos Jiahe Liu et.al. 2407.16124 link
2024-07-22 DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design Zhi Hao Luo et.al. 2407.15723 link
2024-07-22 SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time Stanislav Frolov et.al. 2407.15507 link
2024-07-22 TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping Despina Konstantinidou et.al. 2407.15500 link
2024-07-22 DiffX: Guide Your Layout to Cross-Modal Generative Modeling Zeyu Wang et.al. 2407.15488 link
2024-07-22 Text2Place: Affordance-aware Text Guided Human Placement Rishubh Parihar et.al. 2407.15446 null
2024-07-23 BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM Hanjun Luo et.al. 2407.15240 link
2024-07-21 Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative Modelling Junn Yong Loo et.al. 2407.15238 null
2024-07-21 Flow as the Cross-Domain Manipulation Interface Mengda Xu et.al. 2407.15208 null
2024-07-21 The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation Junwen Luo et.al. 2407.15167 null
2024-07-21 Anchored Diffusion for Video Face Reenactment Idan Kligvasser et.al. 2407.15153 null
2024-07-19 T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation Kaiyue Sun et.al. 2407.14505 link
2024-07-19 Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations Decheng Liu et.al. 2407.14367 link
2024-07-19 Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model Kun Zhao et.al. 2407.14326 null
2024-07-19 Unlearning Concepts from Text-to-Video Diffusion Models Shiqi Liu et.al. 2407.14209 null
2024-07-19 Time Series Generative Learning with Application to Brain Imaging Analysis Zhenghao Li et.al. 2407.14003 null
2024-07-18 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Boyang Deng et.al. 2407.13759 null
2024-07-18 Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models Xiaoyu Zhu et.al. 2407.13642 null
2024-07-18 Training-free Composite Scene Generation for Layout-to-Image Synthesis Jiaqi Liu et.al. 2407.13609 link
2024-07-18 Multi-sentence Video Grounding for Long Video Generation Wei Feng et.al. 2407.13219 null
2024-07-18 Image Inpainting Models are Effective Tools for Instruction-guided Image Editing Xuan Ju et.al. 2407.13139 null
2024-07-19 From Principles to Practices: Lessons Learned from Applying Partnership on AI’s (PAI) Synthetic Media Framework to 11 Use Cases Claire R. Leibowicz et.al. 2407.13025 null
2024-07-17 Denoising Diffusions in Latent Space for Medical Image Segmentation Fahim Ahmed Zaman et.al. 2407.12952 link
2024-07-17 VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Sherwin Bahmani et.al. 2407.12781 null
2024-07-17 Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs Yiqing Shen et.al. 2407.12678 link
2024-07-17 Zero-shot Text-guided Infinite Image Synthesis with LLM guidance Soyeong Kwon et.al. 2407.12642 null
2024-07-17 Towards Understanding Unsafe Video Generation Yan Pang et.al. 2407.12581 link
2024-07-17 The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation Yi Yao et.al. 2407.12579 null
2024-07-17 I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps Junseo Park et.al. 2407.12331 null
2024-07-17 Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process Yang Cheng et.al. 2407.12261 null
2024-07-18 Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts Sam Yu-Te Lee et.al. 2407.12192 null
2024-07-16 Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis Haeil Lee et.al. 2407.12173 null
2024-07-16 Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning Yanting Miao et.al. 2407.12164 link
2024-07-16 Efficient Training with Denoised Neural Weights Yifan Gong et.al. 2407.11966 null
2024-07-16 Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model Dominik Winter et.al. 2407.11664 null
2024-07-16 Scaling Diffusion Transformers to 16 Billion Parameters Zhengcong Fei et.al. 2407.11633 link
2024-07-16 DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training Guillermo Jimenez-Perez et.al. 2407.11594 null
2024-07-16 How Control Information Influences Multilingual Text Image Generation and Editing? Boqiang Zhang et.al. 2407.11502 link
2024-07-16 AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models Lei Ren et.al. 2407.11480 null
2024-07-16 Cover-separable Fixed Neural Network Steganography via Deep Generative Models Guobiao Li et.al. 2407.11405 link
2024-07-16 Flatfish Disease Detection Based on Part Segmentation Approach and Disease Image Generation Seo-Bin Hwang et.al. 2407.11348 null
2024-07-16 Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems Yaşar Utku Alçalar et.al. 2407.11288 null
2024-07-15 IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation Yuanhao Zhai et.al. 2407.10937 link
2024-07-15 OPa-Ma: Text Guided Mamba for 360-degree Image Out-painting Penglei Gao et.al. 2407.10923 null
2024-07-16 DataDream: Few-shot Guided Dataset Generation Jae Myung Kim et.al. 2407.10910 link
2024-07-15 Optical Diffusion Models for Image Generation Ilker Oguz et.al. 2407.10897 null
2024-07-15 Physics-Inspired Generative Models in Medical Imaging: A Review Dennis Hein et.al. 2407.10856 null
2024-07-15 An Autonomous Drone Swarm for Detecting and Tracking Anomalies among Dense Vegetation Rakesh John Amala Arokia Nathan et.al. 2407.10754 null
2024-07-15 AccDiffusion: An Accurate Method for Higher-Resolution Image Generation Zhihang Lin et.al. 2407.10738 link
2024-07-15 Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval Youngsun Lim et.al. 2407.10683 null
2024-07-15 Spatio-temporal neural distance fields for conditional generative modeling of the heart Kristine Sørensen et.al. 2407.10663 link
2024-07-15 A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication Jingyi Deng et.al. 2407.10575 null
2024-07-12 FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 Georgios Makridis et.al. 2407.09467 null
2024-07-12 PID: Physics-Informed Diffusion Model for Infrared Image Generation Fangyuan Mao et.al. 2407.09299 link
2024-07-12 Surgical Text-to-Image Generation Chinedu Innocent Nwoye et.al. 2407.09230 null
2024-07-12 DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training Chen Xin et.al. 2407.09174 link
2024-07-12 Machine Apophenia: The Kaleidoscopic Generation of Architectural Images Alexey Tikhonov et.al. 2407.09172 null
2024-07-12 Inference Optimization of Foundation Models on AI Accelerators Youngsuk Park et.al. 2407.09111 null
2024-07-12 Bora: Biomedical Generalist Video Generation Model Weixiang Sun et.al. 2407.08944 null
2024-07-11 SEED-Story: Multimodal Long Story Generation with Large Language Model Shuai Yang et.al. 2407.08683 link
2024-07-11 CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs Leah Chong et.al. 2407.08675 null
2024-07-11 Still-Moving: Customized Video Generation without Customized Video Data Hila Chefer et.al. 2407.08674 null
2024-07-11 A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights Wentao Lei et.al. 2407.08428 link
2024-07-11 E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors Jinxiu Liang et.al. 2407.08231 null
2024-07-10 Generative Image as Action Models Mohit Shridhar et.al. 2407.07875 link
2024-07-10 StoryDiffusion: How to Support UX Storyboarding With Generative-AI Zhaohui Liang et.al. 2407.07672 null
2024-07-10 VEnhancer: Generative Space-Time Enhancement for Video Generation Jingwen He et.al. 2407.07667 null
2024-07-11 Trainable Highly-expressive Activation Functions Irit Chelly et.al. 2407.07564 link
2024-07-10 Video-to-Audio Generation with Hidden Alignment Manjie Xu et.al. 2407.07464 null
2024-07-10 Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis Jian-Qing Zheng et.al. 2407.07295 link
2024-07-09 Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion Yu Cao et.al. 2407.07249 null
2024-07-09 Accelerating Mobile Edge Generation (MEG) by Constrained Learning Xiaoxia Xu et.al. 2407.07245 null
2024-07-09 ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction Shaozhe Hao et.al. 2407.07077 link
2024-07-09 Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation Filipe Lauar et.al. 2407.06950 link
2024-07-09 HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance Guian Fang et.al. 2407.06937 link
2024-07-09 Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning Fanyue Wei et.al. 2407.06642 link
2024-07-09 Mobius: An High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task Yiran Yang et.al. 2407.06617 link
2024-07-09 Sketch-Guided Scene Image Generation Tianyu Zhang et.al. 2407.06469 null
2024-07-08 MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions Xuan Ju et.al. 2407.06358 null
2024-07-08 Dynamics of quantum turbulence in axially rotating thermal counterflow Ritesh Dwivedi et.al. 2407.06311 link
2024-07-08 VIMI: Grounding Video Generation through Multi-modal Instruction Yuwei Fang et.al. 2407.06304 null
2024-07-08 JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation Yu Zeng et.al. 2407.06187 null
2024-07-08 The Tug-of-War Between Deepfake Generation and Detection Hannah Lee et.al. 2407.06174 null
2024-07-08 PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models Jinhua Zhang et.al. 2407.06109 link
2024-07-08 MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition Hozaifa Kassab et.al. 2407.05980 null
2024-07-08 T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models Yibo Miao et.al. 2407.05965 null
2024-07-08 3D Vessel Graph Generation Using Denoising Diffusion Chinmay Prabhakar et.al. 2407.05842 link
2024-07-08 GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing Zhenyu Wang et.al. 2407.05600 null
2024-07-08 This&That: Language-Gesture Controlled Video Generation for Robot Planning Boyang Wang et.al. 2407.05530 null
2024-07-07 Diffusion as Sound Propagation: Physics-inspired Model for Ultrasound Image Generation Marina Domínguez et.al. 2407.05428 link
2024-07-07 Enhancing Label-efficient Medical Image Segmentation with Text-guided Diffusion Models Chun-Mei Feng et.al. 2407.05323 null
2024-07-05 PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation Yinghua Yao et.al. 2407.04493 link
2024-07-05 Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator Mehryar Abbasi et.al. 2407.04258 null
2024-07-04 Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion Yutian Zhong et.al. 2407.03992 link
2024-07-04 Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection Federico Girella et.al. 2407.03961 link
2024-07-04 Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations Zhiyang Xu et.al. 2407.03604 null
2024-07-03 BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations Zhantao Yang et.al. 2407.03314 null
2024-07-03 Towards High Resolution Real-Time Optical Flow Particle Image Velocimetry Juan Pimienta et.al. 2407.03057 null
2024-07-03 Robot Shape and Location Retention in Video Generation Using Diffusion Models Peng Wang et.al. 2407.02873 link
2024-07-03 Representation learning with CGAN for casual inference Zhaotian Weng et.al. 2407.02825 null
2024-07-03 Mobile Edge Generation-Enabled Digital Twin: Architecture Design and Research Opportunities Xiaoxia Xu et.al. 2407.02804 link
2024-07-02 OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Kepan Nan et.al. 2407.02371 null
2024-07-04 UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks Jingjing Ren et.al. 2407.02158 null
2024-07-02 SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules Suyi Li et.al. 2407.02031 null
2024-07-04 GVDIFF: Grounded Text-to-Video Generation with Diffusion Models Huanzhang Dou et.al. 2407.01921 null
2024-07-01 Label-free Neural Semantic Image Synthesis Jiayi Wang et.al. 2407.01790 null
2024-06-30 BADM: Batch ADMM for Deep Learning Ouya Wang et.al. 2407.01640 null
2024-07-01 Evaluation of Text-to-Video Generation Models: A Dynamics Perspective Mingxiang Liao et.al. 2407.01094 link
2024-06-30 InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation Haofan Wang et.al. 2407.00788 link
2024-06-30 Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation Peng Huang et.al. 2407.00752 null
2024-06-30 LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation Mushui Liu et.al. 2407.00737 null
2024-06-28 Wavelets Are All You Need for Autoregressive Image Generation Wael Mattar et.al. 2406.19997 null
2024-06-28 Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs Sangwon Jeong et.al. 2406.19987 null
2024-06-28 MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance Yuang Zhang et.al. 2406.19680 null
2024-06-28 PopAlign: Population-Level Alignment for Fair Text-to-Image Generation Shufan Li et.al. 2406.19668 link
2024-06-28 Network Bending of Diffusion Models for Audio-Visual Generation Luke Dzwonczyk et.al. 2406.19589 link
2024-06-27 What Matters in Detecting AI-Generated Videos like Sora? Chirui Chang et.al. 2406.19568 null
2024-06-27 Understanding Modality Preferences in Search Clarification Leila Tavakoli et.al. 2406.19546 link
2024-06-27 Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model Jiangtong Tan et.al. 2406.19030 link
2024-06-28 AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation Yanan Sun et.al. 2406.18958 link
2024-06-27 CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation Zuo Zuo et.al. 2406.18941 null
2024-06-26 MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data William Berman et.al. 2406.18790 null
2024-06-26 MultiDiff: Consistent Novel View Synthesis from a Single Image Norman Müller et.al. 2406.18524 null
2024-06-26 ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Shenghai Yuan et.al. 2406.18522 link
2024-06-26 DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance Younghyun Kim et.al. 2406.18459 link
2024-06-25 Text-Animator: Controllable Visual Text Video Generation Lin Liu et.al. 2406.17777 null
2024-06-25 MotionBooth: Motion-Aware Customized Text-to-Video Generation Jianzong Wu et.al. 2406.17758 null
2024-06-25 Detection of Synthetic Face Images: Accuracy, Robustness, Generalization Nela Petrzelkova et.al. 2406.17547 null
2024-06-25 TSynD: Targeted Synthetic Data Generation for Enhanced Medical Image Classification Joshua Niemeijer et.al. 2406.17473 null
2024-06-25 SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing Ruihuang Li et.al. 2406.17396 null
2024-06-25 Semantic Deep Hiding for Robust Unlearnable Examples Ruohan Meng et.al. 2406.17349 null
2024-06-25 Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers Lei Chen et.al. 2406.17343 link
2024-06-25 Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds Hongliang Zeng et.al. 2406.17342 null
2024-06-24 Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation Zhenyi Liao et.al. 2406.17100 link
2024-06-24 FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models Haonan Qiu et.al. 2406.16863 link
2024-06-24 Dreamitate: Real-World Visuomotor Policy Learning via Video Generation Junbang Liang et.al. 2406.16862 null
2024-06-24 DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Yuang Peng et.al. 2406.16855 link
2024-06-24 Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation Katherine M. Collins et.al. 2406.16807 null
2024-06-24 Repulsive Score Distillation for Diverse Sampling of Diffusion Models Nicolas Zilberstein et.al. 2406.16683 link
2024-06-24 EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations Zhiyu Tan et.al. 2406.16562 link
2024-06-24 Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization Yuhang Ma et.al. 2406.16537 link
2024-06-24 ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance Shuwei Shi et.al. 2406.16476 null
2024-06-24 Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models Yichen Sun et.al. 2406.16333 null
2024-06-24 Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement Zhiyuan Chang et.al. 2406.16272 link
2024-06-21 MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Xuan He et.al. 2406.15252 null
2024-06-21 Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors Ali Naseh et.al. 2406.15213 link
2024-06-21 Disability Representations: Finding Biases in Automatic Image Generation Yannis Tevissen et.al. 2406.14993 null
2024-06-21 Latent diffusion models for parameterization and data assimilation of facies-based geomodels Guido Di Federico et.al. 2406.14815 null
2024-06-20 Evaluating Numerical Reasoning in Text-to-Image Models Ivana Kajić et.al. 2406.14774 link
2024-06-20 Holistic Evaluation for Interleaved Text-and-Image Generation Minqian Liu et.al. 2406.14643 null
2024-06-20 Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps Nikita Starodubcev et.al. 2406.14539 null
2024-06-20 Fantastic Copyrighted Beasts and How (Not) to Generate Them Luxi He et.al. 2406.14526 null
2024-06-20 SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset Josef Dai et.al. 2406.14477 link
2024-06-20 Video Generation with Learned Action Prior Meenakshi Sarkar et.al. 2406.14436 null
2024-06-20 ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning Zhongjie Duan et.al. 2406.14130 link
2024-06-19 Splatter a Video: Video Gaussian Representation for Versatile Processing Yang-Tian Sun et.al. 2406.13870 null
2024-06-19 GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation Baiqi Li et.al. 2406.13743 link
2024-06-19 Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery Jonathan Gallagher et.al. 2406.13688 null
2024-06-19 Improving Visual Commonsense in Language Models via Multiple Image Generation Guy Yariv et.al. 2406.13621 link
2024-06-19 What’s Next? Exploring Utilization, Challenges, and Future Directions of AI-Generated Image Tools in Graphic Design Yuying Tang et.al. 2406.13436 null
2024-06-19 AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation Xinyu Hou et.al. 2406.12805 link
2024-06-18 Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images Shivank Garg et.al. 2406.12592 link
2024-06-18 Training Diffusion Models with Federated Learning Matthijs de Goede et.al. 2406.12575 null
2024-06-18 Generative Artificial Intelligence-Guided User Studies: An Application for Air Taxi Services Shengdi Xiao et.al. 2406.12296 null
2024-06-17 ARTIST: Improving the Generation of Text-rich Images by Disentanglement Jianyi Zhang et.al. 2406.12044 null
2024-06-17 Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models Alireza Ganjdanesh et.al. 2406.12042 link
2024-06-17 Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI Robert Hönig et.al. 2406.12027 link
2024-06-17 Decomposed evaluations of geographic disparities in text-to-image models Abhishek Sureddy et.al. 2406.11988 null
2024-06-17 Autoregressive Image Generation without Vector Quantization Tianhong Li et.al. 2406.11838 link
2024-06-17 Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% Lei Zhu et.al. 2406.11837 link
2024-06-17 Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Bingqi Ma et.al. 2406.11831 null
2024-06-17 PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models Fanqing Meng et.al. 2406.11802 link
2024-06-17 Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes Aghiles Kebaili et.al. 2406.11659 null
2024-06-17 GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation Shihao Cai et.al. 2406.11503 link
2024-06-17 Generative Visual Instruction Tuning Jefferson Hernandez et.al. 2406.11262 link
2024-06-17 NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation Niu Guanchen et.al. 2406.11259 null
2024-06-17 Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion Rishab Parthasarathy et.al. 2406.11196 link
2024-06-16 An Analysis on Quantizing Diffusion Transformers Yuewei Yang et.al. 2406.11100 null
2024-06-14 Make It Count: Text-to-Image Generation with an Accurate Number of Objects Lital Binyamin et.al. 2406.10210 null
2024-06-14 Crafting Parts for Expressive Object Composition Harsh Rangwani et.al. 2406.10197 null
2024-06-14 Training-free Camera Control for Video Generation Chen Hou et.al. 2406.10126 null
2024-06-14 High-efficiency generation of vectorial holograms with metasurfaces Tong Liu et.al. 2406.10072 null
2024-06-14 BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval Imanol Miranda et.al. 2406.09952 link
2024-06-14 ControlVAR: Exploring Controllable Visual Autoregressive Modeling Xiang Li et.al. 2406.09750 link
2024-06-13 Turns Out I’m Not Real: Towards Robust Detection of AI-Generated Videos Qingyuan Liu et.al. 2406.09601 null
2024-06-13 You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes Jabez Magomere et.al. 2406.09496 link
2024-06-13 Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Qihao Liu et.al. 2406.09416 link
2024-06-13 An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Duy-Kien Nguyen et.al. 2406.09415 null
2024-06-13 Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs Zijia Zhao et.al. 2406.09367 link
2024-06-13 Understanding Hallucinations in Diffusion Models through Mode Interpolation Sumukh K Aithal et.al. 2406.09358 link
2024-06-13 Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps Shuqing Li et.al. 2406.09313 null
2024-06-13 Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Yufan Zhou et.al. 2406.09305 null
2024-06-13 StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning Giuseppe Vecchio et.al. 2406.09293 null
2024-06-13 EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts Yucheng Han et.al. 2406.09162 null
2024-06-13 Complex Image-Generative Diffusion Transformer for Audio Denoising Junhui Li et.al. 2406.09161 null
2024-06-13 EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts Zahraa Al Sahili et.al. 2406.09070 null
2024-06-12 Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation Raphael Tang et.al. 2406.08482 null
2024-06-12 What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li et.al. 2406.08478 null
2024-06-12 PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences Daiwei Chen et.al. 2406.08469 link
2024-06-12 Diffusion Soup: Model Merging for Text-to-Image Diffusion Models Benjamin Biggs et.al. 2406.08431 null
2024-06-12 VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Jiannan Wu et.al. 2406.08394 link
2024-06-12 FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation Xinzhi Mu et.al. 2406.08392 null
2024-06-12 WMAdapter: Adding WaterMark Control to Latent Diffusion Models Hai Ci et.al. 2406.08337 null
2024-06-12 CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models Hyungjin Chung et.al. 2406.08070 null
2024-06-12 Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models Arman Zarei et.al. 2406.07844 link
2024-06-12 Hierarchical Patch Diffusion Models for High-Resolution Video Generation Ivan Skorokhodov et.al. 2406.07792 null
2024-06-11 Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? Xingyu Fu et.al. 2406.07546 null
2024-06-11 Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance Kuan Heng Lin et.al. 2406.07540 null
2024-06-11 Neural Gaffer: Relighting Any Object via Diffusion Haian Jin et.al. 2406.07520 null
2024-06-11 Instant 3D Human Avatar Generation using Image Diffusion Models Nikos Kolotouros et.al. 2406.07516 null
2024-06-11 Understanding Visual Concepts Across Models Brandon Trabucco et.al. 2406.07506 link
2024-06-11 Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions Renjie Pi et.al. 2406.07502 link
2024-06-12 SPIN: Spacecraft Imagery for Navigation Javier Montalvo et.al. 2406.07500 link
2024-06-11 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models Heng Yu et.al. 2406.07472 null
2024-06-11 Beware of Aliases – Signal Preservation is Crucial for Robust Image Restoration Shashank Agnihotri et.al. 2406.07435 null
2024-06-11 Visual Representation Learning with Stochastic Frame Prediction Huiwon Jang et.al. 2406.07398 null
2024-06-10 Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Peize Sun et.al. 2406.06525 link
2024-06-10 The Effect of Training Dataset Size on Discriminative and Diffusion-Based Speech Enhancement Systems Philippe Gonzalez et.al. 2406.06160 null
2024-06-10 ProcessPainter: Learn Painting Process from Sequence Data Yiren Song et.al. 2406.06062 link
2024-06-09 OmniControlNet: Dual-stage Integration for Conditional Image Generation Yilin Wang et.al. 2406.05871 null
2024-06-09 Unified Text-to-Image Generation and Retrieval Leigang Qu et.al. 2406.05814 null
2024-06-11 MLCM: Multistep Consistency Distillation of Latent Diffusion Model Qingsong Xie et.al. 2406.05768 link
2024-06-09 Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion Ge Ya Luo et.al. 2406.05630 link
2024-06-09 Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models Philip Wootaek Shin et.al. 2406.05602 null
2024-06-08 Medical Vision Generalist: Unifying Medical Imaging Tasks in Context Sucheng Ren et.al. 2406.05565 link
2024-06-08 Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models Minho Park et.al. 2406.05432 link
2024-06-07 CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion Xingrui Wang et.al. 2406.05082 null
2024-06-07 GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications Shakhnaz Akhmedova et.al. 2406.05023 link
2024-06-07 AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation Lianyu Pang et.al. 2406.05000 null
2024-06-07 Zero-Shot Video Editing through Adaptive Sliding Score Distillation Lianghan Zhu et.al. 2406.04888 null
2024-06-07 Online Continual Learning of Video Diffusion Models From a Single Video Stream Jason Yoo et.al. 2406.04814 null
2024-06-07 TEDi Policy: Temporally Entangled Diffusion for Robotic Control Sigmund H. Høeg et.al. 2406.04806 link
2024-06-07 PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction Eduard Poesina et.al. 2406.04746 link
2024-06-07 GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models Diptanu De et.al. 2406.04654 null
2024-06-07 CLoG: Benchmarking Continual Learning of Image Generation Models Haotian Zhang et.al. 2406.04584 link
2024-06-06 Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance Reyhane Askari Hemmat et.al. 2406.04551 null
2024-06-06 Coherent Zero-Shot Visual Instruction Generation Quynh Phung et.al. 2406.04337 null
2024-06-06 BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Yang Sui et.al. 2406.04333 link
2024-06-06 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Lin Chen et.al. 2406.04325 null
2024-06-06 SF-V: Single Forward Video Generation Model Zhixing Zhang et.al. 2406.04324 link
2024-06-06 VideoTetris: Towards Compositional Text-to-Video Generation Ye Tian et.al. 2406.04277 link
2024-06-06 Diffusion-based image inpainting with internal learning Nicolas Cherel et.al. 2406.04206 link
2024-06-06 Machine Learning-Driven Microwave Imaging for Soil Moisture Estimation near Leaky Pipe Mohammad Ramezaninia et.al. 2406.04193 null
2024-06-06 Quantum Implicit Neural Representations Jiaming Zhao et.al. 2406.03873 link
2024-06-06 Semantic Similarity Score for Measuring Visual Similarity at Semantic Level Senran Fan et.al. 2406.03865 null
2024-06-06 Malware Classification Based on Image Segmentation Wanhu Nie et.al. 2406.03831 link
2024-06-05 Tackling GenAI Copyright Issues: Originality Estimation and Genericization Hiroaki Chiba-Okabe et.al. 2406.03341 link
2024-06-05 Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Hao Wen et.al. 2406.03184 link
2024-06-05 Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control Jingyun Xue et.al. 2406.03035 null
2024-06-05 Language-guided Detection and Mitigation of Unknown Dataset Bias Zaiying Zhao et.al. 2406.02889 null
2024-06-06 Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter Peng Xing et.al. 2406.02881 null
2024-06-04 Latent Style-based Quantum GAN for high-quality Image Generation Su Yeon Chang et.al. 2406.02668 null
2024-06-04 ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation Tianchen Zhao et.al. 2406.02540 link
2024-06-04 DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering Zhongpai Gao et.al. 2406.02518 null
2024-06-04 V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Cong Wang et.al. 2406.02511 null
2024-06-04 CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation Dejia Xu et.al. 2406.02509 null
2024-06-04 Guiding a Diffusion Model with a Bad Version of Itself Tero Karras et.al. 2406.02507 link
2024-06-04 Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation Jiajun Wang et.al. 2406.02485 link
2024-06-04 Generative Active Learning for Long-tailed Instance Segmentation Muzhi Zhu et.al. 2406.02435 link
2024-06-04 Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation Clement Chadebec et.al. 2406.02347 link
2024-06-04 I4VGen: Image as Stepping Stone for Text-to-Video Generation Xiefan Guo et.al. 2406.02230 null
2024-06-04 The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise Yuanhao Ban et.al. 2406.01970 null
2024-05-31 Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Jiatao Gu et.al. 2405.21048 null
2024-05-31 You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet Zhen Qin et.al. 2405.21022 null
2024-05-31 Amortizing intractable inference in diffusion models for vision, language, and control Siddarth Venkatraman et.al. 2405.20971 link
2024-05-31 Information Theoretic Text-to-Image Alignment Chao Wang et.al. 2405.20759 null
2024-05-31 Diffusion Models Are Innate One-Step Generators Bowen Zheng et.al. 2405.20750 link
2024-05-31 Cyclic image generation using chaotic dynamics Takaya Tanaka et.al. 2405.20717 link
2024-05-31 Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space Yukai Zhang et.al. 2405.20685 null
2024-05-31 4Diffusion: Multi-view Video Diffusion Model for 4D Generation Haiyu Zhang et.al. 2405.20674 null
2024-05-31 Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation Shuzhou Yang et.al. 2405.20669 link
2024-05-31 Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization Yisu Liu et.al. 2405.20584 link
2024-05-30 Improving the Training of Rectified Flows Sangyun Lee et.al. 2405.20320 link
2024-05-30 CV-VAE: A Compatible Video VAE for Latent Generative Video Models Sijie Zhao et.al. 2405.20279 link
2024-05-30 MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Muyao Niu et.al. 2405.20222 link
2024-05-30 Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback Sanghyeon Na et.al. 2405.20216 null
2024-05-30 RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection Zhiyuan He et.al. 2405.20112 null
2024-05-30 Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion Jiangkai Wu et.al. 2405.20032 link
2024-05-30 Mitigating annotation shift in cancer classification using single image generative models Marta Buetas Arcas et.al. 2405.19754 link
2024-05-30 DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark Haoxing Chen et.al. 2405.19707 link
2024-05-30 Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian Wei Sun et.al. 2405.19657 null
2024-05-29 MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection Raman Dutt et.al. 2405.19458 link
2024-05-29 ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning Ruchika Chavhan et.al. 2405.19237 link
2024-05-29 Going beyond compositional generalization, DDPMs can produce zero-shot interpolation Justin Deschenaux et.al. 2405.19201 link
2024-05-29 The ethical situation of DALL-E 2 Eduard Hogea et.al. 2405.19176 null
2024-05-29 Patch-enhanced Mask Encoder Prompt Image Generation Shusong Xu et.al. 2405.19085 null
2024-05-29 EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Jiaqi Xu et.al. 2405.18991 link
2024-05-29 Topological Perspectives on Optimal Multimodal Embedding Spaces Abdul Aziz A. B et.al. 2405.18867 null
2024-05-30 Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation Jiyoon Myung et.al. 2405.18762 null
2024-05-29 T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Jiachen Li et.al. 2405.18750 link
2024-05-29 SketchDeco: Decorating B&W Sketches with Colour Chaitat Utintu et.al. 2405.18716 link
2024-05-28 Scalable Surrogate Verification of Image-based Neural Network Control Systems using Composition and Unrolling Feiyang Cai et.al. 2405.18554 null
2024-05-28 Phased Consistency Model Fu-Yun Wang et.al. 2405.18407 link
2024-05-28 RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives Jaehong Yoon et.al. 2405.18406 link
2024-05-28 VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers Jun Zheng et.al. 2405.18326 null
2024-05-28 Multi-modal Generation via Cross-Modal In-Context Learning Amandeep Kumar et.al. 2405.18304 link
2024-05-28 EG4D: Explicit Generation of 4D Object without Score Distillation Qi Sun et.al. 2405.18132 link
2024-05-28 Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers? Zebin You et.al. 2405.18029 null
2024-05-28 MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling Bowen Zhang et.al. 2405.18003 link
2024-05-28 Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage Detection Zhengji Li et.al. 2405.17905 null
2024-05-28 Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation Akio Hayakawa et.al. 2405.17842 link
2024-05-27 RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance Jiaojiao Fan et.al. 2405.17661 null
2024-05-27 Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Zhengfei Kuang et.al. 2405.17414 null
2024-05-27 Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Ruizhi Shao et.al. 2405.17405 null
2024-05-27 Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability Shenyuan Gao et.al. 2405.17398 link
2024-05-27 Prompt Optimization with Human Feedback Xiaoqiang Lin et.al. 2405.17346 link
2024-05-28 Controllable Longer Image Animation with Diffusion Models Qiang Wang et.al. 2405.17306 null
2024-05-27 Training-free Editioning of Text-to-Image Models Jinqi Wang et.al. 2405.17069 null
2024-05-27 The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models Saravanan Kandasamy et.al. 2405.17068 null
2024-05-27 Glauber Generative Model: Discrete Diffusion Models via Binary Classification Harshit Varma et.al. 2405.17035 null
2024-05-27 Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation Liang Shi et.al. 2405.16895 null
2024-05-27 Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks Yunqi Zhang et.al. 2405.16860 link
2024-05-24 A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence Ali Kashefi et.al. 2405.15406 link
2024-05-24 Stochastic SR for Gaussian microtextures Emile Pierret et.al. 2405.15399 null
2024-05-24 Challenges and Opportunities in 3D Content Generation Ke Zhao et.al. 2405.15335 null
2024-05-24 Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model Mingyang Yi et.al. 2405.15330 null
2024-05-24 SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance Guibao Shen et.al. 2405.15321 null
2024-05-24 Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient Yongliang Wu et.al. 2405.15304 link
2024-05-24 StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models Chengming Xu et.al. 2405.15287 null
2024-05-24 Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models Yimeng Zhang et.al. 2405.15234 link
2024-05-24 iVideoGPT: Interactive VideoGPTs are Scalable World Models Jialong Wu et.al. 2405.15223 link
2024-05-24 ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models Jingyuan Zhu et.al. 2405.15199 null
2024-05-23 Improved Distribution Matching Distillation for Fast Image Synthesis Tianwei Yin et.al. 2405.14867 link
2024-05-23 Video Diffusion Models are Training-free Motion Interpreter and Controller Zeqi Xiao et.al. 2405.14864 null
2024-05-23 Semantica: An Adaptable Image-Conditioned Diffusion Model Manoj Kumar et.al. 2405.14857 null
2024-05-23 TerDiT: Ternary Diffusion Models with Transformers Xudong Lu et.al. 2405.14854 link
2024-05-23 Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models Katherine Xu et.al. 2405.14828 null
2024-05-24 Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation Hongxu Jiang et.al. 2405.14802 link
2024-05-23 Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy Shengfang Zhai et.al. 2405.14800 link
2024-05-23 RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices Qiaoyi Chen et.al. 2405.14794 null
2024-05-23 OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance Shuheng Ge et.al. 2405.14709 null
2024-05-23 Learning Multi-dimensional Human Preference for Text-to-Image Generation Sixian Zhang et.al. 2405.14705 null
2024-05-21 Personalized Residuals for Concept-Driven Text-to-Image Generation Cusuh Ham et.al. 2405.12978 null
2024-05-21 An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation Zhiyu Tan et.al. 2405.12914 link
2024-05-21 OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models Zhaojian Yu et.al. 2405.12843 link
2024-05-21 DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control Hong Chen et.al. 2405.12796 null
2024-05-21 Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations Antoine Legrand et.al. 2405.12728 null
2024-05-21 CustomText: Customized Textual Image Generation using Diffusion Models Shubham Paliwal et.al. 2405.12531 null
2024-05-20 Diffusion for World Modeling: Visual Details Matter in Atari Eloi Alonso et.al. 2405.12399 link
2024-05-20 Diffusion Models for Generating Ballistic Spacecraft Trajectories Tyler Presser et.al. 2405.11738 link
2024-05-19 URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images Zoey Chen et.al. 2405.11656 null
2024-05-19 FIFO-Diffusion: Generating Infinite Videos from Text without Training Jihwan Kim et.al. 2405.11473 link
2024-05-18 UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers Duo Peng et.al. 2405.11336 null
2024-05-18 On the Trajectory Regularity of ODE-based Diffusion Sampling Defang Chen et.al. 2405.11326 link
2024-05-18 TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation Chengcheng Feng et.al. 2405.11236 null
2024-05-17 Improving face generation quality and prompt following with synthetic captions Michail Tarasiou et.al. 2405.10864 null
2024-05-17 From Sora What We Can See: A Survey of Text-to-Video Generation Rui Sun et.al. 2405.10674 link
2024-05-17 Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image Jianshun Zeng et.al. 2405.10504 null
2024-05-17 Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers Rya Sanovar et.al. 2405.10480 null
2024-05-16 UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models Sahel Sharifymoghaddam et.al. 2405.10311 link
2024-05-16 VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing Binghui Chen et.al. 2405.09985 null
2024-05-16 Chameleon: Mixed-Modal Early-Fusion Foundation Models Chameleon Team et.al. 2405.09818 null
2024-05-16 Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images Memoona Aziz et.al. 2405.09426 null
2024-05-15 DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations Nima Fathi et.al. 2405.09288 link
2024-05-15 Dance Any Beat: Blending Beats with Visuals in Dance Video Generation Xuanchen Wang et.al. 2405.09266 null
2024-05-14 Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Zhimin Li et.al. 2405.08748 link
2024-05-13 The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective Andrew Shin et.al. 2405.08720 null
2024-05-14 Compositional Text-to-Image Generation with Dense Blob Representations Weili Nie et.al. 2405.08246 null
2024-05-13 CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models Nick Stracke et.al. 2405.07913 null
2024-05-13 SAR Image Synthesis with Diffusion Models Denisa Qosja et.al. 2405.07776 null
2024-05-12 Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning Jiarui Wang et.al. 2405.07346 link
2024-05-12 Stable Signature is Unstable: Removing Image Watermark from Diffusion Models Yuepeng Hu et.al. 2405.07145 null
2024-05-12 MAxPrototyper: A Multi-Agent Generation System for Interactive User Interface Prototyping Mingyue Yuan et.al. 2405.07131 null
2024-05-11 Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior Ce Wang et.al. 2405.07044 link
2024-05-11 Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation Shengyuan Liu et.al. 2405.06948 null
2024-05-10 Deep MMD Gradient Flow without adversarial training Alexandre Galashov et.al. 2405.06780 null
2024-05-10 OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation Jinwei Lin et.al. 2405.06547 link
2024-05-10 Controllable Image Generation With Composed Parallel Token Prediction Jamie Stirling et.al. 2405.06535 null
2024-05-10 SketchDream: Sketch-based Text-to-3D Generation and Editing Feng-Lin Liu et.al. 2405.06461 null
2024-05-09 Photonic quantum generative adversarial networks for classical data Tigran Sedrakyan et.al. 2405.06023 link
2024-05-09 Frame Interpolation with Consecutive Brownian Bridge Diffusion Zonglin Lyu et.al. 2405.05953 link
2024-05-09 Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models Zhe Ma et.al. 2405.05846 link
2024-05-10 MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation Yuxiang Wei et.al. 2405.05806 link
2024-05-09 Exploring Text-Guided Single Image Editing for Remote Sensing Images Fangzhou Han et.al. 2405.05769 link
2024-05-09 End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base Shuling Li et.al. 2405.05738 null
2024-05-09 A Survey on Personalized Content Synthesis with Diffusion Models Xulu Zhang et.al. 2405.05538 null
2024-05-08 Cross-Modality Translation with Generative Adversarial Networks to Unveil Alzheimer’s Disease Biomarkers Reihaneh Hassanzadeh et.al. 2405.05462 null
2024-05-08 DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation Joshua N. Williams et.al. 2405.05382 link
2024-05-08 Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo Nayantara Mudur et.al. 2405.05255 link
2024-05-08 Reviewing Intelligent Cinematography: AI research for camera-based video production Adrian Azzarelli et.al. 2405.05039 null
2024-05-08 Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI Keqiang Fan et.al. 2405.04974 null
2024-05-08 FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation Xuehai He et.al. 2405.04834 null
2024-05-07 TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation Hritik Bansal et.al. 2405.04682 link
2024-05-07 TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model Yongming Zhang et.al. 2405.04675 null
2024-05-07 Towards Geographic Inclusion in the Evaluation of Text-to-Image Models Melissa Hall et.al. 2405.04457 null
2024-05-07 Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation Jihyun Kim et.al. 2405.04356 link
2024-05-07 Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation Dogucan Yaman et.al. 2405.04327 null
2024-05-08 Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer Zhuoyi Yang et.al. 2405.04312 link
2024-05-07 Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map Yuxuan Xia et.al. 2405.04290 null
2024-05-07 Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models Fan Bao et.al. 2405.04233 null
2024-05-07 Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models Zhixuan Chu et.al. 2405.04180 link
2024-05-07 Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method Peisong He et.al. 2405.04133 null
2024-05-07 Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model Joo Young Choi et.al. 2405.03958 null
2024-05-06 CCDM: Continuous Conditional Diffusion Models for Image Generation Xin Ding et.al. 2405.03546 link
2024-05-06 Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond Zheng Zhu et.al. 2405.03520 link
2024-05-06 Video Diffusion Models: A Survey Andrew Melnik et.al. 2405.03150 link
2024-05-05 Matten: Video Generation with Mamba-Attention Yu Gao et.al. 2405.03025 null
2024-05-05 Data-Efficient Molecular Generation with Hierarchical Textual Inversion Seojin Kim et.al. 2405.02845 link
2024-05-05 ImageInWords: Unlocking Hyper-Detailed Image Descriptions Roopal Garg et.al. 2405.02793 link
2024-05-04 U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers Yuchuan Tian et.al. 2405.02730 link
2024-05-03 Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification Siqi Yin et.al. 2405.02155 null
2024-05-03 AI-generated art perceptions with GenFrame – an image-generating picture frame Peter Kun et.al. 2405.01901 null
2024-05-03 Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition Yichun Tai et.al. 2405.01872 null
2024-05-02 Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning Rafael Elberg et.al. 2405.01705 link
2024-05-02 StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Yupeng Zhou et.al. 2405.01434 link
2024-05-02 Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration Praveen Kumar Chandaliya et.al. 2405.01273 null
2024-05-02 DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines Ye Tian et.al. 2405.01248 null
2024-05-02 On Mechanistic Knowledge Localization in Text-to-Image Generative Models Samyadeep Basu et.al. 2405.01008 link
2024-05-01 SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models Burak Can Biner et.al. 2405.00878 null
2024-05-01 UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement Ruiquan Ge et.al. 2405.00542 link
2024-05-01 Compressive Sensing Imaging Using Caustic Lens Mask Generated by Periodic Perturbation in a Ripple Tank Doğan Tunca Arık et.al. 2405.00407 null
2024-05-01 Streamlining Image Editing with Layered Diffusion Brushes Peyman Gholami et.al. 2405.00313 null
2024-04-30 DOCCI: Descriptions of Connected and Contrasting Images Yasumasa Onoe et.al. 2404.19753 null
2024-04-30 Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Yunhao Ge et.al. 2404.19752 null
2024-04-30 SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration Yuto Nakashima et.al. 2404.19693 null
2024-04-30 VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization Yuliang Liu et.al. 2404.19652 link
2024-04-30 TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models Teng Zhou et.al. 2404.19475 link
2024-04-30 InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Chanran Kim et.al. 2404.19427 null
2024-04-30 Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model Wentao Lei et.al. 2404.19277 null
2024-05-01 FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills Yongqiang Zhao et.al. 2404.19217 link
2024-04-30 NeRF-Insert: 3D Local Editing with Multimodal Control Signals Benet Oriol Sabat et.al. 2404.19204 null
2024-04-29 DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing Minghao Chen et.al. 2404.18929 null
2024-04-29 TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation Junhao Cheng et.al. 2404.18919 link
2024-04-29 Hide and Seek: How Does Watermarking Impact Face Recognition? Yuguang Yao et.al. 2404.18890 null
2024-04-29 Learning Mixtures of Gaussians Using Diffusion Models Khashayar Gatmiry et.al. 2404.18869 null
2024-04-29 FlexiFilm: Long Video Generation with Flexible Conditions Yichen Ouyang et.al. 2404.18620 link
2024-04-29 Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting Tianyidan Xie et.al. 2404.18598 null
2024-04-29 Autonomous Quality and Hallucination Assessment for Virtual Tissue Staining and Digital Pathology Luzhe Huang et.al. 2404.18458 null
2024-04-29 PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images Jiquan Yuan et.al. 2404.18409 link
2024-04-30 Equivalence: An analysis of artists’ roles with Image Generative AI from Conceptual Art perspective through an interactive installation design practice Yixuan Li et.al. 2404.18385 null
2024-04-29 G-Refine: A General Quality Refiner for Text-to-Image Generation Chunyi Li et.al. 2404.18343 link
2024-04-26 Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement Zishu Yao et.al. 2404.17400 link
2024-04-26 Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection Jiawei Song et.al. 2404.17254 null
2024-04-26 Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis Shivangi Yadav et.al. 2404.17105 null
2024-04-25 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao et.al. 2404.16767 link
2024-04-27 Denoising: from classical methods to deep CNNs Jean-Eric Campagne et.al. 2404.16617 link
2024-04-25 MuseumMaker: Continual Style Customization without Catastrophic Forgetting Chenxi Liu et.al. 2404.16612 null
2024-04-25 Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models Parul Gupta et.al. 2404.16556 null
2024-04-25 OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images Ye Mao et.al. 2404.16538 link
2024-04-25 Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series Aimi Okabayashi et.al. 2404.16409 link
2024-04-25 TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models Haomiao Ni et.al. 2404.16306 link
2024-04-26 Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model Gehui Chen et.al. 2404.16305 null
2024-04-26 Guardians of the Quantum GAN Archisman Ghosh et.al. 2404.16156 null
2024-04-24 Spinning solar jets explained through the interplay between plasma sheets and vortex columns Sahel Dey et.al. 2404.16096 null
2024-04-23 ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning Weifeng Chen et.al. 2404.15449 null
2024-04-23 GLoD: Composing Global Contexts and Local Details in Image Generation Moyuru Yamada et.al. 2404.15447 null
2024-04-23 ID-Animator: Zero-Shot Identity-Preserving Human Video Generation Xuanhua He et.al. 2404.15275 link
2024-04-23 From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation Zehuan Huang et.al. 2404.15267 link
2024-04-23 Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment Tianwei Zhou et.al. 2404.15163 null
2024-04-23 Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation Xun Wu et.al. 2404.15100 null
2024-04-23 SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models Bo Lin et.al. 2404.14755 null
2024-04-23 FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction Hang Hua et.al. 2404.14715 null
2024-04-22 The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking Yuying Li et.al. 2404.14581 null
2024-04-22 GeoDiffuser: Geometry-Based Image Editing with Diffusion Models Rahul Sajnani et.al. 2404.14403 null
2024-04-22 SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Yuying Ge et.al. 2404.14396 link
2024-04-22 TAVGBench: Benchmarking Text to Audible-Video Generation Yuxin Mao et.al. 2404.14381 link
2024-04-22 MultiBooth: Towards Generating All Your Concepts in an Image from Text Chenyang Zhu et.al. 2404.14239 link
2024-04-22 RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance Chengrui Wang et.al. 2404.13984 null
2024-04-23 Accelerating Image Generation with Sub-path Linear Approximation Model Chen Xu et.al. 2404.13903 null
2024-04-22 Towards Better Text-to-Image Generation Alignment via Attention Modulation Yihang Wu et.al. 2404.13899 null
2024-04-21 Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation Jensen Hwa et.al. 2404.13798 null
2024-04-21 Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control Maria Mihaela Trusca et.al. 2404.13766 null
2024-04-21 Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models Vitali Petsiuk et.al. 2404.13706 null
2024-04-19 PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Tianyuan Zhang et.al. 2404.13026 null
2024-04-19 Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images Santosh et.al. 2404.12908 link
2024-04-19 ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model Dingming Liu et.al. 2404.12903 null
2024-04-19 Generative Modelling with High-Order Langevin Dynamics Ziqiang Shi et.al. 2404.12814 null
2024-04-19 How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples Dren Fazlija et.al. 2404.12653 null
2024-04-18 On the Content Bias in Fréchet Video Distance Songwei Ge et.al. 2404.12391 null
2024-04-18 RoboDreamer: Learning Compositional World Models for Robot Imagination Siyuan Zhou et.al. 2404.12377 null
2024-04-18 AniClipart: Clipart Animation with Text-to-Video Priors Ronghuan Wu et.al. 2404.12347 null
2024-04-18 Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models Israel A. Laurensi et.al. 2404.12260 null
2024-04-18 First 2D electron density measurements using Coherence Imaging Spectroscopy in the MAST-U Super-X divertor N. Lonigro et.al. 2404.12021 null
2024-04-18 ©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model Chao Zhou et.al. 2404.11962 link
2024-04-18 LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights Thibault Castells et.al. 2404.11936 null
2024-04-18 EdgeFusion: On-Device Text-to-Image Generation Thibault Castells et.al. 2404.11925 null
2024-04-18 TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation Tianyi Liang et.al. 2404.11824 link
2024-04-17 On the Scalability of GNNs for Molecular Graphs Maciej Sypetkowski et.al. 2404.11568 null
2024-04-17 MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Kuan-Chieh et.al. 2404.11565 null
2024-04-17 SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening Yu Zhong et.al. 2404.11537 null
2024-04-17 Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt Zhanjie Zhang et.al. 2404.11474 link
2024-04-17 Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks Eri Hosonuma et.al. 2404.11280 null
2024-04-17 Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case João Gabriel Vinholi et.al. 2404.11243 null
2024-04-17 TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing Sherry X. Chen et.al. 2404.11120 link
2024-04-16 LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Yuchi Wang et.al. 2404.10763 link
2024-04-16 Adversarial Identity Injection for Semantic Face Image Synthesis Giuseppe Tarollo et.al. 2404.10408 null
2024-04-16 CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout Jiafu Wei et.al. 2404.10352 null
2024-04-17 OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model Runyi Li et.al. 2404.10312 null
2024-04-16 OneActor: Consistent Character Generation via Cluster-Conditioned Guidance Jiahao Wang et.al. 2404.10267 null
2024-04-16 Diffusion assisted image reconstruction in optoacoustic tomography M. G. González et.al. 2404.10239 null
2024-04-15 MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models Nithin Gopalakrishnan Nair et.al. 2404.09977 null
2024-04-15 Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers Nithin Gopalakrishnan Nair et.al. 2404.09976 null
2024-04-15 Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Han Lin et.al. 2404.09967 null
2024-04-15 Scalable photonic diffractive generators through sampling noises from scattering medium Ziyu Zhan et.al. 2404.09948 null
2024-04-15 Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models Ziwei Luo et.al. 2404.09732 link
2024-04-15 In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation Han Xue et.al. 2404.09633 null
2024-04-15 Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models Peifei Zhu et.al. 2404.09401 null
2024-04-14 DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling Xuening Yuan et.al. 2404.09227 null
2024-04-14 LoopAnimate: Loopable Salient Object Animation Fanyi Wang et.al. 2404.09172 null
2024-04-13 THQA: A Perceptual Quality Assessment Database for Talking Heads Yingjie Zhou et.al. 2404.09003 link
2024-04-13 LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field Jiyang Li et.al. 2404.08966 link
2024-04-13 Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives Yidan Liu et.al. 2404.08926 null
2024-04-12 E3: Ensemble of Expert Embedders for Adapting Synthetic Image Detectors to New Generators Using Limited Data Aref Azizpour et.al. 2404.08814 link
2024-04-12 Semantic Approach to Quantifying the Consistency of Diffusion Model Image Generation Brinnae Bent et.al. 2404.08799 link
2024-04-12 Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts Yang Li et.al. 2404.08341 link
2024-04-11 Latent Guard: a Safety Framework for Text-to-image Generation Runtao Liu et.al. 2404.08031 link
2024-04-11 Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models Mazda Moayeri et.al. 2404.08030 null
2024-04-11 OpenBias: Open-set Bias Detection in Text-to-Image Generative Models Moreno D’Incà et.al. 2404.07990 link
2024-04-11 Taming Stable Diffusion for Text to 360° Panorama Image Generation Cheng Zhang et.al. 2404.07949 link
2024-04-11 Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models – Technical Challenges and Implications for Monitoring and Verification Tuong Vy Nguyen et.al. 2404.07754 null
2024-04-11 Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models Tuomas Kynkäänniemi et.al. 2404.07724 link
2024-04-11 ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation Stanislav Frolov et.al. 2404.07564 null
2024-04-11 CAT: Contrastive Adapter Training for Personalized Image Generation Jae Wan Park et.al. 2404.07554 link
2024-04-10 A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos Suleyman Ozdel et.al. 2404.07351 null
2024-04-10 RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Jaidev Shriram et.al. 2404.07199 null
2024-04-10 A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks Neel Mishra et.al. 2404.07172 link
2024-04-10 Fine color guidance in diffusion models and its application to image compression at extremely low bitrates Tom Bordin et.al. 2404.06865 null
2024-04-10 UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion Junsheng Zhou et.al. 2404.06851 null
2024-04-10 MedRG: Medical Report Grounding with Multi-modal Large Language Model Ke Zou et.al. 2404.06798 null
2024-04-10 Deep Generative Data Assimilation in Multimodal Setting Yongquan Qu et.al. 2404.06665 link
2024-04-09 High Noise Scheduling is a Must Mahmut S. Gokmen et.al. 2404.06353 null
2024-04-09 DiffHarmony: Latent Diffusion Model Meets Image Harmonization Pengfei Zhou et.al. 2404.06139 link
2024-04-09 Tackling Structural Hallucination in Image Translation with Local Diffusion Seunghoi Kim et.al. 2404.05980 link
2024-04-09 StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion Ming Tao et.al. 2404.05979 link
2024-04-08 SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Jing Gu et.al. 2404.05717 null
2024-04-08 MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation Kunpeng Song et.al. 2404.05674 link
2024-04-08 Automatic Controllable Colorization via Imagination Xiaoyan Cong et.al. 2404.05661 null
2024-04-08 UniFL: Improve Stable Diffusion via Unified Feedback Learning Jiacheng Zhang et.al. 2404.05595 null
2024-04-08 Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI Hugo Caselles-Dupré et.al. 2404.05468 null
2024-04-08 Action-conditioned video data improves predictability Meenakshi Sarkar et.al. 2404.05439 null
2024-04-08 Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt Zhiqi Huang et.al. 2404.05331 null
2024-04-08 MC $^2$ : Multi-concept Guidance for Customized Multi-concept Generation Jiaxiu Jiang et.al. 2404.05268 link
2024-04-08 Text-to-Image Synthesis for Any Artistic Styles: Advancements in Personalized Artistic Image Generation via Subdivision and Dual Binding Junseo Park et.al. 2404.05256 null
2024-04-08 A secure and private ensemble matcher using multi-vault obfuscated templates Babak Poorebrahim Gilkalaye et.al. 2404.05205 null
2024-04-04 No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Vishaal Udandarao et.al. 2404.04125 link
2024-04-05 3D Facial Expressions through Analysis-by-Neural-Synthesis George Retsinas et.al. 2404.04104 null
2024-04-05 Dynamic Prompt Optimizing for Text-to-Image Generation Wenyi Mo et.al. 2404.04095 link
2024-04-05 Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models Gihyun Kwon et.al. 2404.03913 null
2024-04-04 CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Dongzhi Jiang et.al. 2404.03653 link
2024-04-04 Reference-Based 3D-Aware Image Editing with Triplane Bahri Batuhan Bilecen et.al. 2404.03632 null
2024-04-04 Robust Concept Erasure Using Task Vectors Minh Pham et.al. 2404.03631 null
2024-04-04 Multi Positive Contrastive Learning with Pose-Consistent Generated Images Sho Inayoshi et.al. 2404.03256 null
2024-04-04 Would Deep Generative Models Amplify Bias in Future Models? Tianwei Chen et.al. 2404.03242 null
2024-04-04 Diverse and Tailored Image Generation for Zero-shot Multi-label Classification Kaixin Zhang et.al. 2404.03144 null
2024-04-03 Many-to-many Image Generation with Auto-regressive Diffusion Models Ying Shen et.al. 2404.03109 null
2024-04-03 Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Keyu Tian et.al. 2404.02905 link
2024-04-03 MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment Duygu Ceylan et.al. 2404.02899 null
2024-04-03 On the Scalability of Diffusion-based Text-to-Image Generation Hao Li et.al. 2404.02883 null
2024-04-03 MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation Petru-Daniel Tudosiu et.al. 2404.02790 null
2024-04-03 InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Haofan Wang et.al. 2404.02733 link
2024-04-03 Model-agnostic Origin Attribution of Generated Images with Few-shot Examples Fengyuan Liu et.al. 2404.02697 link
2024-04-03 Severity Controlled Text-to-Image Generative Model Bias Manipulation Jordan Vice et.al. 2404.02530 null
2024-04-02 Diffusion $^2$ : Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models Zeyu Yang et.al. 2404.02148 link
2024-04-02 3D Congealing: 3D-Aware Image Alignment in the Wild Yunzhi Zhang et.al. 2404.02125 null
2024-04-02 CameraCtrl: Enabling Camera Control for Text-to-Video Generation Hao He et.al. 2404.02101 link
2024-04-02 Real, fake and synthetic faces – does the coin have three sides? Shahzeb Naeem et.al. 2404.01878 null
2024-04-02 Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model Xu He et.al. 2404.01862 link
2024-04-02 Disentangled Pre-training for Human-Object Interaction Detection Zhuolong Li et.al. 2404.01725 link
2024-04-01 PlayFutures: Imagining Civic Futures with AI and Puppets Supratim Pait et.al. 2404.01527 null
2024-04-01 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data Matthias Gerstgrasser et.al. 2404.01413 null
2024-04-01 Evaluating Text-to-Visual Generation with Image-to-Text Generation Zhiqiu Lin et.al. 2404.01291 link
2024-04-01 Condition-Aware Neural Network for Controlled Image Generation Han Cai et.al. 2404.01143 null
2024-03-29 Benchmarking Counterfactual Image Generation Thomas Melistas et.al. 2403.20287 link
2024-03-29 Motion Inversion for Video Customization Luozhou Wang et.al. 2403.20193 null
2024-03-29 FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models Barbara Toniella Corradini et.al. 2403.20105 null
2024-04-02 FairRAG: Fair Human Generation via Fair Retrieval Augmentation Robik Shrestha et.al. 2403.19964 null
2024-03-28 Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks Pooria Ashrafian et.al. 2403.19880 link
2024-03-28 Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization Yuhang Li et.al. 2403.19866 null
2024-03-28 CLoRA: A Contrastive Approach to Compose Multiple LoRA Models Tuna Han Salih Meral et.al. 2403.19776 null
2024-03-28 Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond Katherine Xu et.al. 2403.19653 link
2024-03-28 GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models Yusuf Dalva et.al. 2403.19645 null
2024-03-28 Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models Ole Hall et.al. 2403.19620 link
2024-03-28 Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model Zhicai Wang et.al. 2403.19600 link
2024-03-28 Frame by Familiar Frame: Understanding Replication in Video Diffusion Models Aimon Rahman et.al. 2403.19593 null
2024-03-28 Imperceptible Protection against Style Imitation from Diffusion Models Namhyuk Ahn et.al. 2403.19254 null
2024-03-28 Synthetic Medical Imaging Generation with Generative Adversarial Networks For Plain Radiographs John R. McNulty et.al. 2403.19107 null
2024-03-28 Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation Yutong He et.al. 2403.19103 null
2024-03-28 Purposeful remixing with generative AI: Constructing designer voice in multimodal composing Xiao Tan et.al. 2403.19095 null
2024-03-27 TextCraftor: Your Text Encoder Can be Image Quality Controller Yanyu Li et.al. 2403.18978 null
2024-03-27 Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching Jannis Chemseddine et.al. 2403.18705 link
2024-03-27 Attention Calibration for Disentangled Text-to-Image Personalization Yanbing Zhang et.al. 2403.18551 link
2024-03-27 DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis Zhongxi Chen et.al. 2403.18471 link
2024-03-27 ECNet: Effective Controllable Text-to-Image Diffusion Models Sicheng Li et.al. 2403.18417 null
2024-03-27 Ship in Sight: Diffusion Models for Ship-Image Super Resolution Luigi Sigillo et.al. 2403.18370 link
2024-03-26 Tutorial on Diffusion Models for Imaging and Vision Stanley H. Chan et.al. 2403.18103 null
2024-03-26 TC4D: Trajectory-Conditioned Text-to-4D Generation Sherwin Bahmani et.al. 2403.17920 null
2024-03-26 Boosting Diffusion Models with Moving Average Sampling in Frequency Domain Yurui Qian et.al. 2403.17870 null
2024-03-26 Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields Rüveyda Yilmaz et.al. 2403.17808 link
2024-03-26 LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection Yunpeng Luo et.al. 2403.17465 link
2024-03-25 DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment Stella Bounareli et.al. 2403.17217 null
2024-03-25 FlashFace: Human Image Personalization with High-fidelity Identity Preservation Shilong Zhang et.al. 2403.17008 null
2024-03-25 TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Zhongwei Zhang et.al. 2403.17005 null
2024-03-25 SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer Rui Zhu et.al. 2403.17004 null
2024-03-25 Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Omer Dahary et.al. 2403.16990 null
2024-03-25 Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance Jingyuan Zhu et.al. 2403.16954 null
2024-03-25 Iso-Diffusion: Improving Diffusion Probabilistic Models Using the Isotropy of the Additive Gaussian Noise Dilum Fernando et.al. 2403.16790 null
2024-03-25 Multi-Scale Texture Loss for CT denoising with GANs Francesco Di Feola et.al. 2403.16640 link
2024-03-25 SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Yuda Song et.al. 2403.16627 link
2024-03-25 An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models Zizhao Hu et.al. 2403.16530 null
2024-03-25 Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation Sanyam Lakhanpal et.al. 2403.16422 null
2024-03-25 A Survey on Long Video Generation: Challenges, Methods, and Prospects Chengxuan Li et.al. 2403.16407 null
2024-03-25 Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation Yingshan Chang et.al. 2403.16394 link
2024-03-25 FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models Lin Zhao et.al. 2403.16379 null
2024-03-24 Opportunities and challenges in the application of large artificial intelligence models in radiology Liangrui Pan et.al. 2403.16112 null
2024-03-23 Adaptive Super Resolution For One-Shot Talking-Head Generation Luchuan Song et.al. 2403.15944 link
2024-03-23 Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content Zhicheng Du et.al. 2403.15876 link
2024-03-22 DragAPart: Learning a Part-Level Motion Prior for Articulated Objects Ruining Li et.al. 2403.15382 null
2024-03-22 Long-CLIP: Unlocking the Long-Text Capability of CLIP Beichen Zhang et.al. 2403.15378 link
2024-03-22 Controlled Training Data Generation with Diffusion Models Teresa Yeo et.al. 2403.15309 null
2024-03-22 Spectral Motion Alignment for Video Motion Transfer using Diffusion Models Geon Yeong Park et.al. 2403.15249 null
2024-03-22 A Multimodal Approach for Cross-Domain Image Retrieval Lucas Iijima et.al. 2403.15152 null
2024-03-22 MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration Zhichao Wei et.al. 2403.15059 null
2024-03-22 Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning Bumsoo Kim et.al. 2403.15048 null
2024-03-22 CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model Seungdae Han et.al. 2403.14944 link
2024-03-22 Geometric Generative Models based on Morphological Equivariant PDEs and GANs El Hadji S. Diop et.al. 2403.14897 null
2024-03-21 StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Roberto Henschel et.al. 2403.14773 link
2024-03-21 Explorative Inbetweening of Time and Space Haiwen Feng et.al. 2403.14611 null
2024-03-21 DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing Yueru Jia et.al. 2403.14487 link
2024-03-22 AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Max Ku et.al. 2403.14468 link
2024-03-21 Analysing Diffusion Segmentation for Medical Images Mathias Öttl et.al. 2403.14440 null
2024-03-21 Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation Mathias Öttl et.al. 2403.14429 null
2024-03-21 Enabling Visual Composition and Animation in Unsupervised Video Generation Aram Davtyan et.al. 2403.14368 null
2024-03-21 Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models Pablo Marcos-Manchón et.al. 2403.14291 link
2024-03-21 Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations Xun Lin et.al. 2403.14250 null
2024-03-21 StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN Jongwoo Choi et.al. 2403.14186 link
2024-03-21 Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Sihyun Yu et.al. 2403.14148 null
2024-03-20 Learning from Models and Data for Visual Grounding Ruozhen He et.al. 2403.13804 null
2024-03-20 Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation Fu-Yun Wang et.al. 2403.13745 link
2024-03-20 Step-Calibrated Diffusion for Biomedical Optical Image Restoration Yiwei Lyu et.al. 2403.13680 link
2024-03-20 ReGround: Improving Textual and Spatial Grounding at No Cost Yuseung Lee et.al. 2403.13589 null
2024-03-20 Diversity-aware Channel Pruning for StyleGAN Compression Jiwoo Chung et.al. 2403.13548 link
2024-03-21 IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models Siying Cui et.al. 2403.13535 null
2024-03-20 Deepfake Detection without Deepfakes: Generalization via Synthetic Frequency Patterns Injection Davide Alessandro Coccomini et.al. 2403.13479 link
2024-03-20 S2DM: Sector-Shaped Diffusion Models for Video Generation Haoran Lang et.al. 2403.13408 null
2024-03-20 AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation Jingkun An et.al. 2403.13352 null
2024-03-20 TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation Santosh Sanjeev et.al. 2403.13343 link
2024-03-19 FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis Linjiang Huang et.al. 2403.12963 link
2024-03-19 Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties Efrain Torres-Lomas et.al. 2403.12935 null
2024-03-19 Ultra-High-Resolution Image Synthesis with Pyramid Diffusion Model Jiajie Yang et.al. 2403.12915 link
2024-03-19 How Spammers and Scammers Leverage AI-Generated Images on Facebook for Audience Growth Renee DiResta et.al. 2403.12838 null
2024-03-19 Total Disentanglement of Font Images into Style and Character Class Features Daichi Haraguchi et.al. 2403.12784 null
2024-03-19 AnimateDiff-Lightning: Cross-Model Diffusion Distillation Shanchuan Lin et.al. 2403.12706 null
2024-03-18 Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts Anh Bui et.al. 2403.12326 null
2024-03-18 Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat? Melanie Mathys et.al. 2403.12207 null
2024-03-18 CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility Bojia Zi et.al. 2403.12035 link
2024-03-18 Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Axel Sauer et.al. 2403.12015 null
2024-03-18 Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing Juan Zhang et.al. 2403.11700 null
2024-03-19 Urban Scene Diffusion through Semantic Occupancy Map Junge Zhang et.al. 2403.11697 null
2024-03-18 Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection Julia Wolleb et.al. 2403.11667 link
2024-03-18 QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation Zhizhen Zhou et.al. 2403.11626 null
2024-03-18 CRS-Diff: Controllable Generative Remote Sensing Foundation Model Datao Tang et.al. 2403.11614 link
2024-03-17 StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining Tushar Kataria et.al. 2403.11340 null
2024-03-17 Fast Personalized Text-to-Image Syntheses With Attention Injection Yuxuan Zhang et.al. 2403.11284 null
2024-03-17 Understanding Diffusion Models by Feynman’s Path Integral Yuji Hirono et.al. 2403.11262 null
2024-03-17 The Effects of Generative AI on Design Fixation and Divergent Thinking Samangi Wadinambiarachchi et.al. 2403.11164 null
2024-03-17 CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion Xiaoyu Wu et.al. 2403.11162 null
2024-03-15 Denoising Task Difficulty-based Curriculum for Training Diffusion Models Jin-Young Kim et.al. 2403.10348 null
2024-03-15 DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers Xuanlei Zhao et.al. 2403.10266 link
2024-03-15 Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder Jinseok Kim et.al. 2403.10255 null
2024-03-15 Animate Your Motion: Turning Still Images into Dynamic Videos Mingxiao Li et.al. 2403.10179 null
2024-03-15 SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model Tao Wu et.al. 2403.10044 null
2024-03-14 SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior Huan-ang Gao et.al. 2403.09638 null
2024-03-14 Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Zeyu Liu et.al. 2403.09622 null
2024-03-14 PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation Yuhan Guo et.al. 2403.09615 null
2024-03-14 Counterfactual contrastive learning: robust representations via causal image synthesis Melanie Roschewitz et.al. 2403.09605 link
2024-03-14 Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing Wonjun Kang et.al. 2403.09468 link
2024-03-14 Mitigating attribute amplification in counterfactual image generation Tian Xia et.al. 2403.09422 null
2024-03-14 Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction Hanyu Chen et.al. 2403.09355 null
2024-03-14 Video Editing via Factorized Diffusion Distillation Uriel Singer et.al. 2403.09334 null
2024-03-14 Noise Dimension of GAN: An Image Compression Perspective Ziran Zhu et.al. 2403.09196 null
2024-03-14 Intention-driven Ego-to-Exo Video Generation Hongchen Luo et.al. 2403.09194 null
2024-03-13 VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Enric Corona et.al. 2403.08764 null
2024-03-13 HAIFIT: Human-Centered AI for Fashion Image Translation Jianan Jiang et.al. 2403.08651 link
2024-03-13 Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation Tianyi Chu et.al. 2403.08294 null
2024-03-13 Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Yue Ma et.al. 2403.08268 link
2024-03-13 Make Me Happier: Evoking Emotions Through Image Diffusion Models Qing Lin et.al. 2403.08255 null
2024-03-12 Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation Shihao Zhao et.al. 2403.07860 link
2024-03-12 Quantifying and Mitigating Privacy Risks for Tabular Generative Models Chaoyi Zhu et.al. 2403.07842 null
2024-03-12 Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model Yuxuan Zhang et.al. 2403.07764 link
2024-03-12 Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Sahand Sharifzadeh et.al. 2403.07750 null
2024-03-14 Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion Dongyang Li et.al. 2403.07721 link
2024-03-12 SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces Yuta Oshima et.al. 2403.07711 link
2024-03-12 Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation Michael Ogezi et.al. 2403.07605 null
2024-03-12 Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation Likun Li et.al. 2403.07500 null
2024-03-12 Backdoor Attack with Mode Mixture Latent Modification Hongwei Zhang et.al. 2403.07463 null
2024-03-13 DragAnything: Motion Control for Anything using Entity Representation Weijia Wu et.al. 2403.07420 link
2024-03-11 DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation Guosheng Zhao et.al. 2403.06845 null
2024-03-11 Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting Wenting Chen et.al. 2403.06835 null
2024-03-11 Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection Chuangchuang Tan et.al. 2403.06803 link
2024-03-11 FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation Pengchong Qiao et.al. 2403.06775 link
2024-03-11 Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback Adarsh N L et.al. 2403.06735 null
2024-03-11 Active Generation for Image Classification Tao Huang et.al. 2403.06517 link
2024-03-11 Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning Woojung Han et.al. 2403.06516 null
2024-03-11 3D-aware Image Generation and Editing with Multi-modal Conditions Bo Li et.al. 2403.06470 null
2024-03-11 A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos Weixia Zhang et.al. 2403.06421 link
2024-03-11 DivCon: Divide and Conquer for Progressive Text-to-Image Generation Yuhao Jia et.al. 2403.06400 link
2024-03-08 Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola Yijiang Li et.al. 2403.05523 null
2024-03-08 VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models Yabo Zhang et.al. 2403.05438 link
2024-03-08 A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN Cristiana Tiago et.al. 2403.05384 null
2024-03-08 Fine-tuning a Multiple Instance Learning Feature Extractor with Masked Context Modelling and Knowledge Distillation Juan I. Pisula et.al. 2403.05325 null
2024-03-08 Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation Junyan Wang et.al. 2403.05239 null
2024-03-08 Synthetic Privileged Information Enhances Medical Image Representation Learning Lucas Farndale et.al. 2403.05220 null
2024-03-08 Denoising Autoregressive Representation Learning Yazhe Li et.al. 2403.05196 null
2024-03-08 ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Xiwei Hu et.al. 2403.05135 null
2024-03-08 Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation Joseph Cho et.al. 2403.05131 null
2024-03-08 Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis Muxi Chen et.al. 2403.05125 link
2024-03-07 Photonic probabilistic machine learning using quantum vacuum noise Seou Choi et.al. 2403.04731 null
2024-03-07 PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Junsong Chen et.al. 2403.04692 link
2024-03-07 Pix2Gif: Motion-Guided Diffusion for GIF Generation Hitesh Kandala et.al. 2403.04634 link
2024-03-07 Discriminative Probing and Tuning for Text-to-Image Generation Leigang Qu et.al. 2403.04321 null
2024-03-06 PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement Zhijie Wang et.al. 2403.04014 link
2024-03-06 Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer Naifu Xue et.al. 2403.03736 null
2024-03-06 Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving He Li et.al. 2403.03541 null
2024-03-06 NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging Takahiro Shirakawa et.al. 2403.03485 link
2024-03-07 DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network Xiangquan Gui et.al. 2403.03456 null
2024-03-06 Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing Bingyan Liu et.al. 2403.03431 null
2024-03-05 Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Patrick Esser et.al. 2403.03206 null
2024-03-05 Behavior Generation with Latent Actions Seungjae Lee et.al. 2403.03181 link
2024-03-05 Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Weijie Li et.al. 2403.02827 null
2024-03-05 Bias in Generative AI Mi Zhou et.al. 2403.02726 null
2024-03-04 Transformer for Times Series: an Application to the S&P500 Pierre Brugiere et.al. 2403.02523 null
2024-03-04 NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function Abdullah Nazhat Abdullah et.al. 2403.02411 link
2024-03-05 UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control Xuweiyi Chen et.al. 2403.02332 link
2024-03-04 ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models Jiaxiang Cheng et.al. 2403.02084 link
2024-03-04 ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models Lukas Höllein et.al. 2403.01807 link
2024-03-05 AtomoVideo: High Fidelity Image-to-Video Generation Litong Gong et.al. 2403.01800 null
2024-03-02 Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models Neta Shaul et.al. 2403.01329 null
2024-03-02 SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code Ziniu Hu et.al. 2403.01248 null
2024-03-02 TCIG: Two-Stage Controlled Image Generation with Quality Enhancement through Diffusion Salaheldin Mohamed et.al. 2403.01212 null
2024-03-01 Improving Android Malware Detection Through Data Augmentation Using Wasserstein Generative Adversarial Networks Kawana Stalin et.al. 2403.00890 null
2024-03-01 Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset Ander Salaberria et.al. 2403.00587 link
2024-03-01 Rethinking cluster-conditioned diffusion models Nikolas Adaloglou et.al. 2403.00570 link
2024-03-01 VisionLLaMA: A Unified LLaMA Interface for Vision Tasks Xiangxiang Chu et.al. 2403.00522 link
2024-03-01 An Ordinal Diffusion Model for Generating Medical Images with Different Severity Levels Shumpei Takezaki et.al. 2403.00452 null
2024-03-01 Abductive Ego-View Accident Video Understanding for Safe Driving Perception Jianwu Fang et.al. 2403.00436 null
2024-02-29 Learning to Find Missing Video Frames with Synthetic Data Augmentation: A General Framework and Application in Generating Thermal Images Using RGB Cameras Mathias Viborg Andersen et.al. 2403.00196 null
2024-02-29 Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Tsai-Shien Chen et.al. 2402.19479 null
2024-02-29 A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation Hanxi Li et.al. 2402.19330 link
2024-02-29 Disentangling representations of retinal images with generative models Sarah Müller et.al. 2402.19186 link
2024-02-29 Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection Christos Koutlis et.al. 2402.19091 link
2024-02-29 WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis Paul Friedrich et.al. 2402.19043 link
2024-02-29 ViewFusion: Towards Multi-View Consistency via Interpolated Denoising Xianghui Yang et.al. 2402.18842 link
2024-02-29 A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D Xiaohan Fei et.al. 2402.18780 null
2024-02-28 FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes Ziying Pan et.al. 2402.18331 link
2024-02-28 Balancing Act: Distribution-Guided Debiasing in Diffusion Models Rishubh Parihar et.al. 2402.18206 null
2024-02-28 VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation Tao Peng et.al. 2402.18189 link
2024-02-28 Block and Detail: Scaffolding Sketch-to-Image Generation Vishnu Sarukkai et.al. 2402.18116 null
2024-02-28 Context-aware Talking Face Video Generation Meidai Xuanyuan et.al. 2402.18092 null
2024-02-28 Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis Yanzuo Lu et.al. 2402.18078 link
2024-02-27 Structure-Guided Adversarial Training of Diffusion Models Ling Yang et.al. 2402.17563 null
2024-02-27 Diffusion Model-Based Image Editing: A Survey Yi Huang et.al. 2402.17525 link
2024-02-27 EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Linrui Tian et.al. 2402.17485 null
2024-02-27 Sora Generates Videos with Stunning Geometrical Consistency Xuanyi Li et.al. 2402.17403 null
2024-02-27 Accelerating Diffusion Sampling with Optimized Time Steps Shuchen Xue et.al. 2402.17376 link
2024-02-27 Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation Daiqing Li et.al. 2402.17245 null
2024-02-27 Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System Majid Memari et.al. 2402.17204 null
2024-02-27 Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Yixin Liu et.al. 2402.17177 link
2024-02-27 Video as the New Language for Real-World Decision Making Sherry Yang et.al. 2402.17139 null
2024-02-27 Transparent Image Layer Diffusion using Latent Transparency Lvmin Zhang et.al. 2402.17113 link
2024-02-26 Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion Xuantong Liu et.al. 2402.16305 null
2024-02-25 Towards Efficient Quantum Hybrid Diffusion Models Francesca De Falco et.al. 2402.16147 null
2024-02-23 Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Chun-Hsiao Yeh et.al. 2402.15504 link
2024-02-23 BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators Yu Tian et.al. 2402.15218 null
2024-02-23 The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling Jiajun Ma et.al. 2402.15170 null
2024-02-22 Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Willi Menapace et.al. 2402.14797 null
2024-02-22 Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models Yixuan Ren et.al. 2402.14780 null
2024-02-25 Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening Zhenrong Shen et.al. 2402.14707 null
2024-02-22 Visual Hallucinations of Multi-modal Large Language Models Wen Huang et.al. 2402.14683 link
2024-02-22 MVD $^2$ : Efficient Multiview 3D Reconstruction for Multiview Diffusion Xin-Yang Zheng et.al. 2402.14253 null
2024-02-21 T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching Zizheng Pan et.al. 2402.14167 link
2024-02-21 SDXL-Lightning: Progressive Adversarial Diffusion Distillation Shanchuan Lin et.al. 2402.13929 null
2024-02-21 SRNDiff: Short-term Rainfall Nowcasting with Condition Diffusion Model Xudong Ling et.al. 2402.13737 link
2024-02-21 Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation Kihong Kim et.al. 2402.13729 null
2024-02-21 Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving Mehdi Azarafza et.al. 2402.13602 link
2024-02-21 Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models Chen Wu et.al. 2402.13490 null
2024-02-20 Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control Denis Lukovnikov et.al. 2402.13404 null
2024-02-20 CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples Jianrui Zhang et.al. 2402.13254 link
2024-02-20 UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing Jianhong Bai et.al. 2402.13185 null
2024-02-20 Neural Network Diffusion Kai Wang et.al. 2402.13144 link
2024-02-20 VGMShield: Mitigating Misuse of Video Generative Models Yan Pang et.al. 2402.13126 link
2024-02-20 Visual Style Prompting with Swapping Self-Attention Jaeseok Jeong et.al. 2402.12974 link
2024-02-20 RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models Xinchen Zhang et.al. 2402.12908 link
2024-02-20 Two-stage Rainfall-Forecasting Diffusion Model XuDong Ling et.al. 2402.12779 link
2024-02-20 MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion Sen Li et.al. 2402.12741 link
2024-02-20 MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction Shitao Tang et.al. 2402.12712 null
2024-02-19 The (R)Evolution of Multimodal Large Language Models: A Survey Davide Caffagni et.al. 2402.12451 null
2024-02-19 Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability Xuelin Qian et.al. 2402.12225 null
2024-02-19 Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation Yi Liu et.al. 2402.12100 null
2024-02-19 DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation Chong Zeng et.al. 2402.11929 link
2024-02-18 SDiT: Spiking Diffusion Model with Transformer Shu Yang et.al. 2402.11588 null
2024-02-18 Visual Concept-driven Image Generation with Text-to-Image Diffusion Model Tanzila Rahman et.al. 2402.11487 null
2024-02-18 Deep learning methods for Hamiltonian parameter estimation and magnetic domain image generation in twisted van der Waals magnets Woo Seok Lee et.al. 2402.11434 null
2024-02-17 TC-DiffRecon: Texture coordination MRI reconstruction method based on diffusion model and modified MF-UNet method Chenyan Zhang et.al. 2402.11274 link
2024-02-16 The Male CEO and the Female Assistant: Probing Gender Biases in Text-To-Image Models Through Paired Stereotype Test Yixin Wan et.al. 2402.11089 null
2024-02-16 Universal Prompt Optimizer for Safe Text-to-Image Generation Zongyu Wu et.al. 2402.10882 link
2024-02-16 Exploring Precision and Recall to assess the quality and diversity of LLMs Le Bronnec Florian et.al. 2402.10693 link
2024-02-16 Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation Lanqing Guo et.al. 2402.10491 link
2024-02-16 UMAIR-FPS: User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style Yan Kang et.al. 2402.10381 link
2024-02-15 Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation Huizhuo Yuan et.al. 2402.10210 null
2024-02-15 Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning Euclid Collaboration et.al. 2402.10187 link
2024-02-15 Classification Diffusion Models Shahar Yadin et.al. 2402.10095 null
2024-02-15 Accelerating Parallel Sampling of Diffusion Models Zhiwei Tang et.al. 2402.09970 link
2024-02-15 Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation Junjie Shentu et.al. 2402.09966 link
2024-02-14 Magic-Me: Identity-Specific Video Customized Diffusion Ze Ma et.al. 2402.09368 link
2024-02-14 Switch EMA: A Free Lunch for Better Flatness and Sharpness Siyuan Li et.al. 2402.09240 link
2024-02-14 L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects Yutaro Yamada et.al. 2402.09052 null
2024-02-14 Multi-modality transrectal ultrasound vudei classification for identification of clinically significant prostate cancer Hong Wu et.al. 2402.08987 link
2024-02-13 Towards the Detection of AI-Synthesized Human Face Images Yuhang Lu et.al. 2402.08750 null
2024-02-13 IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation Luke Melas-Kyriazi et.al. 2402.08682 null
2024-02-13 Learning Continuous 3D Words for Text-to-Image Generation Ta-Ying Cheng et.al. 2402.08654 link
2024-02-13 Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models Jason Tang et.al. 2402.08532 null
2024-02-12 Using AI for Wavefront Estimation with the Rubin Observatory Active Optics System John Franklin Crenshaw et.al. 2402.08094 null
2024-02-14 Score-based generative models break the curse of dimensionality in learning a family of sub-Gaussian probability distributions Frank Cole et.al. 2402.08082 null
2024-02-12 Trustworthy SR: Resolving Ambiguity in Image Super-resolution via Diffusion Models and Human Feedback Cansu Korkmaz et.al. 2402.07597 null
2024-02-11 The Aleph & Other Metaphors for Image Generation Gonzalo Ramos et.al. 2402.07104 null
2024-02-10 Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework Mengqi Wu et.al. 2402.06875 null
2024-02-09 Cardiac ultrasound simulation for autonomous ultrasound navigation Abdoul Aziz Amadou et.al. 2402.06463 null
2024-02-08 Collaborative Control for Geometry-Conditioned PBR Image Generation Shimon Vainer et.al. 2402.05919 null
2024-02-08 Scalable Diffusion Models with State Space Backbone Zhengcong Fei et.al. 2402.05608 link
2024-02-08 Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application Bumsoo Kim et.al. 2402.05448 null
2024-02-08 Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport Jaemoo Choi et.al. 2402.05443 null
2024-02-09 Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models Nicholas Konz et.al. 2402.05210 link
2024-02-07 ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12 Liuqing Chen et.al. 2402.04975 null
2024-02-07 Text2Street: Controllable Text-to-image Generation for Street Views Jinming Su et.al. 2402.04504 null
2024-02-07 ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation Jirayu Burapacheep et.al. 2402.04492 link
2024-02-06 Denoising Diffusion Probabilistic Models in Six Simple Steps Richard E. Turner et.al. 2402.04384 null
2024-02-06 ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation Weiming Ren et.al. 2402.04324 link
2024-02-06 QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning Haoxuan Wang et.al. 2402.03666 link
2024-02-05 Projected Generative Diffusion Models for Constraint Satisfaction Jacob K Christopher et.al. 2402.03559 link
2024-02-05 Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images Xiaodan Xing et.al. 2402.03473 null
2024-02-05 Do Diffusion Models Learn Semantically Meaningful and Efficient Representations? Qiyao Liang et.al. 2402.03305 null
2024-02-05 InstanceDiffusion: Instance-level Control for Image Generation Xudong Wang et.al. 2402.03290 link
2024-02-05 Training-Free Consistent Text-to-Image Generation Yoad Tewel et.al. 2402.03286 null
2024-02-05 IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR images Vincent Roca et.al. 2402.03227 link
2024-02-05 Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion Shiyuan Yang et.al. 2402.03162 null
2024-02-05 InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Yiyuan Zhang et.al. 2402.03040 link
2024-02-05 SynthVision – Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data Yudara Kularathne et.al. 2402.02826 null
2024-02-04 DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing Chong Mou et.al. 2402.02583 link
2024-02-04 M $^3$ Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing Mohammadreza Mofayezi et.al. 2402.02369 null
2024-02-03 DeCoF: Generated Video Detection via Frame Consistency Long Ma et.al. 2402.02085 link
2024-02-02 NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties Jingyuan Sun et.al. 2402.01590 null
2024-02-02 The galactic bubbles of starburst galaxies The influence of galactic large-scale magnetic fields Z. Meliani et.al. 2402.01541 null
2024-02-02 Cross-view Masked Diffusion Transformers for Person Image Synthesis Trung X. Pham et.al. 2402.01516 link
2024-02-02 Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors Dingcheng Yang et.al. 2402.01369 link
2024-02-02 Can MLLMs Perform Text-to-Image In-Context Learning? Yuchen Zeng et.al. 2402.01293 link
2024-02-02 Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion? Cristian Sbrolli et.al. 2402.01241 null
2024-02-01 AI-generated faces free from racial and gender stereotypes Nouar AlDahoul et.al. 2402.01002 link
2024-02-01 Examining the Influence of Digital Phantom Models in Virtual Imaging Trials for Tomographic Breast Imaging Amar Kavuri et.al. 2402.00812 null
2024-02-01 AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Fu-Yun Wang et.al. 2402.00769 link
2024-02-01 DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras Weixing Xie et.al. 2402.00740 null
2024-01-31 SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation Yang Li et.al. 2402.00631 null
2024-02-01 CapHuman: Capture Your Moments in Parallel Universes Chao Liang et.al. 2402.00627 link
2024-02-01 Masked Conditional Diffusion Model for Enhancing Deepfake Detection Tiewen Chen et.al. 2402.00541 null
2024-02-01 High-Quality Medical Image Generation from Free-hand Sketch Quan Huu Cap et.al. 2402.00353 null
2024-02-01 Machine Unlearning for Image-to-Image Generative Models Guihong Li et.al. 2402.00351 link
2024-01-31 Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation Yuanhuiyi Lyu et.al. 2401.17664 null
2024-01-31 Head and Neck Tumor Segmentation from [18F]F-FDG PET/CT Images Based on 3D Diffusion Model Yafei Dong et.al. 2401.17593 null
2024-01-31 Task-Oriented Diffusion Model Compression Geonung Kim et.al. 2401.17547 null
2024-01-31 Fréchet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels Negar Arabzadeh et.al. 2401.17543 null
2024-01-30 OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision Bruno Berenguel-Baeta et.al. 2401.17061 link
2024-01-30 Repositioning the Subject within Image Yikai Wang et.al. 2401.16861 link
2024-01-30 X-ray Image Generation as a Method of Performance Prediction for Real-Time Inspection: a Case Study Vladyslav Andriiashen et.al. 2401.16847 link
2024-01-29 Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors Shiyin Dong et.al. 2401.16459 null
2024-01-29 Spatial-Aware Latent Initialization for Controllable Image Generation Wenqiang Sun et.al. 2401.16157 null
2024-01-31 Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You Felix Friedrich et.al. 2401.16092 link
2024-01-29 Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Xiaoyu Shi et.al. 2401.15977 null
2024-01-29 Diffusion Facial Forgery Detection Harry Cheng et.al. 2401.15859 link
2024-01-29 2L3: Lifting Imperfect Generated 2D Images into Accurate 3D Yizheng Chen et.al. 2401.15841 null
2024-01-28 Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding Jianxiang Lu et.al. 2401.15708 null
2024-01-28 Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation Zhenyu Wang et.al. 2401.15688 null
2024-01-28 IntentTuner: An Interactive Framework for Integrating Human Intents in Fine-tuning Text-to-Image Generative Models Xingchen Zeng et.al. 2401.15559 null
2024-01-27 GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis Jing Hao et.al. 2401.15282 link
2024-01-26 Annotated Hands for Generative Models Yue Yang et.al. 2401.15075 link
2024-01-26 Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support Xiaojun Wu et.al. 2401.14688 link
2024-01-25 Deconstructing Denoising Diffusion Models for Self-Supervised Learning Xinlei Chen et.al. 2401.14404 null
2024-01-25 UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation and Diffusion Models Timo Kapsalis et.al. 2401.14379 null
2024-01-26 Image Synthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs Rameshwar Mishra et.al. 2401.14111 null
2024-01-25 CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion Nisha Huang et.al. 2401.14066 link
2024-01-25 Diffusion-based Data Augmentation for Object Counting Problems Zhen Wang et.al. 2401.13992 null
2024-01-25 Learning to Manipulate Artistic Images Wei Guo et.al. 2401.13976 link
2024-01-25 BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models Senthil Purushwalkam et.al. 2401.13974 link
2024-01-25 A New Image Quality Database for Multiple Industrial Processes Xuanchao Ma et.al. 2401.13956 null
2024-01-25 StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models Yalong Bai et.al. 2401.13942 null
2024-01-24 Research about the Ability of LLM in the Tamper-Detection Area Xinyu Yang et.al. 2401.13504 null
2024-01-24 UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion Wei Li et.al. 2401.13388 null
2024-01-24 Deep Learning for Improved Polyp Detection from Synthetic Narrow-Band Imaging Mathias Ramm Haugland et.al. 2401.13315 null
2024-01-24 Choose Your Diffusion: Efficient and flexible ways to accelerate the diffusion model in fast high energy physics simulation Cheng Jiang et.al. 2401.13162 null
2024-01-23 CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data Chandrakanth Gudavalli et.al. 2401.13006 null
2024-01-23 Lumiere: A Space-Time Diffusion Model for Video Generation Omer Bar-Tal et.al. 2401.12945 null
2024-01-23 A Unified Generation-Registration Framework for Improved MR-based CT Synthesis in Proton Therapy Xia Li et.al. 2401.12878 null
2024-01-23 UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators Hengjia Li et.al. 2401.12596 null
2024-01-23 The Neglected Tails of Vision-Language Models Shubham Parashar et.al. 2401.12425 null
2024-01-20 Large-scale Reinforcement Learning for Diffusion Models Yinan Zhang et.al. 2401.12244 null
2024-01-23 Control of OSIRIS-REx OTES Observations using OCAMS TAG Images Kris J. Becker et.al. 2401.12177 null
2024-01-22 Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Ling Yang et.al. 2401.11708 link
2024-01-21 Text-to-Image Cross-Modal Generation: A Systematic Review Maciej Żelaszczyk et.al. 2401.11631 null
2024-01-21 Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Katherine Crowson et.al. 2401.11605 link
2024-01-19 Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion Zuoyue Li et.al. 2401.10786 null
2024-01-18 Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution Xin Yuan et.al. 2401.10404 null
2024-01-22 Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation Changgu Chen et.al. 2401.10150 null
2024-01-18 DiffusionGPT: LLM-Driven Text-to-Image Generation System Jie Qin et.al. 2401.10061 null
2024-01-18 WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Xiaofeng Wang et.al. 2401.09985 null
2024-01-18 CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects Zhao Wang et.al. 2401.09962 null
2024-01-17 MITS-GAN: Safeguarding Medical Imaging from Tampering with Generative Adversarial Networks Giovanni Pasqualino et.al. 2401.09624 link
2024-01-17 Efficient generative adversarial networks using linear additive-attention Transformers Emilio Morales-Juarez et.al. 2401.09596 link
2024-01-17 Vlogger: Make Your Dream A Vlog Shaobin Zhuang et.al. 2401.09414 link
2024-01-17 UniVG: Towards UNIfied-modal Video Generation Ludan Ruan et.al. 2401.09084 null
2024-01-17 VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models Haoxin Chen et.al. 2401.09047 link
2024-01-16 Fast Dynamic 3D Object Generation from a Single-view Video Zijie Pan et.al. 2401.08742 link
2024-01-16 Fixed Point Diffusion Models Xingjian Bai et.al. 2401.08741 link
2024-01-16 Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks Chenyu Zhang et.al. 2401.08725 link
2024-01-16 Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data Yuhui Zhang et.al. 2401.08567 link
2024-01-16 Instilling Multi-round Thinking to Text-guided Image Generation Lidong Zeng et.al. 2401.08472 null
2024-01-16 Key-point Guided Deformable Image Manipulation Using Diffusion Model Seok-Hwan Oh et.al. 2401.08178 null
2024-01-16 E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning Qiang Qu et.al. 2401.08117 link
2024-01-16 SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation Zhixuan Liu et.al. 2401.08053 null
2024-01-15 Towards A Better Metric for Text-to-Video Generation Jay Zhangjie Wu et.al. 2401.07781 null
2024-01-15 Collaboratively Self-supervised Video Representation Learning for Action Recognition Jie Zhang et.al. 2401.07584 null
2024-01-15 InstantID: Zero-shot Identity-Preserving Generation in Seconds Qixun Wang et.al. 2401.07519 link
2024-01-14 Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs Viktor Seib et.al. 2401.07370 null
2024-01-13 Quantum Denoising Diffusion Models Michael Kölle et.al. 2401.07049 null
2024-01-13 Progressive Feature Fusion Network for Enhancing Image Quality Assessment Kaiqun Wu et.al. 2401.06992 null
2024-01-12 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model Qian Wang et.al. 2401.06578 null
2024-01-12 Beyond the Surface: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation Akshita Jha et.al. 2401.06310 link
2024-01-11 Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications Yuwen Xiong et.al. 2401.06197 link
2024-01-10 AI Art is Theft: Labour, Extraction, and Exploitation, Or, On the Dangers of Stochastic Pollocks Trystan S. Goetze et.al. 2401.06178 null
2024-01-11 RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks Partha Ghosh et.al. 2401.06035 null
2024-01-11 EraseDiff: Erasing Data Influence in Diffusion Models Jing Wu et.al. 2401.05779 link
2024-01-11 Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification Kunpeng Qiu et.al. 2401.05771 link
2024-01-11 Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation Seung Hyun Lee et.al. 2401.05675 null
2024-01-10 From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho Heritage Marcellus Amadeus et.al. 2401.05520 null
2024-01-10 PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Junsong Chen et.al. 2401.05252 link
2024-01-09 Content-Conditioned Generation of Stylized Free hand Sketches Jiajun Liu et.al. 2401.04739 null
2024-01-09 Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks Tanmay Garg et.al. 2401.04647 null
2024-01-09 EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models Jingyuan Yang et.al. 2401.04608 null
2024-01-09 Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models Xuewen Liu et.al. 2401.04585 link
2024-01-09 Let’s Go Shopping (LGS) – Web-Scale Image-Text Dataset for Visual Concept Understanding Yatong Bai et.al. 2401.04575 null
2024-01-09 MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation Weimin Wang et.al. 2401.04468 null
2024-01-09 Vision Reimagined: AI-Powered Breakthroughs in WiFi Indoor Imaging Jianyang Shi et.al. 2401.04317 null
2024-01-07 A Classification of Critical Configurations for any Number of Projective Views Martin Bråtelund et.al. 2401.03450 link
2024-01-05 Latte: Latent Diffusion Transformer for Video Generation Xin Ma et.al. 2401.03048 link
2024-01-05 Dataset of turbulent flow over interacting barchan dunes Jimmy Gabriel Alvarez et.al. 2401.03032 null
2024-01-04 VASE: Object-Centric Appearance and Shape Manipulation of Real Videos Elia Peruzzo et.al. 2401.02473 null
2024-01-04 Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection Yabin Wang et.al. 2401.02335 link
2024-01-04 Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and Geometry Xinzhe Luo et.al. 2401.02141 null
2024-01-04 Improving Diffusion-Based Image Synthesis with Context Prediction Ling Yang et.al. 2401.02015 null
2024-01-03 Instruct-Imagen: Image Generation with Multi-modal Instruction Hexiang Hu et.al. 2401.01952 null
2024-01-03 Can We Generate Realistic Hands Only Using Convolution? Mehran Hosseini et.al. 2401.01951 null
2024-01-03 A Vision Check-up for Language Models Pratyusha Sharma et.al. 2401.01862 null
2024-01-03 Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions David Junhao Zhang et.al. 2401.01827 link
2024-01-03 aMUSEd: An Open MUSE Reproduction Suraj Patil et.al. 2401.01808 link
2024-01-03 Few-shot Image Generation via Information Transfer from the Built Geodesic Surface Yuexing Han et.al. 2401.01749 null
2024-01-03 An Edge-Cloud Collaboration Framework for Generative AI Service Provision with Synergetic Big Cloud Model and Small Edge Models Yuqing Tian et.al. 2401.01666 null
2024-01-03 AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI Fanda Fan et.al. 2401.01651 link
2024-01-02 VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics Ammar A. Siddiqui et.al. 2401.01414 null
2024-01-02 VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM Fuchen Long et.al. 2401.01256 link
2024-01-02 Joint Generative Modeling of Scene Graphs and Images via Diffusion Models Bicheng Xu et.al. 2401.01130 null
2024-01-02 SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM Weijin Cheng et.al. 2401.01128 null
2023-12-31 TrailBlazer: Trajectory Control for Diffusion-Based Video Generation Wan-Duo Kurt Ma et.al. 2401.00896 null
2023-12-30 Improving the Stability of Diffusion Models for Content Consistent Super-Resolution Lingchen Sun et.al. 2401.00877 link
2024-01-01 New Job, New Gender? Measuring the Social Bias in Image Generation Models Wenxuan Wang et.al. 2401.00763 link
2024-01-01 DiffMorph: Text-less Image Morphing with Diffusion Models Shounak Chatterjee et.al. 2401.00739 null
2023-12-31 Generative Model-Driven Synthetic Training Image Generation: An Approach to Cognition in Rail Defect Detection Rahatara Ferdousi et.al. 2401.00393 link
2023-12-30 GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image Generation M. AbdulRazek et.al. 2401.00314 null
2023-12-30 CamPro: Camera-based Anti-Facial Recognition Wenjun Zhu et.al. 2401.00151 link
2023-12-27 RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement Fan Shi et.al. 2312.17274 null
2023-12-28 Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Jiasen Lu et.al. 2312.17172 link
2023-12-27 Prompt Expansion for Adaptive Text-to-Image Generation Siddhartha Datta et.al. 2312.16720 null
2023-12-27 I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models Xun Guo et.al. 2312.16693 link
2023-12-27 Participatory prompting: a user-centric research method for eliciting AI assistance opportunities in knowledge workflows Advait Sarkar et.al. 2312.16633 null
2023-12-27 A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss Xiao Fang et.al. 2312.16498 link
2023-12-29 PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion Guansong Lu et.al. 2312.16486 null
2023-12-27 Bellman Optimal Step-size Straightening of Flow-Matching Models Bao Nguyen et.al. 2312.16414 link
2023-12-26 SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation Yuxuan Zhang et.al. 2312.16272 link
2023-12-26 One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications Mengyao Lyu et.al. 2312.16145 null
2023-12-26 Semantic Guidance Tuning for Text-To-Image Diffusion Models Hyun Kang et.al. 2312.15964 link
2023-12-26 Cross Initialization for Personalized Text-to-Image Generation Lianyu Pang et.al. 2312.15905 link
2023-12-25 A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Xiang Wang et.al. 2312.15770 null
2023-12-25 High-Fidelity Diffusion-based Image Editing Chen Hou et.al. 2312.15707 null
2023-12-24 Make-A-Character: High Quality Text-to-3D Character Generation within Minutes Jianqiang Ren et.al. 2312.15430 null
2023-12-23 Prompt-Propose-Verify: A Reliable Hand-Object-Interaction Data Generation Framework using Foundational Models Gurusha Juneja et.al. 2312.15247 null
2023-12-22 Generative AI and the History of Architecture Joern Ploennigs et.al. 2312.15106 null
2023-12-22 Emage: Non-Autoregressive Text-to-Image Generation Zhangyin Feng et.al. 2312.14988 null
2023-12-22 VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation Max Ku et.al. 2312.14867 link
2023-12-22 Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks Haz Sameen Shahgir et.al. 2312.14440 link
2023-12-21 Fine-grained Forecasting Models Via Gaussian Process Blurring Effect Sepideh Koohfar et.al. 2312.14280 link
2023-12-21 VCoder: Versatile Vision Encoders for Multimodal Large Language Models Jitesh Jain et.al. 2312.14233 link
2023-12-21 Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation Nina Weng et.al. 2312.14223 link
2023-12-21 VideoPoet: A Large Language Model for Zero-Shot Video Generation Dan Kondratyuk et.al. 2312.14125 null
2023-12-21 Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models Huan Ling et.al. 2312.13763 null
2023-12-21 DreamTuner: Single Image is Enough for Subject-Driven Generation Miao Hua et.al. 2312.13691 null
2023-12-21 Free-Editor: Zero-shot Text-driven 3D Scene Editing Nazmul Karim et.al. 2312.13663 link
2023-12-21 Diff-Oracle: Diffusion Model for Oracle Character Generation with Controllable Styles and Contents Jing Li et.al. 2312.13631 null
2023-12-21 Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting Junwu Zhang et.al. 2312.13271 link
2023-12-20 Conditional Image Generation with Pretrained Generative Model Rajesh Shrestha et.al. 2312.13253 null
2023-12-21 Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation Hongtao Wu et.al. 2312.13139 null
2023-12-20 Quantifying Bias in Text-to-Image Generative Models Jordan Vice et.al. 2312.13053 null
2023-12-20 A self-attention-based differentially private tabular GAN with high data utility Zijian Li et.al. 2312.13031 null
2023-12-20 All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models Seunghoo Hong et.al. 2312.12807 null
2023-12-19 Surf-CDM: Score-Based Surface Cold-Diffusion Model For Medical Image Segmentation Fahim Ahmed Zaman et.al. 2312.12649 null
2023-12-19 RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing Shutong Jin et.al. 2312.12635 null
2023-12-19 On Inference Stability for Diffusion Models Viet Nguyen et.al. 2312.12431 link
2023-12-19 Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models Shweta Mahajan et.al. 2312.12416 null
2023-12-19 Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model Lingjun Zhang et.al. 2312.12232 link
2023-12-19 Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method Jiachun Pan et.al. 2312.12030 link
2023-12-19 Decoupled Textual Embeddings for Customized Image Generation Yufei Cai et.al. 2312.11826 link
2023-12-18 Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics Yesukhei Jagvaral et.al. 2312.11707 null
2023-12-18 SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing Zeyinzi Jiang et.al. 2312.11392 link
2023-12-18 Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model Decheng Liu et.al. 2312.11285 link
2023-12-18 MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising Bingyuan Wang et.al. 2312.10899 null
2023-12-18 The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses Mahmoud Ahmed et.al. 2312.10854 null
2023-12-17 VidToMe: Video Token Merging for Zero-Shot Video Editing Xirui Li et.al. 2312.10656 link
2023-12-17 Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability Jaehui Hwang et.al. 2312.10634 null
2023-12-16 Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection Chuangchuang Tan et.al. 2312.10461 link
2023-12-16 DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content Wentao Wang et.al. 2312.10407 link
2023-12-16 Fusing Conditional Submodular GAN and Programmatic Weak Supervision Kumar Shubham et.al. 2312.10366 link
2023-12-16 Operator-learning-inspired Modeling of Neural Ordinary Differential Equations Woojin Cho et.al. 2312.10274 null
2023-12-15 Rich Human Feedback for Text-to-Image Generation Youwei Liang et.al. 2312.10240 link
2023-12-15 Data-Efficient Multimodal Fusion on a Single GPU Noël Vouitsis et.al. 2312.10144 link
2023-12-15 Data and Approaches for German Text simplification – towards an Accessibility-enhanced Communication Thorben Schomacker et.al. 2312.09966 null
2023-12-14 High-Resolution Maps of Left Atrial Displacements and Strains Estimated with 3D CINE MRI and Unsupervised Neural Networks Christoforos Galazis et.al. 2312.09387 link
2023-12-14 ArchiGuesser – AI Art Architecture Educational Game Joern Ploennigs et.al. 2312.09334 link
2023-12-14 LIME: Localized Image Editing via Attention Regularization in Diffusion Models Enis Simsar et.al. 2312.09256 null
2023-12-14 FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection Hongsuk Choi et.al. 2312.09252 null
2023-12-14 VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation Jinguo Zhu et.al. 2312.09251 link
2023-12-14 Fast Sampling via De-randomization for Discrete Diffusion Models Zixiang Chen et.al. 2312.09193 null
2023-12-14 VideoLCM: Video Latent Consistency Model Xiang Wang et.al. 2312.09109 null
2023-12-13 SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance Yuanyou Xu et.al. 2312.08889 null
2023-12-14 Agent Attention: On the Integration of Softmax and Linear Attention Dongchen Han et.al. 2312.08874 link
2023-12-14 Local Conditional Controlling for Text-to-Image Diffusion Models Yibo Zhao et.al. 2312.08768 link
2023-12-13 A Survey of Generative AI for Intelligent Transportation Systems Huan Yan et.al. 2312.08248 null
2023-12-13 Black-box Membership Inference Attacks against Fine-tuned Diffusion Models Yan Pang et.al. 2312.08207 link
2023-12-13 $ρ$ -Diffusion: A diffusion-based density estimation framework for computational physics Maxwell X. Cai et.al. 2312.08153 link
2023-12-13 Clockwork Diffusion: Efficient Generation With Model-Step Distillation Amirhossein Habibian et.al. 2312.08128 link
2023-12-13 3DGEN: A GAN-based approach for generating novel 3D models from image data Antoine Schnepf et.al. 2312.08094 null
2023-12-13 Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision Shengguang Wu et.al. 2312.08056 null
2023-12-13 AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing Zhiyuan Ma et.al. 2312.08019 link
2023-12-13 Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users Tianxun Zhou et.al. 2312.07854 null
2023-12-13 Stable Rivers: A Case Study in the Application of Text-to-Image Generative Models for Earth Sciences C Kupferschmidt et.al. 2312.07833 null
2023-12-12 FreeInit: Bridging Initialization Gap in Video Diffusion Models Tianxing Wu et.al. 2312.07537 link
2023-12-12 FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition Sicheng Mo et.al. 2312.07536 null
2023-12-12 PEEKABOO: Interactive Video Generation via Masked-Diffusion Yash Jain et.al. 2312.07509 link
2023-12-12 How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation Zhongyi Han et.al. 2312.07424 link
2023-12-12 DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing Kaiwen Zhang et.al. 2312.07409 null
2023-12-12 Learned representation-guided diffusion models for large-image generation Alexandros Graikos et.al. 2312.07330 link
2023-12-12 Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description Mianzhi Pan et.al. 2312.07294 null
2023-12-12 Image Content Generation with Causal Reasoning Xiaochuan Li et.al. 2312.07132 link
2023-12-12 Divide-and-Conquer Attack: Harnessing the Power of LLM to Bypass the Censorship of Text-to-Image Generation Model Yimo Deng et.al. 2312.07130 link
2023-12-11 User Friendly and Adaptable Discriminative AI: Using the Lessons from the Success of LLMs and Image Generation Models Son The Nguyen et.al. 2312.06826 null
2023-12-11 Photorealistic Video Generation with Diffusion Models Agrim Gupta et.al. 2312.06662 null
2023-12-11 ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models Denis Zavadski et.al. 2312.06573 link
2023-12-11 PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization Xu Peng et.al. 2312.06354 null
2023-12-11 Compensation Sampling for Improved Convergence in Diffusion Models Hui Lu et.al. 2312.06285 link
2023-12-11 UIEDP:Underwater Image Enhancement with Diffusion Prior Dazhao Du et.al. 2312.06240 link
2023-12-11 Invariant Representation Learning via Decoupling Style and Spurious Features Ruimeng Li et.al. 2312.06226 null
2023-12-11 Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods Panos Achlioptas et.al. 2312.06116 null
2023-12-10 Correcting Diffusion Generation through Resampling Yujian Liu et.al. 2312.06038 link
2023-12-10 Disentangled Representation Learning for Controllable Person Image Generation Wenju Xu et.al. 2312.05798 null
2023-12-10 AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model Teng Hu et.al. 2312.05767 link
2023-12-08 SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation Thuan Hoang Nguyen et.al. 2312.05239 link
2023-12-08 DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models Mengyang Feng et.al. 2312.05107 null
2023-12-08 SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control Jaskirat Singh et.al. 2312.05039 null
2023-12-08 Synthesizing Traffic Datasets using Graph Neural Networks Daniel Rodriguez-Criado et.al. 2312.05031 link
2023-12-08 UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models Yiming Zhao et.al. 2312.04884 link
2023-12-08 MVDD: Multi-View Depth Diffusion Models Zhen Wang et.al. 2312.04875 null
2023-12-08 RS-Corrector: Correcting the Racial Stereotypes in Latent Diffusion Models Yue Jiang et.al. 2312.04810 null
2023-12-07 ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations Maitreya Patel et.al. 2312.04655 null
2023-12-07 Autoencoding Labeled Interpolator, Inferring Parameters From Image, And Image From Parameters Ali SaraerToosi et.al. 2312.04640 null
2023-12-07 Scaling Laws of Synthetic Images for Model Training … for Now Lijie Fan et.al. 2312.04567 link
2023-12-07 Gen2Det: Generate to Detect Saksham Suri et.al. 2312.04566 null
2023-12-07 GenDeF: Learning Generative Deformation Field for Video Generation Wen Wang et.al. 2312.04561 null
2023-12-07 GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation Shoufa Chen et.al. 2312.04557 null
2023-12-07 Generating Illustrated Instructions Sachit Menon et.al. 2312.04552 link
2023-12-07 Free3D: Consistent Novel View Synthesis without 3D Representation Chuanxia Zheng et.al. 2312.04551 link
2023-12-07 Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation Zhiwu Qing et.al. 2312.04483 link
2023-12-07 PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Zhen Li et.al. 2312.04461 link
2023-12-07 DreamVideo: Composing Your Dream Videos with Customized Subject and Motion Yujie Wei et.al. 2312.04433 link
2023-12-07 Approximate Caching for Efficiently Serving Diffusion Models Shubham Agarwal et.al. 2312.04429 null
2023-12-06 Self-conditioned Image Generation via Generating Representations Tianhong Li et.al. 2312.03701 link
2023-12-06 Memory Triggers: Unveiling Memorization in Text-To-Image Generative Models through Word-Level Duplication Ali Naseh et.al. 2312.03692 null
2023-12-06 MotionCtrl: A Unified and Flexible Motion Controller for Video Generation Zhouxia Wang et.al. 2312.03641 link
2023-12-06 TokenCompose: Grounding Diffusion with Token-level Supervision Zirui Wang et.al. 2312.03626 link
2023-12-06 DiffusionSat: A Generative Foundation Model for Satellite Imagery Samar Khanna et.al. 2312.03606 null
2023-12-06 Context Diffusion: In-Context Aware Image Generation Ivona Najdenkoska et.al. 2312.03584 null
2023-12-06 FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation Olivia Markham et.al. 2312.03540 null
2023-12-06 FRDiff: Feature Reuse for Exquisite Zero-shot Acceleration of Diffusion Models Junhyuk So et.al. 2312.03517 null
2023-12-06 Kandinsky 3.0 Technical Report Vladimir Arkhipkin et.al. 2312.03511 link
2023-12-06 Data-driven Crop Growth Simulation on Time-varying Generated Images using Multi-conditional Generative Adversarial Networks Lukas Drees et.al. 2312.03443 link
2023-12-05 GPT4Point: A Unified Framework for Point-Language Understanding and Generation Zhangyang Qi et.al. 2312.02980 null
2023-12-05 MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures Zhangyang Xiong et.al. 2312.02963 null
2023-12-05 WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation Jiachen Lu et.al. 2312.02934 link
2023-12-05 LivePhoto: Real Image Animation with Text-guided Motion Control Xi Chen et.al. 2312.02928 null
2023-12-05 Fine-grained Controllable Video Generation via Object Appearance and Context Hsin-Ping Huang et.al. 2312.02919 null
2023-12-05 BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models Fengyuan Shi et.al. 2312.02813 link
2023-12-05 Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler Philippe Gonzalez et.al. 2312.02683 null
2023-12-05 FaceStudio: Put Your Face Everywhere in Seconds Yuxuan Yan et.al. 2312.02663 null
2023-12-05 GeNIe: Generative Hard Negative Images Through Diffusion Soroush Abbasi Koohpayegani et.al. 2312.02548 link
2023-12-05 Retrieving Conditions from Reference Images for Diffusion Models Haoran Tang et.al. 2312.02521 null
2023-12-04 Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Bingxin Ke et.al. 2312.02145 link
2023-12-04 DiffiT: Diffusion Vision Transformers for Image Generation Ali Hatamizadeh et.al. 2312.02139 link
2023-12-04 Style Aligned Image Generation via Shared Attention Amir Hertz et.al. 2312.02133 link
2023-12-04 GIVT: Generative Infinite-Vocabulary Transformers Michael Tschannen et.al. 2312.02116 link
2023-12-04 UniGS: Unified Representation for Image Generation and Segmentation Lu Qi et.al. 2312.01985 link
2023-12-04 InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models Xunguang Wang et.al. 2312.01886 link
2023-12-04 Fully Spiking Denoising Diffusion Implicit Models Ryo Watanabe et.al. 2312.01742 link
2023-12-04 ResEnsemble-DDPM: Residual Denoising Diffusion Probabilistic Models for Ensemble Learning Shi Zhenning et.al. 2312.01682 null
2023-12-03 Diffusion Posterior Sampling for Nonlinear CT Reconstruction Shudong Li et.al. 2312.01464 null
2023-12-03 Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models Shengqu Cai et.al. 2312.01409 null
2023-12-01 VideoBooth: Diffusion-based Video Generation with Image Prompts Yuming Jiang et.al. 2312.00777 null
2023-12-01 StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter Gongye Liu et.al. 2312.00330 link
2023-11-30 S2ST: Image-to-Image Translation in the Seed Space of Latent Diffusion Or Greenberg et.al. 2312.00116 null
2023-11-30 VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models Zhen Xing et.al. 2311.18837 null
2023-11-30 ART $\boldsymbol{\cdot}$ V: Auto-Regressive Text-to-Video Generation with Diffusion Models Wenming Weng et.al. 2311.18834 null
2023-11-30 MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation Yanhui Wang et.al. 2311.18829 null
2023-11-30 One-step Diffusion with Distribution Matching Distillation Tianwei Yin et.al. 2311.18828 null
2023-11-30 ElasticDiffusion: Training-free Arbitrary Size Image Generation Moayed Haji-Ali et.al. 2311.18822 link
2023-11-30 CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation Zineng Tang et.al. 2311.18775 null
2023-11-30 CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model Jianhao Zeng et.al. 2311.18405 link
2023-11-30 Situating the social issues of image generation models in the model life cycle: a sociotechnical approach Amelia Katirai et.al. 2311.18345 null
2023-11-30 Diffusion Models Without Attention Jing Nathan Yan et.al. 2311.18257 null
2023-11-30 Few-shot Image Generation via Style Adaptation and Content Preservation Xiaosheng He et.al. 2311.18169 null
2023-11-29 SODA: Bottleneck Diffusion Models for Representation Learning Drew A. Hudson et.al. 2311.17901 null
2023-11-29 Analyzing and Explaining Image Classifiers via Diffusion Guidance Maximilian Augustin et.al. 2311.17833 link
2023-11-29 BAND-2k: Banding Artifact Noticeable Database for Banding Detection and Quality Assessment Zijian Chen et.al. 2311.17752 link
2023-11-29 Fair Text-to-Image Diffusion via Fair Mapping Jia Li et.al. 2311.17695 null
2023-11-29 Query-Relevant Images Jailbreak Large Multi-Modal Models Xin Liu et.al. 2311.17600 link
2023-11-29 Non-Visible Light Data Synthesis and Application: A Case Study for Synthetic Aperture Radar Imagery Zichen Tian et.al. 2311.17486 null
2023-11-29 When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation Xiaoming Li et.al. 2311.17461 link
2023-11-29 VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model Haoyu Zhao et.al. 2311.17338 link
2023-11-28 Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation Hang Li et.al. 2311.17216 null
2023-11-28 Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features Niladri Shekhar Dutt et.al. 2311.17024 link
2023-11-28 COLE: A Hierarchical Generation Framework for Graphic Design Peidong Jia et.al. 2311.16974 null
2023-11-28 SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models Yuwei Guo et.al. 2311.16933 null
2023-11-28 Denoising Diffusion Probabilistic Models for Image Inpainting of Cell Distributions in the Human Brain Jan-Oliver Kropp et.al. 2311.16821 null
2023-11-28 Panacea: Panoramic and Controllable Video Generation for Autonomous Driving Yuqing Wen et.al. 2311.16813 null
2023-11-28 Multi-Channel Cross Modal Detection of Synthetic Face Images M. Ibsen et.al. 2311.16773 link
2023-11-28 MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation Sitong Su et.al. 2311.16635 null
2023-11-28 MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices Yang Zhao et.al. 2311.16567 null
2023-11-28 Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks Ye Lin Tun et.al. 2311.16538 link
2023-11-28 Text-Driven Image Editing via Learnable Regions Yuanze Lin et.al. 2311.16432 link
2023-11-27 Self-correcting LLM-controlled Diffusion Models Tsung-Han Wu et.al. 2311.16090 link
2023-11-27 ViT-Lens-2: Gateway to Omni-modal Intelligence Weixian Lei et.al. 2311.16081 link
2023-11-27 Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion Yuanxun Lu et.al. 2311.15980 null
2023-11-27 Tell2Design: A Dataset for Language-Guided Floor Plan Generation Sicong Leng et.al. 2311.15941 link
2023-11-27 Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation Siteng Huang et.al. 2311.15841 null
2023-11-27 FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax Yu Lu et.al. 2311.15813 null
2023-11-27 C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing Avigyan Bhattacharya et.al. 2311.15812 null
2023-11-27 Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation Biao Gong et.al. 2311.15773 null
2023-11-27 Reinforcement Learning from Diffusion Feedback: Q* for Image Search Aboli Marathe et.al. 2311.15648 null
2023-11-27 ET3D: Efficient Text-to-3D Generation via Multi-View Distillation Yiming Chen et.al. 2311.15561 null
2023-11-24 CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization Ruoyu Zhao et.al. 2311.14631 null
2023-11-24 MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation Zhiqi Li et.al. 2311.14494 link
2023-11-24 Decouple Content and Motion for Conditional Image-to-Video Generation Cuifeng Shen et.al. 2311.14294 null
2023-11-24 Paragraph-to-Image Generation with Information-Enriched Diffusion Model Weijia Wu et.al. 2311.14284 link
2023-11-24 Image Super-Resolution with Text Prompt Diffusion Zheng Chen et.al. 2311.14282 link
2023-11-23 ACT: Adversarial Consistency Models Fei Kong et.al. 2311.14097 link
2023-11-22 The Challenges of Image Generation Models in Generating Multi-Component Images Tham Yik Foong et.al. 2311.13620 null
2023-11-22 Guided Flows for Generative Modeling and Decision Making Qinqing Zheng et.al. 2311.13443 null
2023-11-23 LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes Jaeyoung Chung et.al. 2311.13384 null
2023-11-22 Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models Mengyang Feng et.al. 2311.13141 link
2023-11-22 FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline Vladimir Arkhipkin et.al. 2311.13073 link
2023-11-21 GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Jiaxi Lv et.al. 2311.12631 null
2023-11-20 NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation Shachar Rosenman et.al. 2311.12229 link
2023-11-20 Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models Rohit Gandikota et.al. 2311.12092 link
2023-11-20 Advancing Urban Renewal: An Automated Approach to Generating Historical Arcade Facades with Stable Diffusion Models Zheyuan Kuang et.al. 2311.11590 null
2023-11-19 Data efficient protein backmapping with backbone-to-side chain transformers Shriram Chennakesavalu et.al. 2311.11459 link
2023-11-19 DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model Zhenghao Pan et.al. 2311.11417 link
2023-11-19 A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI Yuheng Fan et.al. 2311.11383 null
2023-11-19 MoVideo: Motion-Aware Video Generation with Diffusion Models Jingyun Liang et.al. 2311.11325 null
2023-11-19 AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort Wen Wang et.al. 2311.11243 link
2023-11-19 GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise Xinhai Li et.al. 2311.11221 null
2023-11-18 Mitigating Exposure Bias in Discriminator Guided Diffusion Models Eleftherios Tsonis et.al. 2311.11164 null
2023-11-18 User-Centric Interactive AI for Distributed Diffusion Model-based AI-Generated Content Hongyang Du et.al. 2311.11094 null
2023-11-18 Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models Xuefeng Gao et.al. 2311.11003 null
2023-11-17 Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning Rohit Girdhar et.al. 2311.10709 null
2023-11-17 SelfEval: Leveraging the discriminative nature of generative models for evaluation Sai Saketh Rambhatla et.al. 2311.10708 null
2023-11-17 Enhancing Object Coherence in Layout-to-Image Synthesis Yibin Wang et.al. 2311.10522 link
2023-11-17 End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks Aghiles Kebaili et.al. 2311.10472 null
2023-11-17 High-fidelity Person-centric Subject-to-Image Synthesis Yibin Wang et.al. 2311.10329 link
2023-11-16 K-space Cold Diffusion: Learning to Reconstruct Accelerated MRI without Noise Guoyao Shen et.al. 2311.10162 link
2023-11-16 The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Omri Avrahami et.al. 2311.10093 null
2023-11-16 MAM-E: Mammographic synthetic image generation with diffusion models Ricardo Montoya-del-Angel et.al. 2311.09822 link
2023-11-16 DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics Aniket Roy et.al. 2311.09753 null
2023-11-14 UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Yanwu Xu et.al. 2311.09257 link
2023-11-14 Finding AI-Generated Faces in the Wild Gonzalo J. Aniano Porcile et.al. 2311.08577 null
2023-11-14 Peer is Your Pillar: A Data-unbalanced Conditional GANs for Few-shot Image Generation Ziqiang Li et.al. 2311.08217 null
2023-11-14 Diffusion-based generation of Histopathological Whole Slide Images at a Gigapixel scale Robert Harb et.al. 2311.08199 null
2023-11-14 One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion Minghua Liu et.al. 2311.07885 null
2023-11-13 The Impact of Generative Artificial Intelligence Kaichen Zhang et.al. 2311.07071 null
2023-11-12 IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models Zhaoyuan Yang et.al. 2311.06792 link
2023-11-12 ChatAnything: Facetime Chat with LLM-Enhanced Personas Yilin Zhao et.al. 2311.06772 null
2023-11-12 BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis Tingfeng Cao et.al. 2311.06752 null
2023-11-12 How do Minimum-Norm Shallow Denoisers Look in Function Space? Chen Zeno et.al. 2311.06748 null
2023-11-11 Generative AI for Space-Air-Ground Integrated Networks (SAGIN) Ruichen Zhang et.al. 2311.06523 null
2023-11-10 A Survey of AI Text-to-Image and AI Text-to-Video Generators Aditi Singh et.al. 2311.06329 null
2023-11-09 LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Simian Luo et.al. 2311.05556 link
2023-11-09 L-WaveBlock: A Novel Feature Extractor Leveraging Wavelets for Generative Adversarial Networks Mirat Shah et.al. 2311.05548 null
2023-11-09 ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors Jingwen Chen et.al. 2311.05463 null
2023-11-09 ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image Senthil Purushwalkam et.al. 2311.05230 null
2023-11-08 Image-Based Virtual Try-On: A Survey Dan Song et.al. 2311.04811 link
2023-11-07 Energy-based Calibrated VAE with Test Time Free Lunch Yihong Luo et.al. 2311.04071 link
2023-11-07 MeVGAN: GAN-based Plugin Model for Video Generation with Applications in Colonoscopy Łukasz Struski et.al. 2311.03884 null
2023-11-07 SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation Iman Abbasnejad et.al. 2311.03866 null
2023-11-07 Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models Shengzhe Zhou et.al. 2311.03830 link
2023-11-07 CapST: An Enhanced and Lightweight Method for Deepfake Video Classification Wasim Ahmad et.al. 2311.03782 link
2023-11-07 LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators Allen Roush et.al. 2311.03716 null
2023-11-07 Image Generation and Learning Strategy for Deep Document Forgery Detection Yamato Okamoto et.al. 2311.03650 null
2023-11-06 SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis Hanrong Ye et.al. 2311.03355 null
2023-11-06 Cross-Image Attention for Zero-Shot Appearance Transfer Yuval Alaluf et.al. 2311.03335 null
2023-11-04 From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models Zhuoshi Pan et.al. 2311.02373 link
2023-11-04 Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting Hao Ai et.al. 2311.02343 link
2023-11-03 PRISM: Progressive Restoration for Scene Graph-based Image Manipulation Pavel Jahoda et.al. 2311.02247 null
2023-11-06 RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches Jiayuan Gu et.al. 2311.01977 null
2023-11-03 FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation Yuanxin Liu et.al. 2311.01813 link
2023-11-02 Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation Hadrien Reynaud et.al. 2311.01567 null
2023-11-02 VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning Hong Chen et.al. 2311.00990 null
2023-11-02 Optimal Noise pursuit for Augmenting Text-to-Video Generation Shijie Ma et.al. 2311.00949 null
2023-11-02 The Age of Generative AI and AI-Generated Everything Hongyang Du et.al. 2311.00947 null
2023-11-02 Gaussian Mixture Solvers for Diffusion Models Hanzhong Guo et.al. 2311.00941 link
2023-11-02 Towards High-quality HDR Deghosting with Conditional Diffusion Models Qingsen Yan et.al. 2311.00932 null
2023-11-01 LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Wei-Ge Chen et.al. 2311.00571 null
2023-11-01 fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding Xuelin Qian et.al. 2311.00342 null
2023-11-01 Flooding Regularization for Stable Training of Generative Adversarial Networks Iu Yahiro et.al. 2311.00318 null
2023-10-31 Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion David Marwood et.al. 2311.00056 null
2023-10-31 SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction Xinyuan Chen et.al. 2310.20700 null
2023-10-31 HWD: A Novel Evaluation Score for Styled Handwritten Text Generation Vittorio Pippi et.al. 2310.20316 link
2023-10-31 Machine learning refinement of in situ images acquired by low electron dose LC-TEM Hiroyasu Katsuno et.al. 2310.20279 null
2023-10-31 Beyond U: Making Diffusion Models Faster & Lighter Sergio Calvo-Ordonez et.al. 2310.20092 null
2023-10-30 ‘Person’ == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion Sourojit Ghosh et.al. 2310.19981 null
2023-10-30 CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models Ziyang Yuan et.al. 2310.19784 null
2023-10-30 Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities Zhengliang Liu et.al. 2310.19626 null
2023-10-30 VideoCrafter1: Open Diffusion Models for High-Quality Video Generation Haoxin Chen et.al. 2310.19512 link
2023-10-30 Few-shot Hybrid Domain Adaptation of Image Generators Hengjia Li et.al. 2310.19378 link
2023-10-30 On Measuring Fairness in Generative Models Christopher T. H. Teo et.al. 2310.19297 null
2023-10-29 FPGAN-Control: A Controllable Fingerprint Generator for Training with Synthetic Data Alon Shoshan et.al. 2310.19024 link
2023-10-30 Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation Jaemin Cho et.al. 2310.18235 null
2023-10-27 Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images Pai Chet Ng et.al. 2310.17911 link
2023-10-27 One Style is All you Need to Generate a Video Sandeep Manandhar et.al. 2310.17835 link
2023-10-26 DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation Yongxin Zhu et.al. 2310.17570 null
2023-10-26 AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors You-Ming Chang et.al. 2310.17419 link
2023-10-26 Exploring the Potential of Generative AI for the World Wide Web Nouar AlDahoul et.al. 2310.17370 null
2023-10-26 Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics Shuai Yang et.al. 2310.17316 link
2023-10-26 Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise Zhenkai Zhang et.al. 2310.17167 null
2023-10-25 Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs Jianwei Fei et.al. 2310.16919 null
2023-10-25 CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Aaron Gokaslan et.al. 2310.16825 link
2023-10-25 Interferometric Neural Networks Arun Sehrawat et.al. 2310.16742 link
2023-10-25 Local Statistics for Generative Image Detection Yung Jer Wong et.al. 2310.16684 null
2023-10-25 A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Eyal Segalis et.al. 2310.16656 null
2023-10-25 Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models Weijie Chen et.al. 2310.16573 null
2023-10-25 Learning Robust Deep Visual Representations from EEG Brain Recordings Prajwal Singh et.al. 2310.16532 link
2023-10-24 Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights Alokendu Mazumder et.al. 2310.16194 link
2023-10-24 Complex Image Generation SwinTransformer Network for Audio Denoising Youshan Zhang et.al. 2310.16109 link
2023-10-24 RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image Synthesis Anant Khandelwal et.al. 2310.16074 null
2023-10-24 CVPR 2023 Text Guided Video Editing Competition Jay Zhangjie Wu et.al. 2310.16003 link
2023-10-23 Fast Forward Modelling of Galaxy Spatial and Statistical Distributions Pascale Berner et.al. 2310.15223 null
2023-10-23 FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling Haonan Qiu et.al. 2310.15169 link
2023-10-23 DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design Kevin Lin et.al. 2310.15144 link
2023-10-23 Matryoshka Diffusion Models Jiatao Gu et.al. 2310.15111 link
2023-10-23 ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling Qiugang Zhan et.al. 2310.14839 link
2023-10-23 Large Language Models can Share Images, Too! Young-Jun Lee et.al. 2310.14804 link
2023-10-22 A Pytorch Reproduction of Masked Generative Image Transformer Victor Besnier et.al. 2310.14400 link
2023-10-21 Adversarial Image Generation by Spatial Transformation in Perceptual Colorspaces Ayberk Aydin et.al. 2310.13950 link
2023-10-20 Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models Shawn Shan et.al. 2310.13828 null
2023-10-20 Localizing and Editing Knowledge in Text-to-Image Generative Models Samyadeep Basu et.al. 2310.13730 null
2023-10-20 Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation Wenyu Guo et.al. 2310.13361 link
2023-10-20 DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics Kaiwen Zheng et.al. 2310.13268 link
2023-10-19 Conditional Generative Modeling for Images, 3D Animations, and Video Vikram Voleti et.al. 2310.13157 null
2023-10-19 Particle Guidance: non-I.I.D. Diverse Sampling with Diffusion Models Gabriele Corso et.al. 2310.13102 link
2023-10-19 Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning Amey Pasarkar et.al. 2310.12952 link
2023-10-19 STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models Belhal Karimi et.al. 2310.12667 null
2023-10-19 PrivacyGAN: robust generative image privacy Mariia Zameshina et.al. 2310.12590 null
2023-10-19 Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation Mariia Zameshina et.al. 2310.12583 null
2023-10-19 Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping Zijie Pan et.al. 2310.12474 link
2023-10-18 An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning Chen Jin et.al. 2310.12274 link
2023-10-18 Quality Diversity through Human Feedback Li Ding et.al. 2310.12103 link
2023-10-20 Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach Feng Luo et.al. 2310.12004 link
2023-10-17 GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment Dhruba Ghosh et.al. 2310.11513 link
2023-10-18 EvalCrafter: Benchmarking and Evaluating Large Video Generation Models Yaofang Liu et.al. 2310.11440 link
2023-10-17 Elucidating The Design Space of Classifier-Guided Diffusion Generation Jiajun Ma et.al. 2310.11311 link
2023-10-17 BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference Siqi Kou et.al. 2310.11142 link
2023-10-16 LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation Ruiqi Wu et.al. 2310.10769 link
2023-10-18 BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys Yu Gu et.al. 2310.10765 null
2023-10-16 A Survey on Video Diffusion Models Zhen Xing et.al. 2310.10647 link
2023-10-16 LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts Hanan Gani et.al. 2310.10640 link
2023-10-16 ViPE: Visualise Pretty-much Everything Hassan Shahmohammadi et.al. 2310.10543 link
2023-10-16 ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion Jiayu Yang et.al. 2310.10343 link
2023-10-16 Scene Graph Conditioning in Latent Diffusion Frank Fundel et.al. 2310.10338 link
2023-10-16 Evading Detection Actively: Toward Anti-Forensics against Forgery Localization Long Zhuo et.al. 2310.10036 null
2023-10-15 Segment Anything Model for Pedestrian Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data Jiahao Xia et.al. 2310.09918 null
2023-10-14 Unified High-binding Watermark for Unconditional Image Generation Models Ruinan Ma et.al. 2310.09479 null
2023-10-13 Making Multimodal Generation Easier: When Diffusion Models Meet LLMs Xiangyu Zhao et.al. 2310.08949 link
2023-10-13 R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation Jiayu Xiao et.al. 2310.08872 null
2023-10-12 SSG2: A new modelling paradigm for semantic segmentation Foivos I. Diakogiannis et.al. 2310.08671 link
2023-10-12 HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion Xian Liu et.al. 2310.08579 null
2023-10-12 MotionDirector: Motion Customization of Text-to-Video Diffusion Models Rui Zhao et.al. 2310.08465 link
2023-10-12 Neural Diffusion Models Grigory Bartosh et.al. 2310.08337 null
2023-10-12 Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting Zijie Chen et.al. 2310.08129 link
2023-10-12 SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing Zijie Wu et.al. 2310.08094 null
2023-10-12 CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity Abdullah Hayajneh et.al. 2310.07969 link
2023-10-13 Generative Modeling with Phase Stochastic Bridges Tianrong Chen et.al. 2310.07805 link
2023-10-11 DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model Xiaofan Li et.al. 2310.07771 link
2023-10-11 ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models Yingqing He et.al. 2310.07702 link
2023-10-11 ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation Bo Peng et.al. 2310.07697 link
2023-10-11 Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models Lai Zeqiang et.al. 2310.07653 link
2023-10-11 Distance-based Weighted Transformer Network for Image Completion Pourya Shamsolmoali et.al. 2310.07440 null
2023-10-11 Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else Hazarapet Tunanyan et.al. 2310.07419 null
2023-10-11 Crowd Counting in Harsh Weather using Image Denoising with Pix2Pix GANs Muhammad Asif Khan et.al. 2310.07245 null
2023-10-11 Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model Shiyuan Yang et.al. 2310.07222 link
2023-10-11 Echocardiography video synthesis from end diastolic semantic map via diffusion model Phi Nguyen Van et.al. 2310.07131 null
2023-10-10 Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images Che Liu et.al. 2310.07027 link
2023-10-10 ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning Alec Helbling et.al. 2310.06968 null
2023-10-10 Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling Huangjie Zheng et.al. 2310.06389 link
2023-10-10 JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling Jingyang Zhang et.al. 2310.06347 null
2023-10-10 Improving Compositional Text-to-image Generation with Large Vision-Language Models Song Wen et.al. 2310.06311 null
2023-10-09 Latent Diffusion Model for DNA Sequence Generation Zehui Li et.al. 2310.06150 link
2023-10-09 A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models Sebastian G. Gruber et.al. 2310.05833 link
2023-10-09 Language Model Beats Diffusion – Tokenizer is Key to Visual Generation Lijun Yu et.al. 2310.05737 link
2023-10-09 Locality-Aware Generalizable Implicit Neural Representation} Doyup Lee et.al. 2310.05624 null
2023-10-09 Adaptive Multi-head Contrastive Learning Lei Wang et.al. 2310.05615 link
2023-10-09 A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers Matteo Bastico et.al. 2310.05572 link
2023-10-09 Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers Shiyue Cao et.al. 2310.05400 null
2023-10-08 The Emergence of Reproducibility and Consistency in Diffusion Models Huijie Zhang et.al. 2310.05264 null
2023-10-07 Generative AI May Prefer to Present National-level Characteristics of Cities Based on Stereotypical Geographic Impressions at the Continental Level Shan Ye et.al. 2310.04897 null
2023-10-07 Understanding and Improving Adversarial Attacks on Latent Diffusion Model Boyang Zheng et.al. 2310.04687 link
2023-10-07 X-Transfer: A Transfer Learning-Based Framework for Robust GAN-Generated Fake Image Detection Lei Zhang et.al. 2310.04639 null
2023-10-06 Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference Simian Luo et.al. 2310.04378 link
2023-10-06 Assessing Robustness via Score-Based Adversarial Image Generation Marcel Kollovieh et.al. 2310.04285 null
2023-10-05 Aligning Text-to-Image Diffusion Models with Reward Backpropagation Mihir Prabhudesai et.al. 2310.03739 link
2023-10-05 Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency Tianhong Li et.al. 2310.03734 null
2023-10-06 MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images Yanwu Xu et.al. 2310.03559 link
2023-10-05 Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Anton Razzhigaev et.al. 2310.03502 link
2023-10-04 Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel Paul Hagemann et.al. 2310.03054 link
2023-10-04 Kosmos-G: Generating Images in Context with Multimodal Large Language Models Xichen Pan et.al. 2310.02992 link
2023-10-04 GETAvatar: Generative Textured Meshes for Animatable Human Avatars Xuanmeng Zhang et.al. 2310.02714 null
2023-10-04 ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF Jangho Park et.al. 2310.02712 null
2023-10-03 GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature Aaron Ferber et.al. 2310.02442 null
2023-10-03 FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models Yingqian Cui et.al. 2310.02401 null
2023-10-03 MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens Kaizhi Zheng et.al. 2310.02239 link
2023-10-03 Optimizing microlens arrays for incoherent HiLo microscopy Ziao Jiao et.al. 2310.01939 null
2023-10-03 Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation Jun Li et.al. 2310.01819 null
2023-10-02 ImagenHub: Standardizing the evaluation of conditional image generation models Max Ku et.al. 2310.01596 link
2023-10-02 Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Xuan Ju et.al. 2310.01506 link
2023-10-02 Conditional Diffusion Distillation Kangfu Mei et.al. 2310.01407 link
2023-10-02 Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models Takumi Harada et.al. 2310.00936 null
2023-10-02 Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP Zixiang Chen et.al. 2310.00927 null
2023-10-02 RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches Shawn Mathew et.al. 2310.00868 link
2023-10-01 Completing Visual Objects via Bridging Generation and Segmentation Xiang Li et.al. 2310.00808 null
2023-10-02 LLM-grounded Video Diffusion Models Long Lian et.al. 2309.17444 null
2023-09-29 Directly Fine-Tuning Diffusion Models on Differentiable Rewards Kevin Clark et.al. 2309.17400 null
2023-09-29 Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning Zihan Ding et.al. 2309.16984 link
2023-09-29 Leveraging Optimization for Adaptive Attacks on Image Watermarks Nils Lukas et.al. 2309.16952 link
2023-09-29 Denoising Diffusion Bridge Models Linqi Zhou et.al. 2309.16948 link
2023-09-28 CCEdit: Creative and Controllable Video Editing via Diffusion Models Ruoyu Feng et.al. 2309.16496 null
2023-09-28 Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation Guy Yariv et.al. 2309.16429 link
2023-09-28 Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning Albert Mohwald et.al. 2309.16351 link
2023-09-28 OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions Jin Liu et.al. 2309.16148 null
2023-09-27 Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness Valentin Barriere et.al. 2309.15991 null
2023-09-27 Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation David Junhao Zhang et.al. 2309.15818 link
2023-09-27 Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Xiaoliang Dai et.al. 2309.15807 null
2023-09-27 Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation Xin Yuan et.al. 2309.15726 null
2023-09-27 Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Kai Wang et.al. 2309.15664 link
2023-09-27 Position and Orientation-Aware One-Shot Learning for Medical Action Recognition from Signal Data Leiyu Xie et.al. 2309.15635 null
2023-09-28 Jointly Training Large Autoregressive Multimodal Models Emanuele Aiello et.al. 2309.15564 null
2023-09-27 Teaching Text-to-Image Models to Communicate Xiaowen Sun et.al. 2309.15516 null
2023-09-27 DreamCom: Finetuning Text-guided Inpainting Model for Image Composition Lingxiao Lu et.al. 2309.15508 null
2023-09-27 Finite Scalar Quantization: VQ-VAE Made Simple Fabian Mentzer et.al. 2309.15505 link
2023-09-27 LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Yaohui Wang et.al. 2309.15103 link
2023-09-26 Seimei KOOLS-IFU mapping of the gas and dust distributions in Galactic PNe: Unveiling the origin and evolution of Galactic halo PN H4-1 Masaaki Otsuka et.al. 2309.15099 null
2023-09-26 VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Han Lin et.al. 2309.15091 null
2023-09-26 Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation Shin-Ying Yeh et.al. 2309.14859 link
2023-09-26 On quantifying and improving realism of images generated with diffusion Yunzhuo Chen et.al. 2309.14756 null
2023-09-27 Text-to-Image Generation for Abstract Concepts Jiayi Liao et.al. 2309.14623 null
2023-09-25 Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator Hanzhuo Huang et.al. 2309.14494 link
2023-09-25 Chop & Learn: Recognizing and Generating Object-State Compositions Nirat Saini et.al. 2309.14339 null
2023-09-27 Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation Quang Nguyen et.al. 2309.14303 link
2023-09-25 Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments Najmeh Mohammadbagheri et.al. 2309.14267 null
2023-09-25 SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution Zhongjie Ba et.al. 2309.14122 link
2023-09-25 Diverse Semantic Image Editing with Style Codes Hakan Sivuk et.al. 2309.13975 link
2023-09-23 GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER Mingzhen Sun et.al. 2309.13274 link
2023-09-23 Randomize to Generalize: Domain Randomization for Runway FOD Detection Javaria Farooq et.al. 2309.13264 null
2023-09-23 NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation Rui Yu et.al. 2309.13240 null
2023-09-21 POLAR3D: Augmenting NASA’s POLAR Dataset for Data-Driven Lunar Perception and Rover Simulation Bo-Hsun Chen et.al. 2309.12397 link
2023-09-21 TextCLIP: Text-Guided Face Image Generation And Manipulation Without Adversarial Training Xiaozhou You et.al. 2309.11923 null
2023-09-21 PIE: Simulating Disease Progression via Progressive Image Editing Kaizhao Liang et.al. 2309.11745 link
2023-09-24 Latent Diffusion Models for Structural Component Design Ethan Herron et.al. 2309.11601 null
2023-09-20 Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge Manuel Brack et.al. 2309.11575 null
2023-09-20 FreeU: Free Lunch in Diffusion U-Net Chenyang Si et.al. 2309.11497 link
2023-09-20 Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation Hyelin Nam et.al. 2309.11127 null
2023-09-21 Learning End-to-End Channel Coding with Diffusion Models Muah Kim et.al. 2309.10505 null
2023-09-23 AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration Lijiang Li et.al. 2309.10438 link
2023-09-19 Language Guided Adversarial Purification Himanshu Singh et.al. 2309.10348 link
2023-09-18 Multimodal Foundation Models: From Specialists to General-Purpose Assistants Chunyuan Li et.al. 2309.10020 link
2023-09-18 DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving Xiaofeng Wang et.al. 2309.09777 null
2023-09-18 Gradpaint: Gradient-Guided Inpainting with Diffusion Models Asya Grechka et.al. 2309.09614 null
2023-09-18 DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues Kun Pan et.al. 2309.09526 link
2023-09-18 Progressive Text-to-Image Diffusion with Soft Latent Direction YuTeng Ye et.al. 2309.09466 link
2023-09-15 Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models Feihong He et.al. 2309.08251 null
2023-09-15 Talkin’ ‘Bout AI Generation: Copyright and the Generative-AI Supply Chain Katherine Lee et.al. 2309.08133 null
2023-09-15 Increasing diversity of omni-directional images generated from single image using cGAN based on MLPMixer Atsuya Nakata et.al. 2309.08129 link
2023-09-14 Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset Iya Chivileva et.al. 2309.08009 null
2023-09-14 Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models James Burgess et.al. 2309.07986 link
2023-09-14 ALWOD: Active Learning for Weakly-Supervised Object Detection Yuting Wang et.al. 2309.07914 link
2023-09-13 Unbiased Face Synthesis With Diffusion Models: Are We There Yet? Harrison Rosenberg et.al. 2309.07277 link
2023-09-13 MagiCapture: High-Resolution Multi-Concept Portrait Customization Junha Hyung et.al. 2309.06895 null
2023-09-12 InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Xingchao Liu et.al. 2309.06380 link
2023-09-12 Elucidating the solution space of extended reverse-time SDE for diffusion models Qinpeng Cui et.al. 2309.06169 link
2023-09-12 Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation Ling Huang et.al. 2309.05919 link
2023-09-11 Divergences in Color Perception between Deep Neural Networks and Humans Ethan O. Nadler et.al. 2309.05809 link
2023-09-11 PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models Li Chen et.al. 2309.05793 null
2023-09-11 ITI-GEN: Inclusive Text-to-Image Generation Cheng Zhang et.al. 2309.05569 link
2023-09-11 PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud Chengyu Wang et.al. 2309.05534 null
2023-09-11 Comprehensive analysis of synthetic learning applied to neonatal brain MRI segmentation R Valabregue et.al. 2309.05306 link
2023-09-10 Gender Bias in Multimodal Models: A Transnational Feminist Approach Considering Geographical Region and Culture Abhishek Mandal et.al. 2309.04997 null
2023-09-09 Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video Xiuzhe Wu et.al. 2309.04814 link
2023-09-08 The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion Yujin Jeong et.al. 2309.04509 null
2023-09-08 Create Your World: Lifelong Text-to-Image Diffusion Gan Sun et.al. 2309.04430 null
2023-09-08 MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers Sijia Li et.al. 2309.04372 null
2023-09-08 Sequential Semantic Generative Communication for Progressive Text-to-Image Generation Hyelin Nam et.al. 2309.04287 null
2023-09-08 Robot Localization and Mapping Final Report – Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry Akankshya Kar et.al. 2309.04147 null
2023-09-08 From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models Changming Xiao et.al. 2309.04109 link
2023-09-07 Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis Jiapeng Zhu et.al. 2309.03904 link
2023-09-07 T2IW: Joint Text to Image & Watermark Generation An-An Liu et.al. 2309.03815 null
2023-09-07 Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model Sungwon Hwang et.al. 2309.03550 null
2023-09-07 Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation Jiaxi Gu et.al. 2309.03549 null
2023-09-07 Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation Zhuqiang Lu et.al. 2309.03467 link
2023-09-06 My Art My Choice: Adversarial Protection Against Unruly AI Anthony Rhodes et.al. 2309.03198 null
2023-09-06 Hierarchical-level rain image generative model based on GAN Zhenyuan Liu et.al. 2309.02964 null
2023-09-06 BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Takashi Shibuya et.al. 2309.02836 link
2023-09-06 Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter Jinglong Wang et.al. 2309.02773 link
2023-09-05 Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Lili Yu et.al. 2309.02591 null
2023-09-05 Diffusion on the Probability Simplex Griffin Floto et.al. 2309.02530 null
2023-09-05 Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface Atieh Taheri et.al. 2309.02402 null
2023-09-05 Exchanging-based Multimodal Fusion with Transformer Renyu Zhu et.al. 2309.02190 link
2023-09-04 Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images Jamiu Idowu et.al. 2309.01850 null
2023-09-04 StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation Zhouxia Wang et.al. 2309.01770 null
2023-09-04 Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion Ryota Yoshihashi et.al. 2309.01369 null
2023-09-04 Mutual Information Maximizing Quantum Generative Adversarial Network and Its Applications in Finance Mingyu Lee et.al. 2309.01363 null
2023-09-03 Diffusion Models with Deterministic Normalizing Flow Priors Mohsen Zand et.al. 2309.01274 link
2023-09-03 Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection Weijie Wang et.al. 2309.01104 null
2023-09-02 Constrained CycleGAN for Effective Generation of Ultrasound Sector Images of Improved Spatial Resolution Xiaofei Sun et.al. 2309.00995 link
2023-09-02 Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities Shanyuan Liu et.al. 2309.00952 null
2023-09-01 VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation Xin Li et.al. 2309.00398 null
2023-09-01 DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using Stable Diffusion Models Michael Shenoda et.al. 2309.00248 link
2023-09-01 Diffusion Model with Clustering-based Conditioning for Food Image Generation Yue Han et.al. 2309.00199 null
2023-08-31 StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation Yuhan Wang et.al. 2308.16909 link
2023-08-31 Diffusion Models for Interferometric Satellite Aperture Radar Alexandre Tuel et.al. 2308.16847 link
2023-08-31 Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images Cuican Yu et.al. 2308.16758 null
2023-08-31 Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps Miguel Espinosa et.al. 2308.16648 link
2023-08-31 Detecting Out-of-Context Image-Caption Pairs in News: A Counter-Intuitive Method Eivind Moholdt et.al. 2308.16611 link
2023-08-30 Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation Mengping Yang et.al. 2308.16110 link
2023-08-30 Semantic Image Synthesis via Class-Adaptive Cross-Attention Tomaso Fontanini et.al. 2308.16071 null
2023-08-30 Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models Takami Sato et.al. 2308.15692 null
2023-08-29 Learning Modulated Transformation in GANs Ceyuan Yang et.al. 2308.15472 link
2023-08-29 IndGIC: Supervised Action Recognition under Low Illumination Jingbo Zeng et.al. 2308.15345 null
2023-08-29 A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information Shuxiao Ma et.al. 2308.15142 null
2023-08-28 Automated Conversion of Music Videos into Lyric Videos Jiaju Ma et.al. 2308.14922 null
2023-08-28 RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-centric Learning Nathan Drenkow et.al. 2308.14899 null
2023-08-28 Identifying and Mitigating the Security Risks of Generative AI Clark Barrett et.al. 2308.14840 null
2023-08-28 MagicAvatar: Multimodal Avatar Generation and Animation Jianfeng Zhang et.al. 2308.14748 null
2023-08-28 Causality-Based Feature Importance Quantifying Methods:PN-FI, PS-FI and PNS-FI Shuxian Du et.al. 2308.14474 null
2023-08-28 Semi-Supervised Semantic Depth Estimation using Symbiotic Transformer and NearFarMix Augmentation Md Awsafur Rahman et.al. 2308.14400 null
2023-08-28 FaceChain: A Playground for Identity-Preserving Portrait Generation Yang Liu et.al. 2308.14256 link
2023-08-28 HoloFusion: Towards Photo-realistic 3D Generative Modeling Animesh Karnewar et.al. 2308.14244 null
2023-08-27 A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy Forough Fazeli-Asl et.al. 2308.14048 null
2023-08-26 Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models Hao Fei et.al. 2308.13812 null
2023-08-26 ORES: Open-vocabulary Responsible Visual Synthesis Minheng Ni et.al. 2308.13785 link
2023-08-25 Residual Denoising Diffusion Models Jiawei Liu et.al. 2308.13712 link
2023-08-25 Is Deep Learning Network Necessary for Image Generation? Chenqiu Zhao et.al. 2308.13612 null
2023-08-25 WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI Hai Dang et.al. 2308.13355 null
2023-08-25 Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model Xunpeng Yi et.al. 2308.13164 null
2023-08-25 A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions Tianyi Zhang et.al. 2308.13142 null
2023-08-24 Dense Text-to-Image Generation with Attention Modulation Yunji Kim et.al. 2308.12964 link
2023-08-24 APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency Yupu Yao et.al. 2308.12605 null
2023-08-23 Augmenting medical image classifiers with synthetic data from latent diffusion models Luke W. Sagers et.al. 2308.12453 null
2023-08-23 DISGAN: Wavelet-informed Discriminator Guides GAN to MRI Super-resolution with Noise Cleaning Qi Wang et.al. 2308.12084 link
2023-08-23 Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages Jinyi Hu et.al. 2308.12038 link
2023-08-23 Efficient Transfer Learning in Diffusion Models via Adversarial Noise Xiyu Wang et.al. 2308.11948 null
2023-08-23 LFS-GAN: Lifelong Few-Shot Image Generation Juwon Seo et.al. 2308.11917 link
2023-08-23 CoC-GAN: Employing Context Cluster for Unveiling a New Pathway in Image Generation Zihao Wang et.al. 2308.11857 null
2023-08-22 Ceci n’est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings Eugene Bagdasaryan et.al. 2308.11804 link
2023-08-22 StoryBench: A Multifaceted Benchmark for Continuous Story Visualization Emanuele Bugliarello et.al. 2308.11606 link
2023-08-22 Open Set Synthetic Image Source Attribution Shengbang Fang et.al. 2308.11557 null
2023-08-22 Hamiltonian GAN Christine Allen-Blanchette et.al. 2308.11216 null
2023-08-22 MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers Daniel Silver et.al. 2308.11096 null
2023-08-21 Debiasing Counterfactuals In the Presence of Spurious Correlations Amar Kumar et.al. 2308.10984 null
2023-08-21 Sampling From Autoencoders’ Latent Space via Quantization And Probability Mass Function Concepts Aymene Mohammed Bouayed et.al. 2308.10704 null
2023-08-20 Turning Waste into Wealth: Leveraging Low-Quality Samples for Enhancing Continuous Conditional Generative Adversarial Networks Xin Ding et.al. 2308.10273 link
2023-08-20 StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data Yanda Li et.al. 2308.10253 link
2023-08-20 Spiking-Diffusion: Vector Quantized Discrete Diffusion Model with Spiking Neural Networks Mingxuan Liu et.al. 2308.10187 link
2023-08-20 SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation Chengyou Jia et.al. 2308.10156 null
2023-08-19 ASPIRE: Language-Guided Augmentation for Robust Image Classification Sreyan Ghosh et.al. 2308.10103 link
2023-08-19 ControlCom: Controllable Image Composition using Diffusion Model Bo Zhang et.al. 2308.10040 link
2023-08-19 ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval Kaihang Pan et.al. 2308.10025 null
2023-08-19 DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization Xiaoyu Ye et.al. 2308.09889 null
2023-08-18 Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis Shijie Li et.al. 2308.09835 link
2023-08-18 SimDA: Simple Diffusion Adapter for Efficient Video Generation Zhen Xing et.al. 2308.09710 null
2023-08-18 Guide3D: Create 3D Avatars from Text and Image Guidance Yukang Cao et.al. 2308.09705 null
2023-08-18 DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability Runhui Huang et.al. 2308.09306 null
2023-08-18 RFDforFin: Robust Deep Forgery Detection for GAN-generated Fingerprint Images Hui Miao et.al. 2308.09285 null
2023-08-17 Watch Your Steps: Local Image and Scene Editing by Text Instructions Ashkan Mirzaei et.al. 2308.08947 null
2023-08-16 Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment Qi Chen et.al. 2308.08525 link
2023-08-16 Painter: Teaching Auto-regressive Language Models to Draw Sketches Reza Pourreza et.al. 2308.08520 null
2023-08-16 Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising Diffusion Model Ran Jiang et.al. 2308.08367 null
2023-08-16 Denoising Diffusion Probabilistic Model for Retinal Image Generation and Segmentation Alnur Alimanov et.al. 2308.08339 link
2023-08-18 Dual-Stream Diffusion Net for Text-to-Video Generation Binhui Liu et.al. 2308.08316 null
2023-08-16 Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis Minho Park et.al. 2308.08157 link
2023-08-16 DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory Shengming Yin et.al. 2308.08089 null
2023-08-15 Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training Ximing Xing et.al. 2308.07665 link
2023-08-15 Story Visualization by Online Text Augmentation with Context Memory Daechul Ahn et.al. 2308.07575 link
2023-08-13 Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks David Junhao Zhang et.al. 2308.06739 null
2023-08-13 IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models Hu Ye et.al. 2308.06721 null
2023-08-13 LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts Binbin Yang et.al. 2308.06713 null
2023-08-12 Semantic Communications with Explicit Semantic Base for Image Transmission Yuan Zheng et.al. 2308.06599 null
2023-08-11 White-box Membership Inference Attacks against Diffusion Models Yan Pang et.al. 2308.06405 null
2023-08-15 DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity Melissa Hall et.al. 2308.06198 link
2023-08-11 Improving Joint Speech-Text Representations Without Alignment Cal Peyser et.al. 2308.06125 null
2023-08-11 Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation Yuki Endo et.al. 2308.06027 link
2023-08-10 SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network Chenwei Wang et.al. 2308.05489 null
2023-08-10 Beyond Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization Hongyang Du et.al. 2308.05384 link
2023-08-09 PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions John Joon Young Chung et.al. 2308.05184 link
2023-08-12 LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation Leigang Qu et.al. 2308.05095 null
2023-08-13 TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design Yifan Gao et.al. 2308.04733 null
2023-08-09 GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization Hao Fang et.al. 2308.04699 link
2023-08-08 DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images Xuechao Zou et.al. 2308.04417 link
2023-08-08 The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings Timothy Merino et.al. 2308.04052 link
2023-08-05 DermoSegDiff: A Boundary-aware Segmentation Diffusion Model for Skin Lesion Delineation Afshin Bozorgpour et.al. 2308.02959 link
2023-08-05 Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation Zijie Wu et.al. 2308.02874 null
2023-08-03 ConceptLab: Creative Generation using Diffusion Prior Constraints Elad Richardson et.al. 2308.02669 link
2023-08-04 Towards Personalized Prompt-Model Retrieval for Generative Recommendation Yuanhe Guo et.al. 2308.02205 link
2023-08-04 SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation Shikun Sun et.al. 2308.02154 null
2023-08-03 Focus on Content not Noise: Improving Image Generation for Nuclei Segmentation by Suppressing Steganography in CycleGAN Jonas Utz et.al. 2308.01769 null
2023-08-07 BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout Kairui Yang et.al. 2308.01661 null
2023-08-03 Interleaving GANs with knowledge graphs to support design creativity for book covers Alexandru Motogna et.al. 2308.01626 link
2023-08-03 Circumventing Concept Erasure Methods For Text-to-Image Generative Models Minh Pham et.al. 2308.01508 link
2023-08-02 Reverse Stable Diffusion: What prompt was used to generate this image? Florinel-Alin Croitoru et.al. 2308.01472 link
2023-08-02 Revisiting DETR Pre-training for Object Detection Yan Ma et.al. 2308.01300 null
2023-08-02 Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation Guojin Zhong et.al. 2308.01147 link
2023-08-01 The Bias Amplification Paradox in Text-to-Image Generation Preethi Seshadri et.al. 2308.00755 link
2023-08-01 Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models Cheng-Yu Hsieh et.al. 2308.00675 null
2023-08-01 Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design Jaechang Ko et.al. 2308.00227 null
2023-08-01 SkullGAN: Synthetic Skull CT Generation with Generative Adversarial Networks Kasra Naftchi-Ardebili et.al. 2308.00206 link
2023-07-28 Testing the Depth of ChatGPT’s Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5’s Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking David Bayani et.al. 2307.16806 null
2023-07-31 DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation Runyang Feng et.al. 2307.16687 null
2023-07-31 Towards General Visual-Linguistic Face Forgery Detection Ke Sun et.al. 2307.16545 null
2023-07-31 BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models Jordan Vice et.al. 2307.16489 link
2023-07-31 HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution Minyi Zhao et.al. 2307.16410 null
2023-07-31 MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text Junchen Zhu et.al. 2307.16371 null
2023-07-30 Mask-guided Data Augmentation for Multiparametric MRI Generation with a Rare Hepatocellular Carcinoma Karen Sanchez et.al. 2307.16314 null
2023-07-30 Stylized Projected GAN: A Novel Architecture for Fast and Realistic Image Generation Md Nurul Muttakin et.al. 2307.16275 null
2023-07-29 HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation Zuyan Liu et.al. 2307.16061 null
2023-07-28 Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search Alexander Chebykin et.al. 2307.15621 link
2023-07-28 RAWIW: RAW Image Watermarking Robust to ISP Pipeline Kang Fu et.al. 2307.15443 null
2023-07-28 Staging E-Commerce Products for Online Advertising using Retrieval Assisted Image Generation Yueh-Ning Ku et.al. 2307.15326 null
2023-07-27 Semantic Image Completion and Enhancement using GANs Priyansh Saxena et.al. 2307.14748 null
2023-07-31 Pre-training Vision Transformers with Very Limited Synthesized Images Ryo Nakamura et.al. 2307.14710 link
2023-07-27 LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement Tao Wang et.al. 2307.14659 link
2023-07-27 EqGAN: Feature Equalization Fusion for Few-shot Image Generation Yingbo Zhou et.al. 2307.14638 null
2023-07-26 Deepfake Image Generation for Improved Brain Tumor Segmentation Roa’a Al-Emaryeen et.al. 2307.14273 null
2023-07-26 Learning Disentangled Discrete Representations David Friede et.al. 2307.14151 link
2023-07-26 VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet Zhihao Hu et.al. 2307.14073 null
2023-07-25 **Composite Diffusion whole >= Σparts** Vikram Jamwal et.al. 2307.13720
2023-07-25 Fake It Without Making It: Conditioned Face Generation for Accurate 3D Face Shape Estimation Will Rowan et.al. 2307.13639 null
2023-07-25 XDLM: Cross-lingual Diffusion Language Model for Machine Translation Linyao Chen et.al. 2307.13560 null
2023-07-25 Not with my name! Inferring artists’ names of input strings employed by Diffusion Models Roberto Leotta et.al. 2307.13527 link
2023-07-24 A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models Jindong Gu et.al. 2307.12980 link
2023-07-24 Interpolating between Images with Diffusion Models Clinton J. Wang et.al. 2307.12560 null
2023-07-22 Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis Hao Tang et.al. 2307.12084 link
2023-07-21 PartDiff: Image Super-resolution with Partial Diffusion Models Kai Zhao et.al. 2307.11926 null
2023-07-21 UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN Zhaojie Fang et.al. 2307.11530 link
2023-07-21 Attention Consistency Refined Masked Frequency Forgery Representation for Generalizing Face Forgery Detection Decheng Liu et.al. 2307.11438 link
2023-07-21 Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning Jian Ma et.al. 2307.11410 link
2023-07-20 Diffusion Sampling with Momentum for Mitigating Divergence Artifacts Suttisak Wizadwongsa et.al. 2307.11118 link
2023-07-20 Progressive distillation diffusion for raw music generation Svetlana Pavlova et.al. 2307.10994 null
2023-07-20 Divide & Bind Your Attention for Improved Generative Semantic Nursing Yumeng Li et.al. 2307.10864 link
2023-07-20 Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation Fa-Ting Hong et.al. 2307.09906 link
2023-07-19 Compressive Image Scanning Microscope Ajay Gunalan et.al. 2307.09841 link
2023-07-19 A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images Lydia Abady et.al. 2307.09822 link
2023-07-19 Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline Zhigang Chang et.al. 2307.09821 null
2023-07-19 Text2Layer: Layered Image Generation using Latent Diffusion Model Xinyang Zhang et.al. 2307.09781 null
2023-07-18 AnyDoor: Zero-shot Object-level Image Customization Xi Chen et.al. 2307.09481 link
2023-07-19 Let’s ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation Federico Betti et.al. 2307.09416 null
2023-07-18 Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow Dogucan Yaman et.al. 2307.09368 null
2023-07-18 Augmenting CLIP with Improved Visio-Linguistic Reasoning Samyadeep Basu et.al. 2307.09233 null
2023-07-18 Jean-Luc Picard at Touché 2023: Comparing Image Generation, Stance Detection and Feature Matching for Image Retrieval for Arguments Max Moebius et.al. 2307.09172 null
2023-07-18 Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond Yang Zhao et.al. 2307.08996 null
2023-07-18 PromptCrafter: Crafting Text-to-Image Prompt through Mixed-Initiative Dialogue with LLM Seungho Baek et.al. 2307.08985 null
2023-07-17 Harnessing the Power of AI based Image Generation Model DALLE 2 in Agricultural Settings Ranjan Sapkota et.al. 2307.08789 null
2023-07-17 Diffusion Models Beat GANs on Image Classification Soumik Mukhopadhyay et.al. 2307.08702 null
2023-07-17 Flow Matching in Latent Space Quan Dao et.al. 2307.08698 link
2023-07-17 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Tri Dao et.al. 2307.08691 link
2023-07-17 Image Captions are Natural Prompts for Text-to-Image Models Shiye Lei et.al. 2307.08526 null
2023-07-17 Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data Kai Katsumata et.al. 2307.08319 null
2023-07-17 Manifold-Guided Sampling in Diffusion Models for Unbiased Image Generation Xingzhe Su et.al. 2307.08199 null
2023-07-16 Planting a SEED of Vision in Large Language Model Yuying Ge et.al. 2307.08041 link
2023-07-15 Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning? Jialu Gao et.al. 2307.07837 null
2023-07-18 Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer Wing-Yin Yu et.al. 2307.07754 link
2023-07-14 GenAssist: Making Image Generation Accessible Mina Huh et.al. 2307.07589 null
2023-07-14 Generative adversarial networks for data-scarce spectral applications Juan José García-Esteban et.al. 2307.07454 null
2023-07-13 InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Yi Wang et.al. 2307.06942 link
2023-07-13 Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation Yingqing He et.al. 2307.06940 link
2023-07-13 Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models Moab Arar et.al. 2307.06925 null
2023-07-13 Improving Nonalcoholic Fatty Liver Disease Classification Performance With Latent Diffusion Models Romain Hardy et.al. 2307.06507 null
2023-07-12 T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation Kaiyi Huang et.al. 2307.06350 link
2023-07-12 Facial Reenactment Through a Personalized Generator Ariel Elazary et.al. 2307.06307 null
2023-07-12 CellGAN: Conditional Cervical Cell Synthesis for Augmenting Cytopathological Image Classification Zhenrong Shen et.al. 2307.06182 link
2023-07-12 Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models Sanghyun Kim et.al. 2307.05977 link
2023-07-12 DiffuseGAE: Controllable and High-fidelity Image Manipulation from Disentangled Representation Yipeng Leng et.al. 2307.05899 null
2023-07-12 Precise Image Generation on Current Noisy Quantum Computing Devices Florian Rehm et.al. 2307.05253 null
2023-07-11 Generative Pretraining in Multimodality Quan Sun et.al. 2307.05222 link
2023-07-11 TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation Paul Grimal et.al. 2307.05134 link
2023-07-11 SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation Zhengxin Lei et.al. 2307.05087 null
2023-07-11 Diffusion idea exploration for art generation Nikhil Verma et.al. 2307.04978 null
2023-07-10 Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback Jaskirat Singh et.al. 2307.04749 null
2023-07-11 DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer Dan Ruta et.al. 2307.04157 null
2023-07-09 Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance Paul Kuo-Ming Huang et.al. 2307.04081 null
2023-07-08 Measuring the Success of Diffusion Models at Imitating Human Artists Stephen Casper et.al. 2307.04028 null
2023-07-08 HUMS2023 Data Challenge Result Submission Dhiraj Neupane et.al. 2307.03871 null
2023-07-07 Synthesizing Forestry Images Conditioned on Plant Phenotype Using a Generative Adversarial Network Debasmita Pal et.al. 2307.03789 link
2023-07-07 RGB-D Mapping and Tracking in a Plenoxel Radiance Field Andreas L. Teigen et.al. 2307.03404 link
2023-07-06 VideoGLUE: Video General Understanding Evaluation of Foundation Models Liangzhe Yuan et.al. 2307.03166 link
2023-07-06 On the Cultural Gap in Text-to-Image Generation Bingshuai Liu et.al. 2307.02971 null
2023-07-06 Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback TaeHo Yoon et.al. 2307.02770 link
2023-07-05 Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation Sébastien Lachapelle et.al. 2307.02598 link
2023-07-05 Diffusion Models for Computational Design at the Example of Floor Plans Joern Ploennigs et.al. 2307.02511 link
2023-07-05 Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality Peter Lorenz et.al. 2307.02347 link
2023-07-05 On the Adversarial Robustness of Generative Autoencoders in the Latent Space Mingfei Lu et.al. 2307.02202 null
2023-07-05 Prompting Diffusion Representations for Cross-Domain Semantic Segmentation Rui Gong et.al. 2307.02138 null
2023-07-04 SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Dustin Podell et.al. 2307.01952 link
2023-07-04 A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization Kshama Kodthalu Shivashankara et.al. 2307.01946 link
2023-07-04 Text + Sketch: Image Compression at Ultra Low Rates Eric Lei et.al. 2307.01944 link
2023-07-04 Generative Artificial Intelligence Consensus in a Trustless Network Edward Kim et.al. 2307.01898 null
2023-07-04 Training Energy-Based Models with Diffusion Contrastive Divergences Weijian Luo et.al. 2307.01668 null
2023-07-04 AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation Yunqing Zhao et.al. 2307.01465 null
2023-07-03 Squeezing Large-Scale Diffusion Models for Mobile Jiwoong Choi et.al. 2307.01193 null
2023-07-03 MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Shitao Tang et.al. 2307.01097 link
2023-07-03 DifFSS: Diffusion Model for Few-Shot Semantic Segmentation Weimin Tan et.al. 2307.00773 link
2023-07-02 LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance Linoy Tsaban et.al. 2307.00522 null
2023-07-01 DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation Zhuowei Chen et.al. 2307.00300 null
2023-07-01 AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence Jiarui Wang et.al. 2307.00211 link
2023-06-30 Stay on topic with Classifier-Free Guidance Guillaume Sanchez et.al. 2306.17806 null
2023-06-30 Practical and Asymptotically Exact Conditional Sampling in Diffusion Models Luhuan Wu et.al. 2306.17775 link
2023-06-30 Counting Guidance for High Fidelity Text-to-Image Synthesis Wonjun Kang et.al. 2306.17567 null
2023-06-30 Class-Incremental Learning using Diffusion Model for Distillation and Replay Quentin Jodelet et.al. 2306.17560 null
2023-06-30 DreamDiffusion: Generating High-Quality Images from Brain EEG Signals Yunpeng Bai et.al. 2306.16934 link
2023-06-29 CLIPAG: Towards Generator-Free Text-to-Image Generation Roy Ganz et.al. 2306.16805 null
2023-06-28 Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing Jiechong Song et.al. 2306.16060 link
2023-06-27 Semi-supervised Multimodal Representation Learning through a Global Workspace Benjamin Devillers et.al. 2306.15711 link
2023-06-26 A Simple and Effective Baseline for Attentional Generative Adversarial Networks Mingyu Jin et.al. 2306.14708 link
2023-06-26 Localized Text-to-Image Generation for Free via Cross Attention Control Yutong He et.al. 2306.14636 null
2023-06-26 A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis Aishwarya Agarwal et.al. 2306.14544 null
2023-06-26 Progressive Energy-Based Cooperative Learning for Multi-Domain Image-to-Image Translation Weinan Song et.al. 2306.14448 null
2023-06-26 Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models Luozhou Wang et.al. 2306.14408 link
2023-06-25 DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data Jingyuan Zhu et.al. 2306.14153 null
2023-06-24 UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation Michael Ogezi et.al. 2306.14067 link
2023-06-23 Zero-shot spatial layout conditioning for text-to-image diffusion models Guillaume Couairon et.al. 2306.13754 null

enhancement & editing

Publish Date Title Authors PDF Code
2025-06-26 Wild refitting for black box prediction Martin J. Wainwright et.al. 2506.21460 null
2025-06-26 Controllable 3D Placement of Objects with Scene-Aware Diffusion Models Mohamed Omran et.al. 2506.21446 null
2025-06-26 Learning to See in the Extremely Dark Hai Jiang et.al. 2506.21132 null
2025-06-26 Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling Hansam Cho et.al. 2506.21045 null
2025-06-26 DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing Lingling Cai et.al. 2506.20967 null
2025-06-26 M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization Ju-Hyeon Nam et.al. 2506.20922 null
2025-06-26 FaSTA $^*$ : Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing Advait Gupta et.al. 2506.20911 null
2025-06-25 EditP23: 3D Editing via Propagation of Image Prompts to Multi-View Roi Bar-On et.al. 2506.20652 null
2025-06-25 TDiR: Transformer based Diffusion for Image Restoration Tasks Abbas Anwar et.al. 2506.20302 null
2025-06-25 Towards Efficient Exemplar Based Image Editing with Multimodal VLMs Avadhoot Jadhav et.al. 2506.20155 null
2025-06-24 A Comparative Study of NAFNet Baselines for Image Restoration Vladislav Esaulov et.al. 2506.19845 null
2025-06-24 SceneCrafter: Controllable Multi-View Driving Scene Editing Zehao Zhu et.al. 2506.19488 null
2025-06-24 NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs Khuram Naveed et.al. 2506.19387 null
2025-06-23 Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models Ilia Beletskii et.al. 2506.19103 null
2025-06-23 Let Your Video Listen to Your Music! Xinyu Zhang et.al. 2506.18881 null
2025-06-25 OmniGen2: Exploration to Advanced Multimodal Generation Chenyuan Wu et.al. 2506.18871 null
2025-06-23 Enhancing Image Restoration Transformer via Adaptive Translation Equivariance JiaKui Hu et.al. 2506.18520 null
2025-06-23 CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing Dinh-Khoi Vo et.al. 2506.18438 null
2025-06-23 BSMamba: Brightness and Semantic Modeling for Long-Range Interaction in Low-Light Image Enhancement Tongshun Zhang et.al. 2506.18346 null
2025-06-23 A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement Muhammad Azeem Aslam et.al. 2506.18323 null
2025-06-23 Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction Han Zhang et.al. 2506.18290 null
2025-06-22 CmFNet: Cross-modal Fusion Network for Weakly-supervised Segmentation of Medical Images Dongdong Meng et.al. 2506.18042 null
2025-06-20 Reversing Flow for Image Restoration Haina Qin et.al. 2506.16961 null
2025-06-20 Visual-Instructed Degradation Diffusion for All-in-One Image Restoration Wenyang Luo et.al. 2506.16960 link
2025-06-20 FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation Fan Yang et.al. 2506.16806 null
2025-06-20 Temperature calibration of surface emissivities with an improved thermal image enhancement network Ning Chu et.al. 2506.16803 null
2025-06-23 RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought Junbo Qiao et.al. 2506.16796 null
2025-06-19 Arch-Router: Aligning LLM Routing with Human Preferences Co Tran et.al. 2506.16655 null
2025-06-19 Integrating Generative Adversarial Networks and Convolutional Neural Networks for Enhanced Traffic Accidents Detection and Analysis Zhenghao Xi et.al. 2506.16186 null
2025-06-19 MoiréXNet: Adaptive Multi-Scale Demoiréing with Linear Attention Test-Time Training and Truncated Flow Matching Prior Liangyan Li et.al. 2506.15929 null
2025-06-18 VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics Josef Kuchař et.al. 2506.15903 null
2025-06-17 Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits Taisei Kato et.al. 2506.14624 null
2025-06-17 Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching Giacomo Meanti et.al. 2506.14605 link
2025-06-17 Exploring Diffusion with Test-Time Training on Efficient Image Restoration Rongchang Lu et.al. 2506.14541 null
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 null
2025-06-18 DREAM: On hallucinations in AI-generated content for nuclear medicine imaging Menghua Xia et.al. 2506.13995 null
2025-06-15 Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing Zhuoying Li et.al. 2506.13827 null
2025-06-16 Exploiting the Exact Denoising Posterior Score in Training-Free Guidance of Diffusion Models Gregory Bellchambers et.al. 2506.13614 null
2025-06-16 AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing Biao Yang et.al. 2506.13301 null
2025-06-15 ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies Chenglin Wang et.al. 2506.12830 null
2025-06-15 Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution Hang Xu et.al. 2506.12738 null
2025-06-14 An Iterative PDE Based Illumination Restoration Scheme for Image Enhancement Dragos-Patru Covei et.al. 2506.12560 null
2025-06-14 Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts Saemee Choi et.al. 2506.12520 null
2025-06-14 UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers Yuantao Wang et.al. 2506.12324 null
2025-06-13 SphereDrag: Spherical Geometry-Aware Panoramic Image Editing Zhiao Feng et.al. 2506.11863 null
2025-06-12 VINCIE: Unlocking In-context Image Editing from Video Leigang Qu et.al. 2506.10941 null
2025-06-12 Edit360: 2D Image Edits to 3D Assets from Any Angle Junchao Huang et.al. 2506.10507 null
2025-06-11 LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Chenjian Gao et.al. 2506.10082 null
2025-06-11 Text-Aware Image Restoration with Diffusion Models Jaewon Min et.al. 2506.09993 null
2025-06-11 EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits Ron Yosef et.al. 2506.09988 null
2025-06-11 ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models Qin Zhou et.al. 2506.09740 null
2025-06-11 Ming-Omni: A Unified Multimodal Model for Perception and Generation Inclusion AI et.al. 2506.09344 link
2025-06-11 Fine-Grained Spatially Varying Material Selection in Images Julia Guerrero-Viu et.al. 2506.09023 null
2025-06-10 Do Concept Replacement Techniques Really Erase Unacceptable Concepts? Anudeep Das et.al. 2506.08991 null
2025-06-10 RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping Yang Bai et.al. 2506.08632 null
2025-06-09 Highly Compressed Tokenizer Can Generate Without Training L. Lao Beyer et.al. 2506.08257 link
2025-06-09 PairEdit: Learning Semantic Variations for Exemplar-based Image Editing Haoguang Lu et.al. 2506.07992 link
2025-06-09 Diffusion Counterfactual Generation with Semantic Abduction Rajat Rasal et.al. 2506.07883 link
2025-06-09 M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration Yongzhen Wang et.al. 2506.07814 null
2025-06-09 Consistent Video Editing as Flow-Driven Image-to-Video Generation Ge Wang et.al. 2506.07713 null
2025-06-09 DragNeXt: Rethinking Drag-Based Image Editing Yuan Zhou et.al. 2506.07611 null
2025-06-09 Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding Boyu Chen et.al. 2506.07576 null
2025-06-08 Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI Aditya Chakravarty et.al. 2506.07286 null
2025-06-08 TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation Min-Jung Kim et.al. 2506.07205 null
2025-06-08 A PDE-Based Image Restoration Method: Mathematical Analysis and Implementation Dragos-Patru Covei et.al. 2506.07132 null
2025-06-06 A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance Anees Nashath Shaik et.al. 2506.06578 null
2025-06-06 Bidirectional Image-Event Guided Low-Light Image Enhancement Zhanwen Liu et.al. 2506.06120 null
2025-06-06 Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Yifu Qiu et.al. 2506.06006 link
2025-06-06 FADE: Frequency-Aware Diffusion Model Factorization for Video Editing Yixuan Zhu et.al. 2506.05934 link
2025-06-06 NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces Pierluigi Zama Ramirez et.al. 2506.05815 null
2025-06-05 UniRes: Universal Image Restoration for Complex Degradations Mo Zhou et.al. 2506.05599 null
2025-06-05 OpenRR-5k: A Large-Scale Benchmark for Reflection Removal in the Wild Jie Cai et.al. 2506.05482 null
2025-06-05 Towards Reliable Identification of Diffusion-based Image Manipulations Alex Costanzino et.al. 2506.05466 null
2025-06-05 Degradation-Aware Image Enhancement via Vision-Language Classification Jie Cai et.al. 2506.05450 null
2025-06-05 SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Jianyi Wang et.al. 2506.05301 null
2025-06-06 SeedEdit 3.0: Fast and High-Quality Generative Image Editing Peng Wang et.al. 2506.05083 null
2025-06-05 FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing Guangzhao Li et.al. 2506.05046 null
2025-06-05 Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking Yu-Feng Chen et.al. 2506.04879 null
2025-06-05 Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement Niki Martinel et.al. 2506.04753 null
2025-06-04 A Poisson-Guided Decomposition Network for Extreme Low-Light Image Enhancement Isha Rao et.al. 2506.04470 null
2025-06-04 HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation Hermann Kumbong et.al. 2506.04421 null
2025-06-04 Is Perturbation-Based Image Protection Disruptive to Image Editing? Qiuyu Tang et.al. 2506.04394 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216 null
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213 null
2025-06-04 Image Editing As Programs with Diffusion Models Yujia Hu et.al. 2506.04158 null
2025-06-04 Joint Video Enhancement with Deblurring, Super-Resolution, and Frame Interpolation Network Giyong Choi et.al. 2506.03892 null
2025-06-03 RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions Bimsara Pathiraja et.al. 2506.03448 null
2025-06-04 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Bin Lin et.al. 2506.03147 null
2025-06-03 ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions Di Chang et.al. 2506.03107 null
2025-06-03 NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results Xiaohong Liu et.al. 2506.02875 null
2025-06-03 ControlMambaIR: Conditional Controls with State-Space Model for Image Restoration Cheng Yang et.al. 2506.02633 null
2025-06-03 DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing Zixiang Li et.al. 2506.02560 null
2025-06-03 RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers Yan Gong et.al. 2506.02528 null
2025-06-04 NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution Marcos V. Conde et.al. 2506.02197 null
2025-06-02 IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout Fei Shen et.al. 2506.01949 null
2025-06-02 RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report Marcos V. Conde et.al. 2506.01947 null
2025-06-04 MedEBench: Revisiting Text-instructed Image Editing on Medical Domain Minghao Liu et.al. 2506.01921 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873 null
2025-05-30 RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement Raman Jha et.al. 2505.24705 link
2025-05-30 IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models Hanting Wang et.al. 2505.24406 link
2025-05-30 Boosting All-in-One Image Restoration via Self-Improved Privilege Learning Gang Wu et.al. 2505.24207 link
2025-05-29 Cora: Correspondence-aware image editing using few step diffusion Amirhossein Almohammadi et.al. 2505.23907 null
2025-05-29 LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers Yusuf Dalva et.al. 2505.23758 null
2025-05-29 Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features Ziyong Wang et.al. 2505.23586 null
2025-05-29 Video Editing for Audio-Visual Dubbing Binyamin Manela et.al. 2505.23406 link
2025-05-29 Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging Ping Wang et.al. 2505.23180 link
2025-05-29 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing Jeongsol Kim et.al. 2505.23145 link
2025-05-29 Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing Tongtong Su et.al. 2505.23134 link
2025-05-29 CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing Yuka Ogino et.al. 2505.23102 null
2025-05-29 URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration Rui Xu et.al. 2505.23068 link
2025-05-29 Vision-Based Assistive Technologies for People with Cerebral Visual Impairment: A Review and Focus Study Bhanuka Gamage et.al. 2505.22983 null
2025-05-29 EquiReg: Equivariance Regularized Diffusion for Inverse Problems Bahareh Tolooshams et.al. 2505.22973 null
2025-05-28 From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration Junyu Fan et.al. 2505.22284 null
2025-05-28 GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement Zhihong Tang et.al. 2505.22021 null
2025-05-28 Reference-Guided Identity Preserving Face Restoration Mo Zhou et.al. 2505.21905 null
2025-05-28 Broadening Our View: Assistive Technology for Cerebral Visual Impairment Bhanuka Gamage et.al. 2505.21875 null
2025-05-27 BaryIR: Learning Multi-Source Unified Representation in Continuous Barycenter Space for Generalizable All-in-One Image Restoration Xiaole Tang et.al. 2505.21637 null
2025-05-27 Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion Yang Yang et.al. 2505.21593 null
2025-05-27 Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks Kyzyl Monteiro et.al. 2505.20916 null
2025-05-28 See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction Yuan Wu et.al. 2505.20641 link
2025-05-27 InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling Xiaoxiao Jiang et.al. 2505.20600 null
2025-05-28 PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy Shuhao Guan et.al. 2505.20429 null
2025-05-26 What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models Lorenzo Baraldi et.al. 2505.20405 null
2025-05-26 ImgEdit: A Unified Image Editing Dataset and Benchmark Yang Ye et.al. 2505.20275 link
2025-05-26 Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement Afrah Shaahid et.al. 2505.19895 null
2025-05-26 StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation Yi Wu et.al. 2505.19874 null
2025-05-26 A Regularization-Guided Equivariant Approach for Image Restoration Yulu Bai et.al. 2505.19799 link
2025-05-26 TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs Juntong Wang et.al. 2505.19535 null
2025-05-25 Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions Chenrui Ma et.al. 2505.19352 null
2025-05-25 Improving Novel view synthesis of 360 $^\circ$ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images Guangan Chen et.al. 2505.19264 link
2025-05-25 Benchmarking Laparoscopic Surgical Image Restoration and Beyond Jialun Pei et.al. 2505.19161 link
2025-05-25 SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Shenggan Cheng et.al. 2505.19151 null
2025-05-25 MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection Shuyu Wang et.al. 2505.19149 null
2025-05-25 Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition Xiaoyang Liu et.al. 2505.19120 link
2025-05-23 RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration Sudarshan Rajagopalan et.al. 2505.18047 null
2025-05-23 DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval Yuxin Yang et.al. 2505.17796 null
2025-05-23 R-Genie: Reasoning-Guided Generative Image Editing Dong Zhang et.al. 2505.17768 null
2025-05-23 MODEM: A Morton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery Hainuo Wang et.al. 2505.17581 link
2025-05-23 Dual Ascent Diffusion for Inverse Problems Minseo Kim et.al. 2505.17353 null
2025-05-22 Forward-only Diffusion Probabilistic Models Ziwei Luo et.al. 2505.16733 link
2025-05-22 KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Yongliang Wu et.al. 2505.16707 null
2025-05-22 Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration Yuetong Liu et.al. 2505.16479 null
2025-05-22 NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment Shuhao Han et.al. 2505.16314 null
2025-05-26 Understanding Generative AI Capabilities in Everyday Image Editing Tasks Mohammad Reza Taesiri et.al. 2505.16181 null
2025-05-22 Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey Liyan Wang et.al. 2505.16161 link
2025-05-22 Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention Yuang Ai et.al. 2505.16157 null
2025-05-21 FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models Zhen Sun et.al. 2505.15644 link
2025-05-22 Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives Yisi Luo et.al. 2505.15222 link
2025-05-21 AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection Zhipei Xu et.al. 2505.15173 null
2025-05-20 UHD Image Dehazing via anDehazeFormer with Atmospheric-aware KV Cache Pu Wang et.al. 2505.14010 null
2025-05-19 Adaptive Image Restoration for Video Surveillance: A Real-Time Approach Muhammad Awais Amin et.al. 2505.13130 null
2025-05-19 LatentINDIGO: An INN-Guided Latent Diffusion Algorithm for Image Restoration Di You et.al. 2505.12935 null
2025-05-19 Towards a Universal Image Degradation Model via Content-Degradation Disentanglement Wenbo Yang et.al. 2505.12860 null
2025-05-19 Degradation-Aware Feature Perturbation for All-in-One Image Restoration Xiangpeng Tian et.al. 2505.12630 link
2025-05-20 DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model Siwei Xia et.al. 2505.12427 link
2025-05-18 Trustworthy Image Super-Resolution via Generative Pseudoinverse Andreas Floros et.al. 2505.12375 link
2025-05-18 PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement ZhanFeng Feng et.al. 2505.12266 link
2025-05-18 From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations Yuzhi Li et.al. 2505.12237 null
2025-05-20 CompBench: Benchmarking Complex Instruction-guided Image Editing Bohan Jia et.al. 2505.12200 null
2025-05-16 X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models Valentina Bazyleva et.al. 2505.11753 null
2025-05-16 GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing Yusu Qian et.al. 2505.11493 null
2025-05-16 Diff-Unfolding: A Model-Based Score Learning Framework for Inverse Problems Yuanhao Wang et.al. 2505.11393 null
2025-05-16 Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement Nirjhor Datta et.al. 2505.11246 link
2025-05-15 torchmfbd: a flexible multi-object multi-frame blind deconvolution code A. Asensio Ramos et.al. 2505.10639 link
2025-05-19 Super-Resolution Generative Adversarial Networks based Video Enhancement Kağan ÇETİN et.al. 2505.10589 null
2025-05-15 3D-Fixup: Advancing Photo Editing with 3D Priors Yen-Chi Cheng et.al. 2505.10566 null
2025-05-14 Don’t Forget your Inverse DDIM for Image Editing Guillermo Gomez-Trenado et.al. 2505.09571 null
2025-05-14 PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement Tong Li et.al. 2505.09196 null
2025-05-15 IntrinsicEdit: Precise generative image manipulation in intrinsic space Linjie Lyu et.al. 2505.08889 null
2025-05-13 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations Petrus H. Zwart et.al. 2505.08176 null
2025-05-12 Image Restoration via Integration of Optimal Control Techniques and the Hamilton-Jacobi-Bellman Equation Dragos-Patru Covei et.al. 2505.07699 null
2025-05-12 Generalizable Pancreas Segmentation via a Dual Self-Supervised Learning Framework Jun Li et.al. 2505.07165 null
2025-05-11 DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models Junhao Xia et.al. 2505.07057 null
2025-05-10 UnfoldIR: Rethinking Deep Unfolding Network in Illumination Degradation Image Restoration Chunming He et.al. 2505.06683 null
2025-05-10 Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach Minting Pan et.al. 2505.06482 null
2025-05-09 MonetGPT: Solving Puzzles Enhances MLLMs’ Image Retouching Skills Niladri Shekhar Dutt et.al. 2505.06176 null
2025-05-09 A review of advancements in low-light image enhancement using deep learning Fangxue Liu et.al. 2505.05759 null
2025-05-08 Semantic Style Transfer for Enhancing Animal Facial Landmark Detection Anadil Hussein et.al. 2505.05640 null
2025-05-08 A Preliminary Study for GPT-4o on Image Restoration Hao Yang et.al. 2505.05621 link
2025-05-08 SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation Yonwoo Choi et.al. 2505.05475 link
2025-05-11 Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation Chao Liao et.al. 2505.05472 null
2025-05-08 EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution Haizhen Xie et.al. 2505.05209 null
2025-05-12 MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models Hongyang Zhu et.al. 2505.05101 null
2025-05-08 ADNP-15: An Open-Source Histopathological Dataset for Neuritic Plaque Segmentation in Human Brain Whole Slide Images with Frequency Domain Image Enhancement for Stain Normalization Chenxi Zhao et.al. 2505.05041 null
2025-05-08 GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing Tong Wang et.al. 2505.04915 null
2025-05-07 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers Divyansh Srivastava et.al. 2505.04718 null
2025-05-07 Multi-turn Consistent Image Editing Zijun Zhou et.al. 2505.04320 null
2025-05-07 TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image Enhancement Yi Li et.al. 2505.04281 link
2025-05-07 Regional chemical potential analysis for material surfaces Masahiro Fukuda et.al. 2505.04053 null
2025-05-04 Video Forgery Detection for Surveillance Cameras: A Review Noor B. Tayfor et.al. 2505.03832 null
2025-05-06 DDaTR: Dynamic Difference-aware Temporal Residual Network for Longitudinal Radiology Report Generation Shanshan Song et.al. 2505.03401 link
2025-05-05 NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results Nikolay Safonov et.al. 2505.03007 link
2025-05-07 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Inclusion AI et.al. 2505.02471 link
2025-05-05 MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection Jiaqi Zhang et.al. 2505.02441 link
2025-05-05 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Ming Li et.al. 2505.02370 link
2025-05-04 HiLLIE: Human-in-the-Loop Training for Low-Light Image Enhancement Xiaorui Zhao et.al. 2505.02134 null
2025-05-03 ImageR: Enhancing Bug Report Clarity by Screenshots Xuchen Tan et.al. 2505.01925 null
2025-05-03 Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement Haofan Wu et.al. 2505.01831 null
2025-05-02 Deblurring fission fragment mass distributions Pierre Nzabahimana et.al. 2505.01294 null
2025-05-02 RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement Kui Jiang et.al. 2505.01224 link
2025-05-02 Improving Editability in Image Generation with Layer-wise Memory Daneul Kim et.al. 2505.01079 null
2025-05-02 A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories Ziqi Ding et.al. 2505.01067 null
2025-05-02 Photoshop Batch Rendering Using Actions for Stylistic Video Editing Tessa De La Fuente et.al. 2505.01001 null
2025-05-01 InstructAttribute: Fine-grained Object Attributes editing with Instruction Xingxi Yin et.al. 2505.00751 null
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704 null
2025-05-01 GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution Aditya Arora et.al. 2505.00687 null
2025-05-01 Towards Scalable Human-aligned Benchmark for Text-guided Image Editing Suho Ryu et.al. 2505.00502 link
2025-04-30 DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration Hebaixu Wang et.al. 2504.21487 link
2025-04-30 VR-FuseNet: A Fusion of Heterogeneous Fundus Data and Explainable Deep Network for Diabetic Retinopathy Classification Shamim Rahim Refat et.al. 2504.21464 null
2025-04-29 In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Zechuan Zhang et.al. 2504.20690 null
2025-04-30 PixelHacker: Image Inpainting with Structural and Semantic Consistency Ziyang Xu et.al. 2504.20438 null
2025-04-27 FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement Kangbiao Shi et.al. 2504.19295 null
2025-04-27 Marine Snow Removal Using Internally Generated Pseudo Ground Truth Alexandra Malyugina et.al. 2504.19289 null
2025-04-27 Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting Xiaofeng Jin et.al. 2504.19261 null
2025-04-27 CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes Tuan Nguyen et.al. 2504.19212 null
2025-04-27 Adaptive Dual-domain Learning for Underwater Image Enhancement Lingtao Peng et.al. 2504.19198 link
2025-04-27 DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning Jialang Lu et.al. 2504.19127 null
2025-04-26 REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models Gal Almog et.al. 2504.18989 link
2025-04-24 DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing Aniruddha Bala et.al. 2504.17894 null
2025-04-24 VEU-Bench: Towards Comprehensive Understanding of Video Editing Bozheng Li et.al. 2504.17828 null
2025-04-24 Dual Prompting Image Restoration with Diffusion Transformers Dehong Kong et.al. 2504.17825 null
2025-04-28 Step1X-Edit: A Practical Framework for General Image Editing Shiyu Liu et.al. 2504.17761 link
2025-04-24 DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model Zhanwen Liu et.al. 2504.17732 null
2025-04-24 Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields Zhuo He et.al. 2504.17712 null
2025-04-24 Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems Jaegang Jo et.al. 2504.17368 null
2025-04-24 I-INR: Iterative Implicit Neural Representations Ali Haider et.al. 2504.17364 null
2025-04-24 Enhancing Variational Autoencoders with Smooth Robust Latent Encoding Hyomin Lee et.al. 2504.17219 null
2025-04-23 RouteWinFormer: A Route-Window Transformer for Middle-range Attention in Image Restoration Qifan Li et.al. 2504.16637 null
2025-04-23 Cross Paradigm Representation and Alignment Transformer for Image Deraining Shun Zou et.al. 2504.16455 null
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016 null
2025-04-22 Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models Dasol Jeong et.al. 2504.15723 null
2025-04-24 Vidi: Large Multimodal Models for Video Understanding and Editing Vidi Team et.al. 2504.15681 null
2025-04-22 AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization Jinda Lu et.al. 2504.15619 null
2025-04-22 SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking Yunfeng Li et.al. 2504.15609 link
2025-04-22 InstaRevive: One-Step Image Enhancement via Dynamic Score Matching Yixuan Zhu et.al. 2504.15513 null
2025-04-21 MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World Ankit Dhiman et.al. 2504.15397 null
2025-04-21 Plug-and-Play Versatile Compressed Video Enhancement Huimin Zeng et.al. 2504.15380 null
2025-04-21 Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration Junyuan Deng et.al. 2504.15159 null
2025-04-21 Structure-guided Diffusion Transformer for Low-Light Image Enhancement Xiangchen Yin et.al. 2504.15054 null
2025-04-21 Distribution-aware Dataset Distillation for Efficient Image Restoration Zhuoran Zheng et.al. 2504.14826 null
2025-04-20 MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation Siyi Jiao et.al. 2504.14606 null
2025-04-19 Visual Prompting for One-shot Controllable Video Editing without Inversion Zhengbo Zhang et.al. 2504.14335 null
2025-04-19 Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation Bin Ren et.al. 2504.14249 null
2025-04-19 PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling Alara Dirik et.al. 2504.14219 null
2025-04-18 Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design Wei Dong et.al. 2504.14075 link
2025-04-18 Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation Fulvio Sanguigni et.al. 2504.14011 null
2025-04-18 Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing Joowon Kim et.al. 2504.13490 null
2025-04-18 Circular Image Deturbulence using Quasi-conformal Geometry Chu Chen et.al. 2504.13432 null
2025-04-17 Image Editing with Diffusion Models: A Survey Jia Wang et.al. 2504.13226 null
2025-04-17 $\texttt{Complex-Edit}$ : CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Siwei Yang et.al. 2504.13143 null
2025-04-17 UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models Guanlong Jiao et.al. 2504.13109 null
2025-04-17 Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval WonJun Moon et.al. 2504.13035 null
2025-04-17 Image-Editing Specialists: An RLAIF Approach for Diffusion Models Elior Benarous et.al. 2504.12833 link
2025-04-17 Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal Inzamamul Alam et.al. 2504.12809 link
2025-04-17 SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding Qianqian Sun et.al. 2504.12704 null
2025-04-17 AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting Xin Su et.al. 2504.12605 null
2025-04-16 Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling Zhihua Wang et.al. 2504.12204 link
2025-04-16 Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline Joanne Lin et.al. 2504.12169 null
2025-04-16 Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging Tristan S. W. Stevens et.al. 2504.12154 null
2025-04-17 DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency Mengshi Qi et.al. 2504.12080 link
2025-04-17 Understanding Attention Mechanism in Video Diffusion Models Bingyan Liu et.al. 2504.12027 null
2025-04-16 Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach Lvpan Cai et.al. 2504.11922 link
2025-04-16 Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement Xingxing Yang et.al. 2504.11896 null
2025-04-16 HyperKING: Quantum-Classical Generative Adversarial Networks for Hyperspectral Image Restoration Chia-Hsiang Lin et.al. 2504.11782 null
2025-04-15 Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain Pengcheng Zheng et.al. 2504.11286 null
2025-04-15 UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques Pedro Diaz-Garcia et.al. 2504.11063 null
2025-04-15 AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent Pu Wang et.al. 2504.10978 null
2025-04-15 An Efficient and Mixed Heterogeneous Model for Image Restoration Yubin Gu et.al. 2504.10967 link
2025-04-14 Enhancing Image Restoration through Learning Context-Rich and Detail-Accurate Features Hu Gao et.al. 2504.10558 link
2025-04-14 Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing Taihang Hu et.al. 2504.10434 link
2025-04-14 PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems Maud Biquard et.al. 2504.10375 null
2025-04-14 Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis Kaiwen Zheng et.al. 2504.10351 null
2025-04-14 Analysis of Attention in Video Diffusion Transformers Yuxin Wen et.al. 2504.10317 null
2025-04-14 VibrantLeaves: A principled parametric image generator for training deep restoration models Raphael Achddou et.al. 2504.10201 link
2025-04-14 Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction Yucheng Lu et.al. 2504.10080 null
2025-04-14 Progressive Transfer Learning for Multi-Pass Fundus Image Restoration Uyen Phan et.al. 2504.10025 null
2025-04-14 Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration Gang Wu et.al. 2504.09973 link
2025-04-13 SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow Kenan Tang et.al. 2504.09697 link
2025-04-13 CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models Pooja Guhan et.al. 2504.09472 null
2025-04-11 ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration Yongsheng Yu et.al. 2504.08591 null
2025-04-11 CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model Ruohao Zhan et.al. 2504.08259 null
2025-04-11 VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions Ziyan Liu et.al. 2504.08219 null
2025-04-10 POEM: Precise Object-level Editing via MLLM control Marco Schouten et.al. 2504.08111 null
2025-04-10 Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement Daniel Torres et.al. 2504.07810 null
2025-04-10 Learning Universal Features for Generalizable Image Forgery Localization Hengrun Zhao et.al. 2504.07462 link
2025-04-10 Synthetic CT Generation from Time-of-Flight Non-Attenutaion-Corrected PET for Whole-Body PET Attenuation Correction Weijie Chen et.al. 2504.07450 null
2025-04-10 Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing Chenxi Sun et.al. 2504.07424 null
2025-04-09 Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model Yingjie Zhou et.al. 2504.07148 null
2025-04-08 VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Juan Luis Gonzalez Bello et.al. 2504.07146 null
2025-04-09 FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution Gene Chou et.al. 2504.07093 link
2025-04-09 Rethinking LayerNorm in Image Restoration Transformers MinKyu Lee et.al. 2504.06629 null
2025-04-08 AstroClearNet: Deep image prior for multi-frame astronomical image restoration Yashil Sukurdeep et.al. 2504.06463 null
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 null
2025-04-08 Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Qi Mao et.al. 2504.05594 null
2025-04-08 TAPNext: Tracking Any Point (TAP) as Next Token Prediction Artem Zholus et.al. 2504.05579 null
2025-04-07 CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models Kavana Venkatesh et.al. 2504.05306 null
2025-04-07 Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion Xingyu Hu et.al. 2504.05164 null
2025-04-07 DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration Jiamei Xiong et.al. 2504.05135 null
2025-04-08 Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision Yuandong Pu et.al. 2504.04903 null
2025-04-07 Content-Aware Transformer for All-in-one Image Restoration Gang Wu et.al. 2504.04869 link
2025-04-07 Inland Waterway Object Detection in Multi-environment: Dataset and Approach Shanshan Wang et.al. 2504.04835 null
2025-04-07 Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing Hui Liu et.al. 2504.04784 null
2025-04-05 JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration Yunlong Lin et.al. 2504.04158 null
2025-04-07 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Wulin Xie et.al. 2504.03641 null
2025-04-04 Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal Yuyang Hu et.al. 2504.03607 null
2025-04-04 Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning Lucas Choi et.al. 2504.03168 null
2025-04-04 Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices Yang He et.al. 2504.03155 null
2025-04-03 How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models Pascal Chang et.al. 2504.03072 null
2025-04-03 VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Xianwei Zhuang et.al. 2504.02949 link
2025-04-03 Concept Lancet: Image Editing with Compositional Representation Transplant Jinqi Luo et.al. 2504.02828 null
2025-04-03 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Zhiyuan Yan et.al. 2504.02782 link
2025-04-03 RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models ZhongLi Fang et.al. 2504.02640 null
2025-04-03 Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement Hesong Li et.al. 2504.02555 link
2025-04-03 HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement Hantang Li et.al. 2504.02373 null
2025-04-03 Brightness Perceiving for Recursive Low-Light Image Enhancement Haodian Wang et.al. 2504.02362 link
2025-04-03 SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW Masakazu Yoshimura et.al. 2504.02345 null
2025-04-02 FreSca: Unveiling the Scaling Space in Diffusion Models Chao Huang et.al. 2504.02154 null
2025-04-03 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Runhui Huang et.al. 2504.01934 null
2025-04-02 A Diffusion-Based Framework for Occluded Object Movement Zheng-Peng Duan et.al. 2504.01873 null
2025-04-02 Bridge the Gap between SNN and ANN for Image Restoration Xin Su et.al. 2504.01755 null
2025-04-01 Deconver: A Deconvolutional Network for Medical Image Segmentation Pooya Ashtari et.al. 2504.00302 link
2025-03-31 InstructRestore: Region-Customized Image Restoration with Human Instructions Shuaizheng Liu et.al. 2503.24357 link
2025-03-31 AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents Jiaxiang Chen et.al. 2503.23948 link
2025-03-31 Training-Free Text-Guided Image Editing with Visual Autoregressive Model Yufei Wang et.al. 2503.23897 link
2025-03-31 3D Dental Model Segmentation with Geometrical Boundary Preserving Shufan Xi et.al. 2503.23702 link
2025-03-30 Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging Amar Kumar et.al. 2503.23618 null
2025-03-30 ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 Tianming Liang et.al. 2503.23509 link
2025-03-30 SketchVideo: Sketch-based Video Generation and Editing Feng-Lin Liu et.al. 2503.23284 null
2025-03-29 A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery Pengyu Chen et.al. 2503.23200 null
2025-03-29 FreeInv: Free Lunch for Improving DDIM Inversion Yuxiang Bao et.al. 2503.23035 null
2025-03-29 indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy Ashesh Ashesh et.al. 2503.22983 null
2025-03-28 RELD: Regularization by Latent Diffusion Models for Image Restoration Pasquale Cascarano et.al. 2503.22563 null
2025-03-28 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance Haijie Yang et.al. 2503.22225 null
2025-03-27 Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration Yujie Chen et.al. 2503.21970 null
2025-03-28 LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing Achint Soni et.al. 2503.21541 link
2025-03-27 Invert2Restore: Zero-Shot Degradation-Blind Image Restoration Hamadi Chihaoui et.al. 2503.21486 null
2025-03-27 Diffusion Image Prior Hamadi Chihaoui et.al. 2503.21410 null
2025-03-26 Synthetic Video Enhances Physical Fidelity in Video Synthesis Qi Zhao et.al. 2503.20822 null
2025-03-26 Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising Yan-Bo Lin et.al. 2503.20782 null
2025-03-26 Underwater Image Enhancement by Convolutional Spiking Neural Networks Vidya Sudevan et.al. 2503.20485 link
2025-03-26 EditCLIP: Representation Learning for Image Editing Qian Wang et.al. 2503.20318 link
2025-03-26 Wan: Open and Advanced Large-Scale Video Generative Models WanTeam et.al. 2503.20314 link
2025-03-26 InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction Yuhui Wu et.al. 2503.20287 link
2025-03-26 Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration Shihao Zhou et.al. 2503.20174 null
2025-03-25 FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model Jun Zhou et.al. 2503.19839 null
2025-03-25 LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset Manjushree Aithal et.al. 2503.19804 null
2025-03-24 FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing Yufan Ren et.al. 2503.19191 null
2025-03-24 LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment Haoran Wang et.al. 2503.18640 null
2025-03-25 Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning Sherry X. Chen et.al. 2503.18406 link
2025-03-24 Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance Sicong Feng et.al. 2503.18386 null
2025-03-24 MaSS13K: A Matting-level Semantic Segmentation Benchmark Chenxi Xie et.al. 2503.18364 link
2025-03-23 Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance Harang Ju et.al. 2503.18238 link
2025-03-25 Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods Yuzhi Li et.al. 2503.17975 null
2025-03-23 Deep Learning Assisted Denoising of Experimental Micrographs Owais Ahmad et.al. 2503.17945 null
2025-03-23 Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach Zhi Zhang et.al. 2503.17937 null
2025-03-23 Cat-AIR: Content and Task-Aware All-in-One Image Restoration Jiachen Jiang et.al. 2503.17915 null
2025-03-23 What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images Dongheng Lin et.al. 2503.17899 null
2025-03-21 HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks Maria Pilligua et.al. 2503.17276 null
2025-03-21 Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks Haijin Zeng et.al. 2503.16930 null
2025-03-21 DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics Yihan Hu et.al. 2503.16795 null
2025-03-20 Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems Teresa Klatzer et.al. 2503.16222 null
2025-03-20 FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing Tianyi Wei et.al. 2503.16153 null
2025-03-20 Single Image Iterative Subject-driven Generation and Editing Yair Shpitzer et.al. 2503.16025 link
2025-03-20 DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration Suraj Singh et.al. 2503.15984 null
2025-03-21 UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations Debabrata Mandal et.al. 2503.15868 null
2025-03-23 Multi-focal Conditioned Latent Diffusion for Person Image Synthesis Jiaqi Liu et.al. 2503.15686 link
2025-03-19 Image Restoration Models with Optimal Transport and Total Variation Regularization Weijia Huang et.al. 2503.14947 null
2025-03-18 ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing Yulin Pan et.al. 2503.14482 null
2025-03-18 SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model Yucheng Mao et.al. 2503.14463 null
2025-03-19 VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation Shoubin Yu et.al. 2503.14350 null
2025-03-18 Towards properties of adversarial image perturbations Egor Kuznetsov et.al. 2503.14111 null
2025-03-18 Intra and Inter Parser-Prompted Transformers for Effective Image Restoration Cong Wang et.al. 2503.14037 link
2025-03-18 TarPro: Targeted Protection against Malicious Image Editing Kaixin Shen et.al. 2503.13994 null
2025-03-17 FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models Minghan Li et.al. 2503.13684 null
2025-03-17 Unified Autoregressive Visual Generation and Understanding with Continuous Tokens Lijie Fan et.al. 2503.13436 null
2025-03-17 Edit Transfer: Learning Image Editing via Vision In-Context Relations Lan Chen et.al. 2503.13327 null
2025-03-17 From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective Chen Zhao et.al. 2503.13165 null
2025-03-17 GIFT: Generated Indoor video frames for Texture-less point tracking Jianzheng Huang et.al. 2503.12944 null
2025-03-17 DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode Junjia Huang et.al. 2503.12838 null
2025-03-17 Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion Yidi Liu et.al. 2503.12764 null
2025-03-16 UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing Tsu-Jui Fu et.al. 2503.12652 null
2025-03-16 Personalize Anything for Free with Diffusion Transformer Haoran Feng et.al. 2503.12590 null
2025-03-16 DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement Han Mei et.al. 2503.12470 link
2025-03-16 Pathology Image Restoration via Mixture of Prompts Jiangdong Cai et.al. 2503.12399 link
2025-03-14 RASA: Replace Anyone, Say Anything – A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing Tianrui Pan et.al. 2503.11571 null
2025-03-14 Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption Du Chen et.al. 2503.11221 null
2025-03-14 Multi-Stage Generative Upscaler: Reconstructing Football Broadcast Images via Diffusion Models Luca Martini et.al. 2503.11181 null
2025-03-14 Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement Yini Li et.al. 2503.11175 link
2025-03-14 LUSD: Localized Update Score Distillation for Text-Guided Image Editing Worameth Chinchuthakun et.al. 2503.11054 link
2025-03-14 InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences Hongkai Zheng et.al. 2503.11043 null
2025-03-14 V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes Yanming Zhang et.al. 2503.10634 null
2025-03-13 CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing Advait Gupta et.al. 2503.10613 link
2025-03-13 EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing Zexuan Yan et.al. 2503.10270 link
2025-03-13 Hybrid Agents for Image Restoration Bingchen Li et.al. 2503.10120 null
2025-03-13 MoEdit: On Learning Quantity Perception for Multi-object Image Editing Yanfeng Li et.al. 2503.10112 link
2025-03-13 Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion Xingxin Xu et.al. 2503.10109 null
2025-03-14 On the Limitations of Vision-Language Models in Understanding Image Transforms Ahmad Mustafa Anis et.al. 2503.09837 null
2025-03-12 Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space Yifan Zhou et.al. 2503.09419 link
2025-03-12 Multi-Agent Image Restoration Xu Jiang et.al. 2503.09403 null
2025-03-12 MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration Zhehui Wu et.al. 2503.09131 link
2025-03-12 InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images Jiun Tian Hoe et.al. 2503.09130 null
2025-03-12 Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal Rongxin Liao et.al. 2503.09013 link
2025-03-11 QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution Siddhant Dutta et.al. 2503.08759 null
2025-03-12 OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting Yongsheng Yu et.al. 2503.08677 null
2025-03-13 Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models Armando Fortes et.al. 2503.08434 null
2025-03-11 PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net Jun Yin et.al. 2503.08276 null
2025-03-11 Aligning Text to Image in Diffusion Models is Easier Than You Think Jaa-Yeon Lee et.al. 2503.08250 link
2025-03-11 TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement Miao Zhang et.al. 2503.08168 null
2025-03-11 Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features Hanbyul Lee et.al. 2503.08148 null
2025-03-11 ObjectMover: Generative Object Movement with Video Prior Xin Yu et.al. 2503.08037 null
2025-03-11 Deep Perceptual Enhancement for Medical Image Analysis S M A Sharif et.al. 2503.08027 link
2025-03-11 CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement Chenrui Ma et.al. 2503.07938 null
2025-03-10 Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Lixue Gong et.al. 2503.07703 null
2025-03-10 GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts Minwen Liao et.al. 2503.07417 null
2025-03-11 Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios Chenglu Pan et.al. 2503.07232 null
2025-03-10 TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation Victor Shea-Jay Huang et.al. 2503.07050 null
2025-03-10 MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement Shrutika Vishal Thengane et.al. 2503.06953 link
2025-03-10 Interactive Tumor Progression Modeling via Sketch-Based Image Editing Gexin Huang et.al. 2503.06809 null
2025-03-09 Consistent Image Layout Editing with Diffusion Models Tao Xia et.al. 2503.06419 null
2025-03-08 Get In Video: Add Anything You Want to the Video Shaobin Zhuang et.al. 2503.06268 null
2025-03-08 X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation Jian Ma et.al. 2503.06134 link
2025-03-10 VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Yuxuan Bian et.al. 2503.05639 link
2025-03-07 Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients Niklas Penzel et.al. 2503.05424 null
2025-03-06 Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models Rui Jiang et.al. 2503.04215 null
2025-03-05 GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors Yaopei Zeng et.al. 2503.03944 null
2025-03-05 An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation Yuezhe Tian et.al. 2503.03640 null
2025-03-03 Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications Yuchen Xiang et.al. 2503.02908 null
2025-03-04 ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement Xuejian Guo et.al. 2503.02484 link
2025-03-04 Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration Pengchen Liang et.al. 2503.02321 null
2025-03-04 h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform Toan Nguyen et.al. 2503.02187 link
2025-03-03 MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting Mojtaba Safari et.al. 2503.01576 link
2025-03-03 Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions Zihan Shen et.al. 2503.01339 null
2025-03-03 Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual Chong Wang et.al. 2503.01288 link
2025-03-03 VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors Juil Koo et.al. 2503.01107 null
2025-03-01 Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement Aupendu Kar et.al. 2503.00642 link
2025-03-01 GenVDM: Generating Vector Displacement Maps From a Single Image Yuezhi Yang et.al. 2503.00605 null
2025-03-01 Flow Matching for Medical Image Synthesis: Bridging the Gap Between Speed and Quality Milad Yazdani et.al. 2503.00266 link
2025-02-28 SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events Yunfan Lu et.al. 2502.21120 null
2025-02-28 Diffusion Restoration Adapter for Real-World Image Restoration Hanbang Liang et.al. 2502.20679 null
2025-02-27 Tight Inversion: Image-Conditioned Inversion for Real Image Editing Edo Kadosh et.al. 2502.20376 null
2025-02-28 HVI: A New Color Space for Low-light Image Enhancement Qingsen Yan et.al. 2502.20272 link
2025-02-27 Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps Tianxiao Gao et.al. 2502.20054 null
2025-02-27 Identity-preserving Distillation Sampling by Fixed-Point Iterator SeonHwa Kim et.al. 2502.19930 null
2025-02-27 Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement Nan An et.al. 2502.19867 null
2025-02-26 ILACS-LGOT: A Multi-Layer Contrast Enhancement Approach for Palm-Vein Images Kaveen Perera et.al. 2502.19456 null
2025-02-26 Self-supervised conformal prediction for uncertainty quantification in Poisson imaging problems Bernardin Tamo Amougou et.al. 2502.19194 null
2025-02-26 Multi-level Attention-guided Graph Neural Network for Image Restoration Jiatao Jiang et.al. 2502.19181 null
2025-02-27 RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images Yuhan Tang et.al. 2502.19153 null
2025-02-26 Dynamic Degradation Decomposition Network for All-in-One Image Restoration Huiqiang Wang et.al. 2502.19068 null
2025-02-25 Spatial Analysis of Neuromuscular Junctions Activation in Three-Dimensional Histology-based Muscle Reconstructions Alessandro Ascani Orsini et.al. 2502.18646 link
2025-02-25 Application of Attention Mechanism with Bidirectional Long Short-Term Memory (BiLSTM) and CNN for Human Conflict Detection using Computer Vision Erick da Silva Farias et.al. 2502.18555 null
2025-02-26 Bayesian Optimization for Controlled Image Editing via LLMs Chengkun Cai et.al. 2502.18116 null
2025-02-25 KV-Edit: Training-Free Image Editing for Precise Background Preservation Tianrui Zhu et.al. 2502.17363 link
2025-02-24 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Xiangpeng Yang et.al. 2502.17258 null
2025-02-24 Splitting Regularized Wasserstein Proximal Algorithms for Nonsmooth Sampling Problems Fuqun Han et.al. 2502.16773 link
2025-02-22 DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation Yuxuan Xiong et.al. 2502.16302 null
2025-02-21 Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement Uche A. Nnolim et.al. 2502.15986 null
2025-02-21 LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement Namrah Siddiqua et.al. 2502.15186 null
2025-02-21 Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization Ach Khozaimi et.al. 2502.15156 null
2025-02-20 Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications Maha Ezzelarab et.al. 2502.14995 null
2025-02-23 PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Shijie Huang et.al. 2502.14397 link
2025-02-20 EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement Wenhui Zhu et.al. 2502.14260 null
2025-02-19 RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior Ching-Hua Lee et.al. 2502.13574 null
2025-02-18 AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks Ming Xie et.al. 2502.11158 null
2025-02-14 PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control Kunal Swami et.al. 2502.10258 null
2025-02-14 VideoDiff: Human-AI Video Co-Creation with Alternatives Mina Huh et.al. 2502.10190 null
2025-02-14 Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training Rodrigo Santos et.al. 2502.10064 null
2025-02-19 Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal Jinpei Guo et.al. 2502.09873 link
2025-02-13 Source function from two-particle correlation function through entropy-regularized Richardson-Lucy deblurring C. K. Tam et.al. 2502.09478 null
2025-02-14 SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment Tica Lin et.al. 2502.08621 null
2025-02-19 MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers Ao Li et.al. 2502.07856 null
2025-02-13 Visual-based spatial audio generation system for multi-speaker environments Xiaojing Liu et.al. 2502.07538 null
2025-02-11 Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems Ai Chen et.al. 2502.07351 link
2025-02-10 Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists Bojia Zi et.al. 2502.06734 null
2025-02-10 Predictive Red Teaming: Breaking Policies Without Breaking Robots Anirudha Majumdar et.al. 2502.06575 null
2025-02-10 UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis Zemin Yang et.al. 2502.06324 null
2025-02-09 A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement Muhammad Turab et.al. 2502.05995 null
2025-02-11 UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control Kaizhen Zhu et.al. 2502.05749 link
2025-02-08 AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection Shuheng Zhang et.al. 2502.05433 null
2025-02-07 Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems Jasper M. Everink et.al. 2502.05127 null
2025-02-07 Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition S Sreehari et.al. 2502.04680 null
2025-02-05 Lost in Edits? A $λ$ -Compass for AIGC Provenance Wenhao You et.al. 2502.04364 null
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299 null
2025-02-06 PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models Aleksandar Cvejic et.al. 2502.04050 null
2025-02-06 DICE: Distilling Classifier-Free Guidance into Text Embeddings Zhenyu Zhou et.al. 2502.03726 null
2025-02-05 All-in-One Image Compression and Restoration Huimin Zeng et.al. 2502.03649 link
2025-02-05 REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations Peter Sushko et.al. 2502.03629 null
2025-02-05 Efficient Image Restoration via Latent Consistency Flow Matching Elad Cohen et.al. 2502.03500 null
2025-02-04 Blind Visible Watermark Removal with Morphological Dilation Preston K. Robinette et.al. 2502.02676 null
2025-02-04 Exploring the latent space of diffusion models directly through singular value decomposition Li Wang et.al. 2502.02225 null
2025-02-04 EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues Rohit Girmaji et.al. 2502.02172 null
2025-02-03 Human Body Restoration with One-Step Diffusion Model and A New Benchmark Jue Gong et.al. 2502.01411 null
2025-02-04 Compressed Image Generation with Denoising Diffusion Codebook Models Guy Ohayon et.al. 2502.01189 null
2025-02-01 A framework for river connectivity classification using temporal image processing and attention based neural networks Timothy James Becker et.al. 2502.00474 null
2025-02-01 Shape from Semantics: 3D Shape Generation from Multi-View Semantics Liangchen Li et.al. 2502.00360 null
2025-01-30 DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models Ruofan Liang et.al. 2501.18590 null
2025-01-30 Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration Kyusu Ahn et.al. 2501.18517 null
2025-01-31 MatIR: A Hybrid Mamba-Transformer Image Restoration Model Juan Wen et.al. 2501.18401 link
2025-01-29 Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment Zixue Zeng et.al. 2501.17690 link
2025-01-28 Text-to-Image Generation for Vocabulary Learning Using the Keyword Method Nuwan T. Attygalle et.al. 2501.17099 null
2025-01-27 Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration Long Peng et.al. 2501.16583 null
2025-01-27 UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images Tatiana Taís Schein et.al. 2501.16211 link
2025-01-27 CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference Zhengyang Lu et.al. 2501.15852 link
2025-01-26 Universal Image Restoration Pre-training via Degradation Classification JiaKui Hu et.al. 2501.15510 link
2025-01-24 MATCHA:Towards Matching Anything Fei Xue et.al. 2501.14945 null
2025-01-24 Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders Zaheer Ahmad et.al. 2501.14709 null
2025-01-24 Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.* Ludovica Schaerf et.al. 2501.14524 null
2025-01-24 Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement Guoxi Huang et.al. 2501.14265 link
2025-01-24 CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image Xiaojun Tang et.al. 2501.14264 null
2025-01-23 INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration Di You et.al. 2501.14014 null
2025-01-23 IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Jiayi Lei et.al. 2501.13920 null
2025-01-23 Binary Diffusion Probabilistic Model Vitaliy Kinakh et.al. 2501.13915 null
2025-01-23 Where Do You Go? Pedestrian Trajectory Prediction using Scene Features Mohammad Ali Rezaei et.al. 2501.13848 null
2025-01-23 Training-Free Consistency Pipeline for Fashion Repose Potito Aghilar et.al. 2501.13692 null
2025-01-22 UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior I-Hsiang Chen et.al. 2501.13134 null
2025-01-22 Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects Louis Aberdeen et.al. 2501.13009 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration Ruicheng Zhang et.al. 2501.12832 link
2025-01-21 Slot-BERT: Self-supervised Object Discovery in Surgical Video Guiqiu Liao et.al. 2501.12477 null
2025-01-21 Regressor-Guided Image Editing Regulates Emotional Response to Reduce Online Engagement Christoph Gebhardt et.al. 2501.12289 null
2025-01-21 Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping Hongxu Yang et.al. 2501.12245 null
2025-01-21 DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains Junyu Xia et.al. 2501.12235 null
2025-01-21 Exploring Temporally-Aware Features for Point Tracking Inès Hyeonsu Kim et.al. 2501.12218 link
2025-01-21 Proxies for Distortion and Consistency with Applications for Real-World Image Restoration Sean Man et.al. 2501.12102 null
2025-01-20 SILO: Solving Inverse Problems with Latent Operators Ron Raphaeli et.al. 2501.11746 null
2025-01-20 PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents Kanika Goswami et.al. 2501.11233 null
2025-01-19 Counteracting temporal attacks in Video Copy Detection Katarzyna Fojcik et.al. 2501.11171 null
2025-01-17 DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration Huiyun Cao et.al. 2501.10325 null
2025-01-17 IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment Shangkun Sun et.al. 2501.09927 null
2025-01-16 PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery Shristi Das Biswas et.al. 2501.09826 link
2025-01-16 FLOL: Fast Baselines for Real-World Low-Light Enhancement Juan C. Benito et.al. 2501.09718 link
2025-01-16 Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression Yongheng Zhang et.al. 2501.09321 null
2025-01-16 Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images Yongheng Zhang et.al. 2501.09268 null
2025-01-14 AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring Sanjida Afrin Mou et.al. 2501.08266 link
2025-01-14 FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors Yabo Zhang et.al. 2501.08225 link
2025-01-13 SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing Varun Biyyala et.al. 2501.07554 link
2025-01-13 IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion Tharun Anand et.al. 2501.07530 null
2025-01-11 Natural Language Supervision for Low-light Image Enhancement Jiahui Tang et.al. 2501.06546 null
2025-01-11 Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning Maomao Li et.al. 2501.06438 null
2025-01-10 Underwater Image Enhancement using Generative Adversarial Networks: A Survey Kancharagunta Kishan Babu et.al. 2501.06273 null
2025-01-10 Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs Dabing Cheng et.al. 2501.05884 null
2025-01-09 Bit-depth color recovery via off-the-shelf super-resolution models Xuanshuo Fu et.al. 2501.05611 null
2025-01-09 HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction Shaurya Singh Rathore et.al. 2501.05195 null
2025-01-09 IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation Qi Chen et.al. 2501.04995 link
2025-01-08 Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration Laibin Chang et.al. 2501.04740 null
2025-01-08 EditAR: Unified Conditional Generation with Autoregressive Models Jiteng Mu et.al. 2501.04699 null
2025-01-08 Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion Yangfan He et.al. 2501.04606 link
2025-01-08 FrontierNet: Learning Visual Cues to Explore Boyang Sun et.al. 2501.04597 link
2025-01-08 MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration Zhi Jin et.al. 2501.04486 link
2025-01-08 Edit as You See: Image-guided Video Editing via Masked Motion Modeling Zhi-Lin Huang et.al. 2501.04325 null
2025-01-08 Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization Seitaro Ono et.al. 2501.04210 null
2025-01-07 Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications L. Berlyand et.al. 2501.04182 null
2025-01-07 Convergent Primal-Dual Plug-and-Play Image Restoration: A General Algorithm and Applications Yodai Suzuki et.al. 2501.03780 link
2025-01-07 Materialist: Physically Based Editing Using Single-Image Inverse Rendering Lezhong Wang et.al. 2501.03717 link
2025-01-07 Exploring Optimal Latent Trajetory for Zero-shot Image Editing Maomao Li et.al. 2501.03631 null
2025-01-07 Textualize Visual Prompt for Image Editing via Diffusion Bridge Pengcheng Xu et.al. 2501.03495 null
2025-01-06 ImageMM: Joint multi-frame image restoration and super-resolution Yashil Sukurdeep et.al. 2501.03002 null
2025-01-06 Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis Xiaojiao Guo et.al. 2501.02701 link
2025-01-03 JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing Qili Wang et.al. 2501.01798 link
2025-01-03 Conditional Consistency Guided Image Translation and Enhancement Amil Bhagat et.al. 2501.01223 link
2025-01-02 Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion Dong Zhang et.al. 2501.01114 null
2025-01-01 Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model Chenyang Liu et.al. 2501.00895 null
2024-12-31 SoundBrush: Sound as a Brush for Visual Scene Editing Kim Sung-Bin et.al. 2501.00645 null
2025-01-02 Edicho: Consistent Image Editing in the Wild Qingyan Bai et.al. 2412.21079 link
2024-12-30 Varformer: Adapting VAR’s Generative Prior for Image Restoration Siyang Wang et.al. 2412.21063 link
2024-12-30 Low-Light Image Enhancement via Generative Perceptual Priors Han Zhou et.al. 2412.20916 link
2024-12-29 Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) Tomer Garber et.al. 2412.20596 link
2024-12-28 Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems Wen-Dong Jiang et.al. 2412.20201 null
2024-12-28 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration Boyun Li et.al. 2412.20066 link
2024-12-28 MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion Zechao Zhan et.al. 2412.20062 null
2024-12-28 An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models Yuang Wang et.al. 2412.19992 null
2024-12-28 MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation Haoyu Zheng et.al. 2412.19978 null
2024-12-27 Generative Adversarial Network on Motion-Blur Image Restoration Zhengdong Li et.al. 2412.19479 null
2024-12-27 DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes Yiyuan Liang et.al. 2412.19458 link
2024-12-25 DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images Enbo Huang et.al. 2412.18797 null
2024-12-24 DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Minghong Cai et.al. 2412.18597 link
2024-12-24 Underwater Image Restoration via Polymorphic Large Kernel CNNs Xiaojiao Guo et.al. 2412.18459 link
2024-12-24 Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models Qice Qin et.al. 2412.18421 null
2024-12-24 UNet–: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections Lingxiao Yin et.al. 2412.18276 null
2024-12-24 SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos Zhen Zhang et.al. 2412.18214 link
2024-12-23 The Superposition of Diffusion Models Using the Itô Density Estimator Marta Skreta et.al. 2412.17762 null
2024-12-21 Optoelectronic generative adversarial networks Jumin Qiu et.al. 2412.16672 link
2024-12-21 Rethinking Model Redundancy for Low-light Image Enhancement Tong Li et.al. 2412.16459 null
2024-12-20 Mapping the Mind of an Instruction-based Image Editing using SMILE Zeinab Dehghani et.al. 2412.16277 link
2024-12-20 SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild Jannik Elsäßer et.al. 2412.16147 null
2024-12-20 NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images Yue Guo et.al. 2412.15890 null
2024-12-20 Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation Aiwen Jiang et.al. 2412.15845 link
2024-12-20 Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance Hyunsoo Lee et.al. 2412.15798 null
2024-12-19 Efficient Neural Network Encoding for 3D Color Lookup Tables Vahid Zehtab et.al. 2412.15438 link
2024-12-19 UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency Enis Simsar et.al. 2412.15216 null
2024-12-19 Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model Minglong Xue et.al. 2412.14630 link
2024-12-19 Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion Jixuan He et.al. 2412.14462 link
2024-12-18 Personalized Generative Low-light Image Denoising and Enhancement Xijun Wang et.al. 2412.14327 null
2024-12-18 Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing Le-Anh Tran et.al. 2412.14220 link
2024-12-18 Fed-AugMix: Balancing Privacy and Utility via Data Augmentation Haoyang Li et.al. 2412.13818 null
2024-12-18 Text2Relight: Creative Portrait Relighting with Text Guidance Junuk Cha et.al. 2412.13734 null
2024-12-18 VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement Chen Zhao et.al. 2412.13655 link
2024-12-18 DarkIR: Robust Low-Light Image Restoration Daniel Feijoo et.al. 2412.13443 link
2024-12-18 Zero-Shot Low Light Image Enhancement with Diffusion Prior Joshua Cho et.al. 2412.13401 link
2024-12-17 MotionBridge: Dynamic Video Inbetweening with Flexible Controls Maham Tanveer et.al. 2412.13190 null
2024-12-17 Prompt Augmentation for Self-supervised Text-guided Image Manipulation Rumeysa Bodur et.al. 2412.13081 null
2024-12-17 Unsupervised Region-Based Image Editing of Denoising Diffusion Models Zixiang Li et.al. 2412.12912 null
2024-12-17 MIVE: New Design and Benchmark for Multi-Instance Video Editing Samuel Teodoro et.al. 2412.12877 null
2024-12-17 Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration Xinlong Cheng et.al. 2412.12550 null
2024-12-17 Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy Aditya Ganeshan et.al. 2412.12463 null
2024-12-16 Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD) Ki-Hwan Oh et.al. 2412.12238 link
2024-12-16 Re-Attentional Controllable Video Diffusion Editing Yuanzhi Wang et.al. 2412.11710 link
2024-12-15 Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing Jiancheng Huang et.al. 2412.11152 null
2024-12-15 Towards Context-aware Convolutional Network for Image Restoration Fangwei Hao et.al. 2412.11008 null
2024-12-14 Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification Yucong Meng et.al. 2412.10776 null
2024-12-16 BrushEdit: All-In-One Image Inpainting and Editing Yaowei Li et.al. 2412.10316 null
2024-12-13 Learning Complex Non-Rigid Image Edits from Multimodal Conditioning Nikolai Warner et.al. 2412.10219 null
2024-12-16 Matrix Completion via Residual Spectral Matching Ziyuan Chen et.al. 2412.10005 null
2024-12-12 Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG Kavana Venkatesh et.al. 2412.09614 null
2024-12-12 FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers Yusuf Dalva et.al. 2412.09611 null
2024-12-12 Video Seal: Open and Efficient Video Watermarking Pierre Fernandez et.al. 2412.09492 link
2024-12-12 OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs Yuanzhi Zhu et.al. 2412.09465 link
2024-12-13 Are Conditional Latent Diffusion Models Effective for Image Restoration? Yunchen Yuan et.al. 2412.09324 null
2024-12-12 Text-Video Multi-Grained Integration for Video Moment Montage Zhihui Yin et.al. 2412.09276 null
2024-12-12 ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring Zhongbao Yang et.al. 2412.09193 null
2024-12-12 Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration Yunshuai Zhou et.al. 2412.08939 link
2024-12-11 Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm Marien Renaud et.al. 2412.08262 null
2024-12-10 Leveraging Content and Context Cues for Low-Light Image Enhancement Igor Morawski et.al. 2412.07693 link
2024-12-10 Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement Axel Martinez et.al. 2412.07659 null
2024-12-10 Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE).pdf Tu Vo et.al. 2412.07527 null
2024-12-10 Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring Yuzhi Zhao et.al. 2412.07256 link
2024-12-10 EchoIR: Advancing Image Restoration with Echo Upsampling and Bi-Level Optimization Yuhan He et.al. 2412.07225 null
2024-12-10 A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing Yujie Feng et.al. 2412.07195 null
2024-12-09 InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention Howard Zhang et.al. 2412.06753 null
2024-12-09 PrEditor3D: Fast and Precise 3D Shape Editing Ziya Erkoç et.al. 2412.06592 null
2024-12-09 MoViE: Mobile Diffusion for Video Editing Adil Karjauv et.al. 2412.06578 null
2024-12-09 EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume Deepthy Rose Jose et.al. 2412.06271 null
2024-12-08 GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis Ashish Goswami et.al. 2412.06089 null
2024-12-08 Enhanced 3D Generation by 2D Editing Haoran Li et.al. 2412.05929 null
2024-12-07 Enhancing Sample Generation of Diffusion Models using Noise Level Correction Abulikemu Abuduweili et.al. 2412.05488 null
2024-12-06 Equivariant Denoisers for Image Restoration Marien Renaud et.al. 2412.05343 null
2024-12-06 ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration Chi-Wei Hsiao et.al. 2412.05043 null
2024-12-09 Video Decomposition Prior: A Methodology to Decompose Videos into Layers Gaurav Shrivastava et.al. 2412.04930 null
2024-12-06 Addressing Attribute Leakages in Diffusion-based Image Editing without Training Sunung Mun et.al. 2412.04715 null
2024-12-05 Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise Brayan Monroy et.al. 2412.04648 link
2024-12-05 MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers Byeonghyeon Lee et.al. 2412.04591 null
2024-12-05 Action-based image editing guided by human instructions Maria Mihaela Trusca et.al. 2412.04558 null
2024-12-05 SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Trong-Tung Nguyen et.al. 2412.04301 null
2024-12-05 HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing Jinbin Bai et.al. 2412.04280 link
2024-12-05 Deep priors for satellite image restoration with accurate uncertainties Biquard Maud et.al. 2412.04130 null
2024-12-05 Blind Underwater Image Restoration using Co-Operational Regressor Networks Ozer Can Devecioglu et.al. 2412.03995 null
2024-12-05 INRetouch: Context Aware Implicit Neural Representation for Photography Retouching Omar Elezabi et.al. 2412.03848 null
2024-12-05 LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model Yuan Xue et.al. 2412.03841 null
2024-12-05 Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration Yuzhen Du et.al. 2412.03814 null
2024-12-05 EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM Quang Nguyen et.al. 2412.03809 null
2024-12-04 DIVE: Taming DINO for Subject-Driven Video Editing Yi Huang et.al. 2412.03347 null
2024-12-04 Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges Minghao Shao et.al. 2412.03220 null
2024-12-04 Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution Jiahua Xiao et.al. 2412.02960 null
2024-12-03 Motion Prompting: Controlling Video Generation with Motion Trajectories Daniel Geng et.al. 2412.02700 null
2024-12-03 MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis Tianyu Wang et.al. 2412.02635 null
2024-12-04 GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing Khawar Islam et.al. 2412.02366 null
2024-12-03 Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models Jungwon Park et.al. 2412.02237 link
2024-12-03 OmniCreator: Self-Supervised Unified Generation with Universal Editing Haodong Chen et.al. 2412.02114 null
2024-12-03 Relaxed and Inertial Nonlinear Forward-Backward with Momentum Fernando Roldán et.al. 2412.02045 link
2024-12-02 CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion Kai He et.al. 2412.01792 null
2024-12-02 OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking Xuanyu Zhang et.al. 2412.01615 null
2024-12-02 Learning Adaptive Lighting via Channel-Aware Guidance Qirui Yang et.al. 2412.01493 null
2024-12-02 Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond MD Raqib Khan et.al. 2412.01456 link
2024-11-29 Self-Supervised Denoiser Framework Emilien Valat et.al. 2411.19593 null
2024-11-28 Trajectory Attention for Fine-grained Video Motion Control Zeqi Xiao et.al. 2411.19324 null
2024-11-28 LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair Xue Song et.al. 2411.19156 null
2024-11-28 Descriptions of women are longer than that of men: An analysis of gender portrayal prompts in Stable Diffusion Yan Asadchy et.al. 2411.18994 null
2024-11-27 Hierarchical Information Flow for Generalized Efficient Image Restoration Yawei Li et.al. 2411.18588 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Adaptive Blind All-in-One Image Restoration David Serrano-Lozano et.al. 2411.18412 link
2024-11-29 HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning Zengxi Zhang et.al. 2411.18296 link
2024-11-27 TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution Linwei Dong et.al. 2411.18263 link
2024-11-27 Prediction with Action: Visual Policy Learning via Joint Denoising Process Yanjiang Guo et.al. 2411.18179 null
2024-11-26 Generative Image Layer Decomposition with Visual Effects Jinrui Yang et.al. 2411.17864 null
2024-11-26 Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation Sudarshan Rajagopalan et.al. 2411.17814 null
2024-11-26 GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration Sudarshan Rajagopalan et.al. 2411.17687 null
2024-11-26 VideoDirector: Precise Video Editing via Text-to-Video Models Yukun Wang et.al. 2411.17592 null
2024-11-26 Puzzle Similarity: A Perceptually-guided No-Reference Metric for Artifact Detection in 3D Scene Reconstructions Nicolai Hermann et.al. 2411.17489 null
2024-11-26 InsightEdit: Towards Better Instruction Following for Image Editing Yingjing Xu et.al. 2411.17323 null
2024-11-26 MLI-NeRF: Multi-Light Intrinsic-Aware Neural Radiance Fields Yixiong Yang et.al. 2411.17235 link
2024-11-26 MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers Ruoxi Zhu et.al. 2411.17226 link
2024-11-26 DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting Yicheng Yang et.al. 2411.17223 link
2024-11-25 Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing Hanhui Wang et.al. 2411.16832 link
2024-11-25 Pathways on the Image Manifold: Image Editing via Video Generation Noam Rotstein et.al. 2411.16819 null
2024-11-25 Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding Yubin Gu et.al. 2411.16217 null
2024-11-25 U2NeRF: Unsupervised Underwater Image Restoration and Neural Radiance Fields Vinayak Gupta et.al. 2411.16172 null
2024-11-24 PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation Chia-Ming Lee et.al. 2411.15922 link
2024-11-24 Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing Pengcheng Xu et.al. 2411.15843 null
2024-11-24 MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking Chunhui Zhang et.al. 2411.15761 link
2024-11-24 LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration Gaojing Zhang et.al. 2411.15740 null
2024-11-24 AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea Qifan Yu et.al. 2411.15738 null
2024-11-23 Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Chaehun Shin et.al. 2411.15466 null
2024-11-22 Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration Darshan Thaker et.al. 2411.15295 null
2024-11-22 HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads Yu Xu et.al. 2411.15034 null
2024-11-22 Benchmarking the Robustness of Optical Flow Estimation to Corruptions Zhonghua Yi et.al. 2411.14865 link
2024-11-22 AI Tailoring: Evaluating Influence of Image Features on Fashion Product Popularity Xiaomin Li et.al. 2411.14737 null
2024-11-22 TrojanEdit: Backdooring Text-Based Image Editing Models Ji Guo et.al. 2411.14681 null
2024-11-21 Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection Ali Awad et.al. 2411.14626 link
2024-11-21 Stable Flow: Vital Layers for Training-Free Image Editing Omri Avrahami et.al. 2411.14430 link
2024-11-21 Guided MRI Reconstruction via Schrödinger Bridge Yue Wang et.al. 2411.14269 null
2024-11-21 Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion Jinhong He et.al. 2411.13961 link
2024-11-21 GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter Aniruddha Bala et.al. 2411.13794 null
2024-11-20 Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms Matthieu Kowalski et.al. 2411.13276 null
2024-11-20 Open-World Amodal Appearance Completion Jiayang Ao et.al. 2411.13019 null
2024-11-19 Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution Yang Zou et.al. 2411.12530 link
2024-11-19 Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models Jun Xiao et.al. 2411.12450 null
2024-11-19 Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images Zheng Gong et.al. 2411.12278 null
2024-11-16 GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding Yue Zhou et.al. 2411.11904 link
2024-11-18 Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment Zhendong Liu et.al. 2411.11543 null
2024-11-17 Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method Yan Zheng et.al. 2411.11135 null
2024-11-17 StableV2V: Stablizing Shape Consistency in Video-to-Video Editing Chang Liu et.al. 2411.11045 link
2024-11-19 TSFormer: A Robust Framework for Efficient UHD Image Restoration Xin Su et.al. 2411.10951 null
2024-11-16 AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations Jiawei Mao et.al. 2411.10708 null
2024-11-16 Underwater Image Enhancement with Cascaded Contrastive Learning Yi Liu et.al. 2411.10682 link
2024-11-15 OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models Mathis Koroglu et.al. 2411.10501 null
2024-11-15 Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence Guodong Sun et.al. 2411.10321 null
2024-11-15 ColorEdit: Training-free Image-Guided Color editing with diffusion model Xingxi Yin et.al. 2411.10232 null
2024-11-14 MagicQuill: An Intelligent Interactive Image Editing System Zichen Liu et.al. 2411.09703 link
2024-11-13 A Survey on Vision Autoregressive Model Kai Jiang et.al. 2411.08666 null
2024-11-12 Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing Zitao Shuai et.al. 2411.08196 null
2024-11-12 CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising Linxuan Li et.al. 2411.07930 link
2024-11-12 Joint multi-dimensional dynamic attention and transformer for general image restoration Huan Zhang et.al. 2411.07893 link
2024-11-12 All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model Yuanbo Wen et.al. 2411.07445 null
2024-11-12 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Yoad Tewel et.al. 2411.07232 null
2024-11-11 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Cong Wei et.al. 2411.07199 null
2024-11-11 Multi-scale Frequency Enhancement Network for Blind Image Deblurring Yawen Xiang et.al. 2411.06893 null
2024-11-11 SeedEdit: Align Image Re-Generation to Image Editing Yichun Shi et.al. 2411.06686 null
2024-11-10 Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration Chen Wu et.al. 2411.06456 null
2024-11-08 A Modular Conditional Diffusion Framework for Image Reconstruction Magauiya Zhussip et.al. 2411.05993 null
2024-11-08 UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors Suhas Srinath et.al. 2411.05886 link
2024-11-07 A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model Panwen Hu et.al. 2411.04942 null
2024-11-07 Taming Rectified Flow for Inversion and Editing Jiangshan Wang et.al. 2411.04746 link
2024-11-06 Multi-Reward as Condition for Instruction-based Image Editing Xin Gu et.al. 2411.04713 null
2024-11-06 ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models Ashutosh Srivastava et.al. 2411.03982 null
2024-11-05 CrowdGenUI: Enhancing LLM-Based UI Widget Generation with a Crowdsourced Preference Library Yimeng Liu et.al. 2411.03477 null
2024-11-07 DiT4Edit: Diffusion Transformer for Image Editing Kunyu Feng et.al. 2411.03286 null
2024-11-05 Local Lesion Generation is Effective for Capsule Endoscopy Image Data Augmentation in a Limited Data Setting Adrian B. Chłopowiec et.al. 2411.03098 null
2024-11-05 ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing Yuka Ogino et.al. 2411.02799 null
2024-11-04 AutoVFX: Physically Realistic Video Editing from Natural Language Instructions Hao-Yu Hsu et.al. 2411.02394 null
2024-11-04 DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability Bo Gao et.al. 2411.01819 null
2024-11-03 Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration Xiaole Tang et.al. 2411.01656 link
2024-11-03 Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach Qihe Pan et.al. 2411.01545 link
2024-11-03 TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement Xuanzhao Dong et.al. 2411.01403 link
2024-11-02 Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization Sohrab Namazi Nia et.al. 2411.01373 null
2024-11-01 Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing Naufal Suryanto et.al. 2411.00425 link
2024-10-31 Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes Shaohua Liu et.al. 2411.00239 null
2024-10-31 Chasing Better Deep Image Priors between Over- and Under-parameterization Qiming Wu et.al. 2410.24187 link
2024-10-31 Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation Yihang Zhou et.al. 2410.23962 null
2024-10-31 Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data Yucun Hou et.al. 2410.23628 null
2024-10-31 MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction Ziqi Gao et.al. 2410.23577 link
2024-10-31 Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization Xiao Guo et.al. 2410.23556 null
2024-10-30 EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models Shangquan Sun et.al. 2410.22959 link
2024-10-30 Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement Sahil Ali Akbar et.al. 2410.21946 link
2024-10-25 ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting Takuma Nishimura et.al. 2410.21310 null
2024-10-27 Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement Junhao Tan et.al. 2410.20314 link
2024-10-27 Deep Learning, Machine Learning – Digital Signal and Image Processing: From Theory to Application Weiche Hsieh et.al. 2410.20304 null
2024-10-24 HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision Burak Ercan et.al. 2410.19164 null
2024-10-24 Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances Shilin Lu et.al. 2410.18775 link
2024-10-28 Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing Haonan Lin et.al. 2410.18756 null
2024-10-29 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Yuang Ai et.al. 2410.18666 link
2024-10-23 TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation Ruicheng Zhang et.al. 2410.17855 null
2024-10-23 DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection Qingpeng Li et.al. 2410.17822 link
2024-10-23 An Intelligent Agentic System for Complex Image Restoration Problems Kaiwen Zhu et.al. 2410.17809 link
2024-10-23 A variational approach to nonlocal image restoration flows Harsh Prasad et.al. 2410.17649 null
2024-10-23 Diffusion Priors for Variational Likelihood Estimation and Image Denoising Jun Cheng et.al. 2410.17521 link
2024-10-20 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-19 A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends Junjun Jiang et.al. 2410.15067 link
2024-10-19 Attack as Defense: Run-time Backdoor Implantation for Image Content Protection Haichuan Zhang et.al. 2410.14966 link
2024-10-18 ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing Jimin Dai et.al. 2410.14247 null
2024-10-17 MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks Xinxin Liu et.al. 2410.14089 null
2024-10-17 Movie Gen: A Cast of Media Foundation Models Adam Polyak et.al. 2410.13720 link
2024-10-17 Generative Location Modeling for Spatially Aware Object Insertion Jooyeol Yun et.al. 2410.13564 null
2024-10-16 AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing DuoSheng Chen et.al. 2410.12696 link
2024-10-16 Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing Mingce Guo et.al. 2410.12526 null
2024-10-16 Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks Pranjali Pathre et.al. 2410.12432 link
2024-10-16 Towards Flexible and Efficient Diffusion Low Light Enhancer Guanzhou Lan et.al. 2410.12346 null
2024-10-16 Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond Pengwei Liang et.al. 2410.12274 null
2024-10-15 Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos Zhouxia Wang et.al. 2410.11828 null
2024-10-15 SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing Zhiyuan Zhang et.al. 2410.11815 null
2024-10-15 RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation Anton Antonov et.al. 2410.11722 link
2024-10-15 Augmentation-Driven Metric for Balancing Preservation and Modification in Text-Guided Image Editing Yoonjeon Kim et.al. 2410.11374 null
2024-10-14 Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits Xuhui Kang et.al. 2410.11013 null
2024-10-14 Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Litu Rout et.al. 2410.10792 null
2024-10-14 Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing Kejie Wang et.al. 2410.10496 link
2024-10-13 TextMaster: Universal Controllable Text Edit Aoqiang Wang et.al. 2410.09879 null
2024-10-13 LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond Md Tanvir Islam et.al. 2410.09831 link
2024-10-14 LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection Mingjia Li et.al. 2410.08810 link
2024-10-11 Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers Jin Cao et.al. 2410.08688 link
2024-10-11 Natural Language Induced Adversarial Images Xiaopei Zhu et.al. 2410.08620 link
2024-10-10 TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration Hsing-Hua Wang et.al. 2410.08177 link
2024-10-10 RNA: Video Editing with ROI-based Neural Atlas Jaekyeong Lee et.al. 2410.07600 null
2024-10-09 BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models Fangyikang Wang et.al. 2410.07273 null
2024-10-09 InstantIR: Blind Image Restoration with Instant Generative Reference Jen-Yuan Huang et.al. 2410.06551 null
2024-10-08 PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM Stefan Stefanache et.al. 2410.05710 link
2024-10-08 DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing June Suk Choi et.al. 2410.05694 link
2024-10-08 ReFIR: Grounding Large Restoration Models with Retrieval Augmentation Hang Guo et.al. 2410.05601 link
2024-10-07 GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting Yukang Cao et.al. 2410.05259 null
2024-10-07 PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing Feng Tian et.al. 2410.04844 link
2024-10-07 Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration Zhiyu Zhu et.al. 2410.04811 link
2024-10-06 Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli Valentyn Piskovskyi et.al. 2410.04497 null
2024-10-06 SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems Ismail Alkhouri et.al. 2410.04479 link
2024-10-08 IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis Shitong Shao et.al. 2410.04171 link
2024-10-05 Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model Keda Tao et.al. 2410.04161 null
2024-10-04 Diffusion State-Guided Projected Gradient for Inverse Problems Rayhan Zirvi et.al. 2410.03463 link
2024-10-04 Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing Ziqi Jiang et.al. 2410.03097 null
2024-10-03 PnP-Flow: Plug-and-Play Image Restoration with Flow Matching Ségolène Martin et.al. 2410.02423 link
2024-10-03 Can Capacitive Touch Images Enhance Mobile Keyboard Decoding? Piyawat Lertvittayakumjorn et.al. 2410.02264 link
2024-10-02 Posterior sampling via Langevin dynamics based on generative priors Vishal Purohit et.al. 2410.02078 null
2024-10-02 Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust Asher J. Hancock et.al. 2410.01971 null
2024-10-02 MiraGe: Editable 2D Images using Gaussian Splatting Joanna Waczyńska et.al. 2410.01521 link
2024-10-01 Three-Operator Splitting Method with Two-Step Inertial Extrapolation Olaniyi S. Iyiola et.al. 2410.01099 null
2024-10-01 GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer Youngho Yoon et.al. 2410.00672 link
2024-10-01 Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation Yunnan Wang et.al. 2410.00447 null
2024-10-01 Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Guy Ohayon et.al. 2410.00418 link
2024-10-01 GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction Zaid Ilyas et.al. 2410.00380 null
2024-09-30 A Survey on Diffusion Models for Inverse Problems Giannis Daras et.al. 2410.00083 null
2024-09-30 FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing Lingling Cai et.al. 2409.20500 null
2024-09-30 UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation Cheng Zhang et.al. 2409.20197 link
2024-09-29 Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation Xiaofeng Cong et.al. 2409.19685 link
2024-09-28 Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration Chu-Jie Qin et.al. 2409.19403 link
2024-09-28 PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution Song Zhang et.al. 2409.19269 link
2024-09-27 Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors Yunlong Lin et.al. 2409.18899 null
2024-09-27 Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models Nguyen Gia Bach et.al. 2409.18476 link
2024-09-27 SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement Yunkui Pang et.al. 2409.18355 link
2024-09-26 Toward Efficient Deep Blind RAW Image Restoration Marcos V. Conde et.al. 2409.18204 link
2024-09-26 FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner Wenliang Zhao et.al. 2409.18128 link
2024-09-26 FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction Runze He et.al. 2409.18071 null
2024-09-26 Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs Qinpeng Cui et.al. 2409.17778 link
2024-09-26 MIO: A Foundation Model on Multimodal Tokens Zekun Wang et.al. 2409.17692 link
2024-09-26 Learning Quantized Adaptive Conditions for Diffusion Models Yuchen Liang et.al. 2409.17487 null
2024-09-25 Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement Yihao Zhou et.al. 2409.16661 null
2024-09-25 Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement Guanlin Li et.al. 2409.16604 link
2024-09-24 Proactive Schemes: A Survey of Adversarial Attacks for Social Good Vishal Asnani et.al. 2409.16491 null
2024-09-24 Liger at W.M. Keck Observatory: imager structural analysis, fabrication, and characterization plan James Wiley et.al. 2409.16263 null
2024-09-23 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Weifeng Lin et.al. 2409.15278 link
2024-09-23 LoVA: Long-form Video-to-Audio Generation Xin Cheng et.al. 2409.15157 null
2024-09-23 Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP Zeliang Zhang et.al. 2409.15035 null
2024-09-23 ControlEdit: A MultiModal Local Clothing Image Editing Method Di Cheng et.al. 2409.14720 link
2024-09-23 Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections Ankit Dhiman et.al. 2409.14677 link
2024-09-22 Low-Light Enhancement Effect on Classification and Detection: An Empirical Study Xu Wu et.al. 2409.14461 null
2024-09-22 Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement Cameron Khanpour et.al. 2409.14334 null
2024-09-20 Colorful Diffuse Intrinsic Image Decomposition in the Wild Chris Careaga et.al. 2409.13690 link
2024-09-20 A Bottom-Up Approach to Class-Agnostic Image Segmentation Sebastian Dille et.al. 2409.13687 null
2024-09-18 Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution Peng Wang et.al. 2409.12191 link
2024-09-18 Denoising diffusion models for high-resolution microscopy image restoration Pamela Osuna-Vargas et.al. 2409.12078 null
2024-09-18 InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models Yan Zheng et.al. 2409.11734 null
2024-09-17 Ultrasound Image Enhancement with the Variance of Diffusion Models Yuxin Zhang et.al. 2409.11380 link
2024-09-17 OmniGen: Unified Image Generation Shitao Xiao et.al. 2409.11340 link
2024-09-17 MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance Debin Meng et.al. 2409.11010 link
2024-09-17 CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement Xuanzhao Dong et.al. 2409.10966 link
2024-09-16 SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing Qi Qian et.al. 2409.10476 null
2024-09-16 Taming Diffusion Models for Image Restoration: A Review Ziwei Luo et.al. 2409.10353 null
2024-09-15 Underwater Image Enhancement via Dehazing and Color Restoration Chengqin Wu et.al. 2409.09779 null
2024-09-15 EditBoard: Towards A Comprehensive Evaluation Benchmark for Text-based Video Editing Models Yupeng Chen et.al. 2409.09668 link
2024-09-15 TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer Zihan Su et.al. 2409.09610 link
2024-09-13 InstantDrag: Improving Interactivity in Drag-based Image Editing Joonghyuk Shin et.al. 2409.08857 null
2024-09-13 Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori Jinhong He et.al. 2409.08585 null
2024-09-12 Click2Mask: Local Editing with Dynamic Mask Generation Omer Regev et.al. 2409.08272 link
2024-09-12 Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement Vamsi Krishna Vasa et.al. 2409.07862 null
2024-09-12 Quaternion Nuclear Norm minus Frobenius Norm Minimization for color image reconstruction Yu Guo et.al. 2409.07797 null
2024-09-11 FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process Yang Luo et.al. 2409.07451 null
2024-09-11 Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement Xianmin Chen et.al. 2409.07040 link
2024-09-11 PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening RuoCheng Wu et.al. 2409.06980 null
2024-09-10 Face Mask Removal with Region-attentive Face Inpainting Minmin Yang et.al. 2409.06845 link
2024-09-10 Modeling Image Tone Dichotomy with the Power Function Axel Martinez et.al. 2409.06764 null
2024-09-10 GeoCalib: Learning Single-image Calibration with Geometric Optimization Alexander Veicht et.al. 2409.06704 link
2024-09-10 Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer Li Ke et.al. 2409.06590 null
2024-09-09 NeIn: Telling What You Don’t Want Nhat-Tan Bui et.al. 2409.06481 null
2024-09-10 Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models Siyu Zhai et.al. 2409.06420 null
2024-09-10 Multi-Weather Image Restoration via Histogram-Based Transformer Feature Enhancement Yang Wen et.al. 2409.06334 null
2024-09-10 AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration Hongyi Cai et.al. 2409.06206 null
2024-09-09 MemoVis: A GenAI-Powered Tool for Creating Companion Reference Images for 3D Design Feedback Chen Chen et.al. 2409.06082 null
2024-09-09 Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement Shyang-En Weng et.al. 2409.05274 link
2024-09-07 Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation Jiaxin Cheng et.al. 2409.04847 link
2024-09-07 Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines Sai Yang et.al. 2409.04812 link
2024-09-06 Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior Charlesquin Kemajou Mbakam et.al. 2409.04384 null
2024-09-06 RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement Hao Luo et.al. 2409.04363 link
2024-09-05 Blended Latent Diffusion under Attention Control for Real-World Video Editing Deyin Liu et.al. 2409.03514 null
2024-09-05 Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration Pei Wang et.al. 2409.03455 null
2024-09-05 KAN See In the Dark Aoxiang Ning et.al. 2409.03404 link
2024-09-05 Multiple weather images restoration using the task transformer and adaptive mixup strategy Yang Wen et.al. 2409.03249 null
2024-09-05 Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem Qiwen Zhu et.al. 2409.03179 link
2024-09-04 Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering Peng Wang et.al. 2409.02426 link
2024-09-04 Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing Siyi Chen et.al. 2409.02374 link
2024-09-03 Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning Xiaowei Hu et.al. 2409.02108 link
2024-09-03 Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models Jiaqi Xu et.al. 2409.02101 link
2024-09-03 F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring Subhajit Paul et.al. 2409.02056 null
2024-09-03 AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions Chenghao Qian et.al. 2409.02045 link
2024-09-03 Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement Kun Zhou et.al. 2409.01641 link
2024-09-03 GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting Zixuan Guo et.al. 2409.01581 null
2024-09-02 Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets Ishan Rajendrakumar Dave et.al. 2409.01445 null
2024-09-02 Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Vadim Titov et.al. 2409.01322 link
2024-08-30 Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method Yuji Lin et.al. 2408.17339 link
2024-08-30 Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL Haiyang Zhao et.al. 2408.17060 null
2024-08-29 GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content Lebin Zhou et.al. 2408.16866 null
2024-09-02 A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising Shuaiyu Yuan et.al. 2408.16481 null
2024-08-29 What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer Chaeyeon Chung et.al. 2408.16450 link
2024-08-29 Enhanced Control for Diffusion Bridge in Image Restoration Conghan Yue et.al. 2408.16303 link
2024-08-29 EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More Kanghao Chen et.al. 2408.16254 null
2024-08-29 LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement Ye Yu et.al. 2408.16235 link
2024-08-28 Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration Xu Zhang et.al. 2408.15994 null
2024-08-27 A Preliminary Exploration Towards General Image Restoration Xiangtao Kong et.al. 2408.15143 null
2024-08-27 Towards Real-world Event-guided Low-light Video Enhancement and Deblurring Taewoo Kim et.al. 2408.14916 link
2024-08-26 GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy Peiyan Li et.al. 2408.14368 link
2024-08-26 I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing Yiwei Ma et.al. 2408.14180 link
2024-08-26 Image Provenance Analysis via Graph Encoding with Vision Transformer Keyang Zhang et.al. 2408.14170 null
2024-08-27 Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing Yitong Yang et.al. 2408.13623 null
2024-08-24 CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track Jinming Chai et.al. 2408.13582 null
2024-08-23 Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing Zitao Shuai et.al. 2408.13335 null
2024-08-23 O-Mamba: O-shape State-Space Model for Underwater Image Enhancement Chenyu Dong et.al. 2408.12816 link
2024-08-22 FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing Jue Wang et.al. 2408.12429 link
2024-08-22 CODE: Confident Ordinary Differential Editing Bastien van Delft et.al. 2408.12418 link
2024-08-22 Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement Lingyu Zhu et.al. 2408.12316 link
2024-08-21 Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models Chun-Yen Shih et.al. 2408.11810 null
2024-08-23 AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion Yunfang Niu et.al. 2408.11553 link
2024-08-21 E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment Shangkun Sun et.al. 2408.11481 link
2024-08-21 OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal Qiao Mo et.al. 2408.11480 link
2024-08-21 Taming Generative Diffusion for Universal Blind Image Restoration Siwei Tu et.al. 2408.11287 null
2024-08-20 Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement Satoshi Kosugi et.al. 2408.11055 link
2024-08-20 Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos Dennis Fedorishin et.al. 2408.10998 null
2024-08-20 SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement Linlin Hu et.al. 2408.10934 null
2024-08-20 A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse Zhongliang Guo et.al. 2408.10901 null
2024-08-20 UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement Yingtie Lei et.al. 2408.10653 link
2024-08-19 Multi-Scale Representation Learning for Image Restoration with State-Space Model Yuhong He et.al. 2408.10145 null
2024-08-19 ARMADA: Attribute-Based Multimodal Data Augmentation Xiaomeng Jin et.al. 2408.10086 null
2024-08-19 Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration Alik Pramanick et.al. 2408.09912 link
2024-08-19 ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement Eashan Adhikarla et.al. 2408.09650 link
2024-08-17 Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration Xin Lin et.al. 2408.09241 link
2024-08-16 Language-Driven Interactive Shadow Detection Hongqiu Wang et.al. 2408.08543 link
2024-08-16 Achieving Complex Image Edits via Function Aggregation with Diffusion Models Mohammadreza Samadi et.al. 2408.08495 null
2024-08-16 DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer’s Case Studies Mohammad Hossein Najafi et.al. 2408.08489 null
2024-08-14 TurboEdit: Instant text-based image editing Zongze Wu et.al. 2408.08332 null
2024-08-15 Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks Jiawei Wu et.al. 2408.08149 link
2024-08-15 HAIR: Hypernetworks-based All-in-One Image Restoration Jin Cao et.al. 2408.08091 link
2024-08-14 DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency Xiaojing Zhong et.al. 2408.07481 null
2024-08-14 GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval Zechen Bai et.al. 2408.07249 null
2024-08-13 Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method Xin Su et.al. 2408.06709 null
2024-08-13 EditScribe: Non-Visual Image Editing with Natural Language Verification Loops Ruei-Che Chang et.al. 2408.06632 null
2024-08-12 Wavelet based inpainting detection Barglazan Adrian-Alin et.al. 2408.06429 null
2024-08-12 Latent Disentanglement for Low Light Image Enhancement Zhihao Zheng et.al. 2408.06245 null
2024-08-10 Greedy randomized block Kaczmarz method for matrix equation AXB=C and its applications in color image restoration Wenli Wang et.al. 2408.05444 null
2024-08-09 Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Qirui Jiao et.al. 2408.04594 link
2024-08-08 Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration Ziran Zhang et.al. 2408.04227 null
2024-08-08 MultiColor: Image Colorization by Learning from Multiple Color Spaces Xiangcheng Du et.al. 2408.04172 null
2024-08-06 FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning Zhi Chen et.al. 2408.03355 null
2024-08-05 Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models Tongtong Feng et.al. 2408.02408 null
2024-08-05 Dense Feature Interaction Network for Image Inpainting Localization Ye Yao et.al. 2408.02191 null
2024-08-03 SAT3D: Image-driven Semantic Attribute Transfer in 3D Zhijun Zhai et.al. 2408.01664 null
2024-08-02 Multi-task SAR Image Processing via GAN-based Unsupervised Manipulation Xuran Hu et.al. 2408.01553 null
2024-08-02 Underwater Object Detection Enhancement via Channel Stabilization Muhammad Ali et.al. 2408.01293 link
2024-08-02 Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement Wenbin Zou et.al. 2408.01276 link
2024-08-02 Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration Donwon Park et.al. 2408.01099 null
2024-08-01 TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models Gilad Deutch et.al. 2408.00735 null
2024-08-01 A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition Qi Xiong et.al. 2408.00210 null
2024-07-31 Hyper-parameter tuning for text guided image editing Shiwen Zhang et.al. 2407.21703 link
2024-07-31 Fine-gained Zero-shot Video Sampling Dengsheng Chen et.al. 2407.21475 null
2024-07-31 Generalized Tampered Scene Text Detection in the era of Generative AI Chenfan Qu et.al. 2407.21422 link
2024-07-30 UniProcessor: A Text-induced Unified Low-level Image Processor Huiyu Duan et.al. 2407.20928 link
2024-07-27 Inverse Problems with Diffusion Models: A MAP Estimation Perspective Sai bharath chandra Gutha et.al. 2407.20784 link
2024-07-29 Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing Ekaterina Iakovleva et.al. 2407.20232 null
2024-07-29 FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Yu Lu et.al. 2407.19918 null
2024-07-29 ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement Ezequiel Perez-Zarate et.al. 2407.19708 link
2024-07-31 Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint Song Zhang et.al. 2407.19248 null
2024-07-27 Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration Xiaoyan Yu et.al. 2407.19139 link
2024-07-26 Floating No More: Object-Ground Reconstruction from a Single Image Yunze Man et.al. 2407.18914 null
2024-07-26 PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis Sohyeong Kim et.al. 2407.18695 null
2024-07-26 Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner Pengxiang Cai et.al. 2407.18656 null
2024-07-26 LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification Shuhan Cui et.al. 2407.18614 null
2024-07-26 Dilated Strip Attention Network for Image Restoration Fangwei Hao et.al. 2407.18613 null
2024-07-25 RegionDrag: Fast Region-Based Image Editing with Diffusion Models Jingyi Lu et.al. 2407.18247 null
2024-07-25 RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models Haoyu Chen et.al. 2407.18035 null
2024-07-25 Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography Kailai Zhou et.al. 2407.17996 link
2024-07-25 FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing Gwanhyeong Koo et.al. 2407.17850 link
2024-07-25 Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing Pengfei Jiang et.al. 2407.17847 link
2024-07-25 DragText: Rethinking Text Embedding in Point-based Image Editing Gayoon Choi et.al. 2407.17843 link
2024-07-23 S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks Neha A S et.al. 2407.17587 null
2024-07-23 PyBench: Evaluating LLM Agent on various real-world coding tasks Yaolun Zhang et.al. 2407.16732 link
2024-07-23 DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors Zizheng Yan et.al. 2407.16260 null
2024-07-23 CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction Liang Zhao et.al. 2407.16204 null
2024-07-23 Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems Sojin Lee et.al. 2407.16125 link
2024-07-21 MedEdit: Counterfactual Diffusion-based Image Editing on Brain MRI Malek Ben Alaya et.al. 2407.15270 null
2024-07-21 Assessing Sample Quality via the Latent Space of Generative Models Jingyi Xu et.al. 2407.15171 link
2024-07-20 Deep Learning CT Image Restoration using System Blur and Noise Models Yijie Yuan et.al. 2407.14983 null
2024-07-23 AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement Yunlong Lin et.al. 2407.14900 null
2024-07-20 Dual High-Order Total Variation Model for Underwater Image Restoration Yuemei Li et.al. 2407.14868 link
2024-07-20 Text-based Talking Video Editing with Cascaded Conditional Diffusion Bo Han et.al. 2407.14841 null
2024-07-19 Adaptive Frequency Enhancement Network for Single Image Deraining Fei Yan et.al. 2407.14292 link
2024-07-18 BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models Moon Ye-Bin et.al. 2407.13442 null
2024-07-18 Any Image Restoration with Efficient Automatic Degradation Adaptation Bin Ren et.al. 2407.13372 link
2024-07-18 Multi-sentence Video Grounding for Long Video Generation Wei Feng et.al. 2407.13219 null
2024-07-19 Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking Zhiyuan Ma et.al. 2407.13188 null
2024-07-18 Training-Free Large Model Priors for Multiple-in-One Image Restoration Xuanhua He et.al. 2407.13181 null
2024-07-18 Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement Eashan Adhikarla et.al. 2407.13170 null
2024-07-18 Image Inpainting Models are Effective Tools for Instruction-guided Image Editing Xuan Ju et.al. 2407.13139 null
2024-07-21 HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration Shuchang Zhang et.al. 2407.13120 link
2024-07-17 Zero-shot Text-guided Infinite Image Synthesis with LLM guidance Soyeong Kwon et.al. 2407.12642 null
2024-07-17 Rethinking the Architecture Design for Efficient Generic Event Boundary Detection Ziwei Zheng et.al. 2407.12622 link
2024-07-17 Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations Tomáš Chobola et.al. 2407.12511 link
2024-07-17 GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval Han Zhou et.al. 2407.12431 link
2024-07-17 Sphere Window: Challenges and Opportunities of 360° Video in Collaborative Design Workshops Wo Meijer et.al. 2407.12407 null
2024-07-17 GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity Shuo Cao et.al. 2407.12273 null
2024-07-16 Efficient Training with Denoised Neural Weights Yifan Gong et.al. 2407.11966 null
2024-07-16 TGIF: Text-Guided Inpainting Forgery Dataset Hannes Mareen et.al. 2407.11566 link
2024-07-16 Haze-Aware Attention Network for Single-Image Dehazing Lihan Tong et.al. 2407.11505 null
2024-07-14 Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV Zhiwen Yang et.al. 2407.11087 link
2024-07-15 InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models Nirat Saini et.al. 2407.10958 null
2024-07-15 In-Loop Filtering via Trained Look-Up Tables Zhuoyuan Li et.al. 2407.10926 null
2024-07-15 MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren et.al. 2407.10833 null
2024-07-15 Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval Youngsun Lim et.al. 2407.10683 null
2024-07-14 Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models Qinyu Yang et.al. 2407.10285 link
2024-07-14 Restoring Images in Adverse Weather Conditions via Histogram Transformer Shangquan Sun et.al. 2407.10172 link
2024-07-13 NamedCurves: Learned Image Enhancement via Color Naming David Serrano-Lozano et.al. 2407.09892 link
2024-07-12 Region Attention Transformer for Medical Image Restoration Zhiwen Yang et.al. 2407.09268 link
2024-07-12 Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration Hu Gao et.al. 2407.08950 link
2024-07-12 LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models Hai Jiang et.al. 2407.08939 link
2024-07-11 Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey Laniqng Guo et.al. 2407.08865 link
2024-07-11 Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration Shuang Xu et.al. 2407.08509 null
2024-07-11 ERD: Exponential Retinex decomposition based on weak space and hybrid nonconvex regularization and its denoising application Wenjing Lu et.al. 2407.08498 null
2024-07-12 Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending Delong Wu et.al. 2407.08457 null
2024-07-10 Generative Image as Action Models Mohit Shridhar et.al. 2407.07875 link
2024-07-10 Aging-Resistant Wideband Precoding in 5G and Beyond Using 3D Convolutional Neural Networks Alejandro Villena-Rodriguez et.al. 2407.07434 null
2024-07-10 CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement Wei Wang et.al. 2407.07056 null
2024-07-10 Asymmetric Mask Scheme for Self-Supervised Real Image Denoising Xiangyu Liao et.al. 2407.06514 link
2024-07-08 Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN Jiacheng Su et.al. 2407.05577 null
2024-07-07 Image-Conditional Diffusion Transformer for Underwater Image Enhancement Xingyang Nie et.al. 2407.05389 null
2024-07-07 UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Haozhe Zhao et.al. 2407.05282 link
2024-07-07 Multi-scale Conditional Generative Modeling for Microscopic Image Restoration Luzhe Huang et.al. 2407.05259 null
2024-07-06 Robust Skin Color Driven Privacy Preserving Face Recognition via Function Secret Sharing Dong Han et.al. 2407.05045 null
2024-07-05 On a nonlinear nonlocal reaction-diffusion system applied to image restoration Yuhang Li et.al. 2407.04347 null
2024-07-05 A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation Dazhao Du et.al. 2407.04230 link
2024-07-04 DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts Zheng-Peng Duan et.al. 2407.03757 null
2024-07-04 Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration Yuhong Zhang et.al. 2407.03636 null
2024-07-04 MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration Yuhong Zhang et.al. 2407.03635 null
2024-07-03 BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement Ruirui Lin et.al. 2407.03535 null
2024-07-03 Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Benno Krojer et.al. 2407.03471 link
2024-07-02 Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model Cong Cao et.al. 2407.01960 null
2024-06-30 Learning Frequency-Aware Dynamic Transformers for All-In-One Image Restoration Zenglin Shi et.al. 2407.01636 null
2024-07-01 Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing Bingliang Zhang et.al. 2407.01521 link
2024-07-01 DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models Chang-Han Yeh et.al. 2407.01519 link
2024-07-01 Unrolling Plug-and-Play Gradient Graph Laplacian Regularizer for Image Restoration Jianghe Cai et.al. 2407.01469 null
2024-07-01 Blind Inversion using Latent Diffusion Priors Weimin Bai et.al. 2407.01027 null
2024-06-30 Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation Yuchuan Tian et.al. 2407.00676 link
2024-06-28 Transformer-based Image and Video Inpainting: Current Challenges and Future Directions Omar Elharrouss et.al. 2407.00226 null
2024-06-28 Network Bending of Diffusion Models for Audio-Visual Generation Luke Dzwonczyk et.al. 2406.19589 link
2024-06-27 BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection Yang Song et.al. 2406.19048 null
2024-06-27 Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model Jiangtong Tan et.al. 2406.19030 link
2024-06-26 IDA-UIE: An Iterative Framework for Deep Network-based Degradation Aware Underwater Image Enhancement Pranjali Singh et.al. 2406.18628 null
2024-06-26 Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration Kang Liao et.al. 2406.18516 link
2024-06-26 ConStyle v2: A Strong Prompter for All-in-One Image Restoration Dongqi Fan et.al. 2406.18242 link
2024-06-26 MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal Yiguo Jiang et.al. 2406.18079 link
2024-06-25 LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing Aoyang Liu et.al. 2406.17236 null
2024-06-24 GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization Yirui Chen et.al. 2406.16531 link
2024-06-24 DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution Aiwen Jiang et.al. 2406.16477 link
2024-06-22 Quality-guided Skin Tone Enhancement for Portrait Photography Shiqi Gao et.al. 2406.15848 null
2024-06-22 MVOC: a training-free multiple video object composition method with diffusion models Wei Wang et.al. 2406.15829 link
2024-06-21 LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement Haodong Yang et.al. 2406.14973 null
2024-06-20 A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models Xincheng Shuai et.al. 2406.14555 link
2024-06-26 Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps Nikita Starodubcev et.al. 2406.14539 null
2024-06-20 V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data Rotem Shalev-Arkushin et.al. 2406.14510 null
2024-06-19 EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy Long Bai et.al. 2406.13705 link
2024-06-22 Ultra-High-Definition Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution Liyan Wang et.al. 2406.13607 link
2024-06-19 WaterMono: Teacher-Guided Anomaly Masking and Enhancement Boosting for Robust Underwater Self-Supervised Monocular Depth Estimation Yilin Ding et.al. 2406.13344 link
2024-06-19 ECAFormer: Low-light Image Enhancement using Cross Attention Yudi Ruan et.al. 2406.13281 link
2024-06-19 Diffusion Model-based FOD Restoration from High Distortion in dMRI Shuo Huang et.al. 2406.13209 null
2024-06-18 VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing Jing Gu et.al. 2406.12831 null
2024-06-18 Restorer: Solving Multiple Image Restoration Tasks with One Set of Parameters Jiawei Mao et.al. 2406.12587 link
2024-06-17 Generative Visual Instruction Tuning Jefferson Hernandez et.al. 2406.11262 link
2024-06-16 Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields Yixiong Yang et.al. 2406.11077 null
2024-06-16 Enhancing Supermarket Robot Interaction: A Multi-Level LLM Conversational Interface for Handling Diverse Customer Intents Chandran Nandkumar et.al. 2406.11047 null
2024-06-15 Fast Unsupervised Tensor Restoration via Low-rank Deconvolution David Reixach et.al. 2406.10679 null
2024-06-15 The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Denis Bobkov et.al. 2406.10601 link
2024-06-14 VideoGUI: A Benchmark for GUI Automation from Instructional Videos Kevin Qinghong Lin et.al. 2406.10227 null
2024-06-14 InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning Tiancheng Li et.al. 2406.09973 null
2024-06-14 RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement Jingcheng Li et.al. 2406.09656 null
2024-06-13 DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer Wei-Ting Chen et.al. 2406.09622 null
2024-06-13 Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion Linzhan Mou et.al. 2406.09402 null
2024-06-13 Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior Baiang Li et.al. 2406.09389 link
2024-06-13 CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models Yigit Ekin et.al. 2406.09368 link
2024-06-13 Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Yufan Zhou et.al. 2406.09305 null
2024-06-13 Preserving Identity with Variational Score for General-purpose 3D Editing Duong H. Le et.al. 2406.08953 null
2024-06-13 Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation Jingyuan Xia et.al. 2406.08896 link
2024-06-13 COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing Jiangshan Wang et.al. 2406.08850 link
2024-06-12 LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach Maria Pilligua et.al. 2406.08610 link
2024-06-12 PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement Wei-Tung Lin et.al. 2406.08444 link
2024-06-12 DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor Juncheng Wu et.al. 2406.08377 link
2024-06-12 2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation Zhensong Xu et.al. 2406.08192 null
2024-06-12 One-Step Effective Diffusion Network for Real-World Image Super-Resolution Rongyuan Wu et.al. 2406.08177 link
2024-06-12 From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization Ziran Zhang et.al. 2406.08090 null
2024-06-12 CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models Hyungjin Chung et.al. 2406.08070 null
2024-06-12 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement Sasidhar Alavala et.al. 2406.08048 null
2024-06-12 DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera Senyan Xu et.al. 2406.07951 link
2024-06-11 HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness Zihui Xue et.al. 2406.07754 null
2024-06-11 Zero-shot Image Editing with Reference Imitation Xi Chen et.al. 2406.07547 null
2024-06-11 Beware of Aliases – Signal Preservation is Crucial for Robust Image Restoration Shashank Agnihotri et.al. 2406.07435 null
2024-06-11 Missingness-resilient Video-enhanced Multimodal Disfluency Detection Payal Mohapatra et.al. 2406.06964 link
2024-06-11 Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems Jiawei Zhang et.al. 2406.06959 link
2024-06-10 NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Ting-Hsuan Chen et.al. 2406.06523 link
2024-06-10 FRAG: Frequency Adapting Group for Diffusion Video Editing Sunjae Yoon et.al. 2406.06044 link
2024-06-09 PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction Shangyu Chen et.al. 2406.05641 null
2024-06-08 Training-Free Robust Interactive Video Object Segmentation Xiaoli Wei et.al. 2406.05485 null
2024-06-07 Optimal Eye Surgeon: Finding Image Priors through Sparse Generators at Initialization Avrajit Ghosh et.al. 2406.05288 link
2024-06-07 Research on Tumors Segmentation based on Image Enhancement Method Danyi Huang et.al. 2406.05170 null
2024-06-10 GenHeld: Generating and Editing Handheld Objects Chaerin Min et.al. 2406.05059 link
2024-06-07 Zero-Shot Video Editing through Adaptive Sliding Score Distillation Lianghan Zhu et.al. 2406.04888 null
2024-06-07 Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior Tanvir Mahmud et.al. 2406.04873 link
2024-06-06 GenAI Arena: An Open Evaluation Platform for Generative Models Dongfu Jiang et.al. 2406.04485 null
2024-06-06 Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning Amandeep Kumar et.al. 2406.04413 link
2024-06-06 Diffusion-based image inpainting with internal learning Nicolas Cherel et.al. 2406.04206 link
2024-06-06 LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression Junhui Li et.al. 2406.03961 link
2024-06-06 JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits Minzhou Pan et.al. 2406.03720 link
2024-06-06 Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting Inkyu Shin et.al. 2406.02541 null
2024-06-04 Deep Block Proximal Linearised Minimisation Algorithm for Non-convex Inverse Problems Chaoyan Huang et.al. 2406.02458 null
2024-06-03 DiffUHaul: A Training-Free Method for Object Dragging in Images Omri Avrahami et.al. 2406.01594 null
2024-06-03 CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment Daekyu Kwon et.al. 2406.01020 null
2024-06-03 MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models Mingzhen Huang et.al. 2406.00985 null
2024-06-03 Assessing the Adversarial Security of Perceptual Hashing Algorithms Jordan Madden et.al. 2406.00918 link
2024-06-02 Invisible Backdoor Attacks on Diffusion Models Sen Li et.al. 2406.00816 link
2024-06-02 Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior Yukai Shi et.al. 2406.00632 link
2024-06-02 Correlation Matching Transformation Transformers for UHD Image Restoration Cong Wang et.al. 2406.00629 link
2024-06-01 FlowIE: Efficient Image Enhancement via Rectified Flow Yixuan Zhu et.al. 2406.00508 link
2024-05-31 Learning Gaze-aware Compositional GAN Nerea Aranjuelo et.al. 2405.20643 link
2024-05-30 MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion Shuyuan Tu et.al. 2405.20325 link
2024-05-30 Sharing Key Semantics in Transformer Makes Efficient Image Restoration Bin Ren et.al. 2405.20008 link
2024-05-30 All-In-One Medical Image Restoration via Task-Adaptive Routing Zhiwen Yang et.al. 2405.19769 link
2024-05-30 Streaming Video Diffusion: Online Video Editing with Diffusion Models Feng Chen et.al. 2405.19726 link
2024-05-30 Text Guided Image Editing with Automatic Concept Locating and Forgetting Jia Li et.al. 2405.19708 null
2024-05-30 A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning Xiaofeng Cong et.al. 2405.19684 null
2024-05-30 Creating Language-driven Spatial Variations of Icon Images Xianghao Xu et.al. 2405.19636 null
2024-05-29 Blind Image Restoration via Fast Diffusion Inversion Hamadi Chihaoui et.al. 2405.19572 link
2024-05-29 VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture Heesup Yun et.al. 2405.19413 null
2024-05-28 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting Qihang Zhang et.al. 2405.18424 null
2024-05-28 RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives Jaehong Yoon et.al. 2405.18406 link
2024-05-29 Color Shift Estimation-and-Correction for Image Enhancement Yiyu Li et.al. 2405.17725 link
2024-05-27 Fast Samplers for Inverse Problems in Iterative Refinement Models Kushagra Pandey et.al. 2405.17673 link
2024-05-27 Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection Gihyun Kwon et.al. 2405.16823 null
2024-05-27 TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing Xinyu Zhang et.al. 2405.16803 null
2024-05-27 PromptFix: You Prompt and We Fix the Photo Yongsheng Yu et.al. 2405.16785 link
2024-05-26 I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Wenqi Ouyang et.al. 2405.16537 null
2024-05-26 Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models Regev Cohen et.al. 2405.16475 null
2024-05-28 Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation Jinlin Liu et.al. 2405.16393 null
2024-05-25 LEAST: “Local” text-conditioned image style transfer Silky Singh et.al. 2405.16330 link
2024-05-25 ModelLock: Locking Your Model With a Spell Yifeng Gao et.al. 2405.16285 null
2024-05-25 Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination Shelly Golan et.al. 2405.16260 link
2024-05-25 Underwater Image Enhancement by Diffusion Model with Customized CLIP-Classifier Shuaixin Liu et.al. 2405.16214 link
2024-05-24 FastDrag: Manipulate Anything in One Step Xuanjia Zhao et.al. 2405.15769 link
2024-05-24 Hierarchical Uncertainty Exploration via Feedforward Posterior Trees Elias Nehme et.al. 2405.15719 null
2024-05-24 Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition Xiaogang Xu et.al. 2405.15660 null
2024-05-24 Efficient Degradation-aware Any Image Restoration Eduard Zamfir et.al. 2405.15475 null
2024-05-24 Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features Lichuan Ji et.al. 2405.15343 null
2024-05-24 Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion Aoxue Li et.al. 2405.15313 null
2024-05-24 Blaze3DM: Marry Triplane Representation with Diffusion for 3D Medical Inverse Problem Solving Jia He et.al. 2405.15241 null
2024-05-23 EditWorld: Simulating World Dynamics for Instruction-Following Image Editing Ling Yang et.al. 2405.14785 link
2024-05-23 TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing Teng Xu et.al. 2405.14455 null
2024-05-23 Efficient Visual State Space Model for Image Deblurring Lingshun Kong et.al. 2405.14343 link
2024-05-22 ReVideo: Remake a Video with Motion and Content Control Chong Mou et.al. 2405.13865 null
2024-05-22 Perceptual Fairness in Image Restoration Guy Ohayon et.al. 2405.13805 null
2024-05-22 Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data Tarun Kalluri et.al. 2405.13779 null
2024-05-22 InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos Yujun Shi et.al. 2405.13722 link
2024-05-21 DARK: Denoising, Amplification, Restoration Kit Zhuoheng Li et.al. 2405.12891 link
2024-05-21 Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image Zerui Zhang et.al. 2405.12872 link
2024-05-21 EmoEdit: Evoking Emotions through Image Manipulation Jingyuan Yang et.al. 2405.12661 null
2024-05-21 Customize Your Own Paired Data via Few-shot Way Jinshu Chen et.al. 2405.12490 null
2024-05-20 Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices Nathaniel Cohen et.al. 2405.12211 link
2024-05-20 A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator Zhigang Jia et.al. 2405.12114 null
2024-05-19 Verification technology for finger vein biometric George Kumi Kyeremeh et.al. 2405.11540 null
2024-05-19 Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement Igor Morawski et.al. 2405.11478 null
2024-05-19 Emphasizing Crucial Features for Efficient Image Restoration Hu Gao et.al. 2405.11468 link
2024-05-18 ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing Ying Jin et.al. 2405.11190 null
2024-05-17 A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model Mingxiang Fu et.al. 2405.10890 null
2024-05-17 LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion Tong Chen et.al. 2405.10550 link
2024-05-16 RSDehamba: Lightweight Vision Mamba for Remote Sensing Satellite Image Dehazing Huiling Zhou et.al. 2405.10030 null
2024-05-16 NTIRE 2024 Restore Any Image Model (RAIM) in the Wild Challenge Jie Liang et.al. 2405.09923 null
2024-05-15 Inference in higher-order undirected graphical models and binary polynomial optimization Aida Khajavirad et.al. 2405.09727 null
2024-05-15 Illumination Histogram Consistency Metric for Quantitative Assessment of Video Sequences Long Chen et.al. 2405.09716 link
2024-05-15 RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content Tianhao Peng et.al. 2405.08621 null
2024-05-14 WaterMamba: Visual State Space Model for Underwater Image Enhancement Meisheng Guan et.al. 2405.08419 null
2024-05-14 Palette-based Color Transfer between Images Chenlei Lv et.al. 2405.08263 null
2024-05-13 FRRffusion: Unveiling Authenticity with Diffusion-Based Face Retouching Reversal Fengchuang Xing et.al. 2405.07582 link
2024-05-09 Diag2Diag: Multi modal super resolution for physics discovery with application to fusion Azarakhsh Jalalvand et.al. 2405.05908 null
2024-05-09 DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation Sitian Shen et.al. 2405.05800 null
2024-05-09 Exploring Text-Guided Single Image Editing for Remote Sensing Images Fangzhou Han et.al. 2405.05769 link
2024-05-09 RPBG: Towards Robust Neural Point-based Graphics in the Wild Qingtian Zhu et.al. 2405.05663 link
2024-05-08 Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection Zhaoxiang Zhang et.al. 2405.04782 null
2024-05-07 Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing Yi Zuo et.al. 2405.04496 null
2024-05-07 DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks Jiaxin Zhang et.al. 2405.04408 link
2024-05-07 SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing Yuying Ge et.al. 2405.04007 link
2024-05-06 Low-light Object Detection Pengpeng Li et.al. 2405.03519 null
2024-05-06 Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement Jiesong Bai et.al. 2405.03349 link
2024-05-06 Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance Xunchu Zhou et.al. 2405.03333 link
2024-05-05 Residual-Conditioned Optimal Transport: Towards Structure-preserving Unpaired and Paired Image Restoration Xiaole Tang et.al. 2405.02843 link
2024-05-04 Deep Image Restoration For Image Anti-Forensics Eren Tahir et.al. 2405.02751 link
2024-05-06 SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising Guanyiman Fu et.al. 2405.01726 link
2024-05-02 LocInv: Localization-aware Inversion for Text-Guided Image Editing Chuanming Tang et.al. 2405.01496 link
2024-05-01 SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models Burak Can Biner et.al. 2405.00878 null
2024-05-01 TexSliders: Diffusion-Based Texture Editing in CLIP Space Julia Guerrero-Viu et.al. 2405.00672 null
2024-05-01 Streamlining Image Editing with Layered Diffusion Brushes Peyman Gholami et.al. 2405.00313 null
2024-04-27 Remote Sensing Image Enhancement through Spatiotemporal Filtering Hessah Albanwan et.al. 2404.18950 null
2024-04-29 Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle Jungwoo Lee et.al. 2404.18395 null
2024-04-29 Reconstructing Satellites in 3D from Amateur Telescope Images Zhiming Chang et.al. 2404.18394 null
2024-04-28 Paint by Inpaint: Learning to Add Image Objects by Removing Them First Navve Wasserman et.al. 2404.18212 link
2024-04-27 DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images Maria Mihaela Trusca et.al. 2404.18020 link
2024-04-27 FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder Zheng Cheng et.al. 2404.17936 null
2024-05-02 Underwater Variable Zoom: Depth-Guided Perception Network for Underwater Image Enhancement Zhixiong Huang et.al. 2404.17883 link
2024-04-26 Inhomogeneous illuminated image enhancement under extremely low visibility condition Libang Chen et.al. 2404.17503 null
2024-04-26 Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model Zhenghong Li et.al. 2404.17484 null
2024-04-26 PromptCIR: Blind Compressed Image Restoration with Prompt Learning Bingchen Li et.al. 2404.17433 link
2024-04-26 One-Shot Image Restoration Deborah Pereg et.al. 2404.17426 null
2024-04-26 Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement Zishu Yao et.al. 2404.17400 link
2024-04-25 V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection Xuanyu Zhang et.al. 2404.16824 null
2024-04-25 NTIRE 2024 Quality Assessment of AI-Generated Content Challenge Xiaohong Liu et.al. 2404.16687 null
2024-04-25 AudioScenic: Audio-Driven Video Scene Editing Kaixin Shen et.al. 2404.16581 null
2024-04-24 Editable Image Elements for Controllable Synthesis Jiteng Mu et.al. 2404.16029 null
2024-04-26 A Survey on Visual Mamba Hanwei Zhang et.al. 2404.15956 null
2024-04-26 A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution Zhixiong Yang et.al. 2404.15620 link
2024-04-22 UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement Yaofeng Xie et.al. 2404.14542 link
2024-04-22 GeoDiffuser: Geometry-Based Image Editing with Diffusion Models Rahul Sajnani et.al. 2404.14403 null
2024-04-22 NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results Xiaoning Liu et.al. 2404.14248 link
2024-04-22 Face2Face: Label-driven Facial Retouching Restoration Guanhua Zhao et.al. 2404.14177 null
2024-04-22 Text in the Dark: Extremely Low-Light Text Image Enhancement Che-Tsung Lin et.al. 2404.14135 link
2024-04-22 CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task Kangzhen Yang et.al. 2404.14132 link
2024-04-22 MambaUIE&SR: Unraveling the Ocean’s Secrets with Only 2.8 FLOPs Zhihao Chen et.al. 2404.13884 link
2024-04-23 LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation Haoyu Zheng et.al. 2404.13558 null
2024-04-24 Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition Genggeng Chen et.al. 2404.13537 link
2024-04-20 PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition Xi Fang et.al. 2404.13299 null
2024-04-19 On-board classification of underwater images using hybrid classical-quantum CNN based method Sreeraj Rajan Warrier et.al. 2404.13130 null
2024-04-18 GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models Sai Sree Harsha et.al. 2404.12541 null
2024-04-18 AstroSat observations of interacting galaxies NGC 7469 and IC 5283 Abhinna Sundar Samantaray et.al. 2404.12527 null
2024-04-18 Lazy Diffusion Transformer for Interactive Image Editing Yotam Nitzan et.al. 2404.12382 null
2024-04-18 Customizing Text-to-Image Diffusion with Camera Viewpoint Control Nupur Kumari et.al. 2404.12333 null
2024-04-18 StyleBooth: Image Style Editing with Multimodal Instruction Zhen Han et.al. 2404.12154 link
2024-04-18 Improving the perception of visual fiducial markers in the field using Adaptive Active Exposure Control Ziang Ren et.al. 2404.12055 null
2024-04-18 FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models Wei Wu et.al. 2404.11895 link
2024-04-17 CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration Rui Deng et.al. 2404.11778 null
2024-04-17 AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adapters Hao-Wei Chen et.al. 2404.11475 null
2024-04-17 TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing Sherry X. Chen et.al. 2404.11120 link
2024-04-16 Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation Wenjie Lin et.al. 2404.10358 null
2024-04-16 Referring Flexible Image Restoration Runwei Guan et.al. 2404.10342 link
2024-04-17 OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model Runyi Li et.al. 2404.10312 null
2024-04-15 Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets Dai Quoc Tran et.al. 2404.10078 link
2024-04-15 HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Mude Hui et.al. 2404.09990 null
2024-04-15 Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Han Lin et.al. 2404.09967 null
2024-04-15 The Problem Of Image Super-Resolution, Denoising And Some Image Restoration Methods In Deep Learning Models Ngoc-Giau Pham et.al. 2404.09817 null
2024-04-15 Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement Wenyi Lian et.al. 2404.09735 link
2024-04-15 Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models Ziwei Luo et.al. 2404.09732 link
2024-04-15 Real-world Instance-specific Image Goal Navigation for Service Robots: Bridging the Domain Gap with Contrastive Learning Taichi Sakaguchi et.al. 2404.09645 null
2024-04-13 BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection Jian Zhang et.al. 2404.08979 null
2024-04-13 Seeing Text in the Dark: Algorithm and Benchmark Chengpei Xu et.al. 2404.08965 null
2024-04-11 S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing Guangzhi Wang et.al. 2404.08111 null
2024-04-11 TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising Junyi Li et.al. 2404.07846 link
2024-04-11 Joint Conditional Diffusion Model for Image Restoration with Mixed Degradations Yufeng Yue et.al. 2404.07770 null
2024-04-11 Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method Tashmoy Ghosh et.al. 2404.07649 null
2024-04-10 Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models Yasi Zhang et.al. 2404.07389 null
2024-04-10 Unfolding ADMM for Enhanced Subspace Clustering of Hyperspectral Images Xianlu Li et.al. 2404.07112 link
2024-04-08 NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement Giordano Cicchetti et.al. 2404.05669 link
2024-04-08 Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models Saman Motamed et.al. 2404.05519 null
2024-04-08 Comparative Analysis of Image Enhancement Techniques for Brain Tumor Segmentation: Contrast, Histogram, and Hybrid Approaches Shoffan Saifullah et.al. 2404.05341 null
2024-04-08 CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement Xu Wu et.al. 2404.05253 null
2024-04-07 STAIC regularization for spatio-temporal image reconstruction Deepak G Skariah et.al. 2404.05070 null
2024-04-07 AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment Yuanfeng Xu et.al. 2404.04946 null
2024-04-07 ByteEdit: Boost, Comply and Accelerate Generative Image Editing Yuxi Ren et.al. 2404.04860 null
2024-04-09 Empowering Image Recovery_ A Multi-Attention Approach Juan Wen et.al. 2404.04617 null
2024-04-05 ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing Alec Helbling et.al. 2404.04376 link
2024-04-05 Physics-Inspired Synthesized Underwater Image Dataset Reina Kaneko et.al. 2404.03998 link
2024-04-04 DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior Yiming Zhang et.al. 2404.03642 null
2024-04-04 Reference-Based 3D-Aware Image Editing with Triplane Bahri Batuhan Bilecen et.al. 2404.03632 null
2024-04-04 DI-Retinex: Digital-Imaging Retinex Theory for Low-Light Image Enhancement Shangquan Sun et.al. 2404.03327 null
2024-04-03 Deep Image Composition Meets Image Forgery Eren Tahir et.al. 2404.02897 link
2024-04-03 MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation Petru-Daniel Tudosiu et.al. 2404.02790 null
2024-04-02 Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration Akshay Dudhane et.al. 2404.02154 link
2024-04-02 3D Congealing: 3D-Aware Image Alignment in the Wild Yunzhi Zhang et.al. 2404.02125 null
2024-04-02 Specularity Factorization for Low-Light Enhancement Saurabh Saini et.al. 2404.01998 null
2024-04-02 Fashion Style Editing with Generative Human Prior Chaerin Kong et.al. 2404.01984 null
2024-04-03 RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement Tatiana Gaintseva et.al. 2404.01889 link
2024-04-01 An image speaks a thousand words, but can everyone listen? On translating images for cultural relevance Simran Khanuja et.al. 2404.01247 link
2024-04-01 Uncovering the Text Embedding in Text-to-Image Diffusion Models Hu Yu et.al. 2404.01154 null
2024-04-01 CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment Hyeongmin Lee et.al. 2404.01123 null
2024-04-01 Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach Guoqiang Liang et.al. 2404.00834 null
2024-03-31 GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration Youssef Mansour et.al. 2404.00807 null
2024-03-29 Binarized Low-light Raw Video Enhancement Gengchen Zhang et.al. 2403.19944 link
2024-03-28 GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models Yusuf Dalva et.al. 2403.19645 null
2024-03-28 Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality Kyotaro Tokoro et.al. 2403.19428 link
2024-03-28 Taming Lookup Tables for Efficient Image Retouching Sidi Yang et.al. 2403.19238 link
2024-03-28 A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement Junjie Wen et.al. 2403.19079 null
2024-03-27 TextCraftor: Your Text Encoder Can be Image Quality Controller Yanyu Li et.al. 2403.18978 null
2024-03-27 ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Daniel Winter et.al. 2403.18818 null
2024-03-27 Towards Image Ambient Lighting Normalization Florin-Alexandru Vasluianu et.al. 2403.18730 link
2024-03-27 InstructBrush: Learning Attention-based Instruction Optimization for Image Editing Ruoyu Zhao et.al. 2403.18660 null
2024-03-28 FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing Trong-Tung Nguyen et.al. 2403.18605 null
2024-03-26 Bidirectional Consistency Models Liangchen Li et.al. 2403.18035 link
2024-03-26 Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models Mohammad Shahab Sepehri et.al. 2403.17902 null
2024-03-26 ExpressEdit: Video Editing with Natural Language and Sketching Bekzat Tilekbay et.al. 2403.17693 null
2024-03-26 SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder Dihan Zheng et.al. 2403.17502 link
2024-03-26 Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching Shohei Enomoto et.al. 2403.17423 null
2024-03-26 Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance Donghoon Ahn et.al. 2403.17377 link
2024-03-25 Residual Dense Swin Transformer for Continuous Depth-Independent Ultrasound Imaging Jintong Hu et.al. 2403.16384 link
2024-03-25 Distilling Semantic Priors from SAM to Efficient Image Restoration Models Quan Zhang et.al. 2403.16368 null
2024-03-24 EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing Xiangpeng Yang et.al. 2403.16111 null
2024-03-24 Edit3K: Universal Representation Learning for Video Editing Components Xin Gu et.al. 2403.16048 null
2024-03-23 Graph Image Prior for Unsupervised Dynamic MRI Reconstruction Zhongsen Li et.al. 2403.15770 link
2024-03-22 MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis Mai A. Shaaban et.al. 2403.15585 link
2024-03-22 Latent Neural Cellular Automata for Resource-Efficient Image Restoration Andrea Menta et.al. 2403.15525 null
2024-03-22 Medical Image Data Provenance for Medical Cyber-Physical System Vijay Kumar et.al. 2403.15522 null
2024-03-21 Osmosis: RGBD Diffusion Prior for Underwater Image Restoration Opher Bar Nathan et.al. 2403.14837 null
2024-03-25 Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing Alberto Baldrati et.al. 2403.14828 link
2024-03-21 StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Roberto Henschel et.al. 2403.14773 link
2024-03-22 Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion Xiang Fan et.al. 2403.14617 null
2024-03-21 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation Yuning Cui et.al. 2403.14614 link
2024-03-21 ReNoise: Real Image Inversion Through Iterative Noising Daniel Garibi et.al. 2403.14602 null
2024-03-21 DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing Yueru Jia et.al. 2403.14487 link
2024-03-22 AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Max Ku et.al. 2403.14468 link
2024-03-20 Step-Calibrated Diffusion for Biomedical Optical Image Restoration Yiwei Lyu et.al. 2403.13680 link
2024-03-20 Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing Hangeol Chang et.al. 2403.13551 link
2024-03-20 A multilevel framework for accelerating uSARA in radio-interferometric imaging Guillaume Lauga et.al. 2403.13385 null
2024-03-22 Mora: Enabling Generalist Video Generation via A Multi-Agent Framework Zhengqing Yuan et.al. 2403.13248 link
2024-03-19 Multispectral Image Restoration by Generalized Opponent Transformation Total Variation Zhantao Ma et.al. 2403.12770 null
2024-03-19 LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing Yazeed Alharbi et.al. 2403.12585 null
2024-03-19 Generalized Consistency Trajectory Models for Image Manipulation Beomsu Kim et.al. 2403.12510 link
2024-03-18 Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Axel Sauer et.al. 2403.12015 null
2024-03-18 DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing Hyeonho Jeong et.al. 2403.12002 null
2024-03-18 LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model Runhui Huang et.al. 2403.11929 null
2024-03-18 View-Consistent 3D Editing with Gaussian Splatting Yuxuan Wang et.al. 2403.11868 null
2024-03-18 EffiVED:Efficient Video Editing via Text-instruction Diffusion Models Zhenghao Zhang et.al. 2403.11568 link
2024-03-18 End-To-End Underwater Video Enhancement: Dataset and Model Dazhao Du et.al. 2403.11506 link
2024-03-18 Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors Ruicheng Wang et.al. 2403.11503 null
2024-03-18 CasSR: Activating Image Power for Real-World Image Super-Resolution Haolan Chen et.al. 2403.11451 null
2024-03-18 VmambaIR: Visual State Space Model for Image Restoration Yuan Shi et.al. 2403.11423 link
2024-03-18 DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation Jeongsol Kim et.al. 2403.11415 link
2024-03-18 Divide-and-Conquer Posterior Sampling for Denoising Diffusion Priors Yazid Janati et.al. 2403.11407 link
2024-03-17 Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model Dian Zheng et.al. 2403.11157 link
2024-03-17 Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models Ruibin Li et.al. 2403.11105 link
2024-03-16 A Spectrum-based Image Denoising Method with Edge Feature Enhancement Peter Luvton et.al. 2403.11036 null
2024-03-15 How Powerful Potential of Attention on Image Restoration? Cong Wang et.al. 2403.10336 null
2024-03-15 BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution Feng Li et.al. 2403.10211 link
2024-03-15 E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance Tianrui Huang et.al. 2403.10133 null
2024-03-15 PQDynamicISP: Dynamically Controlled Image Signal Processor for Any Image Sensors Pursuing Perceptual Quality Masakazu Yoshimura et.al. 2403.10091 null
2024-03-15 ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images Xiangtian Xue et.al. 2403.10004 null
2024-03-14 Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing Wonjun Kang et.al. 2403.09468 link
2024-03-14 Video Editing via Factorized Diffusion Distillation Uriel Singer et.al. 2403.09334 null
2024-03-14 D-YOLO a robust framework for object detection in adverse weather conditions Zihan Chu et.al. 2403.09233 null
2024-03-13 7T MRI Synthesization from 3T Acquisitions Qiming Cui et.al. 2403.08979 link
2024-03-13 FogGuard: guarding YOLO against fog using perceptual loss Soheil Gharatappeh et.al. 2403.08939 link
2024-03-13 DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation Minbin Huang et.al. 2403.08857 null
2024-03-13 VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Enric Corona et.al. 2403.08764 null
2024-03-13 iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer Dinh-Khoi Vo et.al. 2403.08746 link
2024-03-13 Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Corrupted Data Asad Aali et.al. 2403.08728 link
2024-03-13 Make Me Happier: Evoking Emotions Through Image Diffusion Models Qing Lin et.al. 2403.08255 null
2024-03-12 Pix2Pix-OnTheFly: Leveraging LLMs for Instruction-Guided Image Editing Rodrigo Santos et.al. 2403.08004 null
2024-03-12 Multiple Latent Space Mapping for Compressed Dark Image Enhancement Yi Zeng et.al. 2403.07622 null
2024-03-12 Imagine a dragon made of seaweed: How images enhance learning in Wikipedia Anita Silva et.al. 2403.07613 null
2024-03-12 NightHaze: Nighttime Image Dehazing via Self-Prior Learning Beibei Lin et.al. 2403.07408 null
2024-03-12 Efficient Diffusion Model for Image Restoration by Residual Shifting Zongsheng Yue et.al. 2403.07319 link
2024-03-12 Continual All-in-One Adverse Weather Removal with Knowledge Replay on a Unified Network Structure De Cheng et.al. 2403.07292 link
2024-03-11 Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions Lan Wang et.al. 2403.07198 null
2024-03-11 DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation Guosheng Zhao et.al. 2403.06845 null
2024-03-11 Boosting Image Restoration via Priors from Pre-trained Models Xiaogang Xu et.al. 2403.06793 null
2024-03-11 Comparison of No-Reference Image Quality Models via MAP Estimation in Diffusion Latents Weixia Zhang et.al. 2403.06406 null
2024-03-10 FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing Youyuan Zhang et.al. 2403.06269 null
2024-03-10 Textureless Object Recognition: An Edge-based Approach Frincy Clement et.al. 2403.06107 null
2024-03-10 Universal Debiased Editing for Fair Medical Image Classification Ruinan Jin et.al. 2403.06104 link
2024-03-10 Reframe Anything: LLM Agent for Open World Video Reframing Jiawang Cao et.al. 2403.06070 null
2024-03-10 Implicit Image-to-Image Schrodinger Bridge for CT Super-Resolution and Denoising Yuang Wang et.al. 2403.06069 link
2024-03-12 Decoupled Data Consistency with Diffusion Purification for Image Restoration Xiang Li et.al. 2403.06054 link
2024-03-09 Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration Jingyun Xue et.al. 2403.05906 null
2024-03-08 InstructGIE: Towards Generalizable Image Editing Zichong Meng et.al. 2403.05018 null
2024-03-07 An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control Aosong Feng et.al. 2403.04880 link
2024-03-07 FriendNet: Detection-Friendly Dehazing Network Yihua Fan et.al. 2403.04443 link
2024-03-07 StableDrag: Stable Dragging for Point-based Image Editing Yutao Cui et.al. 2403.04437 null
2024-03-07 Image enhancement algorithm for absorption imaging Pengcheng Zheng et.al. 2403.04240 null
2024-03-06 Low-Dose CT Image Reconstruction by Fine-Tuning a UNet Pretrained for Gaussian Denoising for the Downstream Task of Image Enhancement Tim Selig et.al. 2403.03551 null
2024-03-06 Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing Bingyan Liu et.al. 2403.03431 null
2024-03-05 Doubly Abductive Counterfactual Inference for Text-based Image Editing Xue Song et.al. 2403.02981 link
2024-03-05 Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement Jinhong He et.al. 2403.02879 null
2024-03-05 Speckle Noise Reduction in Ultrasound Images using Denoising Auto-encoder with Skip Connection Suraj Bhute et.al. 2403.02750 null
2024-03-04 A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement Ruirui Lin et.al. 2403.02408 null
2024-03-03 Learning A Physical-aware Diffusion Model Based on Transformer for Underwater Image Enhancement Chen Zhao et.al. 2403.01497 link
2024-03-02 Extrapolated Plug-and-Play Three-Operator Splitting Methods for Nonconvex Optimization with Applications to Image Restoration Zhongming Wu et.al. 2403.01144 link
2024-03-02 Edge-guided Low-light Image Enhancement with Inertial Bregman Alternating Linearized Minimization Chaoyan Huang et.al. 2403.01142 null
2024-03-01 LoMOE: Localized Multi-Object Editing via Multi-Diffusion Goirik Chakrabarty et.al. 2403.00437 null
2024-03-01 ChartReformer: Natural Language-Driven Chart Image Editing Pengyu Yan et.al. 2403.00209 link
2024-02-28 Misalignment-Robust Frequency Distribution Loss for Image Transformation Zhangkai Ni et.al. 2402.18192 link
2024-02-28 A Lightweight Low-Light Image Enhancement Network via Channel Prior and Gamma Correction Shyang-En Weng et.al. 2402.18147 null
2024-02-28 Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction Koki Maeda et.al. 2402.17969 null
2024-02-26 Randomized Algorithms for Solving Singular Value Decomposition Problems with Matlab Toolbox Xiaowen Li et.al. 2402.17794 null
2024-02-27 Diffusion Model-Based Image Editing: A Survey Yi Huang et.al. 2402.17525 link
2024-02-27 Learning Exposure Correction in Dynamic Scenes Jin Liu et.al. 2402.17296 link
2024-02-25 Diffusion Posterior Proximal Sampling for Image Restoration Hongjie Wu et.al. 2402.16907 link
2024-02-26 Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing Ling Yang et.al. 2402.16627 link
2024-02-25 ARIN: Adaptive Resampling and Instance Normalization for Robust Blind Inpainting of Dunhuang Cave Paintings Alexander Schmidt et.al. 2402.16188 null
2024-02-25 An Image Enhancement Method for Improving Small Intestinal Villi Clarity Shaojie Zhang et.al. 2402.15977 null
2024-02-24 Sandwich GAN: Image Reconstruction from Phase Mask based Anti-dazzle Imaging Xiaopeng Peng et.al. 2402.15919 null
2024-02-24 HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models Li Pang et.al. 2402.15865 link
2024-02-24 IRConStyle: Image Restoration Framework Using Contrastive Learning and Style Transfer Dongqi Fan et.al. 2402.15784 link
2024-02-23 MambaIR: A Simple Baseline for Image Restoration with State-Space Model Hang Guo et.al. 2402.15648 link
2024-02-26 LLMBind: A Unified Modality-Task Integration Framework Bin Zhu et.al. 2402.14891 link
2024-02-22 Consolidating Attention Features for Multi-view Image Editing Or Patashnik et.al. 2402.14792 null
2024-02-22 Place Anything into Any Video Ziling Liu et.al. 2402.14316 null
2024-02-21 Adversarial Purification and Fine-tuning for Robust UDC Image Restoration Zhenbo Song et.al. 2402.13629 null
2024-02-22 UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing Jianhong Bai et.al. 2402.13185 null
2024-02-21 Robust-Wide: Robust Watermarking against Instruction-driven Image Editing Runyi Hu et.al. 2402.12688 link
2024-02-19 Integrating kNN with Foundation Models for Adaptable and Privacy-Aware Image Classification Sebastian Doerrich et.al. 2402.12500 link
2024-02-19 Human Video Translation via Query Warping Haiming Zhu et.al. 2402.12099 null
2024-02-08 Text2Data: Low-Resource Data Generation with Textual Control Shiyu Wang et.al. 2402.10941 null
2024-02-15 LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing Bryan Wang et.al. 2402.10294 null
2024-02-15 Seed Optimization with Frozen Generator for Superior Zero-shot Low-light Enhancement Yuxuan Gu et.al. 2402.09694 null
2024-02-14 DestripeCycleGAN: Stripe Simulation CycleGAN for Unsupervised Infrared Image Destriping Shiqi Yang et.al. 2402.09101 null
2024-02-05 Point and Instruct: Enabling Precise Image Editing by Unifying Direct Manipulation and Text Instructions Alec Helbling et.al. 2402.07925 null
2024-02-12 Tutorial: Shaping the Spatial Correlations of Entangled Photon Pairs Patrick Cameron et.al. 2402.07667 null
2024-02-10 Gyroscope-Assisted Motion Deblurring Network Simin Luan et.al. 2402.06854 link
2024-02-08 You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement Yixu Feng et.al. 2402.05809 link
2024-02-08 Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application Bumsoo Kim et.al. 2402.05448 null
2024-02-08 Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model Junghun Cha et.al. 2402.05350 null
2024-02-07 Noise Map Guidance: Inversion with Spatial Context for Real Image Editing Hansam Cho et.al. 2402.04625 link
2024-02-07 Troublemaker Learning for Low-Light Image Enhancement Yinghao Song et.al. 2402.04584 link
2024-02-14 U-shaped Vision Mamba for Single Image Dehazing Zhuoran Zheng et.al. 2402.04139 link
2024-02-08 Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction Shijun Liang et.al. 2402.04097 null
2024-02-05 Rethinking RGB Color Representation for Image Restoration Models Jaerin Lee et.al. 2402.03399 null
2024-02-05 Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing Yan Shu et.al. 2402.03082 link
2024-02-05 InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Yiyuan Zhang et.al. 2402.03040 link
2024-02-05 Knowledge-driven deep learning for fast MR imaging: undersampled MR image reconstruction from supervised to un-supervised learning Shanshan Wang et.al. 2402.02704 null
2024-02-04 Key-Graph Transformer for Image Restoration Bin Ren et.al. 2402.02634 null
2024-02-04 DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing Chong Mou et.al. 2402.02583 link
2024-02-04 Exploring Intrinsic Properties of Medical Images for Self-Supervised Binary Semantic Segmentation Pranav Singh et.al. 2402.02367 null
2024-02-04 Video Editing for Video Retrieval Bin Zhu et.al. 2402.02335 null
2024-02-03 S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation Yurui Chen et.al. 2402.02112 null
2024-02-03 BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement Nantheera Anantrasirichai et.al. 2402.01970 link
2024-02-02 LIR: Efficient Degradation Removal for Lightweight Image Restoration Dongqi Fan et.al. 2402.01368 link
2024-01-31 Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators Daniel Geng et.al. 2401.18085 null
2024-01-31 Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models Kyungsung Lee et.al. 2401.17629 null
2024-01-31 Task-Oriented Diffusion Model Compression Geonung Kim et.al. 2401.17547 null
2024-01-30 Anything in Any Scene: Photorealistic Video Object Insertion Chen Bai et.al. 2401.17509 null
2024-01-30 LATENTPATCH: A Non-Parametric Approach for Face Generation and Editing Benjamin Samuth et.al. 2401.16830 null
2024-01-31 High-Quality Image Restoration Following Human Instructions Marcos V. Conde et.al. 2401.16468 link
2024-01-30 Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation Zhenyu Wang et.al. 2401.15688 null
2024-01-28 CPDM: Content-Preserving Diffusion Model for Underwater Image Enhancement Xiaowen Shi et.al. 2401.15649 null
2024-01-28 UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration Nachuan Ma et.al. 2401.15647 null
2024-01-26 CascadedGaze: Efficiency in Global Context Extraction for Image Restoration Amirhosein Ghasemabadi et.al. 2401.15235 link
2024-01-30 LYT-Net: Lightweight YUV Transformer-based Network for Low-Light Image Enhancement A. Brateanu et.al. 2401.15204 link
2024-01-25 Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Tianhe Ren et.al. 2401.14159 link
2024-01-30 CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion Nisha Huang et.al. 2401.14066 link
2024-01-24 Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild Fanghua Yu et.al. 2401.13627 null
2024-01-29 Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval Dezhao Luo et.al. 2401.13329 null
2024-01-24 Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration Yimin Xu et.al. 2401.13221 link
2024-01-23 CCA: Collaborative Competitive Agents for Image Editing Tiankai Hang et.al. 2401.13011 link
2024-01-23 CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data Chandrakanth Gudavalli et.al. 2401.13006 null
2024-01-23 Lumiere: A Space-Time Diffusion Model for Video Generation Omer Bar-Tal et.al. 2401.12945 null
2024-01-21 Text-to-Image Cross-Modal Generation: A Systematic Review Maciej Żelaszczyk et.al. 2401.11631 null
2024-01-21 LLMRA: Multi-modal Large Language Model based Restoration Assistant Xiaoyu Jin et.al. 2401.11401 null
2024-01-19 MixNet: Towards Effective and Efficient UHD Low-Light Image Enhancement Chen Wu et.al. 2401.10666 link
2024-01-18 M3BUNet: Mobile Mean Max UNet for Pancreas Segmentation on CT-Scans Juwita juwita et.al. 2401.10419 null
2024-01-18 Edit One for All: Interactive Batch Image Editing Thao Nguyen et.al. 2401.10219 null
2024-01-18 WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Xiaofeng Wang et.al. 2401.09985 null
2024-01-20 Boosting Few-Shot Semantic Segmentation Via Segment Anything Model Chen-Bin Feng et.al. 2401.09826 null
2024-01-18 Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing Gwanhyeong Koo et.al. 2401.09794 null
2024-01-16 Deep Linear Array Pushbroom Image Restoration: A Degradation Pipeline and Jitter-Aware Restoration Network Zida Chen et.al. 2401.08171 link
2024-01-15 Low-light Stereo Image Enhancement and De-noising in the Low-frequency Information Enhanced Image Space Minghua Zhao et.al. 2401.07753 link
2024-01-15 Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks Siyu Zou et.al. 2401.07709 link
2024-01-13 Exploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial Transferability Junxi Chen et.al. 2401.07087 null
2024-01-12 LiDAR Depth Map Guided Image Compression Model Alessandro Gnutti et.al. 2401.06517 null
2024-01-12 RotationDrag: Point-based Image Editing with Rotated Diffusion Features Minxing Luo et.al. 2401.06442 link
2024-01-11 E $^{2}$ GAN: Efficient Training of Efficient GANs for Image-to-Image Translation Yifan Gong et.al. 2401.06127 null
2024-01-11 Object-Centric Diffusion for Efficient Video Editing Kumara Kahatapitiya et.al. 2401.05735 null
2024-01-10 Content-Aware Depth-Adaptive Image Restoration Tom Richard Vargis et.al. 2401.05049 null
2024-01-10 Structure-focused Neurodegeneration Convolutional Neural Network for Modeling and Classification of Alzheimer’s Disease Simisola Odimayo et.al. 2401.03922 link
2024-01-08 Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion Minglong Xue et.al. 2401.03788 link
2024-01-07 SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing Songyan Chen et.al. 2401.03433 link
2024-01-07 Towards Effective Multiple-in-One Image Restoration: A Sequential and Prompt Learning Strategy Xiangtao Kong et.al. 2401.03379 link
2024-01-06 MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond Yupei Lin et.al. 2401.03221 null
2024-01-05 Generating Non-Stationary Textures using Self-Rectification Yang Zhou et.al. 2401.02847 link
2024-01-05 Analysis of a wavelet frame based two-scale model for enhanced edges Bin Dong et.al. 2401.02688 null
2024-01-05 FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF Hao Zhang et.al. 2401.02616 link
2024-01-04 VASE: Object-Centric Appearance and Shape Manipulation of Real Videos Elia Peruzzo et.al. 2401.02473 null
2024-01-04 Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain Xuanhua He et.al. 2401.02161 link
2024-01-04 Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance Jiacheng Wang et.al. 2401.02126 link
2024-01-03 Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions David Junhao Zhang et.al. 2401.01827 link
2024-01-03 AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image Enhancement Kang Fu et.al. 2401.01569 null
2024-01-01 Bracketing is All You Need: Unifying Image Restoration and Enhancement Tasks with Multi-Exposure Images Zhilu Zhang et.al. 2401.00766 link
2024-01-01 From Covert Hiding to Visual Editing: Robust Generative Video Steganography Xueying Mao et.al. 2401.00652 null
2023-12-31 UGPNet: Universal Generative Prior for Image Restoration Hwayoon Lee et.al. 2401.00370 null
2024-01-02 USFM: A Universal Ultrasound Foundation Model Generalized to Tasks and Organs towards Label Efficient Image Analysis Jing Jiao et.al. 2401.00153 null
2023-12-30 CamPro: Camera-based Anti-Facial Recognition Wenjun Zhu et.al. 2401.00151 link
2023-12-28 Improving Image Restoration through Removing Degradations in Textual Representations Jingbo Lin et.al. 2312.17334 link
2023-12-28 Personalized Restoration via Dual-Pivot Tuning Pradyumna Chari et.al. 2312.17234 null
2023-12-28 Restoration by Generation with Constrained Priors Zheng Ding et.al. 2312.17161 null
2023-12-29 DarkShot: Lighting Dark Images with Low-Compute and High-Quality Jiazhang Zheng et.al. 2312.16805 null
2023-12-28 ZONE: Zero-Shot Instruction-Guided Local Editing Shanglin Li et.al. 2312.16794 link
2023-12-27 Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation Rongyu Zhang et.al. 2312.16610 null
2023-12-27 Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance Tomer Garber et.al. 2312.16519 link
2023-12-27 A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss Xiao Fang et.al. 2312.16498 link
2023-12-30 A Survey on Super Resolution for video Enhancement Using GAN Ankush Maity et.al. 2312.16471 null
2023-12-27 Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer Yongsong Huang et.al. 2312.16455 null
2023-12-26 Geometric-Aware Low-Light Image and Video Enhancement via Depth Guidance Yingqi Lin et.al. 2312.15855 null
2023-12-25 High-Fidelity Diffusion-based Image Editing Chen Hou et.al. 2312.15707 null
2023-12-25 Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration Jiahong Fu et.al. 2312.15701 link
2023-12-25 MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility Ahsan Baidar Bakht et.al. 2312.15633 null
2023-12-24 Perception-Distortion Balanced Super-Resolution: A Multi-Objective Optimization Perspective Lingchen Sun et.al. 2312.15408 link
2023-12-23 Revealing Shadows: Low-Light Image Enhancement Using Self-Calibrated Illumination Farzaneh Koohestani et.al. 2312.15199 null
2023-12-22 UniHuman: A Unified Model for Editing Human Images in the Wild Nannan Li et.al. 2312.14985 link
2023-12-22 Tuning-Free Inversion-Enhanced Control for Consistent Image Editing Xiaoyue Duan et.al. 2312.14611 null
2023-12-22 StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors Wanchao Su et.al. 2312.14389 null
2023-12-22 Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation Chaowei Fang et.al. 2312.14387 null
2023-12-22 Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal Yicheng Leng et.al. 2312.14383 null
2023-12-20 Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis Bichen Wu et.al. 2312.13834 null
2023-12-22 AppAgent: Multimodal Agents as Smartphone Users Chi Zhang et.al. 2312.13771 link
2023-12-21 HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in Image Editing via Hypernetworks Hai Zhang et.al. 2312.13537 link
2023-12-20 Texture Matching GAN for CT Image Enhancement Madhuri Nagare et.al. 2312.13422 null
2023-12-20 ClassLIE: Structure- and Illumination-Adaptive Classification for Low-Light Image Enhancement Zixiang Wei et.al. 2312.13265 null
2023-12-21 RadEdit: stress-testing biomedical vision models via diffusion image editing Fernando Pérez-García et.al. 2312.12865 null
2023-12-20 ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement Yuhui Wu et.al. 2312.12826 null
2023-12-21 RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing Shutong Jin et.al. 2312.12635 null
2023-12-19 Fixed-point Inversion for Text-to-image diffusion models Barak Meiri et.al. 2312.12540 link
2023-12-19 Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion Fan Zhang et.al. 2312.12471 link
2023-12-19 MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers Haoyu Ma et.al. 2312.12468 null
2023-12-18 Ultrasound Image Enhancement using CycleGAN and Perceptual Loss Shreeram Athreya et.al. 2312.11748 link
2023-12-18 TIP: Text-Driven Image Processing with Semantic and Restoration Instructions Chenyang Qi et.al. 2312.11595 null
2023-12-18 Warping the Residuals for Image Editing with StyleGAN Ahmet Burak Yildirim et.al. 2312.11422 null
2023-12-18 MAG-Edit: Localized Image Editing in Complex Scenarios via $\underline{M}$ask-Based $\underline{A}$ttention-Adjusted $\underline{G}$ uidance Qi Mao et.al. 2312.11396 null
2023-12-18 CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update Zhi Gao et.al. 2312.10908 null
2023-12-17 Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models Nikita Starodubcev et.al. 2312.10835 link
2023-12-17 Latent Space Editing in Transformer-Based Flow Matching Vincent Tao Hu et.al. 2312.10825 null
2023-12-17 Bengali License Plate Recognition: Unveiling Clarity with CNN and GFP-GAN Noushin Afrin et.al. 2312.10701 link
2023-12-19 VidToMe: Video Token Merging for Zero-Shot Video Editing Xirui Li et.al. 2312.10656 link
2023-12-16 Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge Conghan Yue et.al. 2312.10299 link
2023-12-15 Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation Qin Guo et.al. 2312.10113 link
2023-12-15 Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement Xiaofeng Zhang et.al. 2312.10109 link
2023-12-15 A Case Study of Image Enhancement Algorithms’ Effectiveness of Improving Neural Networks’ Performance on Adverse Images Jonathan Sanderson et.al. 2312.09509 null
2023-12-15 System Integration of Xilinx DPU and HDMI for Real-Time inference in PYNQ Environment with Image Enhancement Jonathan Sanderson et.al. 2312.09506 null
2023-12-15 Image Deblurring using GAN Zhengdong Li et.al. 2312.09496 null
2023-12-14 LIME: Localized Image Editing via Attention Regularization in Diffusion Models Enis Simsar et.al. 2312.09256 null
2023-12-14 SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds Minghao Chen et.al. 2312.09246 null
2023-12-14 Guided Image Restoration via Simultaneous Feature and Image Guided Fusion Xinyi Liu et.al. 2312.08853 null
2023-12-14 VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook Wenbin Zou et.al. 2312.08606 link
2023-12-13 A Compact and Semantic Latent Space for Disentangled and Controllable Image Editing Gwilherm Lesné et.al. 2312.08256 link
2023-12-13 EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset Peiqi Duan et.al. 2312.08220 null
2023-12-13 Clockwork Diffusion: Efficient Generation With Model-Step Distillation Amirhossein Habibian et.al. 2312.08128 link
2023-12-13 AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing Zhiyuan Ma et.al. 2312.08019 link
2023-12-13 CoIE: Chain-of-Instruct Editing for Multi-Attribute Face Manipulation Zhenduo Zhang et.al. 2312.07879 null
2023-12-13 Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements Gaurav Shrivastava et.al. 2312.07835 null
2023-12-12 Uncertainty Visualization via Low-Dimensional Posterior Projections Omer Yair et.al. 2312.07804 link
2023-12-12 Hyper-Restormer: A General Hyperspectral Image Restoration Transformer for Remote Sensing Imaging Yo-Yu Lai et.al. 2312.07016 null
2023-12-12 DGNet: Dynamic Gradient-guided Network with Noise Suppression for Underwater Image Enhancement Jingchun Zhou et.al. 2312.06999 null
2023-12-12 IA2U: A Transfer Plugin with Multi-Prior for In-Air Model to Underwater Jingchun Zhou et.al. 2312.06955 null
2023-12-12 WaterHE-NeRF: Water-ray Tracing Neural Radiance Fields for Underwater Scene Reconstruction Jingchun Zhou et.al. 2312.06946 null
2023-12-11 SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models Yuzhou Huang et.al. 2312.06739 link
2023-12-11 Learning to See Low-Light Images via Feature Domain Adaptation Qirui Yang et.al. 2312.06723 null
2023-12-10 Neutral Editing Framework for Diffusion-based Video Editing Sunjae Yoon et.al. 2312.06708 null
2023-12-11 UIEDP:Underwater Image Enhancement with Diffusion Prior Dazhao Du et.al. 2312.06240 link
2023-12-11 DisControlFace: Disentangled Control for Personalized Facial Image Editing Haozhe Jia et.al. 2312.06193 null
2023-12-11 Textual Prompt Guided Image Restoration Qiuhai Yan et.al. 2312.06162 link
2023-12-10 A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing Maomao Li et.al. 2312.05856 link
2023-12-09 BARET : Balanced Attention based Real image Editing driven by Target-text Inversion Yuming Qiao et.al. 2312.05482 null
2023-12-08 NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models Yusuf Dalva et.al. 2312.05390 null
2023-12-08 Learning 3D Particle-based Simulators from RGB-D Videos William F. Whitney et.al. 2312.05359 null
2023-12-08 Fine Dense Alignment of Image Bursts through Camera Pose and Depth Estimation Bruno Lecouat et.al. 2312.05190 null
2023-12-08 Prompt-In-Prompt Learning for Universal Image Restoration Zilong Li et.al. 2312.05038 link
2023-12-08 Decoupling Degradation and Content Processing for Adverse Weather Image Restoration Xi Wang et.al. 2312.05006 null
2023-12-07 Inversion-Free Image Editing with Natural Language Sihan Xu et.al. 2312.04965 link
2023-12-07 GenDeF: Learning Generative Deformation Field for Video Generation Wen Wang et.al. 2312.04561 null
2023-12-07 RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models Ozgur Kara et.al. 2312.04524 link
2023-12-07 Ricci-Notation Tensor Framework for Model-Based Approaches to Imaging Dileepan Joseph et.al. 2312.04018 link
2023-12-06 A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement Risab Biswas et.al. 2312.03946 link
2023-12-06 FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability Linze Li et.al. 2312.03775 null
2023-12-05 DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing Shao-Yu Chang et.al. 2312.03772 null
2023-12-07 Intrinsic Harmonization for Illumination-Aware Compositing Chris Careaga et.al. 2312.03698 link
2023-12-06 Training Neural Networks on RAW and HDR Images for Restoration Tasks Lei Luo et.al. 2312.03640 link
2023-12-06 Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention Jianjin Xu et.al. 2312.03556 null
2023-12-05 MagicStick: Controllable Video Editing via Control Handle Transformations Yue Ma et.al. 2312.03047 link
2023-12-05 Drag-A-Video: Non-rigid Video Editing with Point-based Interaction Yao Teng et.al. 2312.02936 null
2023-12-05 Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration Yuang Ai et.al. 2312.02918 null
2023-12-05 BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models Fengyuan Shi et.al. 2312.02813 link
2023-12-05 Deep-learning-driven end-to-end metalens imaging Joonhyuk Seo et.al. 2312.02669 link
2023-12-05 GeNIe: Generative Hard Negative Images Through Diffusion Soroush Abbasi Koohpayegani et.al. 2312.02548 link
2023-12-05 SAVE: Protagonist Diversification with Structure Agnostic Video Editing Yeji Song et.al. 2312.02503 null
2023-12-04 Peer attention enhances student learning Songlin Xu et.al. 2312.02358 link
2023-12-05 VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence Yuchao Gu et.al. 2312.02087 null
2023-12-04 SRTransGAN: Image Super-Resolution using Transformer based Generative Adversarial Network Neeraj Baghel et.al. 2312.01999 null
2023-12-05 Multi-task Image Restoration Guided By Robust DINO Features Xin Lin et.al. 2312.01677 null
2023-12-04 Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training Runze He et.al. 2312.01663 null
2023-12-03 T3D: Towards 3D Medical Image Understanding through Vision-Language Pre-training Che Liu et.al. 2312.01529 null
2023-12-03 Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain Adaptation for Medical Image Enhancement Heng Li et.al. 2312.01338 link
2023-12-03 An Augmented Lagrangian Primal-Dual Semismooth Newton Method for Multi-Block Composite Optimization Zhanwang Deng et.al. 2312.01273 null
2023-12-02 Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation Zhipeng Du et.al. 2312.01220 link
2023-12-02 Taming Latent Diffusion Models to See in the Dark Qiang Wen et.al. 2312.01027 null
2023-12-01 Zero-Shot Video Question Answering with Procedural Programs Rohan Choudhury et.al. 2312.00937 null
2023-12-01 Adversarial Score Distillation: When score distillation meets GAN Min Wei et.al. 2312.00739 link
2023-11-30 Advancements and Trends in Ultra-High-Resolution Image Processing: An Overview Zhuoran Zheng et.al. 2312.00250 null
2023-11-30 VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models Zhen Xing et.al. 2311.18837 null
2023-11-30 MotionEditor: Editing Video Motion via Content-Aware Diffusion Shuyuan Tu et.al. 2311.18830 link
2023-11-30 Motion-Conditioned Image Animation for Video Editing Wilson Yan et.al. 2311.18827 null
2023-11-30 Is Underwater Image Enhancement All Object Detectors Need? Yudong Wang et.al. 2311.18814 link
2023-11-30 Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing Hyelin Nam et.al. 2311.18608 null
2023-11-30 ZeST-NeRF: Using temporal aggregation for Zero-Shot Temporal NeRFs Violeta Menéndez González et.al. 2311.18491 null
2023-11-30 Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis Zipeng Qi et.al. 2311.18435 null
2023-11-30 On Exact Inversion of DPM-Solvers Seongmin Hong et.al. 2311.18387 null
2023-11-30 A Novel Variational Approach for Multiphoton Microscopy Image Restoration: from PSF Estimation to 3D Deconvolution Julien Ajdenbaum et.al. 2311.18386 null
2023-11-30 HiPA: Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation Yifan Zhang et.al. 2311.18158 null
2023-11-29 Variational Bayes image restoration with compressive autoencoders Maud Biquard et.al. 2311.17744 null
2023-11-29 Improving Stability during Upsampling – on the Importance of Spatial Context Shashank Agnihotri et.al. 2311.17524 null
2023-11-29 VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model Haoyu Zhao et.al. 2311.17338 link
2023-11-28 Optimisation-Based Multi-Modal Semantic Image Editing Bowen Li et.al. 2311.16882 null
2023-11-28 Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration Chen Zhao et.al. 2311.16845 link
2023-11-28 Decomposer: Semi-supervised Learning of Image Restoration and Image Decomposition Boris Meinardus et.al. 2311.16829 null
2023-11-28 LEDITS++: Limitless Image Editing using Text-to-Image Models Manuel Brack et.al. 2311.16711 null
2023-11-28 Full-resolution MLPs Empower Medical Dense Prediction Mingyuan Meng et.al. 2311.16707 link
2023-11-28 MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation Sitong Su et.al. 2311.16635 null
2023-11-27 LLMGA: Multimodal Large Language Model based Generation Assistant Bin Xia et.al. 2311.16500 link
2023-11-28 Text-Driven Image Editing via Learnable Regions Yuanze Lin et.al. 2311.16432 link
2023-11-27 Joint Deep Image Restoration and Unsupervised Quality Assessment Hakan Emre Gedik et.al. 2311.16372 null
2023-11-27 Self-correcting LLM-controlled Diffusion Models Tsung-Han Wu et.al. 2311.16090 link
2023-11-27 Real Time GAZED: Online Shot Selection and Editing of Virtual Cameras from Wide-Angle Monocular Video Recordings Sudheer Achary et.al. 2311.15581 null
2023-11-26 FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration Zihao Zou et.al. 2311.15445 null
2023-11-26 Sketch Video Synthesis Yudian Zheng et.al. 2311.15306 link
2023-11-24 Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion Minshan Xie et.al. 2311.14343 null
2023-11-23 A New Benchmark and Model for Challenging Image Manipulation Detection Zhenfei Zhang et.al. 2311.14218 link
2023-11-23 Posterior Distillation Sampling Juil Koo et.al. 2311.13831 null
2023-11-22 Retargeting Visual Data with Deformation Fields Tim Elsner et.al. 2311.13297 null
2023-11-20 PanBench: Towards High-Resolution and High-Performance Pansharpening Shiying Wang et.al. 2311.12083 null
2023-11-19 EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models Ruoxi Chen et.al. 2311.12066 null
2023-11-20 Cut-and-Paste: Subject-Driven Video Editing with Attention Control Zhichao Zuo et.al. 2311.11697 null
2023-11-20 Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement Yanyan Wei et.al. 2311.11695 null
2023-11-20 Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model Chunming He et.al. 2311.11638 link
2023-11-20 Deep Equilibrium Diffusion Restoration with Parallel Sampling Jiezhang Cao et.al. 2311.11600 link
2023-11-19 On the Noise Scheduling for Generating Plausible Designs with Diffusion Models Jiajie Fan et.al. 2311.11207 null
2023-11-17 Astronomical Images Quality Assessment with Automated Machine Learning Olivier Parisot et.al. 2311.10617 null
2023-11-16 K-space Cold Diffusion: Learning to Reconstruct Accelerated MRI without Noise Guoyao Shen et.al. 2311.10162 link
2023-11-16 Emu Edit: Precise Image Editing via Recognition and Generation Tasks Shelly Sheynin et.al. 2311.10089 null
2023-11-15 FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier Zhongjie Duan et.al. 2311.09265 link
2023-11-14 The Perception-Robustness Tradeoff in Deterministic Image Restoration Guy Ohayon et.al. 2311.09253 null
2023-11-15 Progressive Feedback-Enhanced Transformer for Image Forgery Localization Haochen Zhu et.al. 2311.08910 link
2023-11-14 Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation Zhihang Zhong et.al. 2311.08007 link
2023-11-09 Dynamic Association Learning of Self-Attention and Convolution in Image Restoration Kui Jiang et.al. 2311.05147 null
2023-11-08 LuminanceL1Loss: A loss function which measures percieved brightness and colour differences Dominic De Jonge et.al. 2311.04614 null
2023-11-11 Learning the What and How of Annotation in Video Object Segmentation Thanos Delatolas et.al. 2311.04414 null
2023-11-07 Energy-based Calibrated VAE with Test Time Free Lunch Yihong Luo et.al. 2311.04071 link
2023-11-07 CLIP Guided Image-perceptive Prompt Learning for Image Enhancement Zinuo Li et.al. 2311.03943 null
2023-11-07 Constrained Regularization by Denoising with Automatic Parameter Selection Pasquale Cascarano et.al. 2311.03819 null
2023-11-06 Pelvic floor MRI segmentation based on semi-supervised deep learning Jianwei Zuo et.al. 2311.03105 null
2023-11-06 A New Extrapolation Economy Cascadic Multigrid Method for Image Restoration Problems Zhaoteng Chu et.al. 2311.03010 null
2023-11-06 Zero-Shot Enhancement of Low-Light Image Based on Retinex Decomposition Wenchao Li et.al. 2311.02995 link
2023-11-08 Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things Li Ping Qian et.al. 2311.02926 link
2023-11-03 Cascadic Tensor Multigrid Method and Economic Cascadic Tensor Multigrid Method for Image Restoration Problems Ziqi Yan et.al. 2311.01924 null
2023-11-02 The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing Shen Nie et.al. 2311.01410 null
2023-11-02 Convergent plug-and-play with proximal denoiser and unconstrained regularization parameter Samuel Hurault et.al. 2311.01216 null
2023-11-03 On Manipulating Scene Text in the Wild with Diffusion Models Joshua Santoso et.al. 2311.00734 link
2023-11-01 fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding Xuelin Qian et.al. 2311.00342 null
2023-11-01 RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method Wangzhen Peng et.al. 2311.00246 link
2023-11-01 Consistent Video-to-Video Transfer Using Synthetic Dataset Jiaxin Cheng et.al. 2311.00213 link
2023-10-31 Image Restoration with Point Spread Function Regularization and Active Learning Peng Jia et.al. 2311.00186 null
2023-10-31 Navigating the Complex Landscape of Shock Filter Cahn-Hilliard Equation: From Regularized to Young Measure Solutions Darko Mitrovic et.al. 2310.20383 null
2023-10-31 Low-Dose CT Image Enhancement Using Deep Learning A. Demir et.al. 2310.20265 null
2023-10-31 UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer Xuhang Chen et.al. 2310.20210 link
2023-10-30 IterInv: Iterative Inversion for Pixel-Level T2I Models Chuanming Tang et.al. 2310.19540 link
2023-10-29 Learning to Follow Object-Centric Image Editing Instructions Faithfully Tuhin Chakrabarty et.al. 2310.19145 link
2023-10-28 PrObeD: Proactive Object Detection Wrapper Vishal Asnani et.al. 2310.18788 null
2023-10-27 Always Clear Days: Degradation Type and Severity Aware All-In-One Adverse Weather Removal Yu-Wei Chen et.al. 2310.18293 link
2023-10-27 DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF Chaowei Liu et.al. 2310.17910 null
2023-10-27 Global Structure-Aware Diffusion Process for Low-Light Image Enhancement Jinhui Hou et.al. 2310.17577 link
2023-10-26 AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors You-Ming Chang et.al. 2310.17419 link
2023-10-25 Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models Tianyi Lu et.al. 2310.16400 link
2023-10-24 From Posterior Sampling to Meaningful Diversity in Image Restoration Noa Cohen et.al. 2310.16047 null
2023-10-24 CVPR 2023 Text Guided Video Editing Competition Jay Zhangjie Wu et.al. 2310.16003 link
2023-10-26 Integrating View Conditions for Image Synthesis Jinbin Bai et.al. 2310.16002 link
2023-10-19 Neural Degradation Representation Learning for All-In-One Image Restoration Mingde Yao et.al. 2310.12848 link
2023-10-18 Object-aware Inversion and Reassembly for Image Editing Zhen Yang et.al. 2310.12149 link
2023-10-18 A Comparative Study of Image Restoration Networks for General Backbone Network Design Xiangyu Chen et.al. 2310.11881 link
2023-10-16 LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation Ruiqi Wu et.al. 2310.10769 link
2023-10-21 BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys Yu Gu et.al. 2310.10765 null
2023-10-16 A Survey on Video Diffusion Models Zhen Xing et.al. 2310.10647 link
2023-10-16 Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models Kevin Black et.al. 2310.10639 link
2023-10-16 DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing Jia-Wei Liu et.al. 2310.10624 null
2023-10-16 Unifying Image Processing as Visual Prompting Question Answering Yihao Liu et.al. 2310.10513 null
2023-10-17 AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion Yitong Jiang et.al. 2310.10123 null
2023-10-15 ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context Binglun Wang et.al. 2310.09965 null
2023-10-15 LOVECon: Text-driven Training-Free Long Video Editing with ControlNet Zhenyi Liao et.al. 2310.09711 link
2023-10-14 Dimma: Semi-supervised Low Light Image Enhancement with Adaptive Dimming Wojciech Kozłowski et.al. 2310.09633 link
2023-10-13 Image Cropping under Design Constraints Takumi Nishiyasu et.al. 2310.08892 null
2023-10-12 DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing Yueming Lyu et.al. 2310.08785 link
2023-10-12 Frequency-Aware Re-Parameterization for Over-Fitting Based Image Compression Yun Ye et.al. 2310.08068 null
2023-10-11 Uncovering Hidden Connections: Iterative Tracking and Reasoning for Video-grounded Dialog Haoyu Zhang et.al. 2310.07259 link
2023-10-10 Tweedie Moment Projected Diffusions For Inverse Problems Benjamin Boys et.al. 2310.06721 null
2023-10-10 Improving Compositional Text-to-image Generation with Large Vision-Language Models Song Wen et.al. 2310.06311 null
2023-10-10 Three-Dimensional Medical Image Fusion with Deformable Cross-Attention Lin Liu et.al. 2310.06291 null
2023-10-09 FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing Yuren Cong et.al. 2310.05922 null
2023-10-09 Dipole-Spread Function Engineering for 6D Super-Resolution Microscopy Tingting Wu et.al. 2310.05810 null
2023-10-08 ITRE: Low-light Image Enhancement Based on Illumination Transmission Ratio Estimation Yu Wang et.al. 2310.05158 null
2023-10-07 Combining UPerNet and ConvNeXt for Contrails Identification to reduce Global Warming Zhenkuan Wang et.al. 2310.04808 link
2023-10-06 Towards A Robust Group-level Emotion Recognition via Uncertainty-Aware Learning Qing Zhu et.al. 2310.04306 null
2023-10-06 Degradation-Aware Self-Attention Based Transformer for Blind Image Super-Resolution Qingguo Liu et.al. 2310.04180 link
2023-10-04 ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF Jangho Park et.al. 2310.02712 null
2023-10-04 Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis Han Zhang et.al. 2310.02641 null
2023-10-03 EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods Samyadeep Basu et.al. 2310.02426 null
2023-10-03 Leveraging Classic Deconvolution and Feature Extraction in Zero-Shot Image Restoration Tomáš Chobola et.al. 2310.02097 link
2023-10-02 ImagenHub: Standardizing the evaluation of conditional image generation models Max Ku et.al. 2310.01596 link
2023-10-02 Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Xuan Ju et.al. 2310.01506 link
2023-10-02 Conditional Diffusion Distillation Kangfu Mei et.al. 2310.01407 link
2023-10-02 Sequential Data Generation with Groupwise Diffusion Process Sangyun Lee et.al. 2310.01400 null
2023-10-02 A Restoration Network as an Implicit Prior Yuyang Hu et.al. 2310.01391 null
2023-10-02 Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models Hyeonho Jeong et.al. 2310.01107 link
2023-10-02 Controlling Vision-Language Models for Universal Image Restoration Ziwei Luo et.al. 2310.01018 link
2023-10-02 JPEG Information Regularized Deep Image Prior for Denoising Tsukasa Takagi et.al. 2310.00894 null
2023-10-01 Exchange means change: an unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange Hongruixuan Chen et.al. 2310.00689 link
2023-09-29 Guiding Instruction-based Image Editing via Multimodal Large Language Models Tsu-Jui Fu et.al. 2309.17102 link
2023-09-29 Denoising Diffusion Bridge Models Linqi Zhou et.al. 2309.16948 link
2023-09-28 KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing Jiancheng Huang et.al. 2309.16608 null
2023-09-28 CCEdit: Creative and Controllable Video Editing via Diffusion Models Ruoyu Feng et.al. 2309.16496 null
2023-09-28 Joint Correcting and Refinement for Balanced Low-Light Image Enhancement Nana Yu et.al. 2309.16128 link
2023-09-27 Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness Valentin Barriere et.al. 2309.15991 null
2023-09-27 Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Kai Wang et.al. 2309.15664 link
2023-09-27 Guided Frequency Loss for Image Restoration Bilel Benjdiraa et.al. 2309.15563 null
2023-09-27 Uncertainty Quantification via Neural Posterior Principal Components Elias Nehme et.al. 2309.15533 null
2023-09-27 VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning Yanan Wang et.al. 2309.15494 null
2023-09-27 Survey on Deep Face Restoration: From Non-blind to Blind and Beyond Wenjie Li et.al. 2309.15490 link
2023-09-26 FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing Songyan Chen et.al. 2309.14934 null
2023-09-26 Image Denoising via Style Disentanglement Jingwei Niu et.al. 2309.14755 null
2023-09-26 Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement Jiancheng Huang et.al. 2309.14709 null
2023-09-25 Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments Najmeh Mohammadbagheri et.al. 2309.14267 null
2023-09-25 Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time Cheng-Hung Chan et.al. 2309.14022 null
2023-09-25 Diverse Semantic Image Editing with Style Codes Hakan Sivuk et.al. 2309.13975 link
2023-09-25 In-Domain GAN Inversion for Faithful Reconstruction and Editability Jiapeng Zhu et.al. 2309.13956 null
2023-09-25 Adversarial Attacks on Video Object Segmentation with Hard Region Discovery Ping Li et.al. 2309.13857 null
2023-09-21 License Plate Super-Resolution Using Diffusion Models Sawsan AlHalawani et.al. 2309.12506 null
2023-09-21 Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction Yu Tian et.al. 2309.11811 link
2023-09-21 PIE: Simulating Disease Progression via Progressive Image Editing Kaizhao Liang et.al. 2309.11745 link
2023-09-21 Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal Xiao Feng Zhang et.al. 2309.11715 null
2023-09-19 Local Lipschitz continuity for energy integrals with slow growth and lower order terms Michela Eleuteri et.al. 2309.10727 null
2023-09-19 Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image Denoising Yujin Wang et.al. 2309.10714 null
2023-09-19 Forgedit: Text Guided Image Editing via Learning and Forgetting Shiwen Zhang et.al. 2309.10556 link
2023-09-16 AOSR-Net: All-in-One Sandstorm Removal Network Yazhong Si et.al. 2309.08838 null
2023-09-16 Dual-Camera Joint Deblurring-Denoising Shayan Shekarforoush et.al. 2309.08826 null
2023-09-15 Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation Surveillance Jingxiang Qu et.al. 2309.08382 link
2023-09-14 A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing Yujie Feng et.al. 2309.07524 null
2023-09-13 FAIR: Frequency-aware Image Restoration for Industrial Visual Anomaly Detection Tongkun Liu et.al. 2309.07068 link
2023-09-13 DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision Xiangchen Yin et.al. 2309.06941 null
2023-09-13 Improving Deep Learning-based Defect Detection on Window Frames with Image Processing Strategies Jorge Vasquez et.al. 2309.06731 null
2023-09-12 Can we predict the Most Replayed data of video streaming platforms? Alessandro Duico et.al. 2309.06102 link
2023-09-12 Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration Gang Wu et.al. 2309.06023 link
2023-09-12 Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos Sarthak Bhagat et.al. 2309.05943 null
2023-09-11 PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud Chengyu Wang et.al. 2309.05534 null
2023-09-11 HAT: Hybrid Attention Transformer for Image Restoration Xiangyu Chen et.al. 2309.05239 link
2023-09-10 Effective Real Image Editing with Accelerated Iterative Diffusion Inversion Zhihong Pan et.al. 2309.04907 null
2023-09-09 UnitModule: A Lightweight Joint Image Enhancement Module for Underwater Object Detection Zhuoyan Liu et.al. 2309.04708 null
2023-09-08 MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers Sijia Li et.al. 2309.04372 null
2023-09-08 Toward Sufficient Spatial-Frequency Interaction for Gradient-aware Underwater Image Enhancement Chen Zhao et.al. 2309.04089 link
2023-09-07 Feature Enhancer Segmentation Network (FES-Net) for Vessel Segmentation Tariq M. Khan et.al. 2309.03535 null
2023-09-07 Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip Strategy Yi Tang et.al. 2309.03445 link
2023-09-06 SLiMe: Segment Like Me Aliasghar Khani et.al. 2309.03179 link
2023-09-06 Prompt-based All-in-One Image Restoration using CNNs and Transformer Hu Gao et.al. 2309.03063 link
2023-09-05 Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Lili Yu et.al. 2309.02591 null
2023-09-05 SAM-Deblur: Let Segment Anything Boost Image Deblurring Siwei Li et.al. 2309.02270 link
2023-09-05 Advanced Underwater Image Restoration in Complex Illumination Conditions Yifan Song et.al. 2309.02217 null
2023-09-05 Empowering Low-Light Image Enhancer through Customized Learnable Priors Naishan Zheng et.al. 2309.01958 link
2023-09-07 Implicit Neural Image Stitching With Enhanced and Blended Feature Reconstruction Minsu Kim et.al. 2309.01409 link
2023-09-04 Memory augment is All You Need for image restoration Xiao Feng Zhang et.al. 2309.01377 link
2023-09-04 Restoration Guarantee of Image Inpainting via Low Rank Patch Matrix Completion Jian-Feng Cai et.al. 2309.01328 null
2023-09-03 Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction Xiaoke Shang et.al. 2309.01183 null
2023-09-03 Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception Zengxi Zhang et.al. 2309.01102 null
2023-09-02 MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation Hanshu Yan et.al. 2309.00908 null
2023-09-02 A Generic Fundus Image Enhancement Network Boosted by Frequency Self-supervised Representation Learning Heng Li et.al. 2309.00885 link
2023-09-01 Iterative Multi-granular Image Editing using Diffusion Models K J Joseph et.al. 2309.00613 null
2023-08-31 Robust GAN inversion Egor Sevriugov et.al. 2308.16510 null
2023-08-30 Feature Attention Network (FA-Net): A Deep-Learning Based Approach for Underwater Single Image Enhancement Muhammad Hamza et.al. 2308.15868 null
2023-08-30 Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models Zhanbo Feng et.al. 2308.15854 link
2023-08-31 Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement Basit Alawode et.al. 2308.15816 link
2023-08-29 IndGIC: Supervised Action Recognition under Low Illumination Jingbo Zeng et.al. 2308.15345 null
2023-08-29 DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior Xinqi Lin et.al. 2308.15070 link
2023-08-28 Copy-Paste Image Augmentation with Poisson Image Editing for Ultrasound Instance Segmentation Learning Wei-Hsiang Shen et.al. 2308.14772 null
2023-08-28 MagicEdit: High-Fidelity and Temporally Coherent Video Editing Jun Hao Liew et.al. 2308.14749 null
2023-08-28 1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation Tao Zhang et.al. 2308.14392 link
2023-08-28 MetaWeather: Few-Shot Weather-Degraded Image Restoration via Degradation Pattern Matching Youngrae Kim et.al. 2308.14334 link
2023-08-27 Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection Xin Feng et.al. 2308.14061 link
2023-08-26 Generalized Lightness Adaptation with Channel Selective Normalization Mingde Yao et.al. 2308.13783 link
2023-08-25 Residual Denoising Diffusion Models Jiawei Liu et.al. 2308.13712 link
2023-08-25 Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions Yibo Wang et.al. 2308.13178 null
2023-08-25 Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model Xunpeng Yi et.al. 2308.13164 null
2023-08-26 CDAN: Convolutional Dense Attention-guided Network for Low-light Image Enhancement Hossein Shakibania et.al. 2308.12902 link
2023-08-24 MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices Xiangyu Chen et.al. 2308.12494 link
2023-08-23 Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement Dehuan Zhang et.al. 2308.11932 link
2023-08-21 EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints Yutao Chen et.al. 2308.10648 null
2023-08-24 Patternshop: Editing Point Patterns by Image Manipulation Xingchang Huang et.al. 2308.10517 null
2023-08-20 Blind Face Restoration for Under-Display Camera via Dictionary Guided Transformer Jingfan Tan et.al. 2308.10196 null
2023-08-22 WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning Dongjian Huo et.al. 2308.10195 null
2023-08-19 ASPIRE: Language-Guided Augmentation for Robust Image Classification Sreyan Ghosh et.al. 2308.10103 link
2023-08-19 Semantic-Human: Neural Rendering of Humans from Monocular Video with Human Parsing Jie Zhang et.al. 2308.09894 null
2023-08-18 Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis Jonathon Luiten et.al. 2308.09713 link
2023-08-18 SimDA: Simple Diffusion Adapter for Efficient Video Generation Zhen Xing et.al. 2308.09710 null
2023-08-18 StableVideo: Text-driven Consistency-aware Diffusion Video Editing Wenhao Chai et.al. 2308.09592 link
2023-08-18 Diffusion Models for Image Restoration and Enhancement – A Comprehensive Survey Xin Li et.al. 2308.09388 link
2023-08-18 DiffLLE: Diffusion-guided Domain Calibration for Unsupervised Low-light Image Enhancement Shuzhou Yang et.al. 2308.09279 null
2023-08-17 Edit Temporal-Consistent Videos with Image Diffusion Model Yuanzhi Wang et.al. 2308.09091 link
2023-08-17 Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration Liyan Wang et.al. 2308.08730 link
2023-08-16 Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network Yinglong Wang et.al. 2308.08220 null
2023-08-21 Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement Jianyu Wen et.al. 2308.08197 link
2023-08-15 Geometry of the Visual Cortex with Applications to Image Inpainting and Enhancement Francesco Ballerin et.al. 2308.07652 link
2023-08-14 Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation Alexander Martin et.al. 2308.07316 link
2023-08-13 FastLLVE: Real-Time Low-Light Video Enhancement with Intensity-Aware Lookup Table Wenhao Li et.al. 2308.06749 link
2023-08-12 Tiny and Efficient Model for the Edge Detection Generalization Xavier Soria et.al. 2308.06468 link
2023-08-11 DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models Weijia Wu et.al. 2308.06160 link
2023-08-10 Is there progress in activity progress prediction? Frans de Boer et.al. 2308.05533 link
2023-08-10 A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement Pan Mu et.al. 2308.05447 link
2023-08-10 TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and Localization Soumyaroop Nandi et.al. 2308.05264 null
2023-08-09 Transmission and Color-guided Network for Underwater Image Enhancement Pan Mu et.al. 2308.04892 null
2023-08-09 A Forensic Methodology for Detecting Image Manipulations Jiwon Lee et.al. 2308.04723 link
2023-08-08 Under-Display Camera Image Restoration with Scattering Effect Binbin Song et.al. 2308.04163 link
2023-08-06 Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction Xiaohong Fan et.al. 2308.03807 link
2023-08-06 PNN: From proximal algorithms to robust unfolded image denoising networks and Plug-and-Play methods Hoang Trieu Vy Le et.al. 2308.03139 null
2023-08-06 NNVISR: Bring Neural Network Video Interpolation and Super Resolution into Video Processing Framework Yuan Tong et.al. 2308.03121 link
2023-08-06 FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information Chenxi Wang et.al. 2308.03033 link
2023-08-06 Brighten-and-Colorize: A Decoupled Network for Customized Low-Light Image Enhancement Chenxi Wang et.al. 2308.03029 null
2023-08-06 All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation Cheng Zhang et.al. 2308.03021 null
2023-08-06 Recurrent Spike-based Image Restoration under General Illumination Lin Zhu et.al. 2308.03018 link
2023-08-05 Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image Enhancement Huake Wang et.al. 2308.02776 null
2023-08-04 CTP-Net: Character Texture Perception Network for Document Image Forgery Localization Xin Liao et.al. 2308.02158 null
2023-08-03 A Multidimensional Analysis of Social Biases in Vision Transformers Jannik Brinkmann et.al. 2308.01948 null
2023-08-02 WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond Zengxi Zhang et.al. 2308.00931 null
2023-08-02 ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation Yasheng Sun et.al. 2308.00906 null
2023-08-01 Decomposition Ascribed Synergistic Learning for Unified Image Restoration Jinghao Zhang et.al. 2308.00759 null
2023-08-01 Context-Aware Talking-Head Video Editing Songlin Yang et.al. 2308.00462 null
2023-08-01 Space Debris: Are Deep Learning-based Image Enhancements part of the Solution? Michele Jamrozik et.al. 2308.00408 null
2023-07-28 Benchmarking Anomaly Detection System on various Jetson Edge Devices Hoang Viet Pham et.al. 2307.16834 link
2023-07-31 From Generation to Suppression: Towards Effective Irregular Glow Removal for Nighttime Visibility Enhancement Wanyu Wu et.al. 2307.16783 null
2023-07-30 RealityCanvas: Augmented Reality Sketching for Embedded and Responsive Scribble Animation Effects Zhijie Xia et.al. 2307.16116 link
2023-07-27 Fast Dust Sand Image Enhancement Based on Color Correction and New Membership Function Ali Hakem Alsaeedi et.al. 2307.15230 null
2023-07-27 The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation Lingdong Kong et.al. 2307.15061 link
2023-07-27 Meta-Processing: A robust framework for multi-tasks seismic processing Shijun Cheng et.al. 2307.14851 link
2023-07-27 Semantic Image Completion and Enhancement using GANs Priyansh Saxena et.al. 2307.14748 null
2023-07-27 LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement Tao Wang et.al. 2307.14659 link
2023-07-26 SuperInpaint: Learning Detail-Enhanced Attentional Implicit Representation for Super-resolutional Image Inpainting Canyu Zhang et.al. 2307.14489 null
2023-08-01 Phenotype-preserving metric design for high-content image reconstruction by generative inpainting Vaibhav Sharma et.al. 2307.14436 link
2023-07-26 Visual Instruction Inversion: Image Editing via Visual Prompting Thao Nguyen et.al. 2307.14331 link
2023-07-25 On the unreasonable vulnerability of transformers for image restoration – and an easy fix Shashank Agnihotri et.al. 2307.13856 null
2023-07-24 Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry Yong-Hyun Park et.al. 2307.12868 link
2023-07-24 A Theoretically Guaranteed Quaternion Weighted Schatten p-norm Minimization Method for Color Image Restoration Qing-Hua Zhang et.al. 2307.12656 link
2023-07-25 TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition Shilin Lu et.al. 2307.12493 link
2023-07-22 Real-Time Neural Video Recovery and Enhancement on Mobile Devices Zhaoyuan He et.al. 2307.12152 null
2023-07-21 Physics-Aware Semi-Supervised Underwater Image Enhancement Hao Qi et.al. 2307.11470 null
2023-07-20 OBJECT 3DIT: Language-guided 3D-aware Image Editing Oscar Michel et.al. 2307.11073 null
2023-07-20 Lighting up NeRF via Unsupervised Decomposition and Enhancement Haoyuan Wang et.al. 2307.10664 link
2023-07-20 Physics-Driven Turbulence Image Restoration with Stochastic Refinement Ajay Jaiswal et.al. 2307.10603 link
2023-07-23 TokenFlow: Consistent Diffusion Features for Consistent Video Editing Michal Geyer et.al. 2307.10373 null
2023-07-19 Text2Layer: Layered Image Generation using Latent Diffusion Model Xinyang Zhang et.al. 2307.09781 null
2023-07-19 NTIRE 2023 Quality Assessment of Video Enhancement Challenge Xiaohong Liu et.al. 2307.09729 null
2023-07-18 Division Gets Better: Learning Brightness-Aware and Detail-Sensitive Representations for Low-Light Image Enhancement Huake Wang et.al. 2307.09104 null
2023-07-18 Unleashing the Imagination of Text: A Novel Framework for Text-to-image Person Retrieval via Exploring the Power of Words Delong Liu et.al. 2307.09059 link
2023-07-18 Soft-IntroVAE for Continuous Latent space Image Super-Resolution Zhi-Song Liu et.al. 2307.09008 null
2023-07-18 Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond Yang Zhao et.al. 2307.08996 null
2023-07-18 Revisiting Latent Space of GAN Inversion for Real Image Editing Kai Katsumata et.al. 2307.08995 null
2023-07-18 CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing Ahmet Canberk Baykal et.al. 2307.08397 null
2023-07-16 LUCYD: A Feature-Driven Richardson-Lucy Deconvolution Network Tomáš Chobola et.al. 2307.07998 link
2023-07-15 Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning? Jialu Gao et.al. 2307.07837 null
2023-07-15 HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance Chunming He et.al. 2307.07829 null
2023-07-15 ExposureDiffusion: Learning to Expose for Low-light Image Enhancement Yufei Wang et.al. 2307.07710 link
2023-07-15 DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration Yuanshuo Cheng et.al. 2307.07688 null
2023-07-15 INVE: Interactive Neural Video Editing Jiahui Huang et.al. 2307.07663 null
2023-07-08 Face Image Quality Enhancement Study for Face Recognition Iqbal Nouyed et.al. 2307.05534 null
2023-07-11 Bio-Inspired Night Image Enhancement Based on Contrast Enhancement and Denoising Xinyi Bai et.al. 2307.05447 null
2023-07-10 FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing Pengyang Ling et.al. 2307.04684 link
2023-07-11 DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer Dan Ruta et.al. 2307.04157 null
2023-07-12 Latent Graph Attention for Enhanced Spatial Context Ayush Singh et.al. 2307.04149 null
2023-07-09 Enhancing Low-Light Images Using Infrared-Encoded Images Shulin Tian et.al. 2307.04122 null
2023-07-07 Joint Perceptual Learning for Enhancement and Object Detection in Underwater Scenarios Chenping Fu et.al. 2307.03536 null
2023-07-06 UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering Triet M. Thai et.al. 2307.02783 null
2023-07-05 LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion Long Bai et.al. 2307.02452 link
2023-07-05 DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models Chong Mou et.al. 2307.02421 link
2023-07-05 Generative Adversarial Networks for Dental Patient Identity Protection in Orthodontic Educational Imaging Mingchuan Tian et.al. 2307.02019 null
2023-07-04 Augment Features Beyond Color for Domain Generalized Segmentation Qiyu Sun et.al. 2307.01703 null
2023-07-02 LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance Linoy Tsaban et.al. 2307.00522 null
2023-06-29 FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude Feng Liu et.al. 2306.17206 null
2023-06-28 PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing Wenjing Huang et.al. 2306.16894 link
2023-06-29 Low-Light Enhancement in the Frequency Domain Hao Chen et.al. 2306.16782 null
2023-06-27 Cutting-Edge Techniques for Depth Map Super-Resolution Ryan Peterson et.al. 2306.15244 null
2023-06-27 DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing Yujun Shi et.al. 2306.14435 link
2023-07-01 Faster Segment Anything: Towards Lightweight SAM for Mobile Applications Chaoning Zhang et.al. 2306.14289 link
2023-06-25 Diffusion Model Based Low-Light Image Enhancement for Space Satellite Yiman Zhu et.al. 2306.14227 null
2023-06-25 A Gated Cross-domain Collaborative Network for Underwater Object Detection Linhui Dai et.al. 2306.14141 link
2023-06-23 ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration Jiaqi Ma et.al. 2306.13653 link
2023-06-23 Augmenting Sports Videos with VisCommentator Zhutian Chen et.al. 2306.13491 null
2023-06-22 PromptIR: Prompting for All-in-One Blind Image Restoration Vaishnav Potlapalli et.al. 2306.13090 link
2023-06-22 Continuous Layout Editing of Single Images with Diffusion Models Zhiyuan Zhang et.al. 2306.13078 null
2023-06-22 Restoration of the JPEG Maximum Lossy Compressed Face Images with Hourglass Block based on Early Stopping Discriminator Jongwook Si et.al. 2306.12757 null