CV Arxiv Daily

Updated on 2025.10.15

Usage instructions: here

diffusion

Publish Date	Title	Authors	PDF	Code
2025-07-16	MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing	Shreya Kadambi et.al.	2507.13401	null
2025-07-11	Beyond Scores: Proximal Diffusion Models	Zhenghan Fang et.al.	2507.08956	null
2025-07-10	Fisher Score Matching for Simulation-Based Forecasting and Inference	Ce Sui et.al.	2507.07833	null
2025-07-09	SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN	Luca Mariotti et.al.	2507.06895	null
2025-07-11	A Malliavin calculus approach to score functions in diffusion generative models	Ehsan Mirafzali et.al.	2507.05550	null
2025-07-07	Generalization bounds for score-based generative models: a synthetic proof	Arthur Stéphanovitch et.al.	2507.04794	null
2025-07-04	Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis	Tyler Farghly et.al.	2507.03756	null
2025-07-04	Nonparametric regression for cost-effectiveness analyses with observational data – a tutorial	Jonas Esser et.al.	2507.03511	null
2025-07-02	BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation	Rishal Aggarwal et.al.	2507.00846	null
2025-06-29	Autoregressive Denoising Score Matching is a Good Video Anomaly Detector	Hanwen Zhang et.al.	2506.23282	null
2025-06-27	Score-Based Model for Low-Rank Tensor Recovery	Zhengyun Cheng et.al.	2506.22295	null
2025-06-24	Quantum Neural Networks for Propensity Score Estimation and Survival Analysis in Observational Biomedical Studies	Vojtěch Novák et.al.	2506.19973	null
2025-06-18	Origins of Creativity in Attention-Based Diffusion Models	Emma Finn et.al.	2506.17324	null
2025-06-17	Expressive Score-Based Priors for Distribution Matching with Geometry-Preserving Regularization	Ziyu Gong et.al.	2506.14607	link
2025-06-12	Variance estimation after matching or re-weighting	Xiang Meng et.al.	2506.11317	link
2025-06-09	Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels	Davide Carbone et.al.	2506.07843	null
2025-06-06	Direct Fisher Score Estimation for Likelihood Maximization	Sherman Khoo et.al.	2506.06542	null
2025-06-03	IGSM: Improved Geometric and Sensitivity Matching for Finetuning Pruned Diffusion Models	Caleb Zheng et.al.	2506.05398	link
2025-06-05	Learning normalized image densities via dual score matching	Florentin Guth et.al.	2506.05310	null
2025-06-02	An Introduction to Flow Matching and Diffusion Models	Peter Holderrieth et.al.	2506.02070	null
2025-05-31	Score Matching With Missing Data	Josh Givens et.al.	2506.00557	null
2025-05-29	Estimation of Gender Wage Gap in the University of North Carolina System	Zihan Zhang et.al.	2505.24078	null
2025-05-29	Score-based Generative Modeling for Conditional Independence Testing	Yixin Ren et.al.	2505.23309	link
2025-05-26	Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage	Chenguang Wang et.al.	2505.19431	null
2025-05-14	Robust Knowledge Graph Embedding via Denoising	Tengwei Song et.al.	2505.18171	null
2025-05-22	Learning non-equilibrium diffusions with Schrödinger bridges: from exactly solvable to simulation-free	Stephen Y. Zhang et.al.	2505.16644	null
2025-05-20	Compositional amortized inference for large-scale hierarchical Bayesian models	Jonas Arruda et.al.	2505.14429	null
2025-05-20	Extension of Dynamic Network Biomarker using the propensity score method: Simulation of causal effects on variance and correlation coefficient	Satoru Shinoda et.al.	2505.13846	null
2025-05-19	Score-Based Training for Energy-Based TTS Models	Wanli Sun et.al.	2505.13771	null
2025-05-16	Approximation and Generalization Abilities of Score-based Neural Network Generative Models for Sub-Gaussian Distributions	Guoji Fu et.al.	2505.10880	null
2025-05-20	Whitened Score Diffusion: A Structured Prior for Imaging Inverse Problems	Jeffrey Alido et.al.	2505.10311	link
2025-05-08	Score-based Self-supervised MRI Denoising	Jiachen Tu et.al.	2505.05631	null
2025-05-08	Graffe: Graph Representation Learning via Diffusion Probabilistic Models	Dingshuo Chen et.al.	2505.04956	null
2025-05-07	Localized Diffusion Models for High Dimensional Distributions Generation	Georg A. Gottwald et.al.	2505.04417	null
2025-05-02	Incorporating Inductive Biases to Energy-based Generative Models	Yukun Li et.al.	2505.01111	null
2025-04-29	Frequency Feature Fusion Graph Network For Depression Diagnosis Via fNIRS	Chengkai Yang et.al.	2504.21064	null
2025-05-20	Coreset selection for the Sinkhorn divergence and generic smooth divergences	Alex Kokot et.al.	2504.20194	link
2025-04-27	Generalized Score Matching: Bridging $f$ -Divergence and Statistical Estimation Under Correlated Noise	Yirong Shen et.al.	2504.19288	null
2025-05-17	Score-Based Deterministic Density Sampling	Vasily Ilin et.al.	2504.18130	null
2025-05-01	Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching	Kewen Peng et.al.	2504.17066	null
2025-04-23	Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion	Ruixiang Zhang et.al.	2504.16431	null
2025-04-22	InstaRevive: One-Step Image Enhancement via Dynamic Score Matching	Yixuan Zhu et.al.	2504.15513	null
2025-04-16	Generalization through variance: how noise shapes inductive biases in diffusion models	John J. Vastola et.al.	2504.12532	link
2025-04-15	Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination	Mika Setälä et.al.	2504.12347	null
2025-04-14	Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems	Karthik Elamvazhuthi et.al.	2504.09836	null
2025-04-13	Knowledge Independence Breeds Disruption but Limits Recognition	Xiaoyao Yu et.al.	2504.09589	null
2025-04-07	DDPM Score Matching and Distribution Learning	Sinho Chewi et.al.	2504.05161	null
2025-04-04	Gaussian Process Tilted Nonparametric Density Estimation using Fisher Divergence Score Matching	John Paisley et.al.	2504.03485	null
2025-04-02	A Unified Approach to Analysis and Design of Denoising Markov Models	Yinuo Ren et.al.	2504.01938	null
2025-03-31	Empirical Analysis of Digital Innovations Impact on Corporate ESG Performance: The Mediating Role of GAI Technology	Jun Cui et.al.	2504.01041	null
2025-03-28	Demographic Factors Associated with Triage Acuity, Admission and Length of Stay During Adult Emergency Department Visits	Helena Coggan et.al.	2503.22781	link
2025-03-27	Inequality Restricted Minimum Density Power Divergence Estimation in Panel Count Data	Udita Goswami et.al.	2503.21534	null
2025-03-22	Solving Schrödinger bridge problem via continuous normalizing flow	Yang Jing et.al.	2503.17829	link
2025-03-21	Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation	Sophia Tang et.al.	2503.17361	null
2025-03-20	Improving Discriminator Guidance in Diffusion Models	Alexandre Verine et.al.	2503.16117	null
2025-03-18	Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance	Liya Guo et.al.	2503.14569	null
2025-03-14	From Denoising Score Matching to Langevin Sampling: A Fine-Grained Error Analysis in the Gaussian Setting	Samuel Hurault et.al.	2503.11615	null
2025-03-21	Aligning Text to Image in Diffusion Models is Easier Than You Think	Jaa-Yeon Lee et.al.	2503.08250	link
2025-03-26	Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models	Junzhe Li et.al.	2503.08120	null
2025-03-11	Computational bottlenecks for denoising diffusions	Andrea Montanari et.al.	2503.08028	null
2025-03-09	Exponential-polynomial divergence based inference for nondestructive one-shot devices under progressive stress model	Shanya Baghel et.al.	2503.06414	null
2025-03-09	Causal Discovery and Inference towards Urban Elements and Associated Factors	Tao Feng et.al.	2503.06395	null
2025-03-04	Exact matching as an alternative to propensity score matching	Ekkehard Glimm et.al.	2503.02850	null
2025-03-03	FlowDec: A flow-based full-band general audio codec with high perceptual quality	Simon Welker et.al.	2503.01485	link
2025-03-02	Underdamped Diffusion Bridges with Applications to Sampling	Denis Blessing et.al.	2503.01006	link
2025-02-27	Stein’s unbiased risk estimate and Hyvärinen’s score matching	Sulagna Ghosh et.al.	2502.20123	null
2025-03-04	Agnostic calculation of atomic free energies with the descriptor density of states	Thomas D Swinburne et.al.	2502.18191	link
2025-02-27	A Fokker-Planck-Based Loss Function that Bridges Dynamics with Density Estimation	Zhixin Lu et.al.	2502.17690	null
2025-02-25	Generalization error bound for denoising score matching under relaxed manifold assumption	Konstantin Yakovlev et.al.	2502.13662	null
2025-02-18	Score Matching Riemannian Diffusion Means	Frederik Möbius Rygaard et.al.	2502.13106	null
2025-02-19	X-IL: Exploring the Design Space of Imitation Learning Policies	Xiaogang Jia et.al.	2502.12330	link
2025-02-14	Dimension-free Score Matching and Time Bootstrapping for Diffusion Models	Syamantak Kumar et.al.	2502.10354	null
2025-02-12	Concentration Inequalities for the Stochastic Optimization of Unbounded Objectives with Application to Denoising Score Matching	Jeremiah Birrell et.al.	2502.08628	null
2025-02-07	Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?	Yujin Han et.al.	2502.04725	null
2025-02-05	Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization	Yu-Han Wu et.al.	2502.03435	null
2025-02-04	Generative Modeling on Lie Groups via Euclidean Generalized Score Matching	Marco Bertolini et.al.	2502.02513	null
2025-02-03	Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning	Hanyang Zhao et.al.	2502.01819	null
2025-02-05	Weak-to-Strong Diffusion with Reflection	Lichen Bai et.al.	2502.00473	null
2025-02-01	Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning Curves	Anand Jerry George et.al.	2502.00336	null
2025-02-13	Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans	Christian Wald et.al.	2501.16839	null
2025-01-27	EDSep: An Effective Diffusion-Based Method for Speech Source Separation	Jinwei Dong et.al.	2501.15965	null
2025-01-27	Memorization and Regularization in Generative Diffusion Models	Ricardo Baptista et.al.	2501.15785	link
2025-01-24	Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation	Anish Abhijit Diwan et.al.	2501.14856	null
2025-01-23	MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance	Wooseok Song et.al.	2501.13449	null
2025-01-22	Sequential Change Point Detection via Denoising Score Matching	Wenbin Zhou et.al.	2501.12667	null
2025-01-13	Likelihood Training of Cascaded Diffusion Models via Hierarchical Volume-preserving Maps	Henry Li et.al.	2501.06999	link
2025-01-10	Explainable Federated Bayesian Causal Inference and Its Application in Advanced Manufacturing	Xiaofeng Xiao et.al.	2501.06077	link
2025-01-09	Propensity score matching in semaglutide retrospective studies	Elizabeth Mohney et.al.	2501.05533	null
2025-01-09	Robust Score Matching	Richard Schwank et.al.	2501.05105	null
2024-12-28	An analytic theory of creativity in convolutional diffusion models	Mason Kamb et.al.	2412.20292	null
2024-12-18	Catalysts of Conversation: Examining Interaction Dynamics Between Topic Initiators and Commentors in Alzheimer’s Disease Online Communities	Congning Ni et.al.	2412.13388	null
2024-12-19	Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation	Bofang Jia et.al.	2412.09265	null
2024-12-10	Score Change of Variables	Stephen Robbins et.al.	2412.07904	null
2024-12-10	Score-matching-based Structure Learning for Temporal Data on Networks	Hao Chen et.al.	2412.07469	null
2024-12-09	Improving Source Extraction with Diffusion and Consistency Models	Tornike Karchkhadze et.al.	2412.06965	link
2024-12-09	Generative Lines Matching Models	Ori Matityahu et.al.	2412.06403	null
2024-12-06	Local Curvature Smoothing with Stein’s Identity for Efficient Score Matching	Genki Osada et.al.	2412.03962	null
2024-12-03	How to Use Diffusion Priors under Sparse Views?	Qisen Wang et.al.	2412.02225	link
2024-11-29	Pretrained Reversible Generation as Unsupervised Visual Representation Learning	Rongkun Xue et.al.	2412.01787	null
2024-11-29	Riemannian Denoising Score Matching for Molecular Structure Optimization with Accurate Energy	Jeheon Woo et.al.	2411.19769	null
2024-11-27	Building Confidence in Deep Generative Protein Design	Tianyuan Zheng et.al.	2411.18568	link
2024-11-20	Comprehensive Methodology for Sample Augmentation in EEG Biomarker Studies for Alzheimers Risk Classification	Veronica Henao Isaza et.al.	2411.17717	null
2024-11-14	Propensity Score Matching: Should We Use It in Designing Observational Studies?	Fei Wan et.al.	2411.09579	null
2024-11-14	Efficiently learning and sampling multimodal distributions with data-based initialization	Frederic Koehler et.al.	2411.09117	null
2024-11-13	Parameter Inference via Differentiable Diffusion Bridge Importance Sampling	Nicklas Boserup et.al.	2411.08993	link
2024-11-02	Supervised Score-Based Modeling by Gradient Boosting	Changyuan Zhao et.al.	2411.01159	null
2024-10-31	TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt	Jiahui Yang et.al.	2410.21299	null
2024-10-27	Hamiltonian Score Matching and Generative Flows	Peter Holderrieth et.al.	2410.20470	null
2024-10-25	Dimension reduction via score ratio matching	Ricardo Baptista et.al.	2410.19990	null
2024-10-23	Semi-Implicit Functional Gradient Flow	Shiyue Zhang et.al.	2410.17935	null
2024-10-18	Mitigating Embedding Collapse in Diffusion Models for Categorical Data	Bac Nguyen et.al.	2410.14758	null
2024-10-17	Diffusing States and Matching Scores: A New Framework for Imitation Learning	Runzhe Wu et.al.	2410.13855	link
2024-10-15	l_inf-approximation of localized distributions	Tiangang Cui et.al.	2410.11771	null
2024-10-14	High-Dimensional Differential Parameter Inference in Exponential Family using Time Score Matching	Daniel J. Williams et.al.	2410.10637	link
2024-10-21	On Divergence Measures for Training GFlowNets	Tiago da Silva et.al.	2410.09355	null
2024-10-11	Linear Convergence of Diffusion Models Under the Manifold Hypothesis	Peter Potaptchik et.al.	2410.09046	null
2024-10-15	Score Neural Operator: A Generative Model for Learning and Generalizing Across Multiple Probability Distributions	Xinyu Liao et.al.	2410.08549	null
2024-10-05	Is Score Matching Suitable for Estimating Point Processes?	Haoqun Cao et.al.	2410.04037	link
2024-10-04	Classification-Denoising Networks	Louis Thiry et.al.	2410.03505	null
2024-10-02	Equivariant score-based generative models provably learn distributions with symmetries efficiently	Ziyu Chen et.al.	2410.01244	null
2024-10-01	Generative Precipitation Downscaling using Score-based Diffusion with Wasserstein Regularization	Yuhao Liu et.al.	2410.00381	null
2024-09-12	Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning	Hanyang Zhao et.al.	2409.08400	null
2024-09-11	From optimal score matching to optimal sampling	Zehao Dou et.al.	2409.07032	null
2024-09-02	Highly Accurate Real-space Electron Densities with Neural Networks	Lixue Cheng et.al.	2409.01306	null
2024-08-29	A Score-Based Density Formula, with Applications in Diffusion Generative Models	Gen Li et.al.	2408.16765	null
2024-08-27	Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging	Yuanhao Li et.al.	2408.14843	null
2024-08-26	Evaluating the effectiveness of public policies on COVID-19 containment: A PSM-DID approach	Zihan Wang et.al.	2408.14108	link
2024-08-22	Variance reduction of diffusion model’s gradients with Taylor approximation-based control variate	Paul Jeha et.al.	2408.12270	null
2024-08-21	MR Optimized Reconstruction of Simultaneous Multi-Slice Imaging Using Diffusion Model	Ting Zhao et.al.	2408.08883	null
2024-08-13	A comparison of methods for estimating the average treatment effect on the treated for externally controlled trials	Huan Wang et.al.	2408.07193	null
2024-08-09	Bootstrap Matching: a robust and efficient correction for non-random A/B test, and its applications	Zihao Zheng et.al.	2408.05297	null
2024-07-26	Score matching through the roof: linear, nonlinear, and latent variables causal discovery	Francesco Montagna et.al.	2407.18755	null
2024-07-24	Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference	Jian Xu et.al.	2407.17033	null
2024-07-22	Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond	Silvio Galesso et.al.	2407.15739	link
2024-07-23	Score matching for bridges without time-reversals	Elizabeth L. Baker et.al.	2407.15455	link
2024-08-12	Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes	Alexandre Abraham et.al.	2407.14861	null
2024-07-12	Learning Distances from Data with Normalizing Flows and Score Matching	Peter Sorrenson et.al.	2407.09297	null
2024-07-10	What’s the score? Automated Denoising Score Matching for Nonlinear Diffusions	Raghav Singhal et.al.	2407.07998	null
2024-06-15	Stein’s Method of Moments on the Sphere	Adrian Fischer et.al.	2407.02299	null
2024-07-01	Learning data efficient coarse-grained molecular dynamics from forces and noise	Aleksander E. P. Durumeric et.al.	2407.01286	link
2024-07-18	Localizing Anomalies via Multiscale Score Matching Analysis	Ahsan Mahmood et.al.	2407.00148	link
2024-06-20	A Practical Diffusion Path for Sampling	Omar Chehab et.al.	2406.14040	null
2024-07-25	Evaluating the design space of diffusion-based generative models	Yuqing Wang et.al.	2406.12839	null
2024-06-17	Score-fPINN: Fractional Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck-Levy Equations	Zheyuan Hu et.al.	2406.11676	null
2024-06-13	Operator-informed score matching for Markov diffusion models	Zheyang Shen et.al.	2406.09084	null
2024-06-13	Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency	Maor Dikter et.al.	2406.08840	link
2024-06-12	CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models	Hyungjin Chung et.al.	2406.08070	null
2024-06-11	DualBind: A Dual-Loss Framework for Protein-Ligand Binding Affinity Prediction	Meng Liu et.al.	2406.07770	null
2024-06-08	Mean-field Chaos Diffusion Models	Sungwoo Park et.al.	2406.05396	null
2024-06-07	Combinatorial Complex Score-based Diffusion Modelling through Stochastic Differential Equations	Adrien Carrel et.al.	2406.04916	link
2024-06-07	Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study	Chong Zhang et.al.	2406.04633	null
2024-06-04	Democratizing Propensity Score Matching Using Web Application	Adam Gajtkowski et.al.	2406.02743	null
2024-06-03	Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation	Mingyuan Zhou et.al.	2406.01561	link
2024-05-29	Kernel Semi-Implicit Variational Inference	Ziheng Cheng et.al.	2405.18997	link
2024-06-06	Simulating infinite-dimensional nonlinear diffusion bridges	Gefan Yang et.al.	2405.18353	link
2024-05-24	ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching	Yumin Zhang et.al.	2405.15914	link
2024-05-24	Score-based generative models are provably robust: an uncertainty quantification perspective	Nikiforos Mimikos-Stamatopoulos et.al.	2405.15754	null
2024-05-24	Nonlinear denoising score matching for enhanced learning of structured distributions	Jeremiah Birrell et.al.	2405.15625	null
2024-05-18	Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching	Xingyu Miao et.al.	2405.11252	link
2024-05-17	High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates	Janick Weberpals et.al.	2405.10925	null
2024-05-15	Response Matching for generating materials and molecules	Bingqing Cheng et.al.	2405.09057	null
2024-05-08	A score-based particle method for homogeneous Landau equation	Yan Huang et.al.	2405.05187	null
2024-05-06	Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models	Ludwig Winkler et.al.	2405.03549	null
2024-05-05	Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction	Xiaoyu Qiao et.al.	2405.02958	null
2024-05-03	SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning	Qian Long et.al.	2405.01839	null
2024-04-29	Learning general Gaussian mixtures with efficient score matching	Sitan Chen et.al.	2404.18893	null
2024-04-24	Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations	Kaiwen Xue et.al.	2404.15766	link
2024-04-23	Score matching for sub-Riemannian bridge sampling	Erlend Grong et.al.	2404.15258	null
2024-04-20	A Massive MIMO Sampling Detection Strategy Based on Denoising Diffusion Model	Lanxin He et.al.	2404.13281	null
2024-04-22	Generative Modelling with High-Order Langevin Dynamics	Ziqiang Shi et.al.	2404.12814	null
2024-04-16	Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution	Yutao Yuan et.al.	2404.10688	link
2024-04-18	Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models	Qi Guo et.al.	2404.10335	link
2024-04-15	Convergence Analysis of Probability Flow ODE for Score-based Generative Models	Daniel Zhengyu Huang et.al.	2404.09730	link
2024-03-25	The Impact of Pradhan Mantri Ujjwala Yojana on Indian Households	Nabeel Asharaf et.al.	2403.17112	null
2024-03-25	Optimal convex $M$ -estimation via score matching	Oliver Y. Feng et.al.	2403.16688	null
2024-03-21	MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection	Jakub Micorek et.al.	2403.14497	link
2024-03-10	Propensity-score matching analysis in COVID-19-related studies: a method and quality systematic review	Chunhui Gu et.al.	2403.07023	null
2024-03-10	UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation	Kwanyoung Kim et.al.	2403.06275	null
2024-03-04	Soft-constrained Schrodinger Bridge: a Stochastic Control Approach	Jhanvi Garg et.al.	2403.01717	null
2024-03-02	Re-evaluating the impact of hormone replacement therapy on heart disease using match-adaptive randomization inference	Samuel D. Pimentel et.al.	2403.01330	null
2024-03-02	Training Unbiased Diffusion Models From Biased Dataset	Yeongmin Kim et.al.	2403.01189	link
2024-03-04	Structure-Guided Adversarial Training of Diffusion Models	Ling Yang et.al.	2402.17563	null
2024-02-27	Label-Noise Robust Diffusion Models	Byeonghu Na et.al.	2402.17517	link
2024-02-23	The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling	Jiajun Ma et.al.	2402.15170	null
2024-02-13	Space-Time Bridge-Diffusion	Hamidreza Behjoo et.al.	2402.08847	null
2024-02-13	Target Score Matching	Valentin De Bortoli et.al.	2402.08667	null
2024-02-23	Score-based generative models break the curse of dimensionality in learning a family of sub-Gaussian probability distributions	Frank Cole et.al.	2402.08082	null
2024-02-12	Optimal score estimation via empirical Bayes smoothing	Andre Wibisono et.al.	2402.07747	null
2024-02-12	Score-based Diffusion Models via Stochastic Differential Equations – a Technical Tutorial	Wenpin Tang et.al.	2402.07487	null
2024-02-12	Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck Equations	Zheyuan Hu et.al.	2402.07465	null
2024-02-09	Particle Denoising Diffusion Sampler	Angus Phillips et.al.	2402.06320	link
2024-02-09	Iterated Denoising Energy Matching for Sampling from Boltzmann Densities	Tara Akhound-Sadegh et.al.	2402.06121	link
2024-02-08	Time Series Diffusion in the Frequency Domain	Jonathan Crabbé et.al.	2402.05933	link
2024-02-06	Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization	Fangzhao Zhang et.al.	2402.01965	null
2024-02-02	Conditioning non-linear and infinite-dimensional diffusion processes	Elizabeth Louise Baker et.al.	2402.01434	link
2024-01-29	Domain adaptation strategies for 3D reconstruction of the lumbar spine using real fluoroscopy data	Sascha Jecklin et.al.	2401.16027	null
2024-02-06	Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization	Yinbin Han et.al.	2401.15604	null
2024-02-04	Free public transport to the destination: A causal analysis of tourists’ travel mode choice	Kevin Blättler et.al.	2401.14945	null
2024-01-23	Contractive Diffusion Probabilistic Models	Wenpin Tang et.al.	2401.13115	null
2024-01-22	ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter	Yi-Chiao Wu et.al.	2401.12160	null
2024-01-14	Score-matching neural networks for improved multi-band source separation	Matt L. Sampson et.al.	2401.07313	link
2024-01-04	Bring Metric Functions into Diffusion Models	Jie An et.al.	2401.02414	null
2024-01-19	Diffusion Model with Perceptual Loss	Shanchuan Lin et.al.	2401.00110	null
2024-01-04	High-Fidelity Diffusion-based Image Editing	Chen Hou et.al.	2312.15707	null
2023-12-16	Bayes-Optimal Unsupervised Learning for Channel Estimation in Near-Field Holographic MIMO	Wentao Yu et.al.	2312.10438	null
2023-12-16	Continuous Diffusion for Mixed-Type Tabular Data	Markus Mueller et.al.	2312.10431	link
2023-12-14	Noise in the reverse process improves the approximation capabilities of diffusion models	Karthik Elamvazhuthi et.al.	2312.07851	null
2023-12-11	Adversarial Estimation of Topological Dimension with Harmonic Score Maps	Eric Yeats et.al.	2312.06869	null
2023-12-09	The New Age of Collusion? An Empirical Study into Airbnb’s Pricing Dynamics and Market Behavior	Richeng Piao et.al.	2312.05633	null
2023-12-14	Stochastic Optimal Control Matching	Carles Domingo-Enrich et.al.	2312.02027	link
2023-11-30	SMaRt: Improving GANs with Score Matching Regularity	Mengfei Xia et.al.	2311.18208	null
2023-11-23	Sample-Efficient Training for Diffusion	Shivam Gupta et.al.	2311.13745	null
2023-12-02	LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching	Yixun Liang et.al.	2311.11284	link
2023-11-14	Learning Bayes-Optimal Channel Estimation for Holographic MIMO in Unknown EM Environments	Wentao Yu et.al.	2311.07908	null
2023-11-10	FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores	Daniel Y. Fu et.al.	2311.05908	null
2023-10-30	Scaling Riemannian Diffusion Models	Aaron Lou et.al.	2310.20030	null
2023-10-27	Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling	Zhenyu Zhu et.al.	2310.18123	null
2023-10-26	Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration	Longlin Yu et.al.	2310.17153	link
2023-10-25	Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution	Aaron Lou et.al.	2310.16834	link
2023-10-25	SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process	Zichong Li et.al.	2310.16336	link
2023-10-25	Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification	Zichong Li et.al.	2310.16310	null
2023-10-22	Shortcuts for causal discovery of nonlinear models by score matching	Francesco Montagna et.al.	2310.14246	null
2023-11-14	On propensity score matching with a diverging number of matches	Yihui He et.al.	2310.14142	link
2023-10-20	Assumption violations in causal discovery and the robustness of score matching	Francesco Montagna et.al.	2310.13387	link
2023-10-19	Closed-Form Diffusion Models	Christopher Scarvelis et.al.	2310.12395	null
2023-10-17	Sadness, Anger, or Anxiety: Twitter Users’ Emotional Responses to Toxicity in Public Conversations	Ana Aleksandric et.al.	2310.11436	null
2023-10-12	Debias the Training of Diffusion Models	Hu Yu et.al.	2310.08442	link
2023-10-09	Integration-free Training for Spatio-temporal Multimodal Covariate Deep Kernel Point Processes	Yixuan Zhang et.al.	2310.05485	null
2023-10-06	Generative Diffusion From An Action Principle	Akhil Premkumar et.al.	2310.04490	null
2023-10-09	Diffusion Random Feature Model	Esha Saha et.al.	2310.04417	null
2023-10-04	On Memorization in Diffusion Models	Xiangming Gu et.al.	2310.02664	link
2023-10-03	Stochastic force inference via density estimation	Victor Chardès et.al.	2310.02366	null
2023-10-01	Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion	Dongjun Kim et.al.	2310.02279	link
2023-10-03	Sampling Multimodal Distributions with the Vanilla Score: Benefits of Data-Based Initialization	Frederic Koehler et.al.	2310.01762	null
2023-09-29	EPiC-ly Fast Particle Cloud Generation with Flow-Matching and Diffusion	Erik Buhmann et.al.	2310.00049	null
2023-09-28	Bayesian Cramér-Rao Bound Estimation with Score-Based Models	Evan Scope Crafts et.al.	2309.16076	null
2023-09-20	Score Mismatching for Generative Modeling	Senmao Ye et.al.	2309.11043	link
2023-09-18	Sex-based Disparities in Brain Aging: A Focus on Parkinson’s Disease	Iman Beheshti et.al.	2309.10069	null
2023-09-18	Single and Few-step Diffusion for Generative Speech Enhancement	Bunlong Lay et.al.	2309.09677	link
2023-09-06	Matcha-TTS: A fast TTS architecture with conditional flow matching	Shivam Mehta et.al.	2309.03199	link
2023-08-29	MadSGM: Multivariate Anomaly Detection with Score-based Generative Models	Haksoo Lim et.al.	2308.15069	null
2023-08-24	Machine Unlearning for Causal Inference	Vikas Ramachandra et.al.	2308.13559	null
2023-08-22	Expressive probabilistic sampling in recurrent neural networks	Shirui Chen et.al.	2308.11809	link
2023-08-22	Convergence guarantee for consistency models	Junlong Lyu et.al.	2308.11449	null
2023-08-31	Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models	Heyang Xue et.al.	2308.10428	null
2023-08-19	Semi-Implicit Variational Inference via Score Matching	Longlin Yu et.al.	2308.10014	link
2023-08-07	Equity in Focus : Investigating Gender Disparities in Glioblastoma via Propensity Score Matching	Solomon Eshun et.al.	2308.03827	null
2023-08-07	A Causal Inference Approach to Eliminate the Impacts of Interfering Factors on Traffic Performance Evaluation	Xiaobo Ma et.al.	2308.03545	null
2023-08-04	Diffusion probabilistic models enhance variational autoencoder for crystal structure generative modeling	Teerachote Pakornchote et.al.	2308.02165	null
2023-08-03	Estimating causal quantile exposure response functions via matching	Luca Merlo et.al.	2308.01628	null
2023-08-01	Causal exposure-response curve estimation with surrogate confounders: a study of air pollution and children’s health in Medicaid claims data	Jenny J. Lee et.al.	2308.00812	link
2023-07-25	Implicitly Normalized Explicitly Regularized Density Estimation	Mark Kozdoba et.al.	2307.13763	null
2023-07-20	Analysis of the rate of force development reveals high neuromuscular fatigability in elderly patients with chronic kidney disease	Antoine Chatrenet et.al.	2307.10691	null
2023-07-15	Variational Inference with Gaussian Score Matching	Chirag Modi et.al.	2307.07849	link
2023-07-12	Energy Discrepancies: A Score-Independent Loss for Energy-Based Models	Tobias Schröder et.al.	2307.06431	link
2023-07-07	Simulation-free Schrödinger bridges via score and flow matching	Alexander Tong et.al.	2307.03672	link
2023-07-02	MissDiff: Training Diffusion Models on Tabular Data with Missing Values	Yidong Ouyang et.al.	2307.00467	null
2023-06-24	Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching	H. J. Terry Suh et.al.	2306.14079	null
2023-08-03	Masked Diffusion Models Are Fast and Privacy-Aware Learners	Jiachen Lei et.al.	2306.11363	link
2023-06-20	Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains	Yilong Qin et.al.	2306.09332	null
2023-06-15	Fast Training of Diffusion Models with Masked Transformers	Hongkai Zheng et.al.	2306.09305	link
2023-06-15	Training Diffusion Classifiers with Denoising Assistance	Chandramouli Sastry et.al.	2306.09192	null
2023-06-23	Image Reconstruction from Sparse Low-Dose CT Data via Score Matching	Wenxiang Cong et.al.	2306.08610	null
2023-06-13	Inactivated COVID-19 Vaccination did not affect In vitro fertilization (IVF) / Intra-Cytoplasmic Sperm Injection (ICSI) cycle outcomes	Qi Wan et.al.	2306.07652	null
2023-06-07	ScoreCL: Augmentation-Adaptive Contrastive Learning via Score-Matching Function	JinYoung Kim et.al.	2306.04175	null
2023-06-05	Machine Learning Force Fields with Data Cost Aware Training	Alexander Bukharin et.al.	2306.03109	link
2023-06-05	Faster Training of Diffusion Models and Improved Density Estimation via Parallel Score Matching	Etrit Haxholli et.al.	2306.02658	null

generation

Publish Date	Title	Authors	PDF	Code
2025-07-23	Yume: An Interactive World Generation Model	Xiaofeng Mao et.al.	2507.17744	null
2025-07-23	Attention (as Discrete-Time Markov) Chains	Yotam Erel et.al.	2507.17657	null
2025-07-23	Dual-branch Prompting for Multimodal Machine Translation	Jie Wang et.al.	2507.17588	null
2025-07-23	An h-space Based Adversarial Attack for Protection Against Few-shot Personalization	Xide Xu et.al.	2507.17554	null
2025-07-23	Unsupervised anomaly detection using Bayesian flow networks: application to brain FDG PET in the context of Alzheimer’s disease	Hugues Roy et.al.	2507.17486	null
2025-07-23	EndoGen: Conditional Autoregressive Endoscopic Video Generation	Xinyu Liu et.al.	2507.17388	null
2025-07-23	PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image	Hyeongjin Nam et.al.	2507.17332	null
2025-07-22	AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation	Nima Fathi et.al.	2507.16940	null
2025-07-22	Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed	Antoni Kowalczuk et.al.	2507.16880	null
2025-07-22	DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling	Boheng Li et.al.	2507.16329	null
2025-07-22	MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation	Yanchen Liu et.al.	2507.16310	null
2025-07-22	Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning	Boheng Li et.al.	2507.16302	null
2025-07-22	Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective	Seunghyeon Kim et.al.	2507.16254	null
2025-07-22	Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling	Chao Zhou et.al.	2507.16240	null
2025-07-22	A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology	Katelyn Morrison et.al.	2507.16207	null
2025-07-22	LSSGen: Leveraging Latent Space Scaling in Flow and Diffusion for Efficient Text to Image Generation	Jyun-Ze Tang et.al.	2507.16154	null
2025-07-22	PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation	Yaofang Liu et.al.	2507.16116	null
2025-07-21	Improving Personalized Image Generation through Social Context Feedback	Parul Gupta et.al.	2507.16095	null
2025-07-21	Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models	Enes Sanli et.al.	2507.15824	null
2025-07-21	TokensGen: Harnessing Condensed Tokens for Long Video Generation	Wenqi Ouyang et.al.	2507.15728	null
2025-07-21	A Practical Investigation of Spatially-Controlled Image Generation with Transformers	Guoxuan Xia et.al.	2507.15724	null
2025-07-21	SustainDiffusion: Optimising the Social and Environmental Sustainability of Stable Diffusion Models	Giordano d’Aloisio et.al.	2507.15663	null
2025-07-21	CylinderPlane: Nested Cylinder Representation for 3D-aware Image Generation	Ru Jia et.al.	2507.15606	null
2025-07-21	Conditional Video Generation for High-Efficiency Video Compression	Fangqiu Yi et.al.	2507.15269	null
2025-07-21	Improving Joint Embedding Predictive Architecture with Diffusion Noise	Yuping Qiu et.al.	2507.15216	null
2025-07-20	Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR	Peirong Zhang et.al.	2507.15085	null
2025-07-20	Paired Image Generation with Diffusion-Guided Diffusion Models	Haoxuan Zhang et.al.	2507.14833	null
2025-07-19	BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM	Haiquan Wen et.al.	2507.14632	null
2025-07-18	PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement	Sofia Jamil et.al.	2507.13708	null
2025-07-17	$\nabla$ NABLA: Neighborhood Adaptive Block-Level Attention	Dmitrii Mikhailov et.al.	2507.13546	null
2025-07-17	“PhyWorldBench”: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models	Jing Gu et.al.	2507.13428	null
2025-07-17	Taming Diffusion Transformer for Real-Time Mobile Video Generation	Yushu Wu et.al.	2507.13343	null
2025-07-17	FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization	Chuancheng Shi et.al.	2507.13311	null
2025-07-17	Leveraging Pre-Trained Visual Models for AI-Generated Video Detection	Keerthi Veeramachaneni et.al.	2507.13224	null
2025-07-17	fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting	Alicia Durrer et.al.	2507.13146	null
2025-07-17	Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation	Yi Xin et.al.	2507.13032	null
2025-07-17	A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints	Youssef Tawfilis et.al.	2507.12979	null
2025-07-17	LoViC: Efficient Long Video Generation with Context Compression	Jiaxiu Jiang et.al.	2507.12952	null
2025-07-17	DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization	Dongyeun Lee et.al.	2507.12933	null
2025-07-17	Local Representative Token Guided Merging for Text-to-Image Generation	Min-Jeong Lee et.al.	2507.12771	null
2025-07-17	World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving	Yanchen Guan et.al.	2507.12762	null
2025-07-17	Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models	Samuel Lavoie et.al.	2507.12318	null
2025-07-16	FADE: Adversarial Concept Erasure in Flow Models	Zixuan Fu et.al.	2507.12283	null
2025-07-16	DeepShade: Enable Shade Simulation by Text-conditioned Image Generation	Longchao Da et.al.	2507.12103	null
2025-07-16	ID-EA: Identity-driven Text Enhancement and Adaptation with Textual Inversion for Personalized Text-to-Image Generation	Hyun-Jun Jin et.al.	2507.11990	null
2025-07-16	RaDL: Relation-aware Disentangled Learning for Multi-Instance Text-to-Image Generation	Geon Park et.al.	2507.11947	null
2025-07-16	Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement	Shuichiro Nishigori et.al.	2507.11925	null
2025-07-15	CharaConsist: Fine-Grained Consistent Character Generation	Mengyu Wang et.al.	2507.11533	null
2025-07-15	CATVis: Context-Aware Thought Visualization	Tariq Mehmood et.al.	2507.11522	null
2025-07-15	Implementing Adaptations for Vision AutoRegressive Model	Kaif Shaikh et.al.	2507.11441	null
2025-07-15	MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection	Guanghao Wu et.al.	2507.11252	null
2025-07-15	NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models	X. Feng et.al.	2507.11245	null
2025-07-14	Spatial Reasoners for Continuous Variables in Any Domain	Bart Pogodzinski et.al.	2507.10768	null
2025-07-15	Text Embedding Knows How to Quantize Text-Guided Diffusion Models	Hongjae Lee et.al.	2507.10340	null
2025-07-14	From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation	Jeongho Kim et.al.	2507.10217	null
2025-07-14	Latent Diffusion Models with Masked AutoEncoders	Junho Lee et.al.	2507.09984	null
2025-07-14	Counterfactual Visual Explanation via Causally-Guided Adversarial Steering	Yiran Qiao et.al.	2507.09881	null
2025-07-13	Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI	Shomukh Qari et.al.	2507.09630	null
2025-07-13	Demystifying Flux Architecture	Or Greenberg et.al.	2507.09595	null
2025-07-13	MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models	Haozhe Zhao et.al.	2507.09574	null
2025-07-12	AlphaVAE: Unified End-to-End RGBA Image Reconstruction and Generation with Alpha-Aware Representation Learning	Zile Wang et.al.	2507.09308	null
2025-07-12	$I^{2}$ -World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting	Zhimin Liao et.al.	2507.09144	null
2025-07-12	Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning	Yiyang Chen et.al.	2507.09102	null
2025-07-11	Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective	Hangjie Yuan et.al.	2507.08801	null
2025-07-11	Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation	Liu He et.al.	2507.08513	null
2025-07-11	Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation	Anlin Zheng et.al.	2507.08441	null
2025-07-11	Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers	Wongi Jeong et.al.	2507.08422	null
2025-07-11	Subject-Consistent and Pose-Diverse Text-to-Image Generation	Zhanxin Gao et.al.	2507.08396	null
2025-07-11	From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning	Sen Wang et.al.	2507.08380	null
2025-07-10	Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling	Haoyu Wu et.al.	2507.07982	null
2025-07-10	Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions	Longfei Li et.al.	2507.07978	null
2025-07-10	Scaling RL to Long Videos	Yukang Chen et.al.	2507.07966	null
2025-07-11	T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates	Zhitao Wang et.al.	2507.07633	null
2025-07-10	Digital Salon: An AI and Physics-Driven Tool for 3D Hair Grooming and Simulation	Chengan He et.al.	2507.07387	null
2025-07-09	Scale leads to compositional generalization	Florian Redhardt et.al.	2507.07207	null
2025-07-09	A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality	Mohamed Elmoghany et.al.	2507.07202	null
2025-07-09	Interpretable EEG-to-Image Generation with Semantic Prompts	Arshak Rezvani et.al.	2507.07157	null
2025-07-09	Evaluating Attribute Confusion in Fashion Text-to-Image Generation	Ziyue Liu et.al.	2507.07079	null
2025-07-10	Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting	Fei Teng et.al.	2507.06971	null
2025-07-09	Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation	Tao Feng et.al.	2507.06830	null
2025-07-09	Democratizing High-Fidelity Co-Speech Gesture Video Generation	Xu Yang et.al.	2507.06812	null
2025-07-09	PromptTea: Let Prompts Tell TeaCache the Optimal Threshold	Zishen Huang et.al.	2507.06739	null
2025-07-09	Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution	Yonghyun Park et.al.	2507.06547	null
2025-07-10	Concept Unlearning by Modeling Key Steps of Diffusion Process	Chaoshuo Zhang et.al.	2507.06526	null
2025-07-09	FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation	Liqiang Jing et.al.	2507.06523	null
2025-07-08	FedPhD: Federated Pruning with Hierarchical Learning of Diffusion Models	Qianyu Long et.al.	2507.06449	null
2025-07-08	NeoBabel: A Multilingual Open Tower for Visual Generation	Mohammad Mahdi Derakhshani et.al.	2507.06137	null
2025-07-08	Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions	Jaewan Park et.al.	2507.06133	null
2025-07-09	Omni-Video: Democratizing Unified Video Understanding and Generation	Zhiyu Tan et.al.	2507.06119	null
2025-07-08	Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval	Haiwen Li et.al.	2507.05970	null
2025-07-09	Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation	Zhenghao Zhang et.al.	2507.05963	null
2025-07-08	MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos	Rongsheng Wang et.al.	2507.05675	null
2025-07-08	DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation	Young Hun Kim et.al.	2507.05627	null
2025-07-08	AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework	Suoxiang Zhang et.al.	2507.05621	null
2025-07-08	Model-free Optical Processors using In Situ Reinforcement Learning with Proximal Policy Optimization	Yuhang Li et.al.	2507.05583	null
2025-07-08	SingLoRA: Low Rank Adaptation Using a Single Matrix	David Bensaïd et.al.	2507.05566	null
2025-07-07	SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model	Chun Xie et.al.	2507.05148	null
2025-07-07	ICAS: Detecting Training Data from Autoregressive Image Generative Models	Hongyao Yu et.al.	2507.05068	null
2025-07-07	AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics	Jan Carreras Boada et.al.	2507.05063	null
2025-07-07	Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning	Ricardo Cardoso et.al.	2507.05029	null
2025-07-07	DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer	Yecheng Wu et.al.	2507.04947	null
2025-07-07	Taming the Tri-Space Tension: ARC-Guided Hallucination Modeling and Control for Text-to-Image Generation	Jianjiang Yang et.al.	2507.04946	null
2025-07-07	Leveraging Self-Supervised Features for Efficient Flooded Region Identification in UAV Aerial Images	Dibyabha Deb et.al.	2507.04915	null
2025-07-07	HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding	Yuxuan Cai et.al.	2507.04909	null
2025-07-07	Efficacy of Image Similarity as a Metric for Augmenting Small Dataset Retinal Image Segmentation	Thomas Wallace et.al.	2507.04862	null
2025-07-07	Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning	Jiayun Hu et.al.	2507.04758	null
2025-07-03	RefTok: Reference-Based Tokenization for Video Generation	Xiang Fan et.al.	2507.02862	null
2025-07-03	Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching	Xin Zhou et.al.	2507.02860	null
2025-07-03	AnyI2V: Animating Any Conditional Image with Motion Control	Ziye Li et.al.	2507.02857	null
2025-07-03	RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation	Liheng Zhang et.al.	2507.02792	null
2025-07-03	FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models	Yuxuan Wang et.al.	2507.02714	null
2025-07-04	UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation	Qin Guo et.al.	2507.02713	null
2025-07-03	Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation	François Rozet et.al.	2507.02608	null
2025-07-03	AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models	Chenhao Xue et.al.	2507.02598	null
2025-07-03	Holistic Tokenizer for Autoregressive Image Generation	Anlin Zheng et.al.	2507.02358	null
2025-07-02	Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation	Zhuoyang Zhang et.al.	2507.01957	null
2025-07-02	How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks	Rahul Ramachandran et.al.	2507.01955	null
2025-07-02	LongAnimation: Long Animation Generation with Dynamic Global-Local Memory	Nan Chen et.al.	2507.01945	null
2025-07-02	FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization	Peng Zheng et.al.	2507.01792	null
2025-07-02	Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective	Yuxin Mao et.al.	2507.01652	null
2025-07-02	Representation Entanglement for Generation:Training Diffusion Transformers Is Much Easier Than You Think	Ge Wu et.al.	2507.01467	null
2025-07-02	DiffMark: Diffusion-based Robust Watermark Against Deepfakes	Chen Sun et.al.	2507.01428	null
2025-07-02	SD-Acc: Accelerating Stable Diffusion through Phase-aware Sampling and Hardware Co-Optimizations	Zhican Wang et.al.	2507.01309	null
2025-07-02	LLM-based Realistic Safety-Critical Driving Video Generation	Yongjie Fu et.al.	2507.01264	null
2025-07-02	AIGVE-MACS: Unified Multi-Aspect Commenting and Scoring Model for AI-Generated Video Evaluation	Xiao Liu et.al.	2507.01255	null
2025-06-30	Calligrapher: Freestyle Text Image Customization	Yue Ma et.al.	2506.24123	null
2025-06-30	Epona: Autoregressive Diffusion World Model for Autonomous Driving	Kaiwen Zhang et.al.	2506.24113	null
2025-06-30	Navigating with Annealing Guidance Scale in Diffusion Space	Shai Yehezkel et.al.	2506.24108	null
2025-06-30	Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention	Wonwoong Cho et.al.	2506.24085	null
2025-06-30	World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation	Haonan Chen et.al.	2506.23919	null
2025-06-30	VMoBA: Mixture-of-Block Attention for Video Diffusion Models	Jianzong Wu et.al.	2506.23858	null
2025-06-30	RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment	Jianing Jin et.al.	2506.23852	null
2025-06-30	Radioactive Watermarks in Diffusion and Autoregressive Image Generative Models	Michel Meintz et.al.	2506.23731	null
2025-06-30	SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation	Shuai Tan et.al.	2506.23690	null
2025-06-30	A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement	Gaozheng Pei et.al.	2506.23676	null
2025-06-27	RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation	Liudi Yang et.al.	2506.22007	null
2025-06-27	CERBERUS: Crack Evaluation & Recognition Benchmark for Engineering Reliability & Urban Stability	Justin Reinman et.al.	2506.21909	null
2025-06-27	On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling	Stanley Wu et.al.	2506.21874	null
2025-06-27	TaleForge: Interactive Multimodal System for Personalized Story Creation	Minh-Loi Nguyen et.al.	2506.21832	null
2025-06-26	BASS. XLIV. Morphological preferences of local hard X-ray selected AGN	Miguel Parra Tello et.al.	2506.21800	null
2025-06-26	Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis	Chenqiu Zhao et.al.	2506.21731	null
2025-06-26	$\textrm{ODE}_t \left(\textrm{ODE}_l \right)$ : Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling	Denis Gudovskiy et.al.	2506.21714	null
2025-06-26	TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation	Hakan Çapuk et.al.	2506.21681	null
2025-06-26	SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture	Kehan Sui et.al.	2506.21478	null
2025-06-26	XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation	Bowen Chen et.al.	2506.21416	null
2025-06-26	GenFlow: Interactive Modular System for Image Generation	Duc-Hung Nguyen et.al.	2506.21369	null
2025-06-27	ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models	Hongbo Liu et.al.	2506.21356	null
2025-06-26	HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation	Diego Biagini et.al.	2506.21287	null
2025-06-26	Video Virtual Try-on with Conditional Diffusion Transformer Inpainter	Cheng Zou et.al.	2506.21270	null
2025-06-26	BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models	Louis Kerner et.al.	2506.21209	null
2025-06-26	Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation	Ze Wang et.al.	2506.21022	null
2025-06-26	HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation	Qingyue Jiao et.al.	2506.21015	null
2025-06-26	Rethink Sparse Signals for Pose-guided Text-to-image Generation	Wenjie Xuan et.al.	2506.20983	null
2025-06-25	Video Perception Models for 3D Scene Synthesis	Rui Huang et.al.	2506.20601	null
2025-06-25	HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling	Tobias Vontobel et.al.	2506.20452	null
2025-06-25	Med-Art: Diffusion Transformer for 2D Medical Text-to-Image Generation	Changlu Guo et.al.	2506.20449	null
2025-06-25	EAR: Erasing Concepts from Unified Autoregressive Models	Haipeng Fan et.al.	2506.20151	null
2025-06-25	BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos	Jiahao Lin et.al.	2506.20103	null
2025-06-24	Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation	Xingyang Li et.al.	2506.19852	null
2025-06-24	GenHSI: Controllable Generation of Human-Scene Interaction Videos	Zekun Li et.al.	2506.19840	null
2025-06-24	SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution	Liangbin Xie et.al.	2506.19838	null
2025-06-24	Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router	Yubo Huang et.al.	2506.19833	null
2025-06-24	Varif.ai to Vary and Verify User-Driven Diversity in Scalable Image Generation	M. Michelessa et.al.	2506.19644	null
2025-06-24	Stylized Structural Patterns for Improved Neural Network Pre-training	Farnood Salehi et.al.	2506.19465	null
2025-06-24	Enhancing Galaxy Classification with U-Net Variational Autoencoders for Image Denoising	Sergey Mirzoyan et.al.	2506.19434	null
2025-06-24	SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation	Yunsung Chung et.al.	2506.19360	null
2025-06-24	Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation	Jintao Rong et.al.	2506.19348	null
2025-06-24	Style Transfer: A Decade Survey	Tianshan Zhang et.al.	2506.19278	null
2025-06-23	VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory	Runjia Li et.al.	2506.18903	null
2025-06-23	From Virtual Games to Real-World Play	Wenqiang Sun et.al.	2506.18901	null
2025-06-23	FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation	Kaiyi Huang et.al.	2506.18899	null
2025-06-23	MinD: Unified Visual Imagination and Control via Hierarchical World Models	Xiaowei Chi et.al.	2506.18897	null
2025-06-23	OmniGen2: Exploration to Advanced Multimodal Generation	Chenyuan Wu et.al.	2506.18871	null
2025-06-23	OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation	Qijun Gan et.al.	2506.18866	null
2025-06-23	TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting	Zhongbin Guo et.al.	2506.18862	null
2025-06-23	Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset	Zhuowei Chen et.al.	2506.18851	null
2025-06-23	Matrix-Game: Interactive World Foundation Model	Yifan Zhang et.al.	2506.18701	null
2025-06-23	RDPO: Real Data Preference Optimization for Physics Consistency Video Generation	Wenxu Qian et.al.	2506.18655	null
2025-06-23	Emergent Temporal Correspondences from Video Diffusion Transformers	Jisu Nam et.al.	2506.17220	link
2025-06-20	Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens	Zeyuan Yang et.al.	2506.17218	link
2025-06-20	DreamCube: 3D Panorama Generation via Multi-plane Synchronization	Yukun Huang et.al.	2506.17206	null
2025-06-20	Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition	Jiaqi Li et.al.	2506.17201	null
2025-06-20	The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation	Giulia Bertazzini et.al.	2506.17016	null
2025-06-20	AI’s Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario	Ciro Beneduce et.al.	2506.16898	null
2025-06-20	Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models	Semin Kim et.al.	2506.16853	link
2025-06-20	FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation	Fan Yang et.al.	2506.16806	null
2025-06-20	Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation	Riccardo Corvi et.al.	2506.16802	null
2025-06-20	PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model	Beomseok Ko et.al.	2506.16776	null
2025-06-18	Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model	Anirud Aggarwal et.al.	2506.15682	link
2025-06-20	Sekai: A Video Dataset towards World Exploration	Zhen Li et.al.	2506.15675	null
2025-06-20	Show-o2: Improved Native Unified Multimodal Models	Jinheng Xie et.al.	2506.15564	link
2025-06-18	Control and Realism: Best of Both Worlds in Layout-to-Image without Training	Bonan Li et.al.	2506.15563	null
2025-06-18	GalaxyGenius: A Mock Galaxy Image Generator for Various Telescopes from Hydrodynamical Simulations	Xingchen Zhou et.al.	2506.15060	link
2025-06-17	Frequency-Calibrated Membership Inference Attacks on Medical Image Diffusion Models	Xinkai Zhao et.al.	2506.14919	null
2025-06-17	DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization	Renjith Prasad et.al.	2506.14903	null
2025-06-17	The Quasi-Radial Field-line Tracing (QRaFT): an Adaptive Segmentation of the Open-Flux Solar Corona	Vadim M. Uritsky et.al.	2506.14894	null
2025-06-17	Cost-Aware Routing for Efficient Text-To-Image Generation	Qinchan et.al.	2506.14753	null
2025-06-17	Align Your Flow: Scaling Continuous-Time Flow Map Distillation	Amirmojtaba Sabour et.al.	2506.14603	null
2025-06-17	Risk Estimation of Knee Osteoarthritis Progression via Predictive Multi-task Modelling from Efficient Diffusion Model using X-ray Images	David Butler et.al.	2506.14560	null
2025-06-17	Causally Steered Diffusion for Automated Video Counterfactual Generation	Nikos Spyrou et.al.	2506.14404	link
2025-06-17	Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models	Tian Xia et.al.	2506.14399	null
2025-06-17	CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation	Jia-Chen Zhang et.al.	2506.14206	null
2025-06-17	DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion	Makoto Shing et.al.	2506.14202	null
2025-06-18	VideoMAR: Autoregressive Video Generatio with Continuous Tokens	Hu Yu et.al.	2506.14168	null
2025-06-16	UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions	Zhucun Xue et.al.	2506.13691	null
2025-06-16	Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention	Jeonghoon Park et.al.	2506.13298	null
2025-06-16	STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation	Jiamin Wang et.al.	2506.13138	null
2025-06-15	iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer	Zhelun Shen et.al.	2506.12847	null
2025-06-14	Retrieval Augmented Comic Image Generation	Yunhao Shui et.al.	2506.12517	null
2025-06-14	Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity	Mohsen Jenadeleh et.al.	2506.12505	null
2025-06-14	Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback	Janet Wang et.al.	2506.12323	null
2025-06-13	Exploring the Effectiveness of Deep Features from Domain-Specific Foundation Models in Retinal Image Synthesis	Zuzanna Skorniewska et.al.	2506.11753	null
2025-06-13	SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation	Xu Wang et.al.	2506.11621	null
2025-06-13	A Watermark for Auto-Regressive Image Generation Models	Yihan Wu et.al.	2506.11371	null
2025-06-12	GenWorld: Towards Detecting AI-generated Real-world Simulation Videos	Weiliang Chen et.al.	2506.10975	null
2025-06-13	MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning	Yuxuan Luo et.al.	2506.10963	null
2025-06-12	The Role of Generative AI in Facilitating Social Interactions: A Scoping Review	T. T. J. E. Arets et.al.	2506.10927	null
2025-06-12	M4V: Multi-Modal Mamba for Text-to-Video Generation	Jiancheng Huang et.al.	2506.10915	null
2025-06-12	GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning	Xiaoyi Bao et.al.	2506.10639	null
2025-06-12	Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models	Francisco Caetano et.al.	2506.10634	null
2025-06-12	High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model	Eshan Ramesh et.al.	2506.10605	null
2025-06-12	Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning	Chun-Mei Feng et.al.	2506.10575	null
2025-06-12	Unitary Scrambling and Collapse: A Quantum Diffusion Framework for Generative Modeling	Yihua Li et.al.	2506.10571	link
2025-06-12	DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers	Lizhen Wang et.al.	2506.10568	null
2025-06-11	PlayerOne: Egocentric World Simulator	Yuanpeng Tu et.al.	2506.09995	null
2025-06-11	InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions	Zhenzhi Wang et.al.	2506.09984	null
2025-06-11	ReSim: Reliable World Simulation for Autonomous Driving	Jiazhi Yang et.al.	2506.09981	null
2025-06-11	Canonical Latent Representations in Conditional Diffusion Models	Yitao Xu et.al.	2506.09955	null
2025-06-11	HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations	Marco Federici et.al.	2506.09932	null
2025-06-11	Only-Style: Stylistic Consistency in Image Generation without Content Leakage	Tilemachos Aravanis et.al.	2506.09916	link
2025-06-11	ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models	Qin Zhou et.al.	2506.09740	null
2025-06-11	DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning	Dongxu Liu et.al.	2506.09644	null
2025-06-12	Consistent Story Generation with Asymmetry Zigzag Sampling	Mingxiao Li et.al.	2506.09612	link
2025-06-11	Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression	Dingcheng Zhen et.al.	2506.09482	link
2025-06-10	MagCache: Fast Video Generation with Magnitude-Aware Cache	Zehong Ma et.al.	2506.09045	link
2025-06-10	Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models	Xuanchi Ren et.al.	2506.09042	link
2025-06-10	Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better	Dianyi Wang et.al.	2506.09040	link
2025-06-10	Diffuse and Disperse: Image Generation with Representation Regularization	Runqian Wang et.al.	2506.09027	null
2025-06-11	SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping	Jiajun Li et.al.	2506.08908	link
2025-06-10	CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics	Shravan Nayak et.al.	2506.08835	null
2025-06-10	FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency	Yifei Su et.al.	2506.08822	null
2025-06-10	HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation	Ziyao Huang et.al.	2506.08797	null
2025-06-10	Flow Diverse and Efficient: Learning Momentum Flow Matching via Stochastic Velocity Field Sampling	Zhiyuan Ma et.al.	2506.08796	null
2025-06-10	MAMBO: High-Resolution Generative Approach for Mammography Images	Milica Škipina et.al.	2506.08677	null
2025-06-09	StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets	Anh-Quan Cao et.al.	2506.08013	link
2025-06-09	Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion	Xun Huang et.al.	2506.08009	null
2025-06-09	Dreamland: Controllable World Creation with Simulator and Generative Models	Sicheng Mo et.al.	2506.08006	null
2025-06-09	Audio-Sync Video Generation with Multi-Stream Temporal Control	Shuchen Weng et.al.	2506.08003	null
2025-06-09	MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation	Junhao Chen et.al.	2506.07999	null
2025-06-09	Generative Modeling of Weights: Generalization or Memorization?	Boya Zeng et.al.	2506.07998	link
2025-06-10	OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation	Jingjing Chang et.al.	2506.07977	link
2025-06-09	Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces	Kevin Rojas et.al.	2506.07903	link
2025-06-09	Video Unlearning via Low-Rank Refusal Vector	Simone Facchiano et.al.	2506.07891	null
2025-06-09	Diffusion Counterfactual Generation with Semantic Abduction	Rajat Rasal et.al.	2506.07883	link
2025-06-06	STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis	Jiatao Gu et.al.	2506.06276	null
2025-06-06	GenIR: Generative Visual Feedback for Mental Image Retrieval	Diji Yang et.al.	2506.06220	null
2025-06-06	Feedback Guidance of Diffusion Models	Koulischer Felix et.al.	2506.06085	null
2025-06-06	Restereo: Diffusion stereo video generation and restoration	Xingchang Huang et.al.	2506.06023	null
2025-06-06	Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models	Chaoyi Zhu et.al.	2506.06018	link
2025-06-06	Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection	Yu Li et.al.	2506.05872	null
2025-06-06	LLIA – Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models	Haojie Yu et.al.	2506.05806	null
2025-06-06	Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery	Sajjad Abdoli et.al.	2506.05673	null
2025-06-05	UniRes: Universal Image Restoration for Complex Degradations	Mo Zhou et.al.	2506.05599	null
2025-06-05	EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh	Tao Hu et.al.	2506.05554	null
2025-06-05	ContentV: Efficient Training of Video Generation Models with Limited Compute	Wenfeng Lin et.al.	2506.05343	null
2025-06-05	AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model	Pingyu Wu et.al.	2506.05289	link
2025-06-05	Aligning Latent Spaces with Flow Priors	Yizhuo Li et.al.	2506.05240	null
2025-06-05	PixCell: A generative foundation model for digital histopathology images	Srikar Yellapragada et.al.	2506.05127	null
2025-06-05	Membership Inference Attacks on Sequence Models	Lorenzo Rossi et.al.	2506.05126	null
2025-06-05	DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models	Revant Teotia et.al.	2506.05108	null
2025-06-06	Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers	Haosong Liu et.al.	2506.05096	null
2025-06-05	FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation	Huihan Wang et.al.	2506.04956	link
2025-06-05	CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx	Lukas Picek et.al.	2506.04931	null
2025-06-05	Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking	Yu-Feng Chen et.al.	2506.04879	link
2025-06-04	LayerFlow: A Unified Model for Layer-aware Video Generation	Sihui Ji et.al.	2506.04228	null
2025-06-04	UNIC: Unified In-Context Video Editing	Zixuan Ye et.al.	2506.04216	null
2025-06-05	FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers	Xuanhua He et.al.	2506.04213	null
2025-06-04	Image Editing As Programs with Diffusion Models	Yujia Hu et.al.	2506.04158	null
2025-06-05	RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors	Hicham Eddoubi et.al.	2506.03988	link
2025-06-04	EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation	Cheng Zhang et.al.	2506.03652	null
2025-06-04	ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning	Feng Han et.al.	2506.03596	link
2025-06-04	DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models	Ziyi Wu et.al.	2506.03517	null
2025-06-03	Robustness in Both Domains: CLIP Needs a Robust Text Encoder	Elias Abad Rocamora et.al.	2506.03355	null
2025-06-03	Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas	Austin Silveria et.al.	2506.03275	null
2025-06-03	IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation	Yuanze Lin et.al.	2506.03150	null
2025-06-04	UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	Bin Lin et.al.	2506.03147	null
2025-06-03	Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval	Jiwen Yu et.al.	2506.03141	null
2025-06-03	CamCloneMaster: Enabling Reference-based Camera Control for Video Generation	Yawen Luo et.al.	2506.03140	null
2025-06-03	AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation	Lu Qiu et.al.	2506.03126	null
2025-06-03	DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation	Zhengyao Lv et.al.	2506.03123	null
2025-06-03	TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models	Chetwin Low et.al.	2506.03099	null
2025-06-03	ORV: 4D Occupancy-centric Robot Video Generation	Xiuyu Yang et.al.	2506.03079	link
2025-06-03	EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models	Mingzhe Li et.al.	2506.03067	null
2025-06-03	Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers	Pengtao Chen et.al.	2506.03065	null
2025-05-30	ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL	Yu Zhang et.al.	2505.24875	null
2025-05-30	MiniMax-Remover: Taming Bad Noise Helps Video Object Removal	Bojia Zi et.al.	2505.24873	null
2025-05-30	GenSpace: Benchmarking Spatially-Aware Image Generation	Zehan Wang et.al.	2505.24870	null
2025-05-30	Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation	Yucheng Zhou et.al.	2505.24787	link
2025-05-30	DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds	Jiaxu Zhang et.al.	2505.24733	null
2025-05-30	UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation	Yang-Tian Sun et.al.	2505.24521	null
2025-05-30	un $^2$ CLIP: Improving CLIP’s Visual Detail Capturing Ability via Inverting unCLIP	Yinqi Li et.al.	2505.24517	link
2025-05-30	Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields	Md Shahriar Rahim Siddiqui et.al.	2505.24434	null
2025-06-03	Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning	Stepan Shabalin et.al.	2505.24360	link
2025-05-30	Category-aware EEG image generation based on wavelet transform and contrast semantic loss	Enshang Zhang et.al.	2505.24301	link
2025-05-29	LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers	Yusuf Dalva et.al.	2505.23758	null
2025-05-29	MAGREF: Masked Guidance for Any-Reference Video Generation	Yufan Deng et.al.	2505.23742	link
2025-05-29	How Animals Dance (When You’re Not Looking)	Xiaojuan Wang et.al.	2505.23738	null
2025-05-29	VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos	Tingyu Song et.al.	2505.23693	link
2025-05-29	VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models	Xiangdong Zhang et.al.	2505.23656	link
2025-05-29	Inference-time Scaling of Diffusion Models through Classical Search	Xiangcheng Zhang et.al.	2505.23614	null
2025-05-29	Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model	Qingyu Shi et.al.	2505.23606	link
2025-05-29	R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation	Kaijie Chen et.al.	2505.23493	null
2025-05-29	VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation	Shi-Xue Zhang et.al.	2505.23484	link
2025-05-29	Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering	Sixian Wang et.al.	2505.23343	link
2025-05-28	Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation	Zhe Kong et.al.	2505.22647	link
2025-05-28	SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation	Dekai Zhu et.al.	2505.22643	null
2025-05-28	Principled Out-of-Distribution Generalization via Simplicity	Jiawei Ge et.al.	2505.22622	null
2025-05-28	ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models	Dmitrii Sorokin et.al.	2505.22569	null
2025-05-28	PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models	Junwen Chen et.al.	2505.22523	null
2025-05-28	ProCrop: Learning Aesthetic Image Cropping from Professional Compositions	Ke Zhang et.al.	2505.22490	null
2025-05-28	Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation	Jiadong Pan et.al.	2505.22407	null
2025-05-28	PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models	Fan Fei et.al.	2505.22394	null
2025-05-28	Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion	Kewen Chen et.al.	2505.22360	null
2025-05-28	Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers	Weilun Feng et.al.	2505.22167	null
2025-05-27	Frame In-N-Out: Unbounded Controllable Image-to-Video Generation	Boyang Wang et.al.	2505.21491	null
2025-05-27	Policy Optimized Text-to-Image Pipeline Design	Uri Gadot et.al.	2505.21478	null
2025-05-27	DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction	Yiheng Liu et.al.	2505.21473	link
2025-05-27	Dynamic Vision from EEG Brain Recordings: How much does EEG know?	Prajwal Singh et.al.	2505.21385	null
2025-05-28	SageAttention2++: A More Efficient Implementation of SageAttention2	Jintao Zhang et.al.	2505.21136	link
2025-05-27	Creativity in LLM-based Multi-Agent Systems: A Survey	Yi-Cheng Lin et.al.	2505.21116	null
2025-05-27	Minute-Long Videos with Dual Parallelisms	Zeqing Wang et.al.	2505.21070	link
2025-05-27	RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy	Aiyue Chen et.al.	2505.21036	null
2025-05-27	OrienText: Surface Oriented Textual Image Generation	Shubham Singh Paliwal et.al.	2505.20958	null
2025-05-27	Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models	Puwei Lian et.al.	2505.20955	null
2025-05-26	FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities	Jin Wang et.al.	2505.20147	null
2025-05-26	Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	Zheqi Lv et.al.	2505.20053	link
2025-05-27	Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM	Peng Liu et.al.	2505.19901	null
2025-05-26	StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation	Yi Wu et.al.	2505.19874	null
2025-05-26	DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving	Wenchao Sun et.al.	2505.19692	link
2025-05-26	TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs	Juntong Wang et.al.	2505.19535	null
2025-05-26	Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning	Jiyu Hu et.al.	2505.19522	null
2025-05-26	The Role of Video Generation in Enhancing Data-Limited Action Understanding	Wei Li et.al.	2505.19495	null
2025-05-26	MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models	Hang Hua et.al.	2505.19415	null
2025-05-26	Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals	Nate Gillman et.al.	2505.19386	null
2025-05-23	WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions	Zizhang Li et.al.	2505.18151	null
2025-05-23	F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles	Varun Ajith et.al.	2505.18106	null
2025-05-23	DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation	Junhao Chen et.al.	2505.18078	null
2025-05-23	RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2505.18047	null
2025-05-23	SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain	Jiawei Zhou et.al.	2505.17727	null
2025-05-23	FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving	Shuang Zeng et.al.	2505.17685	null
2025-05-23	Scaling Image and Video Generation via Test-Time Evolutionary Search	Haoran He et.al.	2505.17618	null
2025-05-23	MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation	Jihan Yao et.al.	2505.17613	null
2025-05-23	InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO	Xueji Fang et.al.	2505.17574	link
2025-05-23	Deeper Diffusion Models Amplify Bias	Shahin Hakemi et.al.	2505.17560	null
2025-05-22	GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning	Chengqi Duan et.al.	2505.17022	link
2025-05-22	Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO	Chengzhuo Tong et.al.	2505.17017	link
2025-05-22	Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On	Siqi Wan et.al.	2505.16977	link
2025-05-22	Creatively Upscaling Images with Global-Regional Priors	Yurui Qian et.al.	2505.16976	null
2025-05-22	Training-Free Efficient Video Generation via Dynamic Token Carving	Yuechen Zhang et.al.	2505.16864	link
2025-05-22	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Chaoyang Wang et.al.	2505.16862	null
2025-05-22	Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts	Taewon Kang et.al.	2505.16819	null
2025-05-22	Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation	Hongji Yang et.al.	2505.16763	null
2025-05-22	MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM	Siwei Meng et.al.	2505.16456	null
2025-05-22	FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design	Renjie Wei et.al.	2505.16335	link
2025-05-21	MMaDA: Multimodal Large Diffusion Language Models	Ling Yang et.al.	2505.15809	link
2025-05-21	Interspatial Attention for Efficient 4D Human Video Generation	Ruizhi Shao et.al.	2505.15800	null
2025-05-21	IA-T2I: Internet-Augmented Text-to-Image Generation	Chuanhao Li et.al.	2505.15779	null
2025-05-21	FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion	Kazuaki Mishima et.al.	2505.15313	null
2025-05-21	BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution	Ji Guo et.al.	2505.15308	null
2025-05-21	Scaling Diffusion Transformers Efficiently via $μ$ P	Chenyu Zheng et.al.	2505.15270	link
2025-05-21	AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection	Zhipei Xu et.al.	2505.15173	null
2025-05-21	Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation	Xinran Wang et.al.	2505.15172	null
2025-05-21	CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation	Xinran Wang et.al.	2505.15145	link
2025-05-20	Programmatic Video Prediction Using Large Language Models	Hao Tang et.al.	2505.14948	link
2025-05-20	Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers	Sucheng Ren et.al.	2505.14687	link
2025-05-20	UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation	Rui Tian et.al.	2505.14682	null
2025-05-20	Training-Free Watermarking for Autoregressive Image Generation	Yu Tong et.al.	2505.14673	link
2025-05-20	SparC: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling	Zhihao Li et.al.	2505.14521	null
2025-05-20	Latent Flow Transformer	Yen-Chen Wu et.al.	2505.14513	link
2025-05-20	VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank	Tianhe Wu et.al.	2505.14460	link
2025-05-20	Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives	Xingxing Weng et.al.	2505.14361	null
2025-05-20	Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization	Yuanyuan Chang et.al.	2505.14254	link
2025-05-20	“Haet Bhasha aur Diskrimineshun”: Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs	Darpan Aswal et.al.	2505.14226	null
2025-05-20	LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer	Changgu Chen et.al.	2505.14167	null
2025-05-19	VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation	Huawei Lin et.al.	2505.13439	link
2025-05-19	FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance	Dian Shao et.al.	2505.13437	null
2025-05-20	Swin DiT: Diffusion Transformer using Pseudo Shifted Windows	Jiafu Wu et.al.	2505.13219	null
2025-05-19	Diffusion Models with Double Guidance: Generate with aggregated datasets	Yanfeng Yang et.al.	2505.13213	null
2025-05-19	MAGI-1: Autoregressive Video Generation at Scale	Sand. ai et.al.	2505.13211	link
2025-05-19	A Physics-Inspired Optimizer: Velocity Regularized Adam	Pranav Vaidhyanathan et.al.	2505.13196	null
2025-05-19	Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model	Jonas Brenig et.al.	2505.13152	link
2025-05-19	Accelerate TarFlow Sampling with GS-Jacobi Iteration	Ben Liu et.al.	2505.12849	link
2025-05-19	FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities	Shibo Hong et.al.	2505.12795	link
2025-05-19	SounDiT: Geo-Contextual Soundscape-to-Landscape Generation	Junbo Wang et.al.	2505.12734	null
2025-05-16	QVGen: Pushing the Limit of Quantized Video Generative Models	Yushi Huang et.al.	2505.11497	null
2025-05-16	PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment	Dingbang Huang et.al.	2505.11468	null
2025-05-16	GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI	Marc-Antoine Fortin et.al.	2505.11445	link
2025-05-16	Face Consistency Benchmark for GenAI Video	Michal Podstawski et.al.	2505.11425	null
2025-05-16	DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models	Giulia Bertazzini et.al.	2505.11257	null
2025-05-16	Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models	Fu-Yun Wang et.al.	2505.11245	link
2025-05-16	CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback	Yixin Wan et.al.	2505.11178	null
2025-05-16	One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework	Feiran Li et.al.	2505.11131	link
2025-05-16	HSRMamba: Efficient Wavelet Stripe State Space Model for Hyperspectral Image Super-Resolution	Baisong Li et.al.	2505.11062	link
2025-05-16	Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges	Yuan Zhang et.al.	2505.10993	null
2025-05-15	End-to-End Vision Tokenizer Tuning	Wenxuan Wang et.al.	2505.10562	null
2025-05-15	CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs	Raman Dutt et.al.	2505.10496	link
2025-05-16	MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation	Yanbo Ding et.al.	2505.10238	link
2025-05-15	ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars	Rui-Yang Ju et.al.	2505.10072	null
2025-05-15	Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis	Bingda Tang et.al.	2505.10046	link
2025-05-14	EnerVerse-AC: Envisioning Embodied Environments with Action Condition	Yuxin Jiang et.al.	2505.09723	null
2025-05-14	EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models	Hu Yue et.al.	2505.09694	link
2025-05-14	Don’t Forget your Inverse DDIM for Image Editing	Guillermo Gomez-Trenado et.al.	2505.09571	null
2025-05-14	BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset	Jiuhai Chen et.al.	2505.09568	link
2025-05-14	Train a Multi-Task Diffusion Policy on RLBench-18 in One Day with One GPU	Yutong Hu et.al.	2505.09430	link
2025-05-14	Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis	Bingxin Ke et.al.	2505.09358	link
2025-05-14	An Initial Exploration of Default Images in Text-to-Image Generation	Hannu Simonen et.al.	2505.09166	null
2025-05-15	Generating time-consistent dynamics with discriminator-guided image diffusion models	Philipp Hess et.al.	2505.09089	null
2025-05-13	Generative AI for Autonomous Driving: Frontiers and Opportunities	Yuping Wang et.al.	2505.08854	link
2025-05-13	Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models	Donghoon Kim et.al.	2505.08622	null
2025-05-13	Symbolically-Guided Visual Plan Inference from Uncurated Video Data	Wenyan Yang et.al.	2505.08444	null
2025-05-13	Identifying Memorization of Diffusion Models through p-Laplace Analysis	Jonathan Brokman et.al.	2505.08246	link
2025-05-12	Image-Guided Microstructure Optimization using Diffusion Models: Validated with Li-Mn-rich Cathode Precursors	Geunho Choi et.al.	2505.07906	null
2025-05-12	DanceGRPO: Unleashing GRPO on Visual Generation	Zeyue Xue et.al.	2505.07818	null
2025-05-12	ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models	Ozgur Kara et.al.	2505.07652	null
2025-05-12	Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning	Bohan Wang et.al.	2505.07538	null
2025-05-12	Addressing degeneracies in latent interpolation for diffusion models	Erik Landolsi et.al.	2505.07481	null
2025-05-13	Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model	Wei Li et.al.	2505.07449	link
2025-05-12	GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models	Daria Zotova et.al.	2505.07364	null
2025-05-12	Generative Pre-trained Autoregressive Diffusion Transformer	Yuan Zhang et.al.	2505.07344	null
2025-05-12	Metrics that matter: Evaluating image quality metrics for medical image generation	Yash Deo et.al.	2505.07175	link
2025-05-11	DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models	Junhao Xia et.al.	2505.07057	null
2025-05-11	Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation	Md. Naimur Asif Borno et.al.	2505.06995	null
2025-05-09	Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation	Dongying Li et.al.	2505.06117	null
2025-05-09	Discovery of the Polar Ring Galaxies with deep learning	D. V. Dobrycheva et.al.	2505.05890	null
2025-05-09	Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition	Zhiyuan Chen et.al.	2505.05829	link
2025-05-08	InstanceGen: Image Generation with Instance-level Instructions	Etai Sella et.al.	2505.05678	link
2025-05-08	A Preliminary Study for GPT-4o on Image Restoration	Hao Yang et.al.	2505.05621	link
2025-05-11	Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation	Chao Liao et.al.	2505.05472	null
2025-05-08	Normalize Everything: A Preconditioned Magnitude-Preserving Architecture for Diffusion-Based Speech Enhancement	Julius Richter et.al.	2505.05216	null
2025-05-12	PIDiff: Image Customization for Personalized Identities with Diffusion Models	Jinyu Gu et.al.	2505.05081	null
2025-05-08	T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models	Xuyang Guo et.al.	2505.04946	null
2025-05-07	CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation	Viacheslav Vasilev et.al.	2505.04851	null
2025-05-07	Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers	Divyansh Srivastava et.al.	2505.04718	null
2025-05-08	HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation	Teng Hu et.al.	2505.04512	null
2025-05-08	Defining and Quantifying Creative Behavior in Popular Image Generators	Aditi Ramaswamy et.al.	2505.04497	null
2025-05-07	Efficient Flow Matching using Latent Variables	Anirban Samaddar et.al.	2505.04486	null
2025-05-07	Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety	Variath Madhupal Gautham Nair et.al.	2505.04146	null
2025-05-07	RFNNS: Robust Fixed Neural Network Steganography with Popular Deep Generative Models	Yu Cheng et.al.	2505.04116	null
2025-05-06	Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators	Will Hawkins et.al.	2505.03859	link
2025-05-06	Revolutionizing Brain Tumor Imaging: Generating Synthetic 3D FA Maps from T1-Weighted MRI using CycleGAN Models	Xin Du et.al.	2505.03662	null
2025-05-06	Real-Time Person Image Synthesis Using a Flow Matching Model	Jiwoo Jeong et.al.	2505.03562	link
2025-05-06	Safer Prompts: Reducing IP Risk in Visual Generative AI	Lena Reissinger et.al.	2505.03338	null
2025-05-06	Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning	Yibin Wang et.al.	2505.03318	null
2025-05-06	Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights	Zhaiming Shen et.al.	2505.03205	null
2025-05-05	Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models	Kuofeng Gao et.al.	2505.02824	link
2025-05-06	MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation	Mingcheng Li et.al.	2505.02648	null
2025-05-07	Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities	Xinjie Zhang et.al.	2505.02567	link
2025-05-05	Text to Image Generation and Editing: A Survey	Pengfei Yang et.al.	2505.02527	null
2025-05-07	Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction	Inclusion AI et.al.	2505.02471	link
2025-05-04	Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset	Jakub Wąsala et.al.	2505.02255	null
2025-05-04	Improving Physical Object State Representation in Text-to-Image Generative Systems	Tianle Chen et.al.	2505.02236	link
2025-05-04	DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization	Wenchuan Wang et.al.	2505.02192	null
2025-05-06	Regression is all you need for medical image translation	Sebastian Rassmann et.al.	2505.02048	link
2025-05-03	PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth	Bu Jin et.al.	2505.01729	null
2025-05-02	FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis	Jiangtong Tan et.al.	2505.01172	link
2025-05-02	Improving Editability in Image Generation with Layer-wise Memory	Daneul Kim et.al.	2505.01079	null
2025-05-01	Controllable Weather Synthesis and Removal with Video Diffusion Models	Chih-Hao Lin et.al.	2505.00704	null
2025-05-01	T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT	Dongzhi Jiang et.al.	2505.00703	link
2025-05-01	JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers	Kwon Byung-Ki et.al.	2505.00482	link
2025-05-01	T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation	Xuyang Guo et.al.	2505.00337	null
2025-04-30	Direct Motion Models for Assessing Generated Videos	Kelsey Allen et.al.	2505.00209	null
2025-04-30	Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis	Michal Geyer et.al.	2505.00135	null
2025-04-30	ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction	Qihao Liu et.al.	2504.21855	null
2025-04-30	3D Stylization via Large Reconstruction Model	Ipek Oztas et.al.	2504.21836	null
2025-04-30	Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields	Yixin Gao et.al.	2504.21814	null
2025-04-30	HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation	Haiyang Zhou et.al.	2504.21650	link
2025-04-30	Latent Feature-Guided Conditional Diffusion for High-Fidelity Generative Image Semantic Communication	Zehao Chen et.al.	2504.21577	null
2025-04-30	Simple Visual Artifact Detection in Sora-Generated Videos	Misora Sugiyama et.al.	2504.21334	null
2025-04-30	Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation	Abdul Sami et.al.	2504.21325	null
2025-04-30	Capturing Conditional Dependence via Auto-regressive Diffusion Models	Xunpeng Huang et.al.	2504.21314	null
2025-04-30	AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images	Yunhao Li et.al.	2504.21308	null
2025-04-30	Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions	ZiYi Dong et.al.	2504.21292	null
2025-04-29	YoChameleon: Personalized Vision and Language Generation	Thao Nguyen et.al.	2504.20998	null
2025-04-29	TesserAct: Learning 4D Embodied World Models	Haoyu Zhen et.al.	2504.20995	null
2025-04-29	DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs	Hao Luan et.al.	2504.20754	null
2025-04-29	Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion	Zesheng Wang et.al.	2504.20685	null
2025-04-29	Advance Fake Video Detection via Vision Transformers	Joy Battocchio et.al.	2504.20669	null
2025-04-30	PixelHacker: Image Inpainting with Structural and Semantic Consistency	Ziyang Xu et.al.	2504.20438	null
2025-04-29	Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems	Shiqian Zhao et.al.	2504.20376	null
2025-04-29	A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks	Khoi Trinh et.al.	2504.20340	null
2025-04-28	Physics-Informed Diffusion Models for SAR Ship Wake Generation from Text Prompts	Kamirul Kamirul et.al.	2504.20241	null
2025-04-28	CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition	Quynh Phung et.al.	2504.19894	null
2025-04-28	RepText: Rendering Visual Text via Replicating	Haofan Wang et.al.	2504.19724	null
2025-04-28	DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer	Junpeng Jiang et.al.	2504.19614	null
2025-04-28	Image Generation Method Based on Heat Diffusion Models	Pengfei Zhang et.al.	2504.19600	null
2025-04-29	WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution	Pietro Bongini et.al.	2504.19595	null
2025-04-28	GenPTW: In-Generation Image Watermarking for Provenance Tracing and Tamper Localization	Zhenliang Gan et.al.	2504.19567	null
2025-04-28	Masked Language Prompting for Generative Data Augmentation in Few-shot Fashion Style Recognition	Yuki Hirakawa et.al.	2504.19455	null
2025-04-27	Flow Along the K-Amplitude for Generative Modeling	Weitao Du et.al.	2504.19353	null
2025-04-26	Predicting Stress in Two-phase Random Materials and Super-Resolution Method for Stress Images by Embedding Physical Information	Tengfei Xing et.al.	2504.18854	null
2025-04-26	Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning	Yifan Xie et.al.	2504.18810	null
2025-04-25	NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration	Haotian Dong et.al.	2504.18448	null
2025-04-25	HepatoGEN: Generating Hepatobiliary Phase MRI with Perceptual and Adversarial Models	Jens Hooge et.al.	2504.18405	null
2025-04-24	Fast Autoregressive Models for Continuous Latent Generation	Tiankai Hang et.al.	2504.18391	null
2025-04-25	TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation	Shintaro Ozaki et.al.	2504.18269	null
2025-04-25	Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding	Kun Li et.al.	2504.18204	null
2025-04-25	Diffusion-Driven Universal Model Inversion Attack for Face Recognition	Hanrui Wang et.al.	2504.18015	null
2025-04-27	Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models	Xu Ma et.al.	2504.17789	null
2025-04-24	Dynamic Camera Poses and Where to Find Them	Chris Rockwell et.al.	2504.17788	null
2025-04-24	Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields	Zhuo He et.al.	2504.17712	null
2025-04-24	STCL:Curriculum learning Strategies for deep learning image steganography models	Fengchun Liu et.al.	2504.17609	link
2025-04-24	Text-to-Image Alignment in Denoising-Based Models through Step Selection	Paul Grimal et.al.	2504.17525	null
2025-04-24	RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation	Aviv Slobodkin et.al.	2504.17502	null
2025-04-24	StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies	Xu Wang et.al.	2504.17401	null
2025-04-24	DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition	Yiyan Xu et.al.	2504.17349	null
2025-04-24	Physics-based super-resolved simulation of 3D elastic wave propagation adopting scalable Diffusion Transformer	Hugo Gabrielidis et.al.	2504.17308	null
2025-04-24	Towards Generalized and Training-Free Text-Guided Semantic Manipulation	Yu Hong et.al.	2504.17269	null
2025-04-23	BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation	Ruotong Wang et.al.	2504.16907	null
2025-04-23	ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance	Ying Li et.al.	2504.16464	null
2025-04-23	CLPSTNet: A Progressive Multi-Scale Convolutional Steganography Model Integrating Curriculum Learning	Fengchun Liu et.al.	2504.16364	link
2025-04-23	VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models	Xuming Hu et.al.	2504.16359	null
2025-04-22	Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching	Junn Yong Loo et.al.	2504.16262	null
2025-04-22	Survey of Video Diffusion Models: Foundations, Implementations, and Applications	Yimu Wang et.al.	2504.16081	link
2025-04-22	Boosting Generative Image Modeling via Joint Image-Feature Synthesis	Theodoros Kouzelis et.al.	2504.16064	null
2025-04-22	Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework	Xinyuan Song et.al.	2504.16016	null
2025-04-22	FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation	Zebin Yao et.al.	2504.15958	link
2025-04-22	Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning	Wang Lin et.al.	2504.15932	null
2025-04-22	DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers	Xuyang Zhong et.al.	2504.15827	null
2025-04-22	Satellite to GroundScape – Large-scale Consistent Ground View Generation from Satellite Views	Ningli Xu et.al.	2504.15786	null
2025-04-22	DiTPainter: Efficient Video Inpainting with Diffusion Transformers	Xian Wu et.al.	2504.15661	null
2025-04-21	Emergence and Evolution of Interpretable Concepts in Diffusion Models	Berk Tinaz et.al.	2504.15473	null
2025-04-21	Solving New Tasks by Adapting Internet Video Knowledge	Calvin Luo et.al.	2504.15369	null
2025-04-22	LACE: Controlled Image Prompting and Iterative Refinement with GenAI for Professional Visual Art Creators	Yenkai Huang et.al.	2504.15189	null
2025-04-21	Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform	Xianpan Zhou et.al.	2504.15182	null
2025-04-21	Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration	Junyuan Deng et.al.	2504.15159	null
2025-04-21	GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security	Xiangkun Wang et.al.	2504.15139	null
2025-04-22	VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation	Mingxia Zhan et.al.	2504.15095	null
2025-04-21	DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation	Weijie He et.al.	2504.15032	null
2025-04-21	TWIG: Two-Step Image Generation using Segmentation Masks in Diffusion Models	Mazharul Islam Rakib et.al.	2504.14933	null
2025-04-21	Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation	Chenjie Cao et.al.	2504.14899	link
2025-04-21	Twin Co-Adaptive Dialogue for Progressive Image Generation	Jianhui Wang et.al.	2504.14868	null
2025-04-21	LACE: Exploring Turn-Taking and Parallel Interaction Modes in Human-AI Co-Creation for Iterative Image Generation	YenKai Huang et.al.	2504.14827	null
2025-04-18	MLEP: Multi-granularity Local Entropy Patterns for Universal AI-generated Image Detection	Lin Yuan et.al.	2504.13726	null
2025-04-18	SupResDiffGAN a new approach for the Super-Resolution task	Dawid Kopeć et.al.	2504.13622	null
2025-04-18	U-Shape Mamba: State Space Model for faster diffusion	Alex Ergasti et.al.	2504.13499	link
2025-04-18	Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing	Joowon Kim et.al.	2504.13490	null
2025-04-18	POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation	Evans Xu Han et.al.	2504.13392	null
2025-04-17	SMPL-GPTexture: Dual-View 3D Human Texture Estimation using Text-to-Image Generation Models	Mingxiao Tu et.al.	2504.13378	null
2025-04-17	Personalized Text-to-Image Generation with Auto-Regressive Models	Kaiyue Sun et.al.	2504.13162	link
2025-04-18	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	link
2025-04-17	HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation	Wenqi Dong et.al.	2504.13072	null
2025-04-17	ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models	Linkang Du et.al.	2504.13061	link
2025-04-17	RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins	Yao Mu et.al.	2504.13059	null
2025-04-17	SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding	Qianqian Sun et.al.	2504.12704	null
2025-04-17	Packing Input Frame Context in Next-Frame Prediction Models for Video Generation	Lvmin Zhang et.al.	2504.12626	link
2025-04-17	Prompt-Driven and Training-Free Forgetting Approach and Dataset for Large Language Models	Zhenyu Yu et.al.	2504.12574	null
2025-04-16	InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework	Jiale Tao et.al.	2504.12395	link
2025-04-16	VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate	Zhihang Yuan et.al.	2504.12259	link
2025-04-16	SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction	Xia Wang et.al.	2504.12245	null
2025-04-16	Cobra: Efficient Line Art COlorization with BRoAder References	Junhao Zhuang et.al.	2504.12240	null
2025-04-16	Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM	Zirui Pan et.al.	2504.12048	null
2025-04-16	Instruction-augmented Multimodal Alignment for Image-Text and Element Matching	Xinli Yue et.al.	2504.12018	null
2025-04-16	Novel-view X-ray Projection Synthesis through Geometry-Integrated Deep Learning	Daiqi Liu et.al.	2504.11953	link
2025-04-16	Mind2Matter: Creating 3D Models from EEG Signals	Xia Deng et.al.	2504.11936	link
2025-04-16	The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation	Bingjie Gao et.al.	2504.11739	null
2025-04-16	Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset	Muhammad Shahid Muneer et.al.	2504.11707	link
2025-04-15	Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception	Ziqi Pang et.al.	2504.11457	link
2025-04-15	ADT: Tuning Diffusion Models with Adversarial Supervision	Dazhong Shen et.al.	2504.11423	null
2025-04-15	VideoPanda: Video Panoramic Diffusion with Multi-view Attention	Kevin Xie et.al.	2504.11389	null
2025-04-15	Omni $^2$ : Unifying Omnidirectional Image Generation and Editing in an Omni Model	Liu Yang et.al.	2504.11379	null
2025-04-16	Seedream 3.0 Technical Report	Yu Gao et.al.	2504.11346	null
2025-04-15	Using LLMs as prompt modifier to avoid biases in AI image generators	René Peinl et.al.	2504.11104	null
2025-04-15	AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era	Chenyang Zhu et.al.	2504.11015	null
2025-04-15	InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation	Yukang Lin et.al.	2504.10905	null
2025-04-15	Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models	Karan Jain et.al.	2504.10883	null
2025-04-15	OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Dianbing Xi et.al.	2504.10825	null
2025-04-14	Art3D: Training-Free 3D Generation from Flat-Colored Illustration	Xiaoyan Cong et.al.	2504.10466	null
2025-04-14	Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing	Taihang Hu et.al.	2504.10434	link
2025-04-14	FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos	Rui Chen et.al.	2504.10358	null
2025-04-14	InstructEngine: Instruction-driven Text-to-Image Alignment	Xingyu Lu et.al.	2504.10329	null
2025-04-14	VibrantLeaves: A principled parametric image generator for training deep restoration models	Raphael Achddou et.al.	2504.10201	link
2025-04-14	GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions	Jo-Ku Cheng et.al.	2504.10146	link
2025-04-14	Aligning Anime Video Generation with Human Feedback	Bingwen Zhu et.al.	2504.10044	null
2025-04-14	Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics	Nikolai Röhrich et.al.	2504.10021	null
2025-04-14	Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes	Huijie Liu et.al.	2504.09948	null
2025-04-14	EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise	Chao Liu et.al.	2504.09789	null
2025-04-11	GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation	Tianwei Xiong et.al.	2504.08736	link
2025-04-11	Generating Fine Details of Entity Interactions	Xinyi Gu et.al.	2504.08714	null
2025-04-11	Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model	Team Seawead et.al.	2504.08685	null
2025-04-11	Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization	Jialu Li et.al.	2504.08641	null
2025-04-11	Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging	Gabriele Lozupone et.al.	2504.08635	link
2025-04-11	Discriminator-Free Direct Preference Optimization for Video Diffusion	Haoran Cheng et.al.	2504.08542	null
2025-04-11	On the Design of Diffusion-based Neural Speech Codecs	Pietro Foti et.al.	2504.08470	null
2025-04-11	Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion	Weiye Chen et.al.	2504.08451	link
2025-04-11	Diffusion Models for Robotic Manipulation: A Survey	Rosa Wolf et.al.	2504.08438	null
2025-04-11	MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization	Daeun Kim et.al.	2504.08398	null
2025-04-10	PixelFlow: Pixel-Space Generative Models with Flow	Shoufa Chen et.al.	2504.07963	link
2025-04-10	Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction	Zeren Jiang et.al.	2504.07961	link
2025-04-10	VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning	Zhong-Yu Li et.al.	2504.07960	null
2025-04-10	Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos	Rundong Luo et.al.	2504.07940	null
2025-04-10	DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows	Mashrur M. Morshed et.al.	2504.07894	null
2025-04-10	Towards Sustainable Creativity Support: An Exploratory Study on Prompt Based Image Generation	Daniel Hove Paludan et.al.	2504.07879	null
2025-04-10	Diffusion Transformers for Tabular Data Time Series Generation	Fabrizio Garuti et.al.	2504.07566	link
2025-04-10	FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation	Linyan Huang et.al.	2504.07405	null
2025-04-10	ID-Booth: Identity-consistent Face Generation with Diffusion Models	Darian Tomašević et.al.	2504.07392	link
2025-04-10	Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction	Qingchao Jiang et.al.	2504.07382	link
2025-04-09	OmniCaptioner: One Captioner to Rule Them All	Yiting Lu et.al.	2504.07089	link
2025-04-09	A Unified Agentic Framework for Evaluating Conditional Image Generation	Jifang Wang et.al.	2504.07046	link
2025-04-09	EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation	Diljeet Jagpal et.al.	2504.06861	null
2025-04-09	DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation	Wangbo Zhao et.al.	2504.06803	link
2025-04-09	A Meaningful Perturbation Metric for Evaluating Explainability Methods	Danielle Cohen et.al.	2504.06800	null
2025-04-10	Compass Control: Multi Object Orientation Control for Text-to-Image Generation	Rishubh Parihar et.al.	2504.06752	null
2025-04-09	RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism	Elia Peruzzo et.al.	2504.06672	null
2025-04-09	Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception	Ruotian Peng et.al.	2504.06666	null
2025-04-09	Collision avoidance from monocular vision trained with novel view synthesis	Valentin Tordjman–Levavasseur et.al.	2504.06651	null
2025-04-09	PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering	Yifan Gao et.al.	2504.06632	null
2025-04-08	Transfer between Modalities with MetaQueries	Xichen Pan et.al.	2504.06256	null
2025-04-08	HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance	Jiazi Bu et.al.	2504.06232	null
2025-04-08	A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model	Jihun Park et.al.	2504.06144	null
2025-04-08	CamContextI2V: Context-aware Controllable Video Generation	Luis Denninger et.al.	2504.06022	link
2025-04-08	An Empirical Study of GPT-4o Image Generation Capabilities	Sixiang Chen et.al.	2504.05979	link
2025-04-08	Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking	Junxi Chen et.al.	2504.05838	link
2025-04-08	Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models	Jiahao Chen et.al.	2504.05815	null
2025-04-08	Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling	Jaskirat Singh et.al.	2504.05800	null
2025-04-07	Gaussian Mixture Flow Matching Models	Hansheng Chen et.al.	2504.05304	link
2025-04-07	One-Minute Video Generation with Test-Time Training	Karan Dalal et.al.	2504.05298	null
2025-04-07	Video-Bench: Human-Aligned Video Generation Benchmark	Hui Han et.al.	2504.04907	null
2025-04-07	Imagining the Far East: Exploring Perceived Biases in AI-Generated Images of East Asian Women	Xingyu Lan et.al.	2504.04865	null
2025-04-07	AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation	Xiongbo Lu et.al.	2504.04743	null
2025-04-08	Your Image Generator Is Your New Private Dataset	Nicolo Resmini et.al.	2504.04582	null
2025-04-06	Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification	Shijian Wang et.al.	2504.04510	null
2025-04-06	UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding	Yang Jiao et.al.	2504.04423	link
2025-04-05	SDEIT: Semantic-Driven Electrical Impedance Tomography	Dong Liu et.al.	2504.04185	null
2025-04-05	Learning about the Physical World through Analytic Concepts	Jianhua Sun et.al.	2504.04170	null
2025-04-04	MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Wulin Xie et.al.	2504.03641	null
2025-04-04	Dynamic Importance in Diffusion U-Net for Enhanced Image Synthesis	Xi Wang et.al.	2504.03471	link
2025-04-04	QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning	Quanxing Xu et.al.	2504.03337	null
2025-04-04	Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models	Xuran Ma et.al.	2504.03140	link
2025-04-03	How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models	Pascal Chang et.al.	2504.03072	null
2025-04-03	VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning	Xianwei Zhuang et.al.	2504.02949	link
2025-04-03	Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments	Chenyu Zhang et.al.	2504.02918	null
2025-04-03	Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets	Chuning Zhu et.al.	2504.02792	null
2025-04-03	GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation	Zhiyuan Yan et.al.	2504.02782	link
2025-04-03	Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model	Shengjun Zhang et.al.	2504.02764	null
2025-04-03	RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models	ZhongLi Fang et.al.	2504.02640	null
2025-04-03	Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation	Jiwoo Chung et.al.	2504.02612	link
2025-04-04	Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation	Fa-Ting Hong et.al.	2504.02542	link
2025-04-03	ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer	Jiayi Gao et.al.	2504.02451	link
2025-04-03	SkyReels-A2: Compose Anything in Video Diffusion Transformers	Zhengcong Fei et.al.	2504.02436	link
2025-04-04	MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition	Takahiro Shirakawa et.al.	2504.02361	null
2025-04-03	OmniCam: Unified Multimodal Video Generation via Camera Control	Xiaoda Yang et.al.	2504.02312	null
2025-04-03	VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step	Hanyang Wang et.al.	2504.01956	null
2025-04-03	ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement	Runhui Huang et.al.	2504.01934	null
2025-04-02	FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs	Mothilal Asokan et.al.	2504.01916	link
2025-04-02	Instance Migration Diffusion for Nuclear Instance Segmentation in Pathology	Lirui Qi et.al.	2504.01577	null
2025-04-02	High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model	Yiyang Shen et.al.	2504.01512	null
2025-04-01	Prompting Forgetting: Unlearning in GANs via Textual Guidance	Piyush Nagasubramaniam et.al.	2504.01218	null
2025-04-01	Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models	Guy Kaplan et.al.	2504.01137	link
2025-04-01	ShieldGemma 2: Robust and Tractable Image Content Moderation	Wenjun Zeng et.al.	2504.01081	null
2025-04-01	AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction	Junhao Cheng et.al.	2504.01014	link
2025-04-01	MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization	Siyuan Li et.al.	2504.00999	link
2025-03-31	RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy	Zhonghan Zhao et.al.	2503.24388	null
2025-03-31	Consistent Subject Generation via Contrastive Instantiated Concepts	Lee Hsin-Ying et.al.	2503.24387	null
2025-03-31	Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation	Shengqiong Wu et.al.	2503.24379	null
2025-03-31	Style Quantization for Data-Efficient GAN Training	Jian Wang et.al.	2503.24282	null
2025-03-31	FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics	Yixuan Li et.al.	2503.24267	null
2025-03-31	Threats and Opportunities in AI-generated Images for Armed Forces	Raphael Meier et.al.	2503.24095	null
2025-04-01	HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation	Boyuan Wang et.al.	2503.24026	null
2025-03-31	JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation	Fangda Chen et.al.	2503.23951	null
2025-03-31	AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents	Jiaxiang Chen et.al.	2503.23948	link
2025-04-01	On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices	Bosung Kim et.al.	2503.23796	link
2025-03-28	Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure	Frank J. Brooks et.al.	2503.22658	null
2025-03-28	Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model	Jangho Park et.al.	2503.22622	null
2025-03-28	EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation	Hadrien Reynaud et.al.	2503.22357	null
2025-03-28	Meta-LoRA: Meta-Learning LoRA Components for Domain-Aware ID Personalization	Barış Batuhan Topal et.al.	2503.22352	null
2025-03-28	CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving	Yishen Ji et.al.	2503.22231	null
2025-03-28	Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces	Wonhyeok Choi et.al.	2503.22209	null
2025-03-28	ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation	Yunhong Min et.al.	2503.22194	null
2025-03-28	Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items	Jianghao Lin et.al.	2503.22182	null
2025-03-28	An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval	Min Cao et.al.	2503.22171	link
2025-03-28	Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis	Woojung Han et.al.	2503.22168	null
2025-03-27	VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models	Chi-Pin Huang et.al.	2503.21781	null
2025-03-27	Optimal Stepsize for Diffusion Sampling	Jianning Pei et.al.	2503.21774	link
2025-03-27	Exploring the Evolution of Physics Cognition in Video Generation: A Survey	Minghui Lin et.al.	2503.21765	link
2025-03-27	Lumina-Image 2.0: A Unified and Efficient Image Generative Framework	Qi Qin et.al.	2503.21758	link
2025-03-27	A Unified Framework for Diffusion Bridge Problems: Flow Matching and Schrödinger Matching into One	Minyoung Kim et.al.	2503.21756	null
2025-03-27	VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness	Dian Zheng et.al.	2503.21755	link
2025-03-27	CTRL-O: Language-Controllable Object-Centric Visual Representation Learning	Aniket Didolkar et.al.	2503.21747	null
2025-03-27	3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models	Yuhan Zhang et.al.	2503.21745	null
2025-03-27	Audio-driven Gesture Generation via Deviation Feature in the Latent Space	Jiahui Chen et.al.	2503.21616	null
2025-03-27	Zero-Shot Visual Concept Blending Without Text Guidance	Hiroya Makino et.al.	2503.21277	link
2025-03-26	High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching	Guoqiang Zhang et.al.	2503.20744	null
2025-03-26	RecTable: Fast Modeling Tabular Data with Rectified Flow	Masane Fuchi et.al.	2503.20731	link
2025-03-26	BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation	Yuyang Peng et.al.	2503.20672	null
2025-03-26	AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports	Xiangwen Zhang et.al.	2503.20654	null
2025-03-26	MMGen: Unified Multi-modal Image Generation and Understanding in One Go	Jiepeng Wang et.al.	2503.20644	null
2025-03-26	GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	Lloyd Russell et.al.	2503.20523	null
2025-03-26	VPO: Aligning Text-to-Video Generation Models with Prompt Optimization	Jiale Cheng et.al.	2503.20491	link
2025-03-26	Wan: Open and Advanced Large-Scale Video Generative Models	WanTeam et.al.	2503.20314	link
2025-03-26	Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models	Prin Phunyaphibarn et.al.	2503.20240	null
2025-03-26	Video Motion Graphs	Haiyang Liu et.al.	2503.20218	null
2025-03-25	FullDiT: Multi-Task Video Generative Foundation Model with Full Attention	Xuan Ju et.al.	2503.19907	null
2025-03-25	Scaling Down Text Encoders of Text-to-Image Diffusion Models	Lifu Wang et.al.	2503.19897	link
2025-03-25	Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation	Tianhao Qi et.al.	2503.19881	null
2025-03-25	AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers	Jiazhi Guan et.al.	2503.19824	null
2025-03-25	SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation	Jingdan Kang et.al.	2503.19791	link
2025-03-25	Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models	Kartik Thakral et.al.	2503.19783	null
2025-03-25	PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models	Junhyuk So et.al.	2503.19731	null
2025-03-25	VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models	Suhas G Hegde et.al.	2503.19530	null
2025-03-25	Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage	Zhengwentai Sun et.al.	2503.19486	null
2025-03-25	AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset	Haiyu Zhang et.al.	2503.19462	null
2025-03-25	Aether: Geometric-Aware Unified World Modeling	Aether Team et.al.	2503.18945	null
2025-03-24	Video-T1: Test-Time Scaling for Video Generation	Fangfu Liu et.al.	2503.18942	null
2025-03-24	Training-free Diffusion Acceleration with Bottleneck Sampling	Ye Tian et.al.	2503.18940	null
2025-03-24	SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection	Shrikant Malviya et.al.	2503.18812	link
2025-03-24	Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation	Qin Wang et.al.	2503.18753	null
2025-03-24	Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings	Cong Liu et.al.	2503.18719	null
2025-03-25	AMD-Hummingbird: Towards an Efficient Text-to-Video Model	Takashi Isobe et.al.	2503.18559	link
2025-03-24	Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models	Bin Li et.al.	2503.18556	null
2025-03-24	EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation	Qiang Qu et.al.	2503.18552	null
2025-03-24	Can Text-to-Video Generation help Video-Language Alignment?	Luca Zanella et.al.	2503.18507	null
2025-03-21	Position: Interactive Generative Video as Next-Generation Game Engine	Jiwen Yu et.al.	2503.17359	null
2025-03-21	Leveraging Text-to-Image Generation for Handling Spurious Correlation	Aryan Yazdan Parast et.al.	2503.17226	null
2025-03-21	D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens	Panpan Wang et.al.	2503.17155	null
2025-03-21	Halton Scheduler For Masked Generative Image Transformer	Victor Besnier et.al.	2503.17076	link
2025-03-21	Zero-Shot Styled Text Image Generation, but Make It Autoregressive	Vittorio Pippi et.al.	2503.17074	null
2025-03-21	AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process	Junjie Hu et.al.	2503.17029	null
2025-03-21	Enabling Versatile Controls for Video Diffusion Models	Xu Zhang et.al.	2503.16983	link
2025-03-21	Multiple Ultrasound Image Generation based on Tuned Alignment of Amplitude Hologram over Spatially non-Uniform Ultrasound Source	Keisuke Hasegawa et.al.	2503.16949	null
2025-03-21	Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model	Yingying Fan et.al.	2503.16942	null
2025-03-21	When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO	Lingfan Zhang et.al.	2503.16921	null
2025-03-20	XAttention: Block Sparse Attention with Antidiagonal Scoring	Ruyi Xu et.al.	2503.16428	link
2025-03-20	Tokenize Image as a Set	Zigang Geng et.al.	2503.16425	link
2025-03-20	MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance	Quanhao Li et.al.	2503.16421	null
2025-03-20	SynCity: Training-Free Generation of 3D Worlds	Paul Engstler et.al.	2503.16420	null
2025-03-20	InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity	Liming Jiang et.al.	2503.16418	link
2025-03-20	VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness	SeungJu Cha et.al.	2503.16406	link
2025-03-20	ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos	Haolin Yang et.al.	2503.16400	null
2025-03-20	LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images	Leyang Wang et.al.	2503.16376	null
2025-03-20	Ultra-Resolution Adaptation with Ease	Ruonan Yu et.al.	2503.16322	link
2025-03-20	Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction	Ziyao Guo et.al.	2503.16194	null
2025-03-19	Di $\mathtt{[M]}$ O: Distilling Masked Diffusion Models into One-step Generator	Yuanzhi Zhu et.al.	2503.15457	null
2025-03-19	Temporal Regularization Makes Your Video Generator Stronger	Harold Haodong Chen et.al.	2503.15417	null
2025-03-19	Visual Persona: Foundation Model for Full-Body Human Customization	Jisu Nam et.al.	2503.15406	null
2025-03-19	TruthLens:A Training-Free Paradigm for DeepFake Detection	Ritabrata Chakraborty et.al.	2503.15342	null
2025-03-19	TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models	Teng-Fang Hsiao et.al.	2503.15283	null
2025-03-19	LEGION: Learning to Ground and Explain for Synthetic Image Detection	Hengrui Kang et.al.	2503.15264	null
2025-03-19	Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization	Feifei Li et.al.	2503.15197	null
2025-03-20	VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention	Mingzhe Zheng et.al.	2503.15138	null
2025-03-20	Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis	Imanol G. Estepa et.al.	2503.15060	null
2025-03-19	FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis	Yaofei Duan et.al.	2503.14906	null
2025-03-18	MusicInfuser: Making Video Diffusion Listen and Dance	Susung Hong et.al.	2503.14505	null
2025-03-18	Deeply Supervised Flow-Based Generative Models	Inkyu Shin et.al.	2503.14494	null
2025-03-18	DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers	Minglei Shi et.al.	2503.14487	null
2025-03-18	ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing	Yulin Pan et.al.	2503.14482	null
2025-03-18	MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation	Hongyu Zhang et.al.	2503.14428	null
2025-03-18	Impossible Videos	Zechen Bai et.al.	2503.14378	null
2025-03-18	RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment	Chao Wang et.al.	2503.14358	null
2025-03-18	LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models	Yu Cheng et.al.	2503.14325	link
2025-03-18	Free-Lunch Color-Texture Disentanglement for Stylized Image Generation	Jiang Qin et.al.	2503.14275	null
2025-03-18	Concat-ID: Towards Universal Identity-Preserving Video Synthesis	Yong Zhong et.al.	2503.14151	null
2025-03-17	Unified Autoregressive Visual Generation and Understanding with Continuous Tokens	Lijie Fan et.al.	2503.13436	null
2025-03-17	BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing	Yaowei Li et.al.	2503.13434	null
2025-03-17	MAME: Multidimensional Adaptive Metamer Exploration with Human Perceptual Feedback	Mina Kamao et.al.	2503.13212	null
2025-03-17	Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation	Yihong Luo et.al.	2503.13070	null
2025-03-17	Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction	Zheyuan Liu et.al.	2503.12953	null
2025-03-17	DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode	Junjia Huang et.al.	2503.12838	null
2025-03-17	AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations	Quang Trung Truong et.al.	2503.12828	null
2025-03-17	GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching	Feng Qiao et.al.	2503.12720	link
2025-03-16	UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing	Tsu-Jui Fu et.al.	2503.12652	null
2025-03-16	Personalize Anything for Free with Diffusion Transformer	Haoran Feng et.al.	2503.12590	null
2025-03-14	ReCamMaster: Camera-Controlled Generative Rendering from A Single Video	Jianhong Bai et.al.	2503.11647	null
2025-03-14	HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models	Ziqin Zhou et.al.	2503.11513	null
2025-03-14	T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation	Seyed Mohammad Hadi Hosseini et.al.	2503.11481	null
2025-03-14	TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation	Hongxiang Zhao et.al.	2503.11423	null
2025-03-14	Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking	Ziyi Wang et.al.	2503.11324	null
2025-03-14	Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model	Haoyang Huang et.al.	2503.11251	link
2025-03-14	Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards	Zijing Hu et.al.	2503.11240	link
2025-03-14	Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption	Du Chen et.al.	2503.11221	null
2025-03-14	Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation	Fengchen He et.al.	2503.11213	link
2025-03-14	Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models	Shree Singhi et.al.	2503.11195	null
2025-03-13	GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing	Rongyao Fang et.al.	2503.10639	link
2025-03-13	DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation	Chen Chen et.al.	2503.10618	null
2025-03-13	CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models	Hao He et.al.	2503.10592	null
2025-03-13	Long Context Tuning for Video Generation	Yuwei Guo et.al.	2503.10589	null
2025-03-13	Autoregressive Image Generation with Randomized Parallel Decoding	Haopeng Li et.al.	2503.10568	link
2025-03-13	RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models	Yijing Lin et.al.	2503.10406	null
2025-03-13	CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance	Yufan Deng et.al.	2503.10391	null
2025-03-13	ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation	Zirun Guo et.al.	2503.10358	null
2025-03-13	Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark	Viktor Moskvoretskii et.al.	2503.10357	null
2025-03-13	MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment	Hao Zhou et.al.	2503.10287	null
2025-03-12	PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop	Chenyu Li et.al.	2503.09595	link
2025-03-12	FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model	Jiahao Xia et.al.	2503.09560	null
2025-03-12	PromptMap: An Alternative Interaction Style for AI-Based Image Generation	Krzysztof Adamkiewicz et.al.	2503.09436	link
2025-03-12	LHC Triggers using FPGA Image Recognition	James Brooke et.al.	2503.09428	null
2025-03-12	Unified Dense Prediction of Video Diffusion	Lehan Yang et.al.	2503.09344	null
2025-03-12	Revealing the Implicit Noise-based Imprint of Generative Models	Xinghan Li et.al.	2503.09314	null
2025-03-12	Revealing Unintentional Information Leakage in Low-Dimensional Facial Portrait Representations	Kathleen Anderson et.al.	2503.09306	link
2025-03-12	UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer	Haoxuan Wang et.al.	2503.09277	null
2025-03-12	NAMI: Efficient Image Generation via Progressive Rectified Flow Transformers	Yuhang Ma et.al.	2503.09242	null
2025-03-12	Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets	Hannah Kniesel et.al.	2503.09221	null
2025-03-11	GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing	Yuanhao Wang et.al.	2503.08678	null
2025-03-11	REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder	Yitian Zhang et.al.	2503.08665	null
2025-03-11	Generating Robot Constitutions & Benchmarks for Semantic Safety	Pierre Sermanet et.al.	2503.08663	null
2025-03-11	LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization	Xianfeng Wu et.al.	2503.08619	link
2025-03-11	Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling	Subin Kim et.al.	2503.08605	null
2025-03-11	Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum	Shengpeng Xiao et.al.	2503.08484	null
2025-03-12	Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens	Qingsong Xie et.al.	2503.08377	null
2025-03-11	Robust Latent Matters: Boosting Image Generation with Sampling Error	Kai Qiu et.al.	2503.08354	link
2025-03-12	$^R$ FLAV: Rolling Flow matching for infinite Audio Video generation	Alex Ergasti et.al.	2503.08307	link
2025-03-11	OminiControl2: Efficient Conditioning for Diffusion Transformers	Zhenxiong Tan et.al.	2503.08280	link
2025-03-10	V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation	Guiwei Zhang et.al.	2503.07493	link
2025-03-10	GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models	Ryugo Morita et.al.	2503.07463	null
2025-03-10	AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion	Mingzhen Sun et.al.	2503.07418	null
2025-03-10	TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models	Ruidong Chen et.al.	2503.07389	link
2025-03-10	Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment	Xing Xie et.al.	2503.07334	link
2025-03-10	Automated Movie Generation via Multi-Agent CoT Planning	Weijia Wu et.al.	2503.07314	link
2025-03-10	WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation	Yuwei Niu et.al.	2503.07265	link
2025-03-10	Effective and Efficient Masked Image Generation Models	Zebin You et.al.	2503.07197	link
2025-03-10	NFIG: Autoregressive Image Generation with Next-Frequency Prediction	Zhihao Huang et.al.	2503.07076	null
2025-03-10	TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation	Victor Shea-Jay Huang et.al.	2503.07050	null
2025-03-07	Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models	Zheng Li et.al.	2503.05595	link
2025-03-07	Frequency Autoregressive Image Generation with Continuous Tokens	Hu Yu et.al.	2503.05305	null
2025-03-07	MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio	Xuenan Xu et.al.	2503.05242	link
2025-03-07	Unified Reward Model for Multimodal Understanding and Generation	Yibin Wang et.al.	2503.05236	null
2025-03-07	RecipeGen: A Benchmark for Real-World Recipe Image Generation	Ruoxuan Zhang et.al.	2503.05228	null
2025-03-07	Development and Enhancement of Text-to-Image Diffusion Models	Rajdeep Roshan Sahu et.al.	2503.05149	null
2025-03-06	Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation	Alexey Buzovkin et.al.	2503.04871	link
2025-03-06	FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video	Yue Gao et.al.	2503.04720	null
2025-03-06	What Are You Doing? A Closer Look at Controllable Human Video Generation	Emanuele Bugliarello et.al.	2503.04666	null
2025-03-08	The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation	Aoxiong Yin et.al.	2503.04606	link
2025-03-06	S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting	Yecong Wan et.al.	2503.04314	null
2025-03-06	Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models	Rui Jiang et.al.	2503.04215	null
2025-03-06	Underlying Semantic Diffusion for Effective and Efficient In-Context Learning	Zhong Ji et.al.	2503.04050	null
2025-03-06	DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation	Amin Karimi et.al.	2503.04006	null
2025-03-05	GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control	Xuanchi Ren et.al.	2503.03751	link
2025-03-05	Rethinking Video Tokenization: A Conditioned Diffusion-based Approach	Nianzu Yang et.al.	2503.03708	link
2025-03-05	DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	Zhao Yang et.al.	2503.03689	link
2025-03-05	A Generative Approach to High Fidelity 3D Reconstruction from Text Data	Venkat Kumar R et.al.	2503.03664	null
2025-03-05	High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights	Yuna Kato et.al.	2503.03558	link
2025-03-05	Video Super-Resolution: All You Need is a Video Diffusion Model	Zhihao Zhan et.al.	2503.03355	null
2025-03-05	GenColor: Generative Color-Concept Association in Visual Design	Yihan Hou et.al.	2503.03236	null
2025-03-05	An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models	Binxu Wang et.al.	2503.03206	null
2025-03-05	Find Matching Faces Based On Face Parameters	Setu A. Bhatt et.al.	2503.03204	null
2025-03-05	From Architectural Sketch to Conceptual Representation: Using Structure-Aware Diffusion Model to Generate Renderings of School Buildings	Zhengyang Wang et.al.	2503.03090	null
2025-03-04	ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models	Qinyu Zhao et.al.	2503.02883	link
2025-03-04	Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts	Marta Skreta et.al.	2503.02819	link
2025-03-04	Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution	Ru Ito et.al.	2503.02767	null
2025-03-04	Generative Modeling of Microweather Wind Velocities for Urban Air Mobility	Tristan A. Shah et.al.	2503.02690	link
2025-03-04	SPG: Improving Motion Diffusion by Smooth Perturbation Guidance	Boseong Jeon et.al.	2503.02577	null
2025-03-04	PVTree: Realistic and Controllable Palm Vein Generation for Recognition Tasks	Sheng Shang et.al.	2503.02547	null
2025-03-04	RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification	Zhen Yang et.al.	2503.02537	null
2025-03-04	Q&C: When Quantization Meets Cache in Efficient Image Generation	Xin Ding et.al.	2503.02508	null
2025-03-04	Teaching Metric Distance to Autoregressive Multimodal Foundational Models	Jiwan Chung et.al.	2503.02379	null
2025-03-04	GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning	Zhun Mou et.al.	2503.02341	null
2025-02-28	How far can we go with ImageNet for Text-to-Image generation?	L. Degeorge et.al.	2502.21318	null
2025-02-28	Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos	Zhiyu Tan et.al.	2502.21314	null
2025-03-03	MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing	Xueyun Tian et.al.	2502.21291	link
2025-02-28	A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images	Zineb Sordo et.al.	2502.21151	null
2025-02-28	Training-free and Adaptive Sparse Attention for Efficient Long Video Generation	Yifei Xia et.al.	2502.21079	null
2025-02-28	Synthesizing Individualized Aging Brains in Health and Disease with Generative Models and Parallel Transport	Jingru Fu et.al.	2502.21049	link
2025-02-28	DiffBrush:Just Painting the Art by Your Hands	Jiaming Chu et.al.	2502.20904	null
2025-02-28	HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models	Xiao Wang et.al.	2502.20811	null
2025-02-28	WorldModelBench: Judging Video Generation Models As World Models	Dacheng Li et.al.	2502.20694	null
2025-02-28	Diffusion Restoration Adapter for Real-World Image Restoration	Hanbang Liang et.al.	2502.20679	null
2025-02-27	FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction	Siyu Jiao et.al.	2502.20313	link
2025-02-27	Mobius: Text to Seamless Looping Video Generation via Latent Shift	Xiuli Bi et.al.	2502.20307	link
2025-02-27	Attention Distillation: A Unified Approach to Visual Characteristics Transfer	Yang Zhou et.al.	2502.20235	link
2025-02-27	Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think	Liang Chen et.al.	2502.20172	link
2025-02-27	FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute	Sotiris Anagnostidis et.al.	2502.20126	null
2025-02-27	New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration	Xuzheng Yang et.al.	2502.20104	null
2025-02-27	C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation	Yuhao Li et.al.	2502.19868	link
2025-02-27	Analyzing CLIP’s Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study	Reza Abbasi et.al.	2502.19828	null
2025-02-27	MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery	Lianping Yang et.al.	2502.19797	null
2025-02-27	The erasure of intensive livestock farming in text-to-image generative AI	Kehan Sheng et.al.	2502.19771	link
2025-02-26	Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making	Soobin Park et.al.	2502.18853	null
2025-02-26	Optimal Stochastic Trace Estimation in Generative Modeling	Xinyang Liu et.al.	2502.18808	null
2025-02-26	AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools	Nathalie Riche et.al.	2502.18736	null
2025-02-25	Investigating Youth AI Auditing	Jaemarie Solyst et.al.	2502.18576	null
2025-02-25	ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation	Yifan Pu et.al.	2502.18364	null
2025-02-25	LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation	Pengzhi Li et.al.	2502.18302	null
2025-02-25	Training Consistency Models with Variational Noise Coupling	Gianluigi Silvestri et.al.	2502.18197	link
2025-02-25	SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference	Jintao Zhang et.al.	2502.18137	link
2025-02-26	Bayesian Optimization for Controlled Image Editing via LLMs	Chengkun Cai et.al.	2502.18116	null
2025-02-25	Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models	Jia Yu et.al.	2502.17951	link
2025-02-25	ASurvey: Spatiotemporal Consistency in Video Generation	Zhiyu Yin et.al.	2502.17863	null
2025-02-25	FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks	Tanawan Premsri et.al.	2502.17775	link
2025-02-25	Fractal Generative Models	Tianhong Li et.al.	2502.17437	link
2025-02-24	X-Dancer: Expressive Music to Human Dance Video Generation	Zeyuan Chen et.al.	2502.17414	null
2025-02-24	RELICT: A Replica Detection Framework for Medical Image Generation	Orhun Utku Aydin et.al.	2502.17360	link
2025-02-24	VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing	Xiangpeng Yang et.al.	2502.17258	null
2025-02-24	DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks	Canyu Zhao et.al.	2502.17157	link
2025-02-24	Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions	Zhong Li et.al.	2502.17119	link
2025-02-24	Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence	Wenzhe Yin et.al.	2502.17028	null
2025-02-24	Autoregressive Image Generation Guided by Chains of Thought	Miaomiao Cai et.al.	2502.16965	null
2025-02-24	Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment	Suchae Jeong et.al.	2502.16902	null
2025-02-24	A Survey of fMRI to Image Reconstruction	Weiyu Guo et.al.	2502.16861	null
2025-02-21	One-step Diffusion Models with $f$ -Divergence Distribution Matching	Yilun Xu et.al.	2502.15681	null
2025-02-21	VaViM and VaVAM: Autonomous Driving through Video Generative Modeling	Florent Bartoccioni et.al.	2502.15672	link
2025-02-21	Soybean pod and seed counting in both outdoor fields and indoor laboratories using unions of deep neural networks	Tianyou Jiang et.al.	2502.15286	null
2025-02-21	Unsettling the Hegemony of Intention: Agonistic Image Generation	Andre Ye et.al.	2502.15242	null
2025-02-21	FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation	Young Beom Woo et.al.	2502.15203	null
2025-02-21	Methods and Trends in Detecting Generated Images: A Comprehensive Review	Arpan Mahara et.al.	2502.15176	null
2025-02-20	Hardware-Friendly Static Quantization Method for Video Diffusion Transformers	Sanghyun Yi et.al.	2502.15077	null
2025-02-20	Generative Modeling of Individual Behavior at Scale	Nabil Omi et.al.	2502.14998	null
2025-02-20	LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection	Qingyuan Liu et.al.	2502.14994	null
2025-02-20	Improving the Diffusability of Autoencoders	Ivan Skorokhodov et.al.	2502.14831	null
2025-02-20	DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models	Hongji Yang et.al.	2502.14779	null
2025-02-20	AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers	Wen-Fan Wang et.al.	2502.14747	null
2025-02-20	RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers	Ke Cao et.al.	2502.14377	null
2025-02-20	Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation	Jiayu Yang et.al.	2502.14247	link
2025-02-20	Designing Parameter and Compute Efficient Diffusion Transformers using Distillation	Vignesh Sundaresha et.al.	2502.14226	null
2025-02-19	d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining	Prasun Roy et.al.	2502.14007	link
2025-02-19	FlexTok: Resampling Images into 1D Token Sequences of Flexible Length	Roman Bachmann et.al.	2502.13967	null
2025-02-19	IP-Composer: Semantic Composition of Visual Concepts	Sara Dorfman et.al.	2502.13951	null
2025-02-19	MagicGeo: Training-Free Text-Guided Geometric Diagram Generation	Junxiao Wang et.al.	2502.13855	null
2025-02-19	Flow-based generative models as iterative algorithms in probability space	Yao Xie et.al.	2502.13394	null
2025-02-18	Breaking the bonds of generative artificial intelligence by minimizing the maximum entropy	Mattia Miotto et.al.	2502.13287	null
2025-02-18	Personalized Image Generation with Deep Generative Models: A Decade Survey	Yuxiang Wei et.al.	2502.13081	link
2025-02-19	LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Junchen Fu et.al.	2502.12945	null
2025-02-18	Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options	Lakshmi Nair et.al.	2502.12929	link
2025-02-18	VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation	Xinlong Chen et.al.	2502.12782	link
2025-02-18	3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces	Fabian Bongratz et.al.	2502.12742	null
2025-02-18	MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation	Sihyun Yu et.al.	2502.12632	null
2025-02-18	CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation	Minghao Fu et.al.	2502.12579	link
2025-02-18	DeltaDiff: A Residual-Guided Diffusion Model for Enhanced Image Super-Resolution	Chao Yang et.al.	2502.12567	null
2025-02-17	LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities	Florian Sestak et.al.	2502.12128	link
2025-02-17	A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond	Shreya Shukla et.al.	2502.12048	null
2025-02-17	Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images	Negar Kamali et.al.	2502.11989	link
2025-02-17	GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs	Yi Fang et.al.	2502.11925	null
2025-02-17	DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation	Zhihang Yuan et.al.	2502.11897	link
2025-02-17	Object-Centric Image to Video Generation with Language Guidance	Angel Villar-Corrales et.al.	2502.11655	null
2025-02-17	Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation	Taeyoung Yun et.al.	2502.11477	link
2025-02-16	MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation	Michael Fuest et.al.	2502.11234	null
2025-02-16	Phantom: Subject-consistent video generation via cross-modal alignment	Lijie Liu et.al.	2502.11079	null
2025-02-15	Hybrid Deepfake Image Detection: A Comprehensive Dataset-Driven Approach Integrating Convolutional and Attention Mechanisms with Frequency Domain Features	Kafi Anan et.al.	2502.10682	null
2025-02-14	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Guoqing Ma et.al.	2502.10248	link
2025-02-14	RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control	Teng Li et.al.	2502.10059	null
2025-02-14	ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation	Yuxin He et.al.	2502.10028	null
2025-02-13	CellFlow: Simulating Cellular Morphology Changes via Flow Matching	Yuhui Zhang et.al.	2502.09775	null
2025-02-13	Designing a Conditional Prior Distribution for Flow-Based Generative Models	Noam Issachar et.al.	2502.09611	null
2025-02-13	Redistribute Ensemble Training for Mitigating Memorization in Diffusion Models	Xiaoliu Guan et.al.	2502.09434	link
2025-02-13	ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation	Rotem Shalev-Arkushin et.al.	2502.09411	null
2025-02-13	When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models	Samuel Joseph Amouyal et.al.	2502.09307	null
2025-02-14	GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation	Hongyin Zhang et.al.	2502.09268	null
2025-02-13	Sequential Covariance Fitting for InSAR Phase Linking	Dana El Hajjar et.al.	2502.09248	null
2025-02-13	Dynamic watermarks in images generated by diffusion models	Yunzhuo Chen et.al.	2502.08927	null
2025-02-13	Detecting Malicious Concepts Without Image Generation in AIGC	Kun Xu et.al.	2502.08921	null
2025-02-12	HistoSmith: Single-Stage Histology Image-Label Generation via Conditional Latent Diffusion for Enhanced Cell Segmentation and Classification	Valentina Vadori et.al.	2502.08754	link
2025-02-12	CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation	Qinghe Wang et.al.	2502.08639	null
2025-02-12	Enhancing Diffusion Models Efficiency by Disentangling Total-Variance and Signal-to-Noise Ratio	Khaled Kahouli et.al.	2502.08598	link
2025-02-12	Ultrasound Image Generation using Latent Diffusion Models	Benoit Freiche et.al.	2502.08580	null
2025-02-12	BCDDM: Branch-Corrected Denoising Diffusion Model for Black Hole Image Generation	Ao liu et.al.	2502.08528	null
2025-02-12	FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis	Wonjoon Jin et.al.	2502.08244	null
2025-02-12	Learning Human Skill Generators at Key-Step Levels	Yilu Wu et.al.	2502.08234	null
2025-02-12	AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance	Zhao Wang et.al.	2502.08189	null
2025-02-12	PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation	Ziyan Wang et.al.	2502.08106	null
2025-02-12	ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation	Qianrui Teng et.al.	2502.08097	null
2025-02-11	Training-Free Safe Denoisers for Safe Use of Diffusion Models	Mingyu Kim et.al.	2502.08011	null
2025-02-11	Direct Ascent Synthesis: Revealing Hidden Generative Capabilities in Discriminative Models	Stanislav Fort et.al.	2502.07753	null
2025-02-11	CausalGeD: Blending Causality and Diffusion for Spatial Gene Expression Generation	Rabeya Tus Sadia et.al.	2502.07751	null
2025-02-11	Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling	Shuhuai Ren et.al.	2502.07737	null
2025-02-11	Magic 1-For-1: Generating One Minute Video Clips within One Minute	Hongwei Yi et.al.	2502.07701	link
2025-02-11	SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches	Haichuan Lin et.al.	2502.07556	link
2025-02-11	VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation	Sixiao Zheng et.al.	2502.07531	null
2025-02-11	Enhance-A-Video: Better Generated Video for Free	Yang Luo et.al.	2502.07508	link
2025-02-11	RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation	Viacheslav Vasilev et.al.	2502.07455	link
2025-02-11	Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers	Zhaodong Bing et.al.	2502.07436	null
2025-02-11	Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos	Haowen Gao et.al.	2502.07327	null
2025-02-10	Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT	Dongyang Liu et.al.	2502.06782	null
2025-02-10	History-Guided Video Diffusion	Kiwhan Song et.al.	2502.06764	null
2025-02-10	Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists	Bojia Zi et.al.	2502.06734	null
2025-02-10	TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models	Yangguang Li et.al.	2502.06608	link
2025-02-10	A Large-scale AI-generated Image Inpainting Benchmark	Paschalis Giakoumoglou et.al.	2502.06593	null
2025-02-10	CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers	D. She et.al.	2502.06527	null
2025-02-10	Universal Approximation of Visual Autoregressive Transformers	Yifang Chen et.al.	2502.06167	null
2025-02-10	Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile	Hangliang Ding et.al.	2502.06155	null
2025-02-10	Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models	Ce Zhang et.al.	2502.06130	link
2025-02-09	Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization	Jiajun Fan et.al.	2502.06061	null
2025-02-07	FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation	Shilong Zhang et.al.	2502.05179	link
2025-02-07	QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation	Yue Zhao et.al.	2502.05178	null
2025-02-07	Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment	Minh-Quan Le et.al.	2502.05153	null
2025-02-07	C2GM: Cascading Conditional Generation of Multi-scale Maps from Remote Sensing Images Constrained by Geographic Features	Chenxing Sun et.al.	2502.04991	null
2025-02-07	Cached Multi-Lora Composition for Multi-Concept Image Generation	Xiandong Zou et.al.	2502.04923	link
2025-02-07	Goku: Flow Based Video Generative Foundation Models	Shoufa Chen et.al.	2502.04896	null
2025-02-07	HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation	Qijun Gan et.al.	2502.04847	null
2025-02-07	G2PDiffusion: Genotype-to-Phenotype Prediction with Diffusion Models	Mengdi Liu et.al.	2502.04684	null
2025-02-06	Fast Video Generation with Sliding Tile Attention	Peiyuan Zhang et.al.	2502.04507	null
2025-02-06	Augmented Conditioning Is Enough For Effective Training Image Generation	Jiahui Chen et.al.	2502.04475	null
2025-02-06	HOG-Diff: Higher-Order Guided Diffusion for Graph Generation	Yiming Huang et.al.	2502.04308	link
2025-02-06	MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation	Jinbo Xing et.al.	2502.04299	null
2025-02-06	Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression	Lirui Wang et.al.	2502.04296	null
2025-02-06	Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention	Ayush K. Varshney et.al.	2502.04260	null
2025-02-06	Multi-fidelity emulator for large-scale 21 cm lightcone images: a few-shot transfer learning approach with generative adversarial network	Kangning Diao et.al.	2502.04246	null
2025-02-06	Generative Adversarial Networks Bridging Art and Machine Intelligence	Junhao Song et.al.	2502.04116	null
2025-02-06	Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency	Shangkun Sun et.al.	2502.04076	link
2025-02-06	UniForm: A Unified Diffusion Transformer for Audio-Video Generation	Lei Zhao et.al.	2502.03897	null
2025-02-06	FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing	Jinya Sakurai et.al.	2502.03826	null
2025-02-06	DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models	Lingshun Kong et.al.	2502.03810	null
2025-02-05	On Fairness of Unified Multimodal Large Language Model for Image Generation	Ming Liu et.al.	2502.03429	null
2025-02-05	TruePose: Human-Parsing-guided Attention Diffusion for Full-ID Preserving Pose Transfer	Zhihong Xu et.al.	2502.03426	null
2025-02-05	Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation	Alexey A. Novikov et.al.	2502.03420	null
2025-02-05	MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent	Xinyao Liao et.al.	2502.03207	null
2025-02-05	Poisson Flow Joint Model for Multiphase contrast-enhanced CT	Rongjun Ge et.al.	2502.03079	null
2025-02-05	A Survey of Sample-Efficient Deep Learning for Change Detection in Remote Sensing: Tasks, Strategies, and Challenges	Lei Ding et.al.	2502.02835	null
2025-02-04	When are Diffusion Priors Helpful in Sparse Reconstruction? A Study with Sparse-view CT	Matt Y. Cheung et.al.	2502.02771	null
2025-02-04	Controllable Video Generation with Provable Disentanglement	Yifan Shen et.al.	2502.02690	null
2025-02-04	VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models	Hila Chefer et.al.	2502.02492	null
2025-02-04	On the Guidance of Flow Matching	Ruiqi Feng et.al.	2502.02150	link
2025-02-04	IPO: Iterative Preference Optimization for Text-to-Video Generation	Xiaomeng Yang et.al.	2502.02088	null
2025-02-03	VILP: Imitation Learning with Latent Video Planning	Zhengtong Xu et.al.	2502.01784	link
2025-02-03	Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity	Haocheng Xi et.al.	2502.01776	null
2025-02-03	MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation	Haibo Tong et.al.	2502.01719	null
2025-02-03	MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation	Yiren Song et.al.	2502.01572	null
2025-02-03	Improved Training Technique for Latent Consistency Models	Quan Dao et.al.	2502.01441	link
2025-02-03	Assessing the use of Diffusion models for motion artifact correction in brain MRI	Paolo Angella et.al.	2502.01418	null
2025-02-04	Compressed Image Generation with Denoising Diffusion Codebook Models	Guy Ohayon et.al.	2502.01189	null
2025-01-31	Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search	Yuta Oshima et.al.	2501.19252	null
2025-01-31	Ambient Denoising Diffusion Generative Adversarial Networks for Establishing Stochastic Object Models from Noisy Image Data	Xichen Xu et.al.	2501.19094	null
2025-01-31	Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations	Dahye Kim et.al.	2501.19066	link
2025-01-31	BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics	Yuxuan Liu et.al.	2501.18972	null
2025-01-31	Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models	Jaesin Ahn et.al.	2501.18877	link
2025-01-31	REG: Rectified Gradient Guidance for Conditional Diffusion Models	Zhengqi Gao et.al.	2501.18865	null
2025-01-30	Every Image Listens, Every Image Dances: Music-Driven Image Animation	Zhikang Dong et.al.	2501.18801	null
2025-01-30	High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2	Nandakishor M et.al.	2501.18670	null
2025-01-30	Diffusion Autoencoders are Scalable Image Tokenizers	Yinbo Chen et.al.	2501.18593	null
2025-01-30	SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer	Enze Xie et.al.	2501.18427	link
2025-01-30	Simulation of microstructures and machine learning	Katja Schladitz et.al.	2501.18313	null
2025-01-30	LLMs can see and hear without any training	Kumar Ashutosh et.al.	2501.18096	link
2025-01-29	Generative AI for Vision: A Comprehensive Study of Frameworks and Applications	Fouad Bousetouane et.al.	2501.18033	null
2025-01-29	Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling	Xiaokang Chen et.al.	2501.17811	link
2025-01-29	A Framework for Generating Realistic Synthetic Tabular Data in a Randomized Controlled Trial Setting	Niki Z. Petrakos et.al.	2501.17719	null
2025-01-29	Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment	Zixue Zeng et.al.	2501.17690	link
2025-01-28	Text-to-Image Generation for Vocabulary Learning Using the Keyword Method	Nuwan T. Attygalle et.al.	2501.17099	null
2025-01-28	DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation	Chenguo Lin et.al.	2501.16764	null
2025-01-29	Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion	Shengyuan Liu et.al.	2501.16679	link
2025-01-28	Variational Schrödinger Momentum Diffusion	Kevin Rojas et.al.	2501.16675	null
2025-01-28	CascadeV: An Implementation of Wurstchen Architecture for Video Generation	Wenfeng Lin et.al.	2501.16612	link
2025-01-27	LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation	Farzad Farhadzadeh et.al.	2501.16559	null
2025-01-27	RelightVid: Temporal-Consistent Diffusion Model for Video Relighting	Ye Fang et.al.	2501.16330	null
2025-01-28	Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation	Adil Kaan Akan et.al.	2501.15878	null
2025-01-27	Autonomous Horizon-based Asteroid Navigation With Observability-constrained Maneuvers	Aditya Arjun Anibha et.al.	2501.15806	null
2025-01-27	Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?	Yunbo Lyu et.al.	2501.15775	null
2025-01-26	Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting	Yuxin Zhang et.al.	2501.15641	link
2025-01-26	Comparative clinical evaluation of “memory-efficient” synthetic 3d generative adversarial networks (gan) head-to-head to state of art: results on computed tomography of the chest	Mahshid shiri et.al.	2501.15572	null
2025-01-26	“See What I Imagine, Imagine What I See”: Human-AI Co-Creation System for 360 $^\circ$ Panoramic Video Generation in VR	Yunge Wen et.al.	2501.15456	null
2025-01-26	SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity	Zichen Fan et.al.	2501.15448	null
2025-01-26	StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces	Kyeongmin Yeo et.al.	2501.15445	null
2025-01-25	Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation	Yangfan He et.al.	2501.15167	null
2025-01-24	Towards Scalable Topological Regularizers	Hiu-Tung Wong et.al.	2501.14641	null
2025-01-24	Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.*	Ludovica Schaerf et.al.	2501.14524	null
2025-01-24	PAID: A Framework of Product-Centric Advertising Image Design	Hongyu Chen et.al.	2501.14316	null
2025-01-24	VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking	Runyi Hu et.al.	2501.14195	link
2025-01-23	Can We Generate Images with CoT? Let’s Verify and Reinforce Image Generation Step by Step	Ziyu Guo et.al.	2501.13926	link
2025-01-23	IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models	Jiayi Lei et.al.	2501.13920	null
2025-01-23	Improving Video Generation with Human Feedback	Jie Liu et.al.	2501.13918	null
2025-01-23	Generating Realistic Forehead-Creases for User Verification via Conditioned Piecewise Polynomial Curves	Abhishek Tandon et.al.	2501.13889	link
2025-01-23	A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation	Dario Serez et.al.	2501.13718	null
2025-01-24	One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt	Tao Liu et.al.	2501.13554	link
2025-01-23	EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion	Jiangchuan Wei et.al.	2501.13452	null
2025-01-23	MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize	Haohang Xu et.al.	2501.13349	null
2025-01-23	Accelerate High-Quality Diffusion Models with Inner Loop Feedback	Matthew Gwilliam et.al.	2501.13107	null
2025-01-22	Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation	Akshay Krishnan et.al.	2501.13087	null
2025-01-22	LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation	Jiahao Wang et.al.	2501.12976	null
2025-01-22	PreciseCam: Precise Camera Control for Text-to-Image Generation	Edurne Bernal-Berdun et.al.	2501.12910	null
2025-01-22	T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation	Lijun Li et.al.	2501.12612	link
2025-01-22	GPS as a Control Signal for Image Generation	Chao Feng et.al.	2501.12390	null
2025-01-21	Taming Teacher Forcing for Masked Autoregressive Video Generation	Deyu Zhou et.al.	2501.12389	null
2025-01-21	Parallel Sequence Modeling via Generalized Spatial Propagation Network	Hongjun Wang et.al.	2501.12381	null
2025-01-22	Video Depth Anything: Consistent Depth Estimation for Super-Long Videos	Sili Chen et.al.	2501.12375	null
2025-01-21	Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists	Thomas F. Eisenmann et.al.	2501.12374	link
2025-01-21	ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions	Shiyue Zhang et.al.	2501.12173	link
2025-01-20	Are generative models fair? A study of racial bias in dermatological image generation	Miguel López-Pérez et.al.	2501.11752	null
2025-01-20	GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video	Zhenliang Ni et.al.	2501.11340	null
2025-01-20	CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation	Zheng Chong et.al.	2501.11325	link
2025-01-20	Nested Annealed Training Scheme for Generative Adversarial Networks	Chang Wan et.al.	2501.11318	null
2025-01-17	DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency	Xiaohui Li et.al.	2501.10110	null
2025-01-17	DiffuEraser: A Diffusion Model for Video Inpainting	Xiaowen Li et.al.	2501.10018	link
2025-01-17	RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation	Yuefan Cao et.al.	2501.09982	null
2025-01-17	Physics-informed DeepCT: Sinogram Wavelet Decomposition Meets Masked Diffusion	Zekun Zhou et.al.	2501.09935	link
2025-01-17	IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment	Shangkun Sun et.al.	2501.09927	null
2025-01-16	PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery	Shristi Das Biswas et.al.	2501.09826	link
2025-01-16	VideoWorld: Exploring Knowledge Learning from Unlabeled Videos	Zhongwei Ren et.al.	2501.09781	null
2025-01-16	Learnings from Scaling Visual Tokenizers for Reconstruction and Generation	Philippe Hansen-Estruch et.al.	2501.09755	null
2025-01-16	Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps	Nanye Ma et.al.	2501.09732	null
2025-01-16	AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation	Junjie He et.al.	2501.09503	link
2025-01-16	Dynamic Neural Style Transfer for Artistic Image Generation using VGG19	Kapil Kashyap et.al.	2501.09420	null
2025-01-16	SVIA: A Street View Image Anonymization Framework for Self-Driving Applications	Dongyu Liu et.al.	2501.09393	link
2025-01-16	Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse	Guangyuan Liu et.al.	2501.09391	null
2025-01-15	Grounding Text-To-Image Diffusion Models For Controlled High-Quality Image Generation	Ahmad Süleyman et.al.	2501.09194	null
2025-01-15	Generative diffusion model with inverse renormalization group flows	Kanta Masuki et.al.	2501.09064	link
2025-01-15	Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion	Jingyuan Chen et.al.	2501.09019	null
2025-01-15	How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias	Tosin Fadahunsi et.al.	2501.09014	link
2025-01-15	Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	Ruixiang Jiang et.al.	2501.09012	link
2025-01-15	RepVideo: Rethinking Cross-Layer Representation for Video Generation	Chenyang Si et.al.	2501.08994	null
2025-01-15	Enhanced Multi-Scale Cross-Attention for Person Image Generation	Hao Tang et.al.	2501.08900	null
2025-01-15	StereoGen: High-quality Stereo Image Generation from a Single Image	Xianqi Wang et.al.	2501.08654	null
2025-01-15	Joint Learning of Depth and Appearance for Portrait Image Animation	Xinya Ji et.al.	2501.08649	null
2025-01-15	Watermarking in Diffusion Model: Gaussian Shading with Exact Diffusion Inversion via Coupled Transformations (EDICT)	Krishna Panthi et.al.	2501.08604	null
2025-01-15	Comprehensive Subjective and Objective Evaluation Method for Text-generated Video	Zelu Qi et.al.	2501.08545	null
2025-01-15	Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers	Zhongwang Zhang et.al.	2501.08537	link
2025-01-14	GameFactory: Creating New Games with Generative Interactive Videos	Jiwen Yu et.al.	2501.08325	null
2025-01-14	Diffusion Adversarial Post-Training for One-Step Video Generation	Shanchuan Lin et.al.	2501.08316	null
2025-01-14	LayerAnimate: Layer-specific Control for Animation	Yuxue Yang et.al.	2501.08295	null
2025-01-14	FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors	Yabo Zhang et.al.	2501.08225	link
2025-01-14	D $^2$ -DPM: Dual Denoising for Quantized Diffusion Probabilistic Models	Qian Zeng et.al.	2501.08180	link
2025-01-14	Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features	Evgenii Evstafev et.al.	2501.08170	null
2025-01-13	Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens	Dongwon Kim et.al.	2501.07730	null
2025-01-13	BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations	Weixi Feng et.al.	2501.07647	null
2025-01-13	Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss	Xinyu Zhang et.al.	2501.07563	null
2025-01-13	Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models	Yongyu Mu et.al.	2501.07086	link
2025-01-13	Enhancing Image Generation Fidelity via Progressive Prompts	Zhen Xiong et.al.	2501.07070	link
2025-01-13	Detection of AI Deepfake and Fraud in Online Payments Using GAN-Based Models	Zong Ke et.al.	2501.07033	null
2025-01-12	Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models	Michael Toker et.al.	2501.06751	null
2025-01-11	Denoising Diffusion Probabilistic Model for Radio Map Estimation in Generative Wireless Networks	Xuanhao Luo et.al.	2501.06604	null
2025-01-11	DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy	Wenshu Fan et.al.	2501.06533	link
2025-01-11	Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation	Xiaoying Xing et.al.	2501.06481	null
2025-01-11	Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning	Maomao Li et.al.	2501.06438	null
2025-01-10	MEt3R: Measuring Multi-View Consistency in Generated Images	Mohammad Asim et.al.	2501.06336	null
2025-01-10	Multi-subject Open-set Personalization in Video Generation	Tsai-Shien Chen et.al.	2501.06187	null
2025-01-10	VideoAuteur: Towards Long Narrative Video Generation	Junfei Xiao et.al.	2501.06173	null
2025-01-10	Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models	Sofia Jamil et.al.	2501.05839	link
2025-01-10	EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model	Yi He et.al.	2501.05710	null
2025-01-09	Consistent Flow Distillation for Text-to-3D Generation	Runjie Yan et.al.	2501.05445	null
2025-01-09	Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces	Aniruddha Mahapatra et.al.	2501.05442	null
2025-01-09	Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation	Xuyi Meng et.al.	2501.05427	null
2025-01-09	Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation	Darius Petermann et.al.	2501.05413	null
2025-01-09	CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models	Junha Park et.al.	2501.05359	null
2025-01-09	Patch-GAN Transfer Learning with Reconstructive Models for Cloud Removal	Wanli Ma et.al.	2501.05265	null
2025-01-09	3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering	Dewei Zhou et.al.	2501.05131	null
2025-01-08	EditAR: Unified Conditional Generation with Autoregressive Models	Jiteng Mu et.al.	2501.04699	null
2025-01-08	ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning	Yuzhou Huang et.al.	2501.04698	null
2025-01-08	On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis	Yekun Ke et.al.	2501.04377	null
2025-01-08	Circuit Complexity Bounds for Visual Autoregressive Model	Yekun Ke et.al.	2501.04299	null
2025-01-08	LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition	Bowen Hao et.al.	2501.04204	null
2025-01-07	HistoryPalette: Supporting Exploration and Reuse of Past Alternatives in Image Generation and Editing	Karim Benharrak et.al.	2501.04163	null
2025-01-07	Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers	Yuechen Zhang et.al.	2501.03931	link
2025-01-07	Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control	Zekai Gu et.al.	2501.03847	link
2025-01-07	Motion-Aware Generative Frame Interpolation	Guozhen Zhang et.al.	2501.03699	null
2025-01-08	Evaluating Image Caption via Cycle-consistent Text-to-Image Generation	Tianyu Cui et.al.	2501.03567	null
2025-01-07	PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models	Lingzhi Yuan et.al.	2501.03544	null
2025-01-07	Textualize Visual Prompt for Image Editing via Diffusion Bridge	Pengcheng Xu et.al.	2501.03495	null
2025-01-07	SceneBooth: Diffusion-based Framework for Subject-preserved Text-to-Image Generation	Shang Chai et.al.	2501.03490	null
2025-01-06	License Plate Images Generation with Diffusion Models	Mariia Shpir et.al.	2501.03374	null
2025-01-06	Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation	Guy Yariv et.al.	2501.03059	null
2025-01-06	TransPixar: Advancing Text-to-Video Generation with Transparency	Luozhou Wang et.al.	2501.03006	link
2025-01-06	Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising	Yunlong Yuan et.al.	2501.02741	null
2025-01-06	Artificial Intelligence in Creative Industries: Advances Prior to 2025	Nantheera Anantrasirichai et.al.	2501.02725	null
2025-01-05	GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking	Weikang Bian et.al.	2501.02690	null
2025-01-05	Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation	Dawei Dai et.al.	2501.02523	link
2025-01-05	ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling	Chaojie Mao et.al.	2501.02487	null
2025-01-05	MedSegDiffNCA: Diffusion Models With Neural Cellular Automata for Skin Lesion Segmentation	Avni Mittal et.al.	2501.02447	null
2025-01-04	Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey	Zongxia Li et.al.	2501.02189	link
2025-01-04	Generating Multimodal Images with GAN: Integrating Text, Image, and Style	Chaoyi Tan et.al.	2501.02167	null
2025-01-03	JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing	Qili Wang et.al.	2501.01798	link
2025-01-03	Controlling your Attributes in Voice	Xuyuan Li et.al.	2501.01674	null
2025-01-02	VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control	Yuanpeng Tu et.al.	2501.01427	null
2025-01-03	Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions	Xincheng Shuai et.al.	2501.01425	null
2025-01-02	Object-level Visual Prompts for Compositional Image Generation	Gaurav Parmar et.al.	2501.01424	null
2025-01-02	On Unifying Video Generation and Camera Pose Estimation	Chun-Hao Paul Huang et.al.	2501.01409	null
2025-01-02	ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer	Xuyin Qi et.al.	2501.01392	link
2025-01-02	Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement	Z. Zhang et.al.	2501.01368	null
2025-01-02	LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge	Kyoungkook Kang et.al.	2501.01197	null
2025-01-02	HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment	Zitong Xu et.al.	2501.01116	null
2025-01-02	EliGen: Entity-Level Controlled Image Generation with Regional Attention	Hong Zhang et.al.	2501.01097	link
2025-01-01	OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes	Sepehr Dehdashtian et.al.	2501.00962	null
2025-01-02	Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation	Yuanbo Yang et.al.	2412.21117	null
2024-12-30	Quantum Diffusion Model for Quark and Gluon Jet Generation	Mariia Baidachna et.al.	2412.21082	link
2024-12-30	Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model	Yifei Huang et.al.	2412.21080	link
2024-12-30	Varformer: Adapting VAR’s Generative Prior for Image Restoration	Siyang Wang et.al.	2412.21063	link
2024-12-30	VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation	Jiazheng Xu et.al.	2412.21059	link
2024-12-30	ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation	Ting Zhang et.al.	2412.20901	null
2024-12-30	VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control	Shaojin Wu et.al.	2412.20800	link
2024-12-30	Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling	Min Zhang et.al.	2412.20725	null
2024-12-30	HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images	Sungik Choi et.al.	2412.20704	null
2024-12-30	Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis	Yousef Yeganeh et.al.	2412.20651	null
2024-12-27	Generative Video Propagation	Shaoteng Liu et.al.	2412.19761	null
2024-12-27	VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models	Tao Wu et.al.	2412.19645	null
2024-12-27	P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision	Junjie Hu et.al.	2412.19533	null
2024-12-27	DrivingWorld: ConstructingWorld Model for Autonomous Driving via Video GPT	Xiaotao Hu et.al.	2412.19505	link
2024-12-27	Focusing Image Generation to Mitigate Spurious Correlations	Xuewei Li et.al.	2412.19457	null
2024-12-25	UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation	Lunhao Duan et.al.	2412.18928	null
2024-12-25	Accelerating Diffusion Transformers with Dual Feature Caching	Chang Zou et.al.	2412.18911	link
2024-12-25	DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering	Ruohong Yang et.al.	2412.18838	null
2024-12-25	DebiasDiff: Debiasing Text-to-image Diffusion Models with Self-discovering Latent Attribute Directions	Yilei Jiang et.al.	2412.18810	null
2024-12-25	Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation	Sen Peng et.al.	2412.18791	null
2024-12-24	DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers	Yuntao Chen et.al.	2412.18607	null
2024-12-24	ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation	Hongjie Li et.al.	2412.18600	null
2024-12-24	DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation	Minghong Cai et.al.	2412.18597	link
2024-12-24	Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models	Qice Qin et.al.	2412.18421	null
2024-12-24	Extract Free Dense Misalignment from CLIP	JeongYeon Nam et.al.	2412.18404	link
2024-12-24	TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization	Yucong Luo et.al.	2412.18185	null
2024-12-24	EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation	Shuhao Han et.al.	2412.18150	link
2024-12-24	Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction	Xiao Guo et.al.	2412.18149	null
2024-12-24	Ensuring Consistency for In-Image Translation	Chengpeng Fu et.al.	2412.18139	null
2024-12-23	Large Motion Video Autoencoding with Cross-modal Video VAE	Yazhou Xing et.al.	2412.17805	null
2024-12-23	VidTwin: Video VAE with Decoupled Structure and Dynamics	Yuchi Wang et.al.	2412.17726	link
2024-12-23	Personalized Large Vision-Language Models	Chau Pham et.al.	2412.17610	null
2024-12-23	FFA Sora, video generation as fundus fluorescein angiography simulator	Xinyuan Wu et.al.	2412.17346	null
2024-12-23	Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory	Xingyao Li et.al.	2412.17254	null
2024-12-23	Discriminative Image Generation with Diffusion Models for Zero-Shot Learning	Dingjie Fu et.al.	2412.17219	null
2024-12-22	Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching	Enshu Liu et.al.	2412.17153	link
2024-12-22	Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images	Dennis Menn et.al.	2412.17109	null
2024-12-22	DreamOmni: Unified Image Generation and Editing	Bin Xia et.al.	2412.17098	null
2024-12-22	SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults	Jinzhi Wang et.al.	2412.17077	null
2024-12-20	Personalized Representation from Personalized Generation	Shobhita Sundaram et.al.	2412.16156	link
2024-12-20	NeRF-To-Real Tester: Neural Radiance Fields as Test Image Generators for Vision of Autonomous Systems	Laura Weihl et.al.	2412.16141	null
2024-12-20	CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up	Songhua Liu et.al.	2412.16112	link
2024-12-20	SafeCFG: Redirecting Harmful Classifier-Free Guidance for Safe Generation	Jiadong Pan et.al.	2412.16039	null
2024-12-20	Semi-Supervised Adaptation of Diffusion Models for Handwritten Text Generation	Kai Brandenbusch et.al.	2412.15853	null
2024-12-20	DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization	Zihan Ding et.al.	2412.15689	null
2024-12-20	PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium	Xinzhe Li et.al.	2412.15674	link
2024-12-20	BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models	Yifei Sun et.al.	2412.15670	link
2024-12-20	CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training	Xiuli Bi et.al.	2412.15646	link
2024-12-20	Stylish and Functional: Guided Interpolation Subject to Physical Constraints	Yan-Ying Chen et.al.	2412.15507	null
2024-12-19	Flowing from Words to Pixels: A Framework for Cross-Modality Evolution	Qihao Liu et.al.	2412.15213	null
2024-12-19	FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching	Sucheng Ren et.al.	2412.15205	link
2024-12-19	AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation	Moayed Haji-Ali et.al.	2412.15191	null
2024-12-19	LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation	Weijia Shi et.al.	2412.15188	null
2024-12-19	Tiled Diffusion	Or Madar et.al.	2412.15185	null
2024-12-19	Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM	Yatai Ji et.al.	2412.15156	link
2024-12-19	Parallelized Autoregressive Visual Generation	Yuqing Wang et.al.	2412.15119	null
2024-12-19	DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space	Mang Ning et.al.	2412.15032	link
2024-12-19	Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations	Yucheng Hu et.al.	2412.14803	null
2024-12-19	Qua $^2$ SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models	Keith G. Mills et.al.	2412.14628	null
2024-12-18	E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling	Zhihang Yuan et.al.	2412.14170	null
2024-12-18	Autoregressive Video Generation without Vector Quantization	Haoge Deng et.al.	2412.14169	link
2024-12-18	FashionComposer: Compositional Fashion Image Generation	Sihui Ji et.al.	2412.14168	null
2024-12-18	VideoDPO: Omni-Preference Alignment for Video Diffusion Generation	Runtao Liu et.al.	2412.14167	null
2024-12-18	AKiRa: Augmentation Kit on Rays for optical video generation	Xi Wang et.al.	2412.14158	null
2024-12-18	SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical Video Generation	Tong Chen et.al.	2412.14018	null
2024-12-18	Text2Relight: Creative Portrait Relighting with Text Guidance	Junuk Cha et.al.	2412.13734	null
2024-12-18	Diffusion models and stochastic quantisation in lattice field theory	Gert Aarts et.al.	2412.13704	null
2024-12-18	MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing	Chuang Yang et.al.	2412.13684	null
2024-12-18	Self-control: A Better Conditional Mechanism for Masked Autoregressive Model	Qiaoying Qu et.al.	2412.13635	null
2024-12-17	MotionBridge: Dynamic Video Inbetweening with Flexible Controls	Maham Tanveer et.al.	2412.13190	null
2024-12-17	F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration	Lu Liu et.al.	2412.13155	null
2024-12-17	Prompt Augmentation for Self-supervised Text-guided Image Manipulation	Rumeysa Bodur et.al.	2412.13081	null
2024-12-17	VidTok: A Versatile and Open-Source Video Tokenizer	Anni Tang et.al.	2412.13061	link
2024-12-17	3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation	Haoshen Wang et.al.	2412.13059	null
2024-12-17	Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression	Ruijie Chen et.al.	2412.12982	null
2024-12-17	Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance	Wenhao Sun et.al.	2412.12974	link
2024-12-17	Unsupervised Region-Based Image Editing of Denoising Diffusion Models	Zixiang Li et.al.	2412.12912	null
2024-12-17	ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction	Zhongjie Duan et.al.	2412.12888	link
2024-12-17	Rethinking Diffusion-Based Image Generators for Fundus Fluorescein Angiography Synthesis on Limited Data	Chengzhou Yu et.al.	2412.12778	null
2024-12-16	Causal Diffusion Transformers for Generative Modeling	Chaorui Deng et.al.	2412.12095	link
2024-12-16	A LoRA is Worth a Thousand Pictures	Chenxi Liu et.al.	2412.12048	null
2024-12-16	InterDyn: Controllable Interactive Dynamics with Video Diffusion Models	Rick Akkerman et.al.	2412.11785	null
2024-12-16	Generative Inbetweening through Frame-wise Conditions-Driven Video Generation	Tianyi Zhu et.al.	2412.11755	link
2024-12-16	IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation	Yiren Song et.al.	2412.11638	null
2024-12-16	VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting	Muhammet Furkan Ilaslan et.al.	2412.11621	link
2024-12-16	3D $^2$ -Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling	Zichen Tang et.al.	2412.11599	link
2024-12-16	LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model	Xi Wang et.al.	2412.11519	null
2024-12-16	FedCAR: Cross-client Adaptive Re-weighting for Generative Models in Federated Learning	Minjun Kim et.al.	2412.11463	link
2024-12-16	Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models	Namhyuk Ahn et.al.	2412.11423	null
2024-12-13	OP-LoRA: The Blessing of Dimensionality	Piotr Teterwak et.al.	2412.10362	null
2024-12-13	TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation	Xingrui Wang et.al.	2412.10275	null
2024-12-13	Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark	Yudong Jiang et.al.	2412.10255	link
2024-12-13	Simple Guidance Mechanisms for Discrete Diffusion Models	Yair Schiff et.al.	2412.10193	link
2024-12-13	Financial Fine-tuning a Large Time Series Model	Xinghong Fu et.al.	2412.09880	link
2024-12-13	LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity	Hongjie Wang et.al.	2412.09856	null
2024-12-13	MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion	Xunnong Xu et.al.	2412.09828	null
2024-12-12	Human vs. AI: A Novel Benchmark and a Comparative Study on the Detection of Generated Images and the Impact of Prompts	Philipp Moeßner et.al.	2412.09715	link
2024-12-12	Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation	Chun-Mei Feng et.al.	2412.09706	link
2024-12-12	Doe-1: Closed-Loop Autonomous Driving with Large World Model	Wenzhao Zheng et.al.	2412.09627	link
2024-12-12	OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation	Weiqi Li et.al.	2412.09623	null
2024-12-12	LoRACLR: Contrastive Adaptation for Customization of Diffusion Models	Enis Simsar et.al.	2412.09622	null
2024-12-12	EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Zhuofan Zong et.al.	2412.09618	null
2024-12-12	FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers	Yusuf Dalva et.al.	2412.09611	null
2024-12-12	Spectral Image Tokenizer	Carlos Esteves et.al.	2412.09607	null
2024-12-12	Owl-1: Omni World Model for Consistent Long Video Generation	Yuanhui Huang et.al.	2412.09600	link
2024-12-12	LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors	Yabo Chen et.al.	2412.09597	null
2024-12-12	Video Creation by Demonstration	Yihong Sun et.al.	2412.09551	null
2024-12-12	UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer	Delong Liu et.al.	2412.09389	link
2024-12-11	Fast Prompt Alignment for Text-to-Image Generation	Khalil Mrini et.al.	2412.08639	link
2024-12-11	Multimodal Latent Language Modeling with Next-Token Diffusion	Yutao Sun et.al.	2412.08635	link
2024-12-11	LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations	Zejian Li et.al.	2412.08580	link
2024-12-11	Learning Flow Fields in Attention for Controllable Person Image Generation	Zijian Zhou et.al.	2412.08486	link
2024-12-11	InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models	Min Hou et.al.	2412.08480	link
2024-12-11	CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis	Mu Zhang et.al.	2412.08464	null
2024-12-11	Pysical Informed Driving World Model	Zhuoran Yang et.al.	2412.08410	null
2024-12-11	FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks	Chongkai Gao et.al.	2412.08261	null
2024-12-11	VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation	Zhiqiang Yuan et.al.	2412.08259	null
2024-12-11	Analyzing and Improving Model Collapse in Rectified Flow Models	Huminhao Zhu et.al.	2412.08175	null
2024-12-10	UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics	Xi Chen et.al.	2412.07774	null
2024-12-10	From Slow Bidirectional to Fast Causal Video Generators	Tianwei Yin et.al.	2412.07772	null
2024-12-10	SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints	Jianhong Bai et.al.	2412.07760	link
2024-12-10	3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation	Xiao Fu et.al.	2412.07759	null
2024-12-10	Multi-Shot Character Consistency for Text-to-Video Generation	Yuval Atzmon et.al.	2412.07750	null
2024-12-10	StyleMaster: Stylize Your Video with Artistic Generation and Translation	Zixuan Ye et.al.	2412.07744	null
2024-12-10	STIV: Scalable Text and Image Conditioned Video Generation	Zongyu Lin et.al.	2412.07730	null
2024-12-10	ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer	Jinyi Hu et.al.	2412.07720	link
2024-12-10	FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models	Tong Wu et.al.	2412.07674	null
2024-12-10	DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Jianzong Wu et.al.	2412.07589	null
2024-12-09	Visual Lexicon: Rich Image Features in Language Space	XuDong Wang et.al.	2412.06774	null
2024-12-09	Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty	Meera Hahn et.al.	2412.06771	link
2024-12-09	ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet	Andrei-Robert Alexandrescu et.al.	2412.06742	null
2024-12-09	EMOv2: Pushing 5M Vision Model Frontier	Jiangning Zhang et.al.	2412.06674	link
2024-12-09	ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Chunwei Wang et.al.	2412.06673	null
2024-12-09	Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion	Shuaiting Li et.al.	2412.06661	null
2024-12-09	Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment	Kim Sung-Bin et.al.	2412.06209	link
2024-12-09	ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance	Yuming Li et.al.	2412.06163	null
2024-12-09	Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters	Yuan Wang et.al.	2412.06143	link
2024-12-08	GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis	Ashish Goswami et.al.	2412.06089	null
2024-12-06	Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model	Lening Wang et.al.	2412.05280	link
2024-12-06	Mind the Time: Temporally-Controlled Multi-Event Video Generation	Ziyi Wu et.al.	2412.05263	null
2024-12-06	LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation	Donald Shenaj et.al.	2412.05148	link
2024-12-06	The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation	Ruoyu Wang et.al.	2412.05101	null
2024-12-06	Noise Matters: Diffusion Model-based Urban Mobility Generation with Collaborative Noise Priors	Yuheng Zhang et.al.	2412.05000	null
2024-12-06	Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction	Gaurav Shrivastava et.al.	2412.04929	null
2024-12-06	UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving	Rui Chen et.al.	2412.04842	link
2024-12-05	Hidden in the Noise: Two-Stage Robust Watermarking for Images	Kasra Arabi et.al.	2412.04653	link
2024-12-05	One Communication Round is All It Needs for Federated Fine-Tuning Foundation Models	Ziyao Wang et.al.	2412.04650	null
2024-12-05	Using Diffusion Priors for Video Amodal Segmentation	Kaihua Chen et.al.	2412.04623	null
2024-12-05	PaintScene4D: Consistent 4D Scene Generation from Text Prompts	Vinayak Gupta et.al.	2412.04471	null
2024-12-05	LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors	Yusuf Dalva et.al.	2412.04460	null
2024-12-05	MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation	Longtao Zheng et.al.	2412.04448	null
2024-12-05	DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models	Yizhuo Li et.al.	2412.04446	null
2024-12-05	GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration	Kaiyi Huang et.al.	2412.04440	null
2024-12-05	Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation	Yuying Ge et.al.	2412.04432	link
2024-12-05	The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation	Fredrik Carlsson et.al.	2412.04318	null
2024-12-05	T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts	Ziwei Huang et.al.	2412.04300	null
2024-12-05	Structure-Aware Stylized Image Synthesis for Robust Medical Image Segmentation	Jie Bao et.al.	2412.04296	link
2024-12-05	Instructional Video Generation	Yayuan Li et.al.	2412.04189	null
2024-12-04	Navigation World Models	Amir Bar et.al.	2412.03572	null
2024-12-04	MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation	Zehuan Huang et.al.	2412.03558	null
2024-12-04	Imagine360: Immersive 360 Video Generation from Perspective Anchor	Jing Tan et.al.	2412.03552	null
2024-12-04	Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention	Hannan Lu et.al.	2412.03520	null
2024-12-04	Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective	Neta Shaul et.al.	2412.03487	null
2024-12-04	SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model	Yan Li et.al.	2412.03430	null
2024-12-04	Skel3D: Skeleton Guided Novel View Synthesis	Aron Fóthi et.al.	2412.03407	null
2024-12-04	Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment	Feng He et.al.	2412.03400	null
2024-12-04	DIVE: Taming DINO for Subject-Driven Video Editing	Yi Huang et.al.	2412.03347	null
2024-12-04	DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Qingdong He et.al.	2412.03255	null
2024-12-03	Motion Prompting: Controlling Video Generation with Motion Trajectories	Daniel Geng et.al.	2412.02700	null
2024-12-03	Taming Scalable Visual Tokenizer for Autoregressive Image Generation	Fengyuan Shi et.al.	2412.02692	link
2024-12-03	FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation	Kefan Chen et.al.	2412.02690	null
2024-12-03	SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance	Viet Nguyen et.al.	2412.02687	null
2024-12-03	AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction	Lingteng Qiu et.al.	2412.02684	null
2024-12-03	Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback	Hiroki Furuta et.al.	2412.02617	null
2024-12-03	WEM-GAN: Wavelet transform based facial expression manipulation	Dongya Sun et.al.	2412.02530	null
2024-12-03	ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?	Leixin Zhang et.al.	2412.02368	link
2024-12-03	VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation	Mingzhe Zheng et.al.	2412.02259	link
2024-12-03	Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models	Jungwon Park et.al.	2412.02237	link
2024-11-29	JetFormer: An Autoregressive Generative Model of Raw Images and Text	Michael Tschannen et.al.	2411.19722	link
2024-11-29	Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing	Wenyi Mo et.al.	2411.19652	link
2024-11-29	QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain	Wenfang Sun et.al.	2411.19534	null
2024-11-29	Fleximo: Towards Flexible Text-to-Human Motion Video Generation	Yuhang Zhang et.al.	2411.19459	null
2024-11-29	Achromatic single-layer hologram	Zhi Li et.al.	2411.19445	null
2024-11-28	AMO Sampler: Enhancing Text Rendering with Overshooting	Xixi Hu et.al.	2411.19415	link
2024-11-28	DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models	Shwetha Ram et.al.	2411.19390	null
2024-11-28	Trajectory Attention for Fine-grained Video Motion Control	Zeqi Xiao et.al.	2411.19324	null
2024-11-28	Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention	Huiguo He et.al.	2411.19261	null
2024-11-28	SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation	Yuhan Pei et.al.	2411.19182	null
2024-11-27	Diffusion Self-Distillation for Zero-Shot Customized Image Generation	Shengqu Cai et.al.	2411.18616	null
2024-11-27	FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion	Haosen Yang et.al.	2411.18552	null
2024-11-27	Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models	Yiming Wu et.al.	2411.18375	null
2024-11-27	TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models	Riza Velioglu et.al.	2411.18350	link
2024-11-27	MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation	Haopeng Fang et.al.	2411.18281	null
2024-11-27	Prediction with Action: Visual Policy Learning via Joint Denoising Process	Yanjiang Guo et.al.	2411.18179	null
2024-11-27	Type-R: Automatically Retouching Typos for Text-to-Image Generation	Wataru Shimoda et.al.	2411.18159	null
2024-11-27	PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion	Gwanghyun Kim et.al.	2411.18068	null
2024-11-27	Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models	Shuyang Hao et.al.	2411.18000	null
2024-11-27	Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space	Lingxiao Li et.al.	2411.17784	null
2024-11-26	Accelerating Vision Diffusion Transformers with Skip Branches	Guanjie Chen et.al.	2411.17616	link
2024-11-26	IMPROVE: Improving Medical Plausibility without Reliance on HumanValidation – An Enhanced Prototype-Guided Diffusion Framework	Anurag Shandilya et.al.	2411.17535	null
2024-11-26	Identity-Preserving Text-to-Video Generation by Frequency Decomposition	Shenghai Yuan et.al.	2411.17440	link
2024-11-26	Image Generation with Multimodule Semantic Feature-Aided Selection for Semantic Communications	Chengyang Liang et.al.	2411.17428	null
2024-11-26	Cross-modal Medical Image Generation Based on Pyramid Convolutional Attention Network	Fuyou Mao et.al.	2411.17420	null
2024-11-26	AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation	Ziyi Xu et.al.	2411.17383	null
2024-11-26	Reward Incremental Learning in Text-to-Image Generation	Maorong Wang et.al.	2411.17310	null
2024-11-26	From Graph Diffusion to Graph Classification	Jia Jun Cheng Xian et.al.	2411.17236	null
2024-11-26	AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM	Jiarui Wang et.al.	2411.17221	link
2024-11-26	cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis	Paul Friedrich et.al.	2411.17203	link
2024-11-25	Factorized Visual Tokenization and Generation	Zechen Bai et.al.	2411.16681	null
2024-11-25	DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation	Zun Wang et.al.	2411.16657	null
2024-11-25	Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric	Zhichao Zhang et.al.	2411.16619	null
2024-11-25	Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing	Kaifeng Gao et.al.	2411.16375	link
2024-11-25	CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain	Jingchao Peng et.al.	2411.16327	null
2024-11-25	Image Generation Diversity Issues and How to Tame Them	Mischa Dombrowski et.al.	2411.16171	link
2024-11-25	Text-to-Image Synthesis: A Decade Survey	Nonghai Zhang et.al.	2411.16164	null
2024-11-25	Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models	Donggeun Ko et.al.	2411.16079	null
2024-11-25	Label-Free Intraoperative Mean-Transition-Time Image Generation Using Statistical Gating and Deep Learning	Yan Shi et.al.	2411.16039	null
2024-11-24	PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs	Teng Zhou et.al.	2411.15867	link
2024-11-22	VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement	Daeun Lee et.al.	2411.15115	null
2024-11-22	Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion	Samarth N Ramesh et.al.	2411.15113	null
2024-11-22	OminiControl: Minimal and Universal Control for Diffusion Transformer	Zhenxiong Tan et.al.	2411.15098	link
2024-11-22	Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation	Lakshmikar R. Polamreddy et.al.	2411.15084	link
2024-11-22	HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads	Yu Xu et.al.	2411.15034	null
2024-11-22	Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation	Dingyuan Shi et.al.	2411.14871	null
2024-11-22	Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation	Jeongsol Kim et.al.	2411.14863	null
2024-11-22	Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering	Haoyuan Li et.al.	2411.14816	null
2024-11-22	High-Resolution Image Synthesis via Next-Token Prediction	Dengsheng Chen et.al.	2411.14808	null
2024-11-22	FairAdapter: Detecting AI-generated Images with Improved Fairness	Feng Ding et.al.	2411.14755	link
2024-11-21	StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart	Jian Shi et.al.	2411.14295	link
2024-11-21	ComfyGI: Automatic Improvement of Image Generation Workflows	Dominik Sobania et.al.	2411.14193	null
2024-11-21	TaQ-DiT: Time-aware Quantization for Diffusion Transformers	Xinyan Liu et.al.	2411.14172	null
2024-11-21	MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective	Hailang Huang et.al.	2411.14062	link
2024-11-21	Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction	Jordan Vice et.al.	2411.13982	null
2024-11-21	On the Fairness, Diversity and Reliability of Text-to-Image Generative Models	Jordan Vice et.al.	2411.13981	null
2024-11-21	Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion	Jinhong He et.al.	2411.13961	link
2024-11-21	iHQGAN: A Lightweight Invertible Hybrid Quantum-Classical Generative Adversarial Network for Unsupervised Image-to-Image Translation	Xue Yang et.al.	2411.13920	link
2024-11-21	Dealing with Synthetic Data Contamination in Online Continual Learning	Maorong Wang et.al.	2411.13852	link
2024-11-21	Detecting Human Artifacts from Text-to-Image Models	Kaihong Wang et.al.	2411.13842	link
2024-11-20	REDUCIO! Generating 1024 $\times$ 1024 Video within 16 Seconds using Extremely Compressed Motion Latents	Rui Tian et.al.	2411.13552	link
2024-11-20	VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models	Ziqi Huang et.al.	2411.13503	link
2024-11-20	From Prompt Engineering to Prompt Craft	Joseph Lindley et.al.	2411.13422	null
2024-11-20	RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation	Christoph Reinders et.al.	2411.13150	link
2024-11-20	CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models	Naen Xu et.al.	2411.13144	null
2024-11-19	From Text to Pose to Image: Improving Diffusion Model Control and Quality	Clément Bonnett et.al.	2411.12872	link
2024-11-19	Towards motion from video diffusion models	Paul Janson et.al.	2411.12831	null
2024-11-19	Stylecodes: Encoding Stylistic Information For Image Generation	Ciara Rowles et.al.	2411.12811	link
2024-11-19	Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting	Haoyu Zhao et.al.	2411.12789	null
2024-11-19	PoM: Efficient Image and Video Generation with the Polynomial Mixer	David Picard et.al.	2411.12663	link
2024-11-19	Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models	Shuntaro Okada et.al.	2411.12188	null
2024-11-19	Enhancing Low Dose Computed Tomography Images Using Consistency Training Techniques	Mahmut S. Gokmen et.al.	2411.12181	null
2024-11-18	Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution	Brian B. Moser et.al.	2411.12072	link
2024-11-18	Medical Video Generation for Disease Progression Simulation	Xu Cao et.al.	2411.11943	null
2024-11-18	SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input	Zhen Lv et.al.	2411.11934	null
2024-11-18	Conceptwm: A Diffusion Model Watermark for Concept Protection	Liangqi Lei et.al.	2411.11688	null
2024-11-18	A Modular Open Source Framework for Genomic Variant Calling	Ankita Vaishnobi Bisoi et.al.	2411.11513	null
2024-11-19	SoK: On the Role and Future of AIGC Watermarking in the Era of Gen-AI	Kui Ren et.al.	2411.11478	null
2024-11-18	MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion	Dongseok Shim et.al.	2411.11475	null
2024-11-18	Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge	Qinglong Cao et.al.	2411.11343	null
2024-11-18	BeautyBank: Encoding Facial Makeup in Latent Space	Qianwen Lu et.al.	2411.11231	null
2024-11-17	Enhanced Anime Image Generation Using USE-CMHSA-GAN	J. Lu et.al.	2411.11179	null
2024-11-17	Time Step Generating: A Universal Synthesized Deepfake Image Detector	Ziyue Zeng et.al.	2411.11016	link
2024-11-17	SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration	Jintao Zhang et.al.	2411.10958	link
2024-11-16	ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models	Vipula Rawte et.al.	2411.10867	null
2024-11-15	M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation	Sucheng Ren et.al.	2411.10433	link
2024-11-15	Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding	Huming Qiu et.al.	2411.10329	null
2024-11-15	The Unreasonable Effectiveness of Guidance for Diffusion Models	Tim Kaiser et.al.	2411.10257	null
2024-11-15	Visual question answering based evaluation metrics for text-to-image generation	Mizuki Miyamoto et.al.	2411.10183	null
2024-11-15	CART: Compositional Auto-Regressive Transformer for Image Generation	Siddharth Roheda et.al.	2411.10180	null
2024-11-15	Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training	Myunsoo Kim et.al.	2411.09998	null
2024-11-15	Content-Aware Preserving Image Generation	Giang H. Le et.al.	2411.09871	null
2024-11-14	Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting	Yian Wang et.al.	2411.09823	null
2024-11-14	GAN-Based Architecture for Low-dose Computed Tomography Imaging Denoising	Yunuo Wang et.al.	2411.09512	null
2024-11-14	Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models	Chutian Meng et.al.	2411.09449	null
2024-11-14	Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance	Md Fahim Anjum et.al.	2411.09174	null
2024-11-14	VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation	Youpeng Wen et.al.	2411.09153	null
2024-11-13	A Survey on Vision Autoregressive Model	Kai Jiang et.al.	2411.08666	null
2024-11-13	Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models	Chengdong Dong et.al.	2411.08642	null
2024-11-13	I Can Embrace and Avoid Vagueness Myself: Supporting the Design Process by Balancing Vagueness through Text-to-Image Generative AI	Myungjin Kim et.al.	2411.08588	null
2024-11-13	EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation	Xiaofeng Wang et.al.	2411.08380	null
2024-11-13	Physics Informed Distillation for Diffusion Models	Joshua Tian Jin Tee et.al.	2411.08378	link
2024-11-13	Motion Control for Enhanced Complex Action Video Generation	Qiang Zhou et.al.	2411.08328	null
2024-11-12	Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing	Zitao Shuai et.al.	2411.08196	null
2024-11-12	TIPO: Text to Image with Text Presampling for Prompt Optimization	Shih-Ying Yeh et.al.	2411.08127	null
2024-11-12	Evaluating the Generation of Spatial Relations in Text and Image Generative Models	Shang Hong Sim et.al.	2411.07664	null
2024-11-12	Leveraging Previous Steps: A Training-free Fast Solver for Flow Diffusion	Kaiyu Song et.al.	2411.07627	null
2024-11-12	Artificial Intelligence for Biomedical Video Generation	Linyuan Li et.al.	2411.07619	null
2024-11-12	GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering	Zhihao Liang et.al.	2411.07478	null
2024-11-11	Exploring Variational Autoencoders for Medical Image Generation: A Comprehensive Study	Khadija Rais et.al.	2411.07348	null
2024-11-11	Learning from Limited and Imperfect Data	Harsh Rangwani et.al.	2411.07229	null
2024-11-11	DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID	Nyle Siddiqui et.al.	2411.07205	link
2024-11-11	More Expressive Attention with Negative Weights	Ang Lv et.al.	2411.07176	link
2024-11-11	Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models	NVIDIA et.al.	2411.07126	null
2024-11-11	Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models	Yanchen Wang et.al.	2411.07121	link
2024-11-11	Layout Control and Semantic Guidance with Attention Loss Backward for T2I Diffusion Model	Guandong Li et.al.	2411.06692	null
2024-11-11	SeedEdit: Align Image Re-Generation to Image Editing	Yichun Shi et.al.	2411.06686	null
2024-11-10	Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement	Zhennan Chen et.al.	2411.06558	link
2024-11-10	I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength	Wanquan Feng et.al.	2411.06525	null
2024-11-10	DDIM-Driven Coverless Steganography Scheme with Real Key	Mingyu Yu et.al.	2411.06486	null
2024-11-08	Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models	Jia-Hong Huang et.al.	2411.05706	null
2024-11-08	WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making	Zhilong Zhang et.al.	2411.05619	null
2024-11-08	A Nerf-Based Color Consistency Method for Remote Sensing Images	Zongcheng Zuo et.al.	2411.05557	null
2024-11-08	Improving image synthesis with diffusion-negative sampling	Alakh Desai et.al.	2411.05473	null
2024-11-07	Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model	Sheng Cheng et.al.	2411.05079	link
2024-11-07	Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models	Weixin Liang et.al.	2411.04996	null
2024-11-07	SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation	Koichi Namekata et.al.	2411.04989	null
2024-11-07	AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation	Anil Kag et.al.	2411.04967	null
2024-11-07	Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification	Mischa Dombrowski et.al.	2411.04956	null
2024-11-07	DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion	Wenqiang Sun et.al.	2411.04928	null
2024-11-07	StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration	Panwen Hu et.al.	2411.04925	null
2024-11-07	MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views	Yuedong Chen et.al.	2411.04924	link
2024-11-07	Taming Rectified Flow for Inversion and Editing	Jiangshan Wang et.al.	2411.04746	link
2024-11-07	DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning	Yuxuan Duan et.al.	2411.04571	link
2024-11-07	BendVLM: Test-Time Debiasing of Vision-Language Embeddings	Walter Gerych et.al.	2411.04420	link
2024-11-06	ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks	Ziji Shi et.al.	2411.03999	null
2024-11-06	Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation	Chihaya Matsuhira et.al.	2411.03595	null
2024-11-05	Enhancing Weakly Supervised Semantic Segmentation for Fibrosis via Controllable Image Generation	Zhiling Yue et.al.	2411.03551	null
2024-11-05	DiT4Edit: Diffusion Transformer for Image Editing	Kunyu Feng et.al.	2411.03286	null
2024-11-05	On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models	Tariq Berrada Ifriqi et.al.	2411.03177	null
2024-11-05	Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising	Tao Huang et.al.	2411.03053	null
2024-11-05	Textual Aesthetics in Large Language Models	Lingjie Jiang et.al.	2411.02930	link
2024-11-05	Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey	Ao Fu et.al.	2411.02914	null
2024-11-05	BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?	David Mayo et.al.	2411.02783	null
2024-11-04	TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives	Maitreya Patel et.al.	2411.02545	null
2024-11-04	Adaptive Caching for Faster Video Generation with Diffusion Transformers	Kumara Kahatapitiya et.al.	2411.02397	null
2024-11-04	Training-free Regional Prompting for Diffusion Transformers	Anthony Chen et.al.	2411.02395	link
2024-11-04	How Far is Video Generation from World Model: A Physical Law Perspective	Bingyi Kang et.al.	2411.02385	null
2024-11-04	Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models	Anjith George et.al.	2411.02188	null
2024-11-03	Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation	Zhenbin Wang et.al.	2411.01647	null
2024-11-03	DreamPolish: Domain Score Distillation With Progressive Geometry Generation	Yean Cheng et.al.	2411.01602	null
2024-11-03	Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach	Qihe Pan et.al.	2411.01545	link
2024-11-03	DPCL-Diff: The Temporal Knowledge Graph Reasoning based on Graph Node Diffusion Model with Dual-Domain Periodic Contrastive Learning	Yukun Cao et.al.	2411.01477	null
2024-11-02	Guided Synthesis of Labeled Brain MRI Data Using Latent Diffusion Models for Segmentation of Enlarged Ventricles	Tim Ruschke et.al.	2411.01351	null
2024-11-02	Fast and Memory-Efficient Video Diffusion Using Streamlined Inference	Zheng Zhan et.al.	2411.01171	link
2024-10-31	Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning	Penghui Ruan et.al.	2410.24219	link
2024-10-31	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	Xiang Deng et.al.	2410.23836	null
2024-11-01	In-Context LoRA for Diffusion Transformers	Lianghua Huang et.al.	2410.23775	link
2024-10-31	Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization	Xiao Guo et.al.	2410.23556	null
2024-10-30	MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts	Jie Zhu et.al.	2410.23332	null
2024-10-30	RelationBooth: Towards Relation-Aware Customized Object Generation	Qingyu Shi et.al.	2410.23280	null
2024-10-31	SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation	Yining Hong et.al.	2410.23277	null
2024-10-30	Multi-student Diffusion Distillation for Better One-step Generators	Yanke Song et.al.	2410.23274	null
2024-10-30	LumiSculpt: A Consistency Lighting Control Network for Video Generation	Yuxin Zhang et.al.	2410.22979	null
2024-10-30	Private Synthetic Text Generation with Diffusion Models	Sebastian Ochs et.al.	2410.22971	link
2024-10-30	An Individual Identity-Driven Framework for Animal Re-Identification	Yihao Wu et.al.	2410.22927	link
2024-10-30	HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models	Shengkai Zhang et.al.	2410.22901	link
2024-10-30	Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images	Hanlin Wu et.al.	2410.22830	link
2024-10-30	Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models	Arash Marioriyad et.al.	2410.22775	null
2024-10-30	Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots	Vincent Guan et.al.	2410.22729	link
2024-10-29	Investigating Memorization in Video Diffusion Models	Chen Chen et.al.	2410.21669	null
2024-10-29	Exploring Local Memorization in Diffusion Models via Bright Ending Attention	Chen Chen et.al.	2410.21665	null
2024-10-29	Fingerprints of Super Resolution Networks	Jeremy Vonderfecht et.al.	2410.21653	null
2024-10-28	Denoising Diffusion Planner: Learning Complex Paths from Low-Quality Demonstrations	Michiel Nikken et.al.	2410.21497	link
2024-10-28	LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior	Hanyu Wang et.al.	2410.21264	null
2024-10-28	Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences	Zhihao Zhao et.al.	2410.21130	null
2024-10-28	Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models	Wenda Li et.al.	2410.21088	link
2024-10-28	Markov spin models for image generation : explicit large deviations with respect to the number of pixels	Cecile Monthus et.al.	2410.20906	null
2024-10-28	*Diff-Instruct: Towards Human-Preferred One-step Text-to-image Generative Models**	Weijian Luo et.al.	2410.20898	link
2024-10-28	Murine AI excels at cats and cheese: Structural differences between human and mouse neurons and their implementation in generative AIs	Rino Saiga et.al.	2410.20735	null
2024-10-28	CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians	Chongjian Ge et.al.	2410.20723	null
2024-10-28	Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient	Yintai Ma et.al.	2410.20657	null
2024-10-27	Generator Matching: Generative modeling with arbitrary Markov processes	Peter Holderrieth et.al.	2410.20587	null
2024-10-27	ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation	Zongyi Li et.al.	2410.20502	null
2024-10-25	FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality	Zhengyao Lv et.al.	2410.19355	null
2024-10-25	High Resolution Seismic Waveform Generation using Denoising Diffusion	Andreas Bergmeister et.al.	2410.19343	null
2024-10-24	Framer: Interactive Frame Interpolation	Wen Wang et.al.	2410.18978	null
2024-10-24	Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences	Weijian Luo et.al.	2410.18881	null
2024-10-24	Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation	Xiaoyu Zhang et.al.	2410.18830	null
2024-10-24	Towards Visual Text Design Transfer Across Languages	Yejin Choi et.al.	2410.18823	null
2024-10-24	Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances	Shilin Lu et.al.	2410.18775	link
2024-10-24	Ali-AUG: Innovative Approaches to Labeled Data Augmentation using One-Step Diffusion Model	Ali Hamza et.al.	2410.18678	null
2024-10-24	FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation	Christopher T. H Teo et.al.	2410.18615	null
2024-10-24	FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling	Zhengqiang Zhang et.al.	2410.18410	link
2024-10-23	Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing	Dongliang Guo et.al.	2410.18267	null
2024-10-23	WorldSimBench: Towards Video Generation Models as World Simulators	Yiran Qin et.al.	2410.18072	null
2024-10-23	Scalable Ranked Preference Optimization for Text-to-Image Generation	Shyamgopal Karthik et.al.	2410.18013	null
2024-10-23	A Wavelet Diffusion GAN for Image Super-Resolution	Lorenzo Aloisi et.al.	2410.17966	null
2024-10-23	TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation	Ruicheng Zhang et.al.	2410.17855	null
2024-10-23	VISAGE: Video Synthesis using Action Graphs for Surgery	Yousef Yeganeh et.al.	2410.17751	null
2024-10-22	Offline Evaluation of Set-Based Text-to-Image Generation	Negar Arabzadeh et.al.	2410.17331	link
2024-10-22	Altogether: Image Captioning via Re-aligning Alt-text	Hu Xu et.al.	2410.17251	link
2024-10-22	IdenBAT: Disentangled Representation Learning for Identity-Preserved Brain Age Transformation	Junyeong Maeng et.al.	2410.16945	link
2024-10-22	DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization	Haowei Zhu et.al.	2410.16942	null
2024-10-22	Hierarchical Clustering for Conditional Diffusion in Image Generation	Jorge da Silva Goncalves et.al.	2410.16910	link
2024-10-22	MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model	Meng Xu et.al.	2410.16840	null
2024-10-22	Progressive Compositionality In Text-to-Image Generative Models	Xu Han et.al.	2410.16719	link
2024-10-22	Dual-Model Defense: Safeguarding Diffusion Models from Membership Inference Attacks through Disjoint Data Splitting	Bao Q. Tran et.al.	2410.16657	null
2024-10-21	MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors	Honghua Chen et.al.	2410.16272	null
2024-10-21	3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors	Xi Liu et.al.	2410.16266	null
2024-10-21	Elucidating the design space of language models for image generation	Xuantong Liu et.al.	2410.16257	link
2024-10-21	A Framework for Evaluating Predictive Models Using Synthetic Image Covariates and Longitudinal Data	Simon Deltadahl et.al.	2410.16177	null
2024-10-21	Continuous Speech Synthesis using per-token Latent Diffusion	Arnon Turetzky et.al.	2410.16048	null
2024-10-20	EVA: An Embodied World Model for Future Video Anticipation	Xiaowei Chi et.al.	2410.15461	null
2024-10-20	Allegro: Open the Black Box of Commercial-Level Video Generation Model	Yuan Zhou et.al.	2410.15458	link
2024-10-20	FrameBridge: Improving Image-to-Video Generation with Bridge Models	Yuji Wang et.al.	2410.15371	null
2024-10-19	SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning	Zhewei Dai et.al.	2410.14987	link
2024-10-19	Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence	Vansh Bansal et.al.	2410.14949	link
2024-10-18	BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities	Shaozhe Hao et.al.	2410.14672	link
2024-10-18	FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models	Rui Hu et.al.	2410.14429	null
2024-10-18	HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation	Bo Cheng et.al.	2410.14324	link
2024-10-18	HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects	Oliverio Theophilus Nathanael et.al.	2410.14265	null
2024-10-18	Text-to-Image Representativity Fairness Evaluation Framework	Asma Yamani et.al.	2410.14201	null
2024-10-18	Personalized Image Generation with Large Multimodal Models	Yiyan Xu et.al.	2410.14170	link
2024-10-18	Assessing Open-world Forgetting in Generative Image Model Customization	Héctor Laria et.al.	2410.14159	null
2024-10-17	Inference of morphology and dynamical state of nearby $Planck$ -SZ galaxy clusters with Zernike polynomials	Valentina Capalbo et.al.	2410.13929	null
2024-10-17	Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens	Lijie Fan et.al.	2410.13863	null
2024-10-17	PUMA: Empowering Unified MLLM with Multi-granular Visual Generation	Rongyao Fang et.al.	2410.13861	link
2024-10-17	VidPanos: Generative Panoramic Videos from Casual Panning Videos	Jingwei Ma et.al.	2410.13832	null
2024-10-17	DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control	Yujie Wei et.al.	2410.13830	null
2024-10-18	DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation	Hanbo Cheng et.al.	2410.13726	link
2024-10-17	Movie Gen: A Cast of Media Foundation Models	Adam Polyak et.al.	2410.13720	link
2024-10-17	LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning	Yiming Shi et.al.	2410.13618	link
2024-10-17	DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation	Guosheng Zhao et.al.	2410.13571	null
2024-10-17	MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models	Donghao Zhou et.al.	2410.13370	null
2024-10-18	Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model	Weiyi Zhang et.al.	2410.13242	null
2024-10-16	SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation	Jaehong Yoon et.al.	2410.12761	null
2024-10-16	3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation	Dewei Zhou et.al.	2410.12669	link
2024-10-16	Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation	Kathrin Khadra et.al.	2410.12542	null
2024-10-16	Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective	Yongxin Zhu et.al.	2410.12490	link
2024-10-16	Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks	Pranjali Pathre et.al.	2410.12432	link
2024-10-16	FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization	Cheng Yu et.al.	2410.12312	link
2024-10-16	Facing Identity: The Formation and Performance of Identity via Face-Based Artificial Intelligence Technologies	Wells Lucas Santo et.al.	2410.12148	null
2024-10-15	On the Effectiveness of Dataset Alignment for Fake Image Detection	Anirudh Sundara Rajan et.al.	2410.11835	null
2024-10-15	KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities	Hsin-Ping Huang et.al.	2410.11824	null
2024-10-16	Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices	Zhiyuan Ma et.al.	2410.11795	null
2024-10-15	Generative Image Steganography Based on Point Cloud	Zhong Yangjie et.al.	2410.11673	null
2024-10-15	InvSeg: Test-Time Prompt Inversion for Semantic Segmentation	Jiayi Lin et.al.	2410.11473	null
2024-10-15	A Simple Approach to Unifying Diffusion-based Conditional Generation	Xirui Li et.al.	2410.11439	null
2024-10-15	Evolutionary Retrofitting	Mathurin Videau et.al.	2410.11330	null
2024-10-15	Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling	Guiyu Zhang et.al.	2410.11236	null
2024-10-14	Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models	Jingzhi Bao et.al.	2410.10821	link
2024-10-14	When Does Perceptual Alignment Benefit Vision Representations?	Shobhita Sundaram et.al.	2410.10817	null
2024-10-14	LVD-2M: A Long-take Video Dataset with Temporally Dense Captions	Tianwei Xiong et.al.	2410.10816	link
2024-10-14	HART: Efficient Visual Generation with Hybrid Autoregressive Transformer	Haotian Tang et.al.	2410.10812	link
2024-10-14	Boosting Camera Motion Control for Video Diffusion Transformers	Soon Yau Cheong et.al.	2410.10802	null
2024-10-15	MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling	Jian Yang et.al.	2410.10798	null
2024-10-14	Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention	Dejia Xu et.al.	2410.10774	null
2024-10-14	DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships	Zhang Wan et.al.	2410.10751	null
2024-10-14	Evaluating SQL Understanding in Large Language Models	Ananya Rahaman et.al.	2410.10680	null
2024-10-14	ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection	Martin Aubard et.al.	2410.10554	link
2024-10-11	SceneCraft: Layout-Guided 3D Scene Generation	Xiuyu Yang et.al.	2410.09049	link
2024-10-11	MiRAGeNews: Multimodal Realistic AI-Generated News Detection	Runsheng Huang et.al.	2410.09045	link
2024-10-11	One-shot Generative Domain Adaptation in 3D GANs	Ziqiang Li et.al.	2410.08824	link
2024-10-11	Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting	Purushothaman Natarajan et.al.	2410.08612	link
2024-10-11	Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models	Pascl Zwick et.al.	2410.08551	link
2024-10-11	Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities	Abhijay Ghildyal et.al.	2410.08534	null
2024-10-11	Diffusion Models Need Visual Priors for Image Generation	Xiaoyu Yue et.al.	2410.08531	null
2024-10-10	Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis	Jinbin Bai et.al.	2410.08261	link
2024-10-10	Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content	Qiuheng Wang et.al.	2410.08260	null
2024-10-10	DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models	Xiaoxiao He et.al.	2410.08207	null
2024-10-10	Scaling Laws For Diffusion Transformers	Zhengyang Liang et.al.	2410.08184	null
2024-10-10	DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation	Jiatao Gu et.al.	2410.08159	null
2024-10-10	RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace	Pragyan Shrestha et.al.	2410.08152	link
2024-10-10	Progressive Autoregressive Video Diffusion Models	Desai Xie et.al.	2410.08151	link
2024-10-10	Generated Bias: Auditing Internal Bias Dynamics of Text-To-Image Generative Models	Abhishek Mandal et.al.	2410.07884	null
2024-10-10	MinorityPrompt: Text to Minority Image Generation via Prompt Optimization	Soobin Um et.al.	2410.07838	link
2024-10-10	HARIVO: Harnessing Text-to-Image Models for Video Generation	Mingi Kwon et.al.	2410.07763	null
2024-10-10	Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation	Jiahao Cui et.al.	2410.07718	link
2024-10-10	Relational Diffusion Distillation for Efficient Image Generation	Weilun Feng et.al.	2410.07679	link
2024-10-09	IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation	Xinchen Zhang et.al.	2410.07171	link
2024-10-09	Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis	Bohan Zeng et.al.	2410.07155	link
2024-10-10	EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models	Rui Zhao et.al.	2410.07133	link
2024-10-09	Personalized Visual Instruction Tuning	Renjie Pi et.al.	2410.07113	link
2024-10-09	Decouple-Then-Merge: Towards Better Training for Diffusion Models	Qianli Ma et.al.	2410.06664	null
2024-10-09	On the Solution of Linearized Inverse Scattering Problems in Near-Field Microwave Imaging by Operator Inversion and Matched Filtering	Matthias M. Saurer et.al.	2410.06465	null
2024-10-08	Story-Adapter: A Training-free Iterative Framework for Long Story Visualization	Jiawei Mao et.al.	2410.06244	null
2024-10-08	BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way	Jiazi Bu et.al.	2410.06241	null
2024-10-08	SD- $π$ XL: Generating Low-Resolution Quantized Imagery via Score Distillation	Alexandre Binninger et.al.	2410.06236	link
2024-10-08	GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation	Chi-Lam Cheang et.al.	2410.06158	null
2024-10-07	The Dawn of Video Generation: Preliminary Explorations with SORA-like Models	Ailing Zeng et.al.	2410.05227	null
2024-10-07	Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality	Ge Ya et.al.	2410.05203	link
2024-10-07	Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning	Ayano Hiranaka et.al.	2410.05116	null
2024-10-07	OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction	Leheng Li et.al.	2410.04932	null
2024-10-07	PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing	Feng Tian et.al.	2410.04844	link
2024-10-07	ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction	Hyungjin Chung et.al.	2410.04721	null
2024-10-06	Realizing Video Summarization from the Path of Language-based Semantic Understanding	Kuan-Chen Mu et.al.	2410.04511	null
2024-10-06	Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training	Wenbo Li et.al.	2410.04439	null
2024-10-06	Disentangling Regional Primitives for Image Generation	Zhengting Chen et.al.	2410.04421	null
2024-10-05	The Visualization JUDGE : Can Multimodal Foundation Models Guide Visualization Design Through Visual Perception?	Matthew Berger et.al.	2410.04280	null
2024-10-04	Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features	Benyuan Meng et.al.	2410.03558	link
2024-10-04	Dynamic Diffusion Transformer	Wangbo Zhao et.al.	2410.03456	link
2024-10-04	Images Speak Volumes: User-Centric Assessment of Image Generation for Accessible Communication	Miriam Anschütz et.al.	2410.03430	link
2024-10-04	LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding	Doohyuk Jang et.al.	2410.03355	null
2024-10-04	Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization	Zichen Miao et.al.	2410.03190	null
2024-10-04	Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach	Yaofang Liu et.al.	2410.03160	link
2024-10-04	ECHOPulse: ECG controlled echocardio-grams video generation	Yiwei Li et.al.	2410.03143	link
2024-10-03	Revealing the Unseen: Guiding Personalized Diffusion Models to Expose Training Data	Xiaoyu Wu et.al.	2410.03039	null
2024-10-03	Loong: Generating Minute-level Long Videos with Autoregressive Language Models	Yuqing Wang et.al.	2410.02757	null
2024-10-03	SteerDiff: Steering towards Safe Text-to-Image Diffusion Models	Hongxiang Zhang et.al.	2410.02710	null
2024-10-03	ControlAR: Controllable Image Generation with Autoregressive Models	Zongming Li et.al.	2410.02705	link
2024-10-03	Grounded Answers for Multi-agent Decision-making Problem through Generative World Model	Zeyang Liu et.al.	2410.02664	null
2024-10-03	Event-Customized Image Generation	Zhen Wang et.al.	2410.02483	null
2024-10-04	Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation	Muzhi Zhu et.al.	2410.02369	link
2024-10-03	SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration	Jintao Zhang et.al.	2410.02367	link
2024-10-03	Plug-and-Play Controllable Generation for Discrete Masked Models	Wei Guo et.al.	2410.02143	null
2024-10-02	EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing	Haotian Sun et.al.	2410.02098	null
2024-10-02	DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation	Jing He et.al.	2410.02067	null
2024-10-02	Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space	Yangming Li et.al.	2410.01796	null
2024-10-02	ImageFolder: Autoregressive Image Generation with Folded Tokens	Xiang Li et.al.	2410.01756	link
2024-10-02	ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation	Rinon Gal et.al.	2410.01731	null
2024-10-02	COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation	Mingzhen Sun et.al.	2410.01718	null
2024-10-02	Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding	Yao Teng et.al.	2410.01699	link
2024-10-02	Data Extrapolation for Text-to-image Generation on Small Datasets	Senmao Ye et.al.	2410.01638	link
2024-10-02	KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models	Pouyan Navard et.al.	2410.01595	link
2024-10-02	MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation	Mingzhen Sun et.al.	2410.01594	link
2024-10-02	Edge-preserving noise for diffusion models	Jente Vandersanden et.al.	2410.01540	null
2024-10-02	Aggregation of Multi Diffusion Models for Enhancing Learned Representations	Conghan Yue et.al.	2410.01262	link
2024-09-30	Inverse Painting: Reconstructing The Painting Process	Bowei Chen et.al.	2409.20556	null
2024-09-30	All-optical autoencoder machine learning framework using diffractive processors	Peijie Feng et.al.	2409.20346	null
2024-09-30	Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Zicheng Zhang et.al.	2409.20063	null
2024-09-30	Illustrious: an Open Advanced Illustration Model	Sang Hyun Park et.al.	2409.19946	null
2024-09-30	MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation	Wenchao Chen et.al.	2409.19937	null
2024-09-30	Replace Anyone in Videos	Xiang Wang et.al.	2409.19911	link
2024-09-29	OrganiQ: Mitigating Classical Resource Bottlenecks of Quantum Generative Adversarial Networks on NISQ-Era Machines	Daniel Silver et.al.	2409.19823	null
2024-09-29	Simple and Fast Distillation of Diffusion Models	Zhenyu Zhou et.al.	2409.19681	link
2024-09-29	Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection	Yuhang Ma et.al.	2409.19624	null
2024-09-29	Effective Diffusion Transformer Architecture for Image Super-Resolution	Kun Cheng et.al.	2409.19589	link
2024-09-27	PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation	Shaowei Liu et.al.	2409.18964	link
2024-09-27	Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions	Iskander Azangulov et.al.	2409.18804	null
2024-09-26	Realistic Evaluation of Model Merging for Compositional Generalization	Derek Tam et.al.	2409.18314	link
2024-09-26	Harnessing Wavelet Transformations for Generalizable Deepfake Forgery Detection	Lalith Bharadwaj Baru et.al.	2409.18301	link
2024-09-26	Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey	Yi Zhang et.al.	2409.18214	link
2024-09-26	FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner	Wenliang Zhao et.al.	2409.18128	link
2024-09-26	Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction	Jing He et.al.	2409.18124	null
2024-09-26	DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models	Helin Cao et.al.	2409.18092	null
2024-09-26	Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion	Hengrui Gu et.al.	2409.17928	link
2024-09-26	Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation	Qihan Huang et.al.	2409.17920	link
2024-09-26	Text Image Generation for Low-Resource Languages with Dual Translation Learning	Chihiro Noguchi et.al.	2409.17747	null
2024-09-26	AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status	Jinghao Zhang et.al.	2409.17740	null
2024-09-26	Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation	Huan Yang et.al.	2409.17674	null
2024-09-26	ID $^3$ : Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition	Shen Li et.al.	2409.17576	null
2024-09-26	Pixel-Space Post-Training of Latent Diffusion Models	Christina Zhang et.al.	2409.17565	null
2024-09-25	GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design	Phillip Mueller et.al.	2409.17045	null
2024-09-25	Pose-Guided Fine-Grained Sign Language Video Generation	Tongkai Shi et.al.	2409.16709	null
2024-09-25	Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation	Youngwan Jin et.al.	2409.16706	link
2024-09-25	Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement	Yihao Zhou et.al.	2409.16661	null
2024-09-25	ECG-Image-Database: A Dataset of ECG Images with Real-World Imaging and Scanning Artifacts; A Foundation for Computerized ECG Image Digitization and Analysis	Matthew A. Reyna et.al.	2409.16612	link
2024-09-24	Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation	Homanga Bharadhwaj et.al.	2409.16283	null
2024-09-24	MonoFormer: One Transformer for Both Diffusion and Autoregression	Chuyang Zhao et.al.	2409.16280	link
2024-09-24	Label-Augmented Dataset Distillation	Seoungyoon Kang et.al.	2409.16239	null
2024-09-24	MaskBit: Embedding-free Image Generation via Bit Tokens	Mark Weber et.al.	2409.16211	link
2024-09-26	Enhanced Unsupervised Image-to-Image Translation Using Contrastive Learning and Histogram of Oriented Gradients	Wanchen Zhao et.al.	2409.16042	null
2024-09-24	Deep chroma compression of tone-mapped images	Xenios Milidonis et.al.	2409.16032	link
2024-09-24	Improvements to SDXL in NovelAI Diffusion V3	Juan Ossa et.al.	2409.15997	null
2024-09-23	Critic Loss for Image Classification	Brendan Hogan Rappazzo et.al.	2409.15565	null
2024-09-23	Bayesian computation with generative diffusion models by Multilevel Monte Carlo	Abdul-Lateef Haji-Ali et.al.	2409.15511	link
2024-09-23	Revealing an Unattractivity Bias in Mental Reconstruction of Occluded Faces using Generative Image Models	Frederik Riedmann et.al.	2409.15443	null
2024-09-18	Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance	Jaehoon Joo et.al.	2409.12099	null
2024-09-18	ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation	Peiyu Li et.al.	2409.12010	link
2024-09-18	Tracking Any Point with Frame-Event Fusion Network at High Frame Rate	Jiaxiong Liu et.al.	2409.11953	null
2024-09-18	Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation	Dimitrios Christodoulou et.al.	2409.11904	null
2024-09-18	RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and Blankets	Jikai Ye et.al.	2409.11831	null
2024-09-18	GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation	Shuowen Liang et.al.	2409.11689	link
2024-09-17	Using Physics Informed Generative Adversarial Networks to Model 3D porous media	Zihan Ren et.al.	2409.11541	null
2024-09-17	OSV: One Step is Enough for High-Quality Image to Video Generation	Xiaofeng Mao et.al.	2409.11367	null
2024-09-17	Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think	Gonzalo Martin Garcia et.al.	2409.11355	link
2024-09-17	OmniGen: Unified Image Generation	Shitao Xiao et.al.	2409.11340	link
2024-09-18	The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives	Samee Arif et.al.	2409.11261	link
2024-09-17	Improving the Efficiency of Visually Augmented Language Models	Paula Ontalvilla et.al.	2409.11148	link
2024-09-17	MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance	Debin Meng et.al.	2409.11010	link
2024-09-16	Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models	Bingchen Liu et.al.	2409.10695	null
2024-09-16	SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing	Qi Qian et.al.	2409.10476	null
2024-09-16	VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation	Aaron Mark Thomas et.al.	2409.10339	null
2024-09-16	On Synthetic Texture Datasets: Challenges, Creation, and Curation	Blaine Hoak et.al.	2409.10297	null
2024-09-16	Embodiment-Agnostic Action Planning via Object-Part Scene Flow	Weiliang Tang et.al.	2409.10032	null
2024-09-15	GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion	Vitor Guizilini et.al.	2409.09896	null
2024-09-15	Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$ -divergence Minimization	Haoyuan Sun et.al.	2409.09774	null
2024-09-15	MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection	Yaning Zhang et.al.	2409.09724	link
2024-09-15	Finetuning CLIP to Reason about Pairwise Differences	Dylan Sam et.al.	2409.09721	link
2024-09-15	E-Commerce Inpainting with Mask Guidance in Controlnet for Reducing Overcompletion	Guandong Li et.al.	2409.09681	null
2024-09-13	InstantDrag: Improving Interactivity in Drag-based Image Editing	Joonghyuk Shin et.al.	2409.08857	null
2024-09-13	STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment	Yong Ren et.al.	2409.08601	null
2024-09-13	Enhancing Privacy in ControlNet and Stable Diffusion via Split Learning	Dixi Yao et.al.	2409.08503	null
2024-09-12	Click2Mask: Local Editing with Dynamic Mask Generation	Omer Regev et.al.	2409.08272	link
2024-09-12	TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder	NaHyeon Park et.al.	2409.08248	link
2024-09-12	IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation	Yinwei Wu et.al.	2409.08240	null
2024-09-12	High-Frequency Anti-DreamBooth: Robust Defense Against Image Synthesis	Takuto Onikubo et.al.	2409.08167	link
2024-09-12	EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance	Zicheng Duan et.al.	2409.08091	link
2024-09-12	Scribble-Guided Diffusion for Training-free Text-to-Image Generation	Seonho Lee et.al.	2409.08026	link
2024-09-11	DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures	Steven Hogue et.al.	2409.07649	null
2024-09-11	Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models	Haibo Yang et.al.	2409.07452	link
2024-09-11	FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process	Yang Luo et.al.	2409.07451	null
2024-09-11	Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy	Somayeh Pakdelmoez et.al.	2409.07422	null
2024-09-11	EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion	Jian Zhang et.al.	2409.07255	link
2024-09-11	Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education	Ali Forootani et.al.	2409.07110	link
2024-09-10	DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images	Taslim Murad et.al.	2409.06694	null
2024-09-10	SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation	Teng Hu et.al.	2409.06633	null
2024-09-10	PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation	Ginger Delmas et.al.	2409.06535	null
2024-09-10	DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement	Jia-Wei Liao et.al.	2409.06355	null
2024-09-10	G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer	Jinzhi Zhang et.al.	2409.06322	null
2024-09-11	MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control	Yining Yao et.al.	2409.06189	null
2024-09-09	SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values	Chengwei Sun et.al.	2409.05926	null
2024-09-11	DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation	Wei Wu et.al.	2409.05463	null
2024-09-09	CipherDM: Secure Three-Party Inference for Diffusion Model Sampling	Xin Zhao et.al.	2409.05414	null
2024-09-09	TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors	Yichuan Mo et.al.	2409.05294	link
2024-09-08	Can OOD Object Detectors Learn from Foundation Models?	Jiahui Liu et.al.	2409.05162	link
2024-09-07	Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation	Jiaxin Cheng et.al.	2409.04847	link
2024-09-07	SpotActor: Training-Free Layout-Controlled Consistent Image Generation	Jiahao Wang et.al.	2409.04801	null
2024-09-07	Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis	Arjun Krishna et.al.	2409.04670	null
2024-09-06	VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation	Yecheng Wu et.al.	2409.04429	link
2024-09-06	Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation	Zhuoyan Luo et.al.	2409.04410	link
2024-09-06	Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks	Hangcheng Cao et.al.	2409.04133	null
2024-09-06	Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task	Jing Wang et.al.	2409.04005	link
2024-09-06	DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes	Jianbiao Mei et.al.	2409.04003	link
2024-09-05	ArtiFade: Learning to Generate High-quality Subject from Blemished Images	Shuya Yang et.al.	2409.03745	null
2024-09-05	Blended Latent Diffusion under Attention Control for Real-World Video Editing	Deyin Liu et.al.	2409.03514	null
2024-09-05	Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks	Akshay Jain et.al.	2409.03458	link
2024-09-05	Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities	Wei Lu et.al.	2409.03444	link
2024-09-09	RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning	Lawrence Yunliang Chen et.al.	2409.03403	null
2024-09-05	Enhancing digital core image resolution using optimal upscaling algorithm: with application to paired SEM images	Shaohua You et.al.	2409.03265	null
2024-09-06	HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts	Xinyu Liu et.al.	2409.02919	link
2024-09-04	PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation	Jun Ling et.al.	2409.02657	null
2024-09-04	Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects	Kyungmin Jo et.al.	2409.02653	null
2024-09-05	Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency	Jianwen Jiang et.al.	2409.02634	null
2024-09-04	StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models	Wen Li et.al.	2409.02543	link
2024-09-04	A Learnable Color Correction Matrix for RAW Reconstruction	Anqi Liu et.al.	2409.02497	null
2024-09-04	Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing	Siyi Chen et.al.	2409.02374	link
2024-09-03	QID $^2$ : An Image-Conditioned Diffusion Model for Q-space Up-sampling of DWI Data	Zijian Chen et.al.	2409.02309	null
2024-09-03	DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos	Wenbo Hu et.al.	2409.02095	link
2024-09-03	Probing Noncentrosymmetric 2D Materials by Fourier Space Second Harmonic Imaging	Lucas Lafeta et.al.	2409.02071	null
2024-08-30	CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion	Yiran Chen et.al.	2408.17424	null
2024-08-30	Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution	Yixin Wu et.al.	2408.17285	null
2024-08-30	VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers	Juncan Deng et.al.	2408.17131	null
2024-08-30	FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition	Chen Hu et.al.	2408.17090	link
2024-08-30	Text-to-Image Generation Via Energy-Based CLIP	Roy Ganz et.al.	2408.17046	null
2024-08-30	AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	Yonghui Wang et.al.	2408.16986	link
2024-08-30	Contrastive Learning with Synthetic Positives	Dewen Zeng et.al.	2408.16965	link
2024-08-29	STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models	Koushik Srivatsan et.al.	2408.16807	link
2024-09-04	CSGO: Content-Style Composition in Text-to-Image Generation	Peng Xing et.al.	2408.16766	null
2024-08-29	One-Shot Learning Meets Depth Diffusion in Multi-Object Videos	Anisha Jain et.al.	2408.16704	null
2024-08-29	GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models	Moreno D’Incà et.al.	2408.16700	link
2024-08-29	DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	Yongjie Fu et.al.	2408.16647	null
2024-08-29	RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model	Zhuan Shi et.al.	2408.16634	null
2024-08-29	GRPose: Learning Graph Relations for Human Image Generation with Pose Priors	Xiangchen Yin et.al.	2408.16540	link
2024-08-29	Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation	Xiaoyu Jin et.al.	2408.16506	null
2024-08-29	Spiking Diffusion Models	Jiahang Cao et.al.	2408.16467	link
2024-08-29	ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding	Minghang Zheng et.al.	2408.16314	link
2024-08-29	Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation	Yanghao Wang et.al.	2408.16266	link
2024-08-28	Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data	Ayodeji Ijishakin et.al.	2408.15890	null
2024-08-28	GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model	Yongjie Fu et.al.	2408.15868	null
2024-08-28	Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas	Fabio Quattrini et.al.	2408.15660	link
2024-08-28	Hand1000: Generating Realistic Hands from Text with Only 1,000 Images	Haozhuo Zhang et.al.	2408.15461	null
2024-08-28	Avoiding Generative Model Writer’s Block With Embedding Nudging	Ali Zand et.al.	2408.15450	null
2024-08-27	GenRec: Unifying Video Generation and Recognition with Diffusion Models	Zejia Weng et.al.	2408.15241	link
2024-08-27	Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance	Weiyi Zhang et.al.	2408.15217	link
2024-08-27	Alfie: Democratising RGBA Image Generation With No $$$	Fabio Quattrini et.al.	2408.14826	link
2024-08-27	Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation	Abdelrahman Eldesokey et.al.	2408.14819	null
2024-08-27	CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis	Weijia Li et.al.	2408.14765	null
2024-08-27	Sequential-Scanning Dual-Energy CT Imaging Using High Temporal Resolution Image Reconstruction and Error-Compensated Material Basis Image Generation	Qiaoxin Li et.al.	2408.14754	null
2024-08-27	Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation	Bochao Liu et.al.	2408.14738	null
2024-08-26	DIAGen: Diverse Image Augmentation with Generative Models	Tobias Lingenberg et.al.	2408.14584	link
2024-08-26	GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy	Peiyan Li et.al.	2408.14368	link
2024-08-26	ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty	Xindi Wu et.al.	2408.14339	null
2024-08-26	Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models	Chaohua Shi et.al.	2408.14135	null
2024-08-26	SurGen: Text-Guided Diffusion Model for Surgical Video Generation	Joseph Cho et.al.	2408.14028	null
2024-08-27	RT-Attack: Jailbreaking Text-to-Image Models via Random Token	Sensen Gao et.al.	2408.13896	null
2024-08-25	Prior Learning in Introspective VAEs	Ioannis Athanasiadis et.al.	2408.13805	null
2024-08-25	SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting	Wenrui Li et.al.	2408.13711	link
2024-08-27	Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing	Yitong Yang et.al.	2408.13623	null
2024-08-24	DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation	Ying Jin et.al.	2408.13509	link
2024-08-24	Explainable Concept Generation through Vision-Language Preference Learning	Aditya Taparia et.al.	2408.13438	null
2024-08-23	CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities	Tao Wu et.al.	2408.13239	link
2024-08-23	Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation	Bonan Li et.al.	2408.13149	null
2024-08-23	G3FA: Geometry-guided GAN for Face Animation	Alireza Javanmardi et.al.	2408.13049	null
2024-08-23	EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation	Cong Wang et.al.	2408.13005	null
2024-08-22	Unlocking Intrinsic Fairness in Stable Diffusion	Eunji Kim et.al.	2408.12692	null
2024-08-22	xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations	Can Qin et.al.	2408.12590	null
2024-08-22	Real-Time Video Generation with Pyramid Attention Broadcast	Xuanlei Zhao et.al.	2408.12588	link
2024-08-25	Show-o: One Single Transformer to Unify Multimodal Understanding and Generation	Jinheng Xie et.al.	2408.12528	null
2024-08-22	CODE: Confident Ordinary Differential Editing	Bastien van Delft et.al.	2408.12418	link
2024-08-22	Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce	Ádám Tibor Czapp et.al.	2408.12392	null
2024-08-22	Scalable Autoregressive Image Generation with Mamba	Haopeng Li et.al.	2408.12245	link
2024-08-22	MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient	Yanzeng Li et.al.	2408.12236	null
2024-08-22	BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking	Hanzheng Wang et.al.	2408.12232	null
2024-08-22	DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models	Wuchao Li et.al.	2408.12153	null
2024-08-21	Approaching Deep Learning through the Spectral Dynamics of Weights	David Yunis et.al.	2408.11804	link
2024-08-21	DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework	Zhifei Xie et.al.	2408.11788	null
2024-08-21	Iterative Object Count Optimization for Text-to-image Diffusion Models	Oz Zafar et.al.	2408.11721	null
2024-08-21	FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting	Liyao Jiang et.al.	2408.11706	null
2024-08-21	TrackGo: A Flexible and Efficient Method for Controllable Video Generation	Haitao Zhou et.al.	2408.11475	null
2024-08-21	Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection	Jingwei Sun et.al.	2408.11408	link
2024-08-21	Gender Bias Evaluation in Text-to-image Generation: A Survey	Yankun Wu et.al.	2408.11358	null
2024-08-21	UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation	Xiangyu Zhao et.al.	2408.11305	link
2024-08-20	Compress Guidance in Conditional Diffusion Sampling	Anh-Dung Dinh et.al.	2408.11194	null
2024-08-20	MS $^3$ D: A RG Flow-Based Regularization for GAN Training with Limited Data	Jian Wang et.al.	2408.11135	null
2024-08-20	MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning	Haoning Wu et.al.	2408.11001	link
2024-08-20	A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse	Zhongliang Guo et.al.	2408.10901	null
2024-08-21	MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration	Yanbo Ding et.al.	2408.10605	link
2024-08-20	Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models	Cong Wan et.al.	2408.10571	link
2024-08-19	Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation	Liu He et.al.	2408.10453	null
2024-08-19	The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks	Niyar R Barman et.al.	2408.10446	null
2024-08-19	Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data	Tao Yang et.al.	2408.10119	null
2024-08-19	Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation	Yunxin Li et.al.	2408.09787	link
2024-08-19	TraDiffusion: Trajectory-Based Training-Free Image Generation	Mingrui Wu et.al.	2408.09739	link
2024-08-21	Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning	Zhi Qiao et.al.	2408.09731	null
2024-08-18	AnomalyFactory: Regard Anomaly Generation as Unsupervised Anomaly Localization	Ying Zhao et.al.	2408.09533	null
2024-08-18	Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned Pairs	Bowen Xin et.al.	2408.09432	null
2024-08-18	SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama	Jing Tang et.al.	2408.09333	link
2024-08-16	PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future	Guangyi Wang et.al.	2408.08822	link
2024-08-16	An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation	Peiming Guo et.al.	2408.08650	link
2024-08-16	Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness	Hefei Mei et.al.	2408.08502	link
2024-08-15	JPEG-LM: LLMs as Image Generators with Canonical Codec Representations	Xiaochuang Han et.al.	2408.08459	null
2024-08-15	METR: Image Watermarking with Large Number of Unique Messages	Alexander Varlamov et.al.	2408.08340	link
2024-08-15	Can Large Language Models Understand Symbolic Graphics Programs?	Zeju Qiu et.al.	2408.08313	null
2024-08-15	Accelerated Image-Aware Generative Diffusion Modeling	Tanmay Asthana et.al.	2408.08306	null
2024-08-15	Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding	Xiner Li et.al.	2408.08252	link
2024-08-16	FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance	Jiasong Feng et.al.	2408.08189	null
2024-08-15	When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding	Pingping Zhang et.al.	2408.08093	null
2024-08-15	A Novel Generative Artificial Intelligence Method for Interference Study on Multiplex Brightfield Immunohistochemistry Images	Satarupa Mukherjee et.al.	2408.07860	null
2024-08-14	Boosting Unconstrained Face Recognition with Targeted Style Adversary	Mohammad Saeed Ebrahimi Saadabadi et.al.	2408.07642	null
2024-08-14	Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving	Yuqing Wen et.al.	2408.07605	null
2024-08-14	KIND: Knowledge Integration and Diversion in Diffusion Models	Yucheng Xie et.al.	2408.07337	link
2024-08-13	Generative Photomontage	Sean J. Liu et.al.	2408.07116	null
2024-08-13	Definition of multispectral camera system parameters to model the asteroid 2001 SN263	Gabriela de Carvalho Assis Goulart et.al.	2408.06886	null
2024-08-13	Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective	Ouxiang Li et.al.	2408.06741	link
2024-08-13	DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion	Yujia Wu et.al.	2408.06740	null
2024-08-13	DiffSG: A Generative Solver for Network Optimization with Diffusion Model	Ruihuai Liang et.al.	2408.06701	link
2024-08-12	Prompt Recovery for Image Generation Models: A Comparative Study of Discrete Optimizers	Joshua Nathaniel Williams et.al.	2408.06502	null
2024-08-15	ControlNeXt: Powerful and Efficient Control for Image and Video Generation	Bohao Peng et.al.	2408.06070	link
2024-08-10	ZePo: Zero-Shot Portrait Stylization with Faster Sampling	Jin Liu et.al.	2408.05492	link
2024-08-10	Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE	Yiying Yang et.al.	2408.05477	null
2024-08-10	Artworks Reimagined: Exploring Human-AI Co-Creation through Body Prompting	Jonas Oppenlaender et.al.	2408.05476	null
2024-08-10	High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model	Weizhi Zhong et.al.	2408.05416	null
2024-08-09	Instruction Tuning-free Visual Token Complement for Multimodal LLMs	Dongsheng Wang et.al.	2408.05019	null
2024-08-09	DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting	Jihoon Lee et.al.	2408.04962	null
2024-08-08	Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets	Hao Li et.al.	2408.04777	null
2024-08-08	Zero-Shot Uncertainty Quantification using Diffusion Probabilistic Models	Dule Shu et.al.	2408.04718	null
2024-08-08	Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics	Ruining Li et.al.	2408.04631	null
2024-08-07	ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling	William Y. Zhu et.al.	2408.04102	link
2024-08-07	Counterfactuals and Uncertainty-Based Explainable Paradigm for the Automated Detection and Segmentation of Renal Cysts in Computed Tomography Images: A Multi-Center Study	Zohaib Salahuddin et.al.	2408.03789	null
2024-08-07	Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model	Guoqing Zhu et.al.	2408.03748	link
2024-08-07	Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling	Zilyu Ye et.al.	2408.03695	link
2024-08-07	A comparative study of generative adversarial networks for image recognition algorithms based on deep learning and traditional methods	Yihao Zhong et.al.	2408.03568	null
2024-08-06	Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey	Vu Tuan Truong et.al.	2408.03400	null
2024-08-06	IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts	Ciara Rowles et.al.	2408.03209	null
2024-08-06	An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion	Xingguang Yan et.al.	2408.03178	null
2024-08-06	Iterative CT Reconstruction via Latent Variable Optimization of Shallow Diffusion Models	Sho Ozaki et.al.	2408.03156	null
2024-08-06	Multitask and Multimodal Neural Tuning for Large Models	Hao Sun et.al.	2408.03001	null
2024-08-06	DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model	Yiming Zhong et.al.	2408.02993	link
2024-08-05	Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services	Shaopeng Fu et.al.	2408.02814	link
2024-08-05	Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining	Dongyang Liu et.al.	2408.02657	link
2024-08-05	VidGen-1M: A Large-Scale Dataset for Text-to-video Generation	Zhiyu Tan et.al.	2408.02629	null
2024-08-06	ProCreate, Don’t Reproduce! Propulsive Energy Diffusion for Creative Generation	Jack Lu et.al.	2408.02226	link
2024-08-04	PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance	Aoming Liu et.al.	2408.02157	null
2024-08-04	LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation	Dwij Mehta et.al.	2408.02078	null
2024-08-04	Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation	Jean Yu et.al.	2408.02054	null
2024-08-04	Robustness of Watermarking on Text-to-Image Diffusion Models	Xiaodong Wu et.al.	2408.02035	null
2024-08-03	SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm	Junyan Ye et.al.	2408.01812	null
2024-08-03	A Novel Evaluation Framework for Image2Text Generation	Jia-Hong Huang et.al.	2408.01723	null
2024-08-03	Controllable Unlearning for Image-to-Image Generative Models via $\varepsilon$ -Constrained Optimization	Xiaohua Feng et.al.	2408.01689	null
2024-08-02	VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling	Qian Zhang et.al.	2408.01181	link
2024-08-02	PINNs for Medical Image Analysis: A Survey	Chayan Banerjee et.al.	2408.01026	null
2024-08-02	EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts	Die Chen et.al.	2408.01014	null
2024-08-02	FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation	Xiang Gao et.al.	2408.00998	link
2024-08-01	Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention	Susung Hong et.al.	2408.00760	link
2024-08-01	Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function	Matias Oscar Volman Stern et.al.	2408.00707	null
2024-08-01	A new approach for encoding code and assisting code understanding	Mengdan Fan et.al.	2408.00521	null
2024-08-01	Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion	Manuel Kansy et.al.	2408.00458	null
2024-08-01	Towards Reliable Advertising Image Generation Using Human Feedback	Zhenbang Du et.al.	2408.00418	link
2024-08-01	DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving	Xuemeng Yang et.al.	2408.00415	null
2024-08-01	On the Limitations and Prospects of Machine Unlearning for Generative AI	Shiji Zhou et.al.	2408.00376	null
2024-08-01	Few-shot Defect Image Generation based on Consistency Modeling	Qingfeng Shi et.al.	2408.00372	link
2024-08-01	Navigating Text-to-Image Generative Bias across Indic Languages	Surbhi Mittal et.al.	2408.00283	null
2024-07-31	WAS: Dataset and Methods for Artistic Text Segmentation	Xudong Xie et.al.	2408.00106	link
2024-07-31	Detecting, Explaining, and Mitigating Memorization in Diffusion Models	Yuxin Wen et.al.	2407.21720	link
2024-07-31	Tora: Trajectory-oriented Diffusion Transformer for Video Generation	Zhenghao Zhang et.al.	2407.21705	link
2024-07-31	Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation	Junxuan Yu et.al.	2407.21490	null
2024-07-31	Fine-gained Zero-shot Video Sampling	Dengsheng Chen et.al.	2407.21475	null
2024-07-31	Deformable 3D Shape Diffusion Model	Dengsheng Chen et.al.	2407.21428	null
2024-07-31	Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model	Zhichao Zhang et.al.	2407.21408	null
2024-07-31	Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging	Wenhua Wu et.al.	2407.21381	null
2024-07-31	ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images	Xilei Zhu et.al.	2407.21363	null
2024-07-30	Adding Multi-modal Controls to Whole-body Human Motion Generation	Yuxuan Bian et.al.	2407.21136	link
2024-07-29	Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory	Xiaoyan Xing et.al.	2407.20785	null
2024-07-30	Understanding the Impact of Synchronous, Asynchronous, and Hybrid In-Situ Techniques in Computational Fluid Dynamics Applications	Yi Ju et.al.	2407.20717	null
2024-07-30	DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis	Julien Lerouge et.al.	2407.20662	link
2024-07-30	Autonomous Improvement of Instruction Following Skills via Foundation Models	Zhiyuan Zhou et.al.	2407.20635	link
2024-07-30	EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos	Aashish Rai et.al.	2407.20592	null
2024-07-29	Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities	Lorenzo Baraldi et.al.	2407.20337	link
2024-07-29	MaskInversion: Localized Embeddings via Optimization of Explainability Maps	Walid Bousselham et.al.	2407.20034	null
2024-07-29	Reproducibility Study of “ITI-GEN: Inclusive Text-to-Image Generation”	Daniel Gallo Fernández et.al.	2407.19996	link
2024-07-29	FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention	Yu Lu et.al.	2407.19918	null
2024-07-29	Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture	Stefanos Gkikas et.al.	2407.19811	null
2024-07-28	Temporal Feature Matters: A Framework for Diffusion Model Quantization	Yushi Huang et.al.	2407.19547	null
2024-07-28	VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary	Hanjun Luo et.al.	2407.19524	link
2024-07-28	MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability	Buyu Liu et.al.	2407.19468	link
2024-07-28	FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models	Changgu Chen et.al.	2407.19453	link
2024-07-28	\textsc{Perm}: A Parametric Representation for Multi-Style 3D Hair Modeling	Chengan He et.al.	2407.19451	link
2024-07-27	Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions	Ashkan Taghipour et.al.	2407.19205	null
2024-07-26	SHIC: Shape-Image Correspondences with no Keypoint Supervision	Aleksandar Shtedritski et.al.	2407.18907	null
2024-07-26	Adversarial Robustification via Text-to-Image Diffusion Models	Daewon Choi et.al.	2407.18658	link
2024-07-25	AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild	Junho Park et.al.	2407.18034	link
2024-07-25	Guided Latent Slot Diffusion for Object-Centric Learning	Krishnakant Singh et.al.	2407.17929	null
2024-07-25	ReCorD: Reasoning and Correcting Diffusion for HOI Generation	Jian-Yu Jiang-Lin et.al.	2407.17911	link
2024-07-24	SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency	Yiming Xie et.al.	2407.17470	null
2024-07-24	ViPer: Visual Personalization of Generative Models via Individual Preference Learning	Sogand Salehi et.al.	2407.17365	null
2024-07-25	LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model	Wanggong Yang et.al.	2407.17229	null
2024-07-24	MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models	Chunsan Hong et.al.	2407.17095	link
2024-07-24	An Adaptive Gradient Regularization Method	Huixiu Jiang et.al.	2407.16944	null
2024-07-24	Synthetic Trajectory Generation Through Convolutional Neural Networks	Jesse Merhi et.al.	2407.16938	link
2024-07-23	Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions	Fabio Tosi et.al.	2407.16698	link
2024-07-23	MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence	Canyu Zhao et.al.	2407.16655	null
2024-07-23	On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models	Deniz Daum et.al.	2407.16405	link
2024-07-23	Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data	Hengyu Fu et.al.	2407.16134	null
2024-07-23	Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos	Jiahe Liu et.al.	2407.16124	link
2024-07-22	DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design	Zhi Hao Luo et.al.	2407.15723	link
2024-07-22	SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time	Stanislav Frolov et.al.	2407.15507	link
2024-07-22	TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping	Despina Konstantinidou et.al.	2407.15500	link
2024-07-22	DiffX: Guide Your Layout to Cross-Modal Generative Modeling	Zeyu Wang et.al.	2407.15488	link
2024-07-22	Text2Place: Affordance-aware Text Guided Human Placement	Rishubh Parihar et.al.	2407.15446	null
2024-07-23	BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM	Hanjun Luo et.al.	2407.15240	link
2024-07-21	Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative Modelling	Junn Yong Loo et.al.	2407.15238	null
2024-07-21	Flow as the Cross-Domain Manipulation Interface	Mengda Xu et.al.	2407.15208	null
2024-07-21	The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation	Junwen Luo et.al.	2407.15167	null
2024-07-21	Anchored Diffusion for Video Face Reenactment	Idan Kligvasser et.al.	2407.15153	null
2024-07-19	T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation	Kaiyue Sun et.al.	2407.14505	link
2024-07-19	Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations	Decheng Liu et.al.	2407.14367	link
2024-07-19	Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model	Kun Zhao et.al.	2407.14326	null
2024-07-19	Unlearning Concepts from Text-to-Video Diffusion Models	Shiqi Liu et.al.	2407.14209	null
2024-07-19	Time Series Generative Learning with Application to Brain Imaging Analysis	Zhenghao Li et.al.	2407.14003	null
2024-07-18	Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion	Boyang Deng et.al.	2407.13759	null
2024-07-18	Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models	Xiaoyu Zhu et.al.	2407.13642	null
2024-07-18	Training-free Composite Scene Generation for Layout-to-Image Synthesis	Jiaqi Liu et.al.	2407.13609	link
2024-07-18	Multi-sentence Video Grounding for Long Video Generation	Wei Feng et.al.	2407.13219	null
2024-07-18	Image Inpainting Models are Effective Tools for Instruction-guided Image Editing	Xuan Ju et.al.	2407.13139	null
2024-07-19	From Principles to Practices: Lessons Learned from Applying Partnership on AI’s (PAI) Synthetic Media Framework to 11 Use Cases	Claire R. Leibowicz et.al.	2407.13025	null
2024-07-17	Denoising Diffusions in Latent Space for Medical Image Segmentation	Fahim Ahmed Zaman et.al.	2407.12952	link
2024-07-17	VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control	Sherwin Bahmani et.al.	2407.12781	null
2024-07-17	Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs	Yiqing Shen et.al.	2407.12678	link
2024-07-17	Zero-shot Text-guided Infinite Image Synthesis with LLM guidance	Soyeong Kwon et.al.	2407.12642	null
2024-07-17	Towards Understanding Unsafe Video Generation	Yan Pang et.al.	2407.12581	link
2024-07-17	The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation	Yi Yao et.al.	2407.12579	null
2024-07-17	I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps	Junseo Park et.al.	2407.12331	null
2024-07-17	Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process	Yang Cheng et.al.	2407.12261	null
2024-07-18	Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts	Sam Yu-Te Lee et.al.	2407.12192	null
2024-07-16	Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis	Haeil Lee et.al.	2407.12173	null
2024-07-16	Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning	Yanting Miao et.al.	2407.12164	link
2024-07-16	Efficient Training with Denoised Neural Weights	Yifan Gong et.al.	2407.11966	null
2024-07-16	Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model	Dominik Winter et.al.	2407.11664	null
2024-07-16	Scaling Diffusion Transformers to 16 Billion Parameters	Zhengcong Fei et.al.	2407.11633	link
2024-07-16	DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training	Guillermo Jimenez-Perez et.al.	2407.11594	null
2024-07-16	How Control Information Influences Multilingual Text Image Generation and Editing?	Boqiang Zhang et.al.	2407.11502	link
2024-07-16	AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models	Lei Ren et.al.	2407.11480	null
2024-07-16	Cover-separable Fixed Neural Network Steganography via Deep Generative Models	Guobiao Li et.al.	2407.11405	link
2024-07-16	Flatfish Disease Detection Based on Part Segmentation Approach and Disease Image Generation	Seo-Bin Hwang et.al.	2407.11348	null
2024-07-16	Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems	Yaşar Utku Alçalar et.al.	2407.11288	null
2024-07-15	IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation	Yuanhao Zhai et.al.	2407.10937	link
2024-07-15	OPa-Ma: Text Guided Mamba for 360-degree Image Out-painting	Penglei Gao et.al.	2407.10923	null
2024-07-16	DataDream: Few-shot Guided Dataset Generation	Jae Myung Kim et.al.	2407.10910	link
2024-07-15	Optical Diffusion Models for Image Generation	Ilker Oguz et.al.	2407.10897	null
2024-07-15	Physics-Inspired Generative Models in Medical Imaging: A Review	Dennis Hein et.al.	2407.10856	null
2024-07-15	An Autonomous Drone Swarm for Detecting and Tracking Anomalies among Dense Vegetation	Rakesh John Amala Arokia Nathan et.al.	2407.10754	null
2024-07-15	AccDiffusion: An Accurate Method for Higher-Resolution Image Generation	Zhihang Lin et.al.	2407.10738	link
2024-07-15	Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval	Youngsun Lim et.al.	2407.10683	null
2024-07-15	Spatio-temporal neural distance fields for conditional generative modeling of the heart	Kristine Sørensen et.al.	2407.10663	link
2024-07-15	A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication	Jingyi Deng et.al.	2407.10575	null
2024-07-12	FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3	Georgios Makridis et.al.	2407.09467	null
2024-07-12	PID: Physics-Informed Diffusion Model for Infrared Image Generation	Fangyuan Mao et.al.	2407.09299	link
2024-07-12	Surgical Text-to-Image Generation	Chinedu Innocent Nwoye et.al.	2407.09230	null
2024-07-12	DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training	Chen Xin et.al.	2407.09174	link
2024-07-12	Machine Apophenia: The Kaleidoscopic Generation of Architectural Images	Alexey Tikhonov et.al.	2407.09172	null
2024-07-12	Inference Optimization of Foundation Models on AI Accelerators	Youngsuk Park et.al.	2407.09111	null
2024-07-12	Bora: Biomedical Generalist Video Generation Model	Weixiang Sun et.al.	2407.08944	null
2024-07-11	SEED-Story: Multimodal Long Story Generation with Large Language Model	Shuai Yang et.al.	2407.08683	link
2024-07-11	CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs	Leah Chong et.al.	2407.08675	null
2024-07-11	Still-Moving: Customized Video Generation without Customized Video Data	Hila Chefer et.al.	2407.08674	null
2024-07-11	A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights	Wentao Lei et.al.	2407.08428	link
2024-07-11	E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors	Jinxiu Liang et.al.	2407.08231	null
2024-07-10	Generative Image as Action Models	Mohit Shridhar et.al.	2407.07875	link
2024-07-10	StoryDiffusion: How to Support UX Storyboarding With Generative-AI	Zhaohui Liang et.al.	2407.07672	null
2024-07-10	VEnhancer: Generative Space-Time Enhancement for Video Generation	Jingwen He et.al.	2407.07667	null
2024-07-11	Trainable Highly-expressive Activation Functions	Irit Chelly et.al.	2407.07564	link
2024-07-10	Video-to-Audio Generation with Hidden Alignment	Manjie Xu et.al.	2407.07464	null
2024-07-10	Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis	Jian-Qing Zheng et.al.	2407.07295	link
2024-07-09	Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion	Yu Cao et.al.	2407.07249	null
2024-07-09	Accelerating Mobile Edge Generation (MEG) by Constrained Learning	Xiaoxia Xu et.al.	2407.07245	null
2024-07-09	ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction	Shaozhe Hao et.al.	2407.07077	link
2024-07-09	Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation	Filipe Lauar et.al.	2407.06950	link
2024-07-09	HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance	Guian Fang et.al.	2407.06937	link
2024-07-09	Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning	Fanyue Wei et.al.	2407.06642	link
2024-07-09	Mobius: An High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task	Yiran Yang et.al.	2407.06617	link
2024-07-09	Sketch-Guided Scene Image Generation	Tianyu Zhang et.al.	2407.06469	null
2024-07-08	MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions	Xuan Ju et.al.	2407.06358	null
2024-07-08	Dynamics of quantum turbulence in axially rotating thermal counterflow	Ritesh Dwivedi et.al.	2407.06311	link
2024-07-08	VIMI: Grounding Video Generation through Multi-modal Instruction	Yuwei Fang et.al.	2407.06304	null
2024-07-08	JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation	Yu Zeng et.al.	2407.06187	null
2024-07-08	The Tug-of-War Between Deepfake Generation and Detection	Hannah Lee et.al.	2407.06174	null
2024-07-08	PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models	Jinhua Zhang et.al.	2407.06109	link
2024-07-08	MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition	Hozaifa Kassab et.al.	2407.05980	null
2024-07-08	T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models	Yibo Miao et.al.	2407.05965	null
2024-07-08	3D Vessel Graph Generation Using Denoising Diffusion	Chinmay Prabhakar et.al.	2407.05842	link
2024-07-08	GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing	Zhenyu Wang et.al.	2407.05600	null
2024-07-08	This&That: Language-Gesture Controlled Video Generation for Robot Planning	Boyang Wang et.al.	2407.05530	null
2024-07-07	Diffusion as Sound Propagation: Physics-inspired Model for Ultrasound Image Generation	Marina Domínguez et.al.	2407.05428	link
2024-07-07	Enhancing Label-efficient Medical Image Segmentation with Text-guided Diffusion Models	Chun-Mei Feng et.al.	2407.05323	null
2024-07-05	PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation	Yinghua Yao et.al.	2407.04493	link
2024-07-05	Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator	Mehryar Abbasi et.al.	2407.04258	null
2024-07-04	Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion	Yutian Zhong et.al.	2407.03992	link
2024-07-04	Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection	Federico Girella et.al.	2407.03961	link
2024-07-04	Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations	Zhiyang Xu et.al.	2407.03604	null
2024-07-03	BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations	Zhantao Yang et.al.	2407.03314	null
2024-07-03	Towards High Resolution Real-Time Optical Flow Particle Image Velocimetry	Juan Pimienta et.al.	2407.03057	null
2024-07-03	Robot Shape and Location Retention in Video Generation Using Diffusion Models	Peng Wang et.al.	2407.02873	link
2024-07-03	Representation learning with CGAN for casual inference	Zhaotian Weng et.al.	2407.02825	null
2024-07-03	Mobile Edge Generation-Enabled Digital Twin: Architecture Design and Research Opportunities	Xiaoxia Xu et.al.	2407.02804	link
2024-07-02	OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation	Kepan Nan et.al.	2407.02371	null
2024-07-04	UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks	Jingjing Ren et.al.	2407.02158	null
2024-07-02	SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules	Suyi Li et.al.	2407.02031	null
2024-07-04	GVDIFF: Grounded Text-to-Video Generation with Diffusion Models	Huanzhang Dou et.al.	2407.01921	null
2024-07-01	Label-free Neural Semantic Image Synthesis	Jiayi Wang et.al.	2407.01790	null
2024-06-30	BADM: Batch ADMM for Deep Learning	Ouya Wang et.al.	2407.01640	null
2024-07-01	Evaluation of Text-to-Video Generation Models: A Dynamics Perspective	Mingxiang Liao et.al.	2407.01094	link
2024-06-30	InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation	Haofan Wang et.al.	2407.00788	link
2024-06-30	Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation	Peng Huang et.al.	2407.00752	null
2024-06-30	LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation	Mushui Liu et.al.	2407.00737	null
2024-06-28	Wavelets Are All You Need for Autoregressive Image Generation	Wael Mattar et.al.	2406.19997	null
2024-06-28	Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs	Sangwon Jeong et.al.	2406.19987	null
2024-06-28	MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance	Yuang Zhang et.al.	2406.19680	null
2024-06-28	PopAlign: Population-Level Alignment for Fair Text-to-Image Generation	Shufan Li et.al.	2406.19668	link
2024-06-28	Network Bending of Diffusion Models for Audio-Visual Generation	Luke Dzwonczyk et.al.	2406.19589	link
2024-06-27	What Matters in Detecting AI-Generated Videos like Sora?	Chirui Chang et.al.	2406.19568	null
2024-06-27	Understanding Modality Preferences in Search Clarification	Leila Tavakoli et.al.	2406.19546	link
2024-06-27	Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model	Jiangtong Tan et.al.	2406.19030	link
2024-06-28	AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation	Yanan Sun et.al.	2406.18958	link
2024-06-27	CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation	Zuo Zuo et.al.	2406.18941	null
2024-06-26	MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data	William Berman et.al.	2406.18790	null
2024-06-26	MultiDiff: Consistent Novel View Synthesis from a Single Image	Norman Müller et.al.	2406.18524	null
2024-06-26	ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation	Shenghai Yuan et.al.	2406.18522	link
2024-06-26	DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance	Younghyun Kim et.al.	2406.18459	link
2024-06-25	Text-Animator: Controllable Visual Text Video Generation	Lin Liu et.al.	2406.17777	null
2024-06-25	MotionBooth: Motion-Aware Customized Text-to-Video Generation	Jianzong Wu et.al.	2406.17758	null
2024-06-25	Detection of Synthetic Face Images: Accuracy, Robustness, Generalization	Nela Petrzelkova et.al.	2406.17547	null
2024-06-25	TSynD: Targeted Synthetic Data Generation for Enhanced Medical Image Classification	Joshua Niemeijer et.al.	2406.17473	null
2024-06-25	SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing	Ruihuang Li et.al.	2406.17396	null
2024-06-25	Semantic Deep Hiding for Robust Unlearnable Examples	Ruohan Meng et.al.	2406.17349	null
2024-06-25	Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers	Lei Chen et.al.	2406.17343	link
2024-06-25	Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds	Hongliang Zeng et.al.	2406.17342	null
2024-06-24	Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation	Zhenyi Liao et.al.	2406.17100	link
2024-06-24	FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models	Haonan Qiu et.al.	2406.16863	link
2024-06-24	Dreamitate: Real-World Visuomotor Policy Learning via Video Generation	Junbang Liang et.al.	2406.16862	null
2024-06-24	DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation	Yuang Peng et.al.	2406.16855	link
2024-06-24	Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation	Katherine M. Collins et.al.	2406.16807	null
2024-06-24	Repulsive Score Distillation for Diverse Sampling of Diffusion Models	Nicolas Zilberstein et.al.	2406.16683	link
2024-06-24	EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations	Zhiyu Tan et.al.	2406.16562	link
2024-06-24	Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization	Yuhang Ma et.al.	2406.16537	link
2024-06-24	ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance	Shuwei Shi et.al.	2406.16476	null
2024-06-24	Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models	Yichen Sun et.al.	2406.16333	null
2024-06-24	Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement	Zhiyuan Chang et.al.	2406.16272	link
2024-06-21	MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation	Xuan He et.al.	2406.15252	null
2024-06-21	Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors	Ali Naseh et.al.	2406.15213	link
2024-06-21	Disability Representations: Finding Biases in Automatic Image Generation	Yannis Tevissen et.al.	2406.14993	null
2024-06-21	Latent diffusion models for parameterization and data assimilation of facies-based geomodels	Guido Di Federico et.al.	2406.14815	null
2024-06-20	Evaluating Numerical Reasoning in Text-to-Image Models	Ivana Kajić et.al.	2406.14774	link
2024-06-20	Holistic Evaluation for Interleaved Text-and-Image Generation	Minqian Liu et.al.	2406.14643	null
2024-06-20	Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps	Nikita Starodubcev et.al.	2406.14539	null
2024-06-20	Fantastic Copyrighted Beasts and How (Not) to Generate Them	Luxi He et.al.	2406.14526	null
2024-06-20	SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset	Josef Dai et.al.	2406.14477	link
2024-06-20	Video Generation with Learned Action Prior	Meenakshi Sarkar et.al.	2406.14436	null
2024-06-20	ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning	Zhongjie Duan et.al.	2406.14130	link
2024-06-19	Splatter a Video: Video Gaussian Representation for Versatile Processing	Yang-Tian Sun et.al.	2406.13870	null
2024-06-19	GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation	Baiqi Li et.al.	2406.13743	link
2024-06-19	Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery	Jonathan Gallagher et.al.	2406.13688	null
2024-06-19	Improving Visual Commonsense in Language Models via Multiple Image Generation	Guy Yariv et.al.	2406.13621	link
2024-06-19	What’s Next? Exploring Utilization, Challenges, and Future Directions of AI-Generated Image Tools in Graphic Design	Yuying Tang et.al.	2406.13436	null
2024-06-19	AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation	Xinyu Hou et.al.	2406.12805	link
2024-06-18	Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images	Shivank Garg et.al.	2406.12592	link
2024-06-18	Training Diffusion Models with Federated Learning	Matthijs de Goede et.al.	2406.12575	null
2024-06-18	Generative Artificial Intelligence-Guided User Studies: An Application for Air Taxi Services	Shengdi Xiao et.al.	2406.12296	null
2024-06-17	ARTIST: Improving the Generation of Text-rich Images by Disentanglement	Jianyi Zhang et.al.	2406.12044	null
2024-06-17	Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models	Alireza Ganjdanesh et.al.	2406.12042	link
2024-06-17	Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI	Robert Hönig et.al.	2406.12027	link
2024-06-17	Decomposed evaluations of geographic disparities in text-to-image models	Abhishek Sureddy et.al.	2406.11988	null
2024-06-17	Autoregressive Image Generation without Vector Quantization	Tianhong Li et.al.	2406.11838	link
2024-06-17	Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%	Lei Zhu et.al.	2406.11837	link
2024-06-17	Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models	Bingqi Ma et.al.	2406.11831	null
2024-06-17	PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models	Fanqing Meng et.al.	2406.11802	link
2024-06-17	Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes	Aghiles Kebaili et.al.	2406.11659	null
2024-06-17	GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation	Shihao Cai et.al.	2406.11503	link
2024-06-17	Generative Visual Instruction Tuning	Jefferson Hernandez et.al.	2406.11262	link
2024-06-17	NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation	Niu Guanchen et.al.	2406.11259	null
2024-06-17	Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion	Rishab Parthasarathy et.al.	2406.11196	link
2024-06-16	An Analysis on Quantizing Diffusion Transformers	Yuewei Yang et.al.	2406.11100	null
2024-06-14	Make It Count: Text-to-Image Generation with an Accurate Number of Objects	Lital Binyamin et.al.	2406.10210	null
2024-06-14	Crafting Parts for Expressive Object Composition	Harsh Rangwani et.al.	2406.10197	null
2024-06-14	Training-free Camera Control for Video Generation	Chen Hou et.al.	2406.10126	null
2024-06-14	High-efficiency generation of vectorial holograms with metasurfaces	Tong Liu et.al.	2406.10072	null
2024-06-14	BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Imanol Miranda et.al.	2406.09952	link
2024-06-14	ControlVAR: Exploring Controllable Visual Autoregressive Modeling	Xiang Li et.al.	2406.09750	link
2024-06-13	Turns Out I’m Not Real: Towards Robust Detection of AI-Generated Videos	Qingyuan Liu et.al.	2406.09601	null
2024-06-13	You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes	Jabez Magomere et.al.	2406.09496	link
2024-06-13	Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models	Qihao Liu et.al.	2406.09416	link
2024-06-13	An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels	Duy-Kien Nguyen et.al.	2406.09415	null
2024-06-13	Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs	Zijia Zhao et.al.	2406.09367	link
2024-06-13	Understanding Hallucinations in Diffusion Models through Mode Interpolation	Sumukh K Aithal et.al.	2406.09358	link
2024-06-13	Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps	Shuqing Li et.al.	2406.09313	null
2024-06-13	Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation	Yufan Zhou et.al.	2406.09305	null
2024-06-13	StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning	Giuseppe Vecchio et.al.	2406.09293	null
2024-06-13	EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts	Yucheng Han et.al.	2406.09162	null
2024-06-13	Complex Image-Generative Diffusion Transformer for Audio Denoising	Junhui Li et.al.	2406.09161	null
2024-06-13	EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts	Zahraa Al Sahili et.al.	2406.09070	null
2024-06-12	Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation	Raphael Tang et.al.	2406.08482	null
2024-06-12	What If We Recaption Billions of Web Images with LLaMA-3?	Xianhang Li et.al.	2406.08478	null
2024-06-12	PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences	Daiwei Chen et.al.	2406.08469	link
2024-06-12	Diffusion Soup: Model Merging for Text-to-Image Diffusion Models	Benjamin Biggs et.al.	2406.08431	null
2024-06-12	VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks	Jiannan Wu et.al.	2406.08394	link
2024-06-12	FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation	Xinzhi Mu et.al.	2406.08392	null
2024-06-12	WMAdapter: Adding WaterMark Control to Latent Diffusion Models	Hai Ci et.al.	2406.08337	null
2024-06-12	CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models	Hyungjin Chung et.al.	2406.08070	null
2024-06-12	Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models	Arman Zarei et.al.	2406.07844	link
2024-06-12	Hierarchical Patch Diffusion Models for High-Resolution Video Generation	Ivan Skorokhodov et.al.	2406.07792	null
2024-06-11	Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?	Xingyu Fu et.al.	2406.07546	null
2024-06-11	Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance	Kuan Heng Lin et.al.	2406.07540	null
2024-06-11	Neural Gaffer: Relighting Any Object via Diffusion	Haian Jin et.al.	2406.07520	null
2024-06-11	Instant 3D Human Avatar Generation using Image Diffusion Models	Nikos Kolotouros et.al.	2406.07516	null
2024-06-11	Understanding Visual Concepts Across Models	Brandon Trabucco et.al.	2406.07506	link
2024-06-11	Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions	Renjie Pi et.al.	2406.07502	link
2024-06-12	SPIN: Spacecraft Imagery for Navigation	Javier Montalvo et.al.	2406.07500	link
2024-06-11	4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models	Heng Yu et.al.	2406.07472	null
2024-06-11	Beware of Aliases – Signal Preservation is Crucial for Robust Image Restoration	Shashank Agnihotri et.al.	2406.07435	null
2024-06-11	Visual Representation Learning with Stochastic Frame Prediction	Huiwon Jang et.al.	2406.07398	null
2024-06-10	Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation	Peize Sun et.al.	2406.06525	link
2024-06-10	The Effect of Training Dataset Size on Discriminative and Diffusion-Based Speech Enhancement Systems	Philippe Gonzalez et.al.	2406.06160	null
2024-06-10	ProcessPainter: Learn Painting Process from Sequence Data	Yiren Song et.al.	2406.06062	link
2024-06-09	OmniControlNet: Dual-stage Integration for Conditional Image Generation	Yilin Wang et.al.	2406.05871	null
2024-06-09	Unified Text-to-Image Generation and Retrieval	Leigang Qu et.al.	2406.05814	null
2024-06-11	MLCM: Multistep Consistency Distillation of Latent Diffusion Model	Qingsong Xie et.al.	2406.05768	link
2024-06-09	Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion	Ge Ya Luo et.al.	2406.05630	link
2024-06-09	Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models	Philip Wootaek Shin et.al.	2406.05602	null
2024-06-08	Medical Vision Generalist: Unifying Medical Imaging Tasks in Context	Sucheng Ren et.al.	2406.05565	link
2024-06-08	Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models	Minho Park et.al.	2406.05432	link
2024-06-07	CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion	Xingrui Wang et.al.	2406.05082	null
2024-06-07	GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications	Shakhnaz Akhmedova et.al.	2406.05023	link
2024-06-07	AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation	Lianyu Pang et.al.	2406.05000	null
2024-06-07	Zero-Shot Video Editing through Adaptive Sliding Score Distillation	Lianghan Zhu et.al.	2406.04888	null
2024-06-07	Online Continual Learning of Video Diffusion Models From a Single Video Stream	Jason Yoo et.al.	2406.04814	null
2024-06-07	TEDi Policy: Temporally Entangled Diffusion for Robotic Control	Sigmund H. Høeg et.al.	2406.04806	link
2024-06-07	PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction	Eduard Poesina et.al.	2406.04746	link
2024-06-07	GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models	Diptanu De et.al.	2406.04654	null
2024-06-07	CLoG: Benchmarking Continual Learning of Image Generation Models	Haotian Zhang et.al.	2406.04584	link
2024-06-06	Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance	Reyhane Askari Hemmat et.al.	2406.04551	null
2024-06-06	Coherent Zero-Shot Visual Instruction Generation	Quynh Phung et.al.	2406.04337	null
2024-06-06	BitsFusion: 1.99 bits Weight Quantization of Diffusion Model	Yang Sui et.al.	2406.04333	link
2024-06-06	ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	Lin Chen et.al.	2406.04325	null
2024-06-06	SF-V: Single Forward Video Generation Model	Zhixing Zhang et.al.	2406.04324	link
2024-06-06	VideoTetris: Towards Compositional Text-to-Video Generation	Ye Tian et.al.	2406.04277	link
2024-06-06	Diffusion-based image inpainting with internal learning	Nicolas Cherel et.al.	2406.04206	link
2024-06-06	Machine Learning-Driven Microwave Imaging for Soil Moisture Estimation near Leaky Pipe	Mohammad Ramezaninia et.al.	2406.04193	null
2024-06-06	Quantum Implicit Neural Representations	Jiaming Zhao et.al.	2406.03873	link
2024-06-06	Semantic Similarity Score for Measuring Visual Similarity at Semantic Level	Senran Fan et.al.	2406.03865	null
2024-06-06	Malware Classification Based on Image Segmentation	Wanhu Nie et.al.	2406.03831	link
2024-06-05	Tackling GenAI Copyright Issues: Originality Estimation and Genericization	Hiroaki Chiba-Okabe et.al.	2406.03341	link
2024-06-05	Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion	Hao Wen et.al.	2406.03184	link
2024-06-05	Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control	Jingyun Xue et.al.	2406.03035	null
2024-06-05	Language-guided Detection and Mitigation of Unknown Dataset Bias	Zaiying Zhao et.al.	2406.02889	null
2024-06-06	Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter	Peng Xing et.al.	2406.02881	null
2024-06-04	Latent Style-based Quantum GAN for high-quality Image Generation	Su Yeon Chang et.al.	2406.02668	null
2024-06-04	ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation	Tianchen Zhao et.al.	2406.02540	link
2024-06-04	DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering	Zhongpai Gao et.al.	2406.02518	null
2024-06-04	V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation	Cong Wang et.al.	2406.02511	null
2024-06-04	CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation	Dejia Xu et.al.	2406.02509	null
2024-06-04	Guiding a Diffusion Model with a Bad Version of Itself	Tero Karras et.al.	2406.02507	link
2024-06-04	Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation	Jiajun Wang et.al.	2406.02485	link
2024-06-04	Generative Active Learning for Long-tailed Instance Segmentation	Muzhi Zhu et.al.	2406.02435	link
2024-06-04	Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation	Clement Chadebec et.al.	2406.02347	link
2024-06-04	I4VGen: Image as Stepping Stone for Text-to-Video Generation	Xiefan Guo et.al.	2406.02230	null
2024-06-04	The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise	Yuanhao Ban et.al.	2406.01970	null
2024-05-31	Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling	Jiatao Gu et.al.	2405.21048	null
2024-05-31	You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet	Zhen Qin et.al.	2405.21022	null
2024-05-31	Amortizing intractable inference in diffusion models for vision, language, and control	Siddarth Venkatraman et.al.	2405.20971	link
2024-05-31	Information Theoretic Text-to-Image Alignment	Chao Wang et.al.	2405.20759	null
2024-05-31	Diffusion Models Are Innate One-Step Generators	Bowen Zheng et.al.	2405.20750	link
2024-05-31	Cyclic image generation using chaotic dynamics	Takaya Tanaka et.al.	2405.20717	link
2024-05-31	Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space	Yukai Zhang et.al.	2405.20685	null
2024-05-31	4Diffusion: Multi-view Video Diffusion Model for 4D Generation	Haiyu Zhang et.al.	2405.20674	null
2024-05-31	Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation	Shuzhou Yang et.al.	2405.20669	link
2024-05-31	Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization	Yisu Liu et.al.	2405.20584	link
2024-05-30	Improving the Training of Rectified Flows	Sangyun Lee et.al.	2405.20320	link
2024-05-30	CV-VAE: A Compatible Video VAE for Latent Generative Video Models	Sijie Zhao et.al.	2405.20279	link
2024-05-30	MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model	Muyao Niu et.al.	2405.20222	link
2024-05-30	Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback	Sanghyeon Na et.al.	2405.20216	null
2024-05-30	RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection	Zhiyuan He et.al.	2405.20112	null
2024-05-30	Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion	Jiangkai Wu et.al.	2405.20032	link
2024-05-30	Mitigating annotation shift in cancer classification using single image generative models	Marta Buetas Arcas et.al.	2405.19754	link
2024-05-30	DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark	Haoxing Chen et.al.	2405.19707	link
2024-05-30	Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian	Wei Sun et.al.	2405.19657	null
2024-05-29	MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection	Raman Dutt et.al.	2405.19458	link
2024-05-29	ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning	Ruchika Chavhan et.al.	2405.19237	link
2024-05-29	Going beyond compositional generalization, DDPMs can produce zero-shot interpolation	Justin Deschenaux et.al.	2405.19201	link
2024-05-29	The ethical situation of DALL-E 2	Eduard Hogea et.al.	2405.19176	null
2024-05-29	Patch-enhanced Mask Encoder Prompt Image Generation	Shusong Xu et.al.	2405.19085	null
2024-05-29	EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture	Jiaqi Xu et.al.	2405.18991	link
2024-05-29	Topological Perspectives on Optimal Multimodal Embedding Spaces	Abdul Aziz A. B et.al.	2405.18867	null
2024-05-30	Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation	Jiyoon Myung et.al.	2405.18762	null
2024-05-29	T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback	Jiachen Li et.al.	2405.18750	link
2024-05-29	SketchDeco: Decorating B&W Sketches with Colour	Chaitat Utintu et.al.	2405.18716	link
2024-05-28	Scalable Surrogate Verification of Image-based Neural Network Control Systems using Composition and Unrolling	Feiyang Cai et.al.	2405.18554	null
2024-05-28	Phased Consistency Model	Fu-Yun Wang et.al.	2405.18407	link
2024-05-28	RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives	Jaehong Yoon et.al.	2405.18406	link
2024-05-28	VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers	Jun Zheng et.al.	2405.18326	null
2024-05-28	Multi-modal Generation via Cross-Modal In-Context Learning	Amandeep Kumar et.al.	2405.18304	link
2024-05-28	EG4D: Explicit Generation of 4D Object without Score Distillation	Qi Sun et.al.	2405.18132	link
2024-05-28	Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers?	Zebin You et.al.	2405.18029	null
2024-05-28	MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling	Bowen Zhang et.al.	2405.18003	link
2024-05-28	Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage Detection	Zhengji Li et.al.	2405.17905	null
2024-05-28	Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation	Akio Hayakawa et.al.	2405.17842	link
2024-05-27	RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance	Jiaojiao Fan et.al.	2405.17661	null
2024-05-27	Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control	Zhengfei Kuang et.al.	2405.17414	null
2024-05-27	Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer	Ruizhi Shao et.al.	2405.17405	null
2024-05-27	Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability	Shenyuan Gao et.al.	2405.17398	link
2024-05-27	Prompt Optimization with Human Feedback	Xiaoqiang Lin et.al.	2405.17346	link
2024-05-28	Controllable Longer Image Animation with Diffusion Models	Qiang Wang et.al.	2405.17306	null
2024-05-27	Training-free Editioning of Text-to-Image Models	Jinqi Wang et.al.	2405.17069	null
2024-05-27	The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models	Saravanan Kandasamy et.al.	2405.17068	null
2024-05-27	Glauber Generative Model: Discrete Diffusion Models via Binary Classification	Harshit Varma et.al.	2405.17035	null
2024-05-27	Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation	Liang Shi et.al.	2405.16895	null
2024-05-27	Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks	Yunqi Zhang et.al.	2405.16860	link
2024-05-24	A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence	Ali Kashefi et.al.	2405.15406	link
2024-05-24	Stochastic SR for Gaussian microtextures	Emile Pierret et.al.	2405.15399	null
2024-05-24	Challenges and Opportunities in 3D Content Generation	Ke Zhao et.al.	2405.15335	null
2024-05-24	Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model	Mingyang Yi et.al.	2405.15330	null
2024-05-24	SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance	Guibao Shen et.al.	2405.15321	null
2024-05-24	Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient	Yongliang Wu et.al.	2405.15304	link
2024-05-24	StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models	Chengming Xu et.al.	2405.15287	null
2024-05-24	Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models	Yimeng Zhang et.al.	2405.15234	link
2024-05-24	iVideoGPT: Interactive VideoGPTs are Scalable World Models	Jialong Wu et.al.	2405.15223	link
2024-05-24	ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models	Jingyuan Zhu et.al.	2405.15199	null
2024-05-23	Improved Distribution Matching Distillation for Fast Image Synthesis	Tianwei Yin et.al.	2405.14867	link
2024-05-23	Video Diffusion Models are Training-free Motion Interpreter and Controller	Zeqi Xiao et.al.	2405.14864	null
2024-05-23	Semantica: An Adaptable Image-Conditioned Diffusion Model	Manoj Kumar et.al.	2405.14857	null
2024-05-23	TerDiT: Ternary Diffusion Models with Transformers	Xudong Lu et.al.	2405.14854	link
2024-05-23	Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models	Katherine Xu et.al.	2405.14828	null
2024-05-24	Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation	Hongxu Jiang et.al.	2405.14802	link
2024-05-23	Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy	Shengfang Zhai et.al.	2405.14800	link
2024-05-23	RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices	Qiaoyi Chen et.al.	2405.14794	null
2024-05-23	OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance	Shuheng Ge et.al.	2405.14709	null
2024-05-23	Learning Multi-dimensional Human Preference for Text-to-Image Generation	Sixian Zhang et.al.	2405.14705	null
2024-05-21	Personalized Residuals for Concept-Driven Text-to-Image Generation	Cusuh Ham et.al.	2405.12978	null
2024-05-21	An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation	Zhiyu Tan et.al.	2405.12914	link
2024-05-21	OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models	Zhaojian Yu et.al.	2405.12843	link
2024-05-21	DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control	Hong Chen et.al.	2405.12796	null
2024-05-21	Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations	Antoine Legrand et.al.	2405.12728	null
2024-05-21	CustomText: Customized Textual Image Generation using Diffusion Models	Shubham Paliwal et.al.	2405.12531	null
2024-05-20	Diffusion for World Modeling: Visual Details Matter in Atari	Eloi Alonso et.al.	2405.12399	link
2024-05-20	Diffusion Models for Generating Ballistic Spacecraft Trajectories	Tyler Presser et.al.	2405.11738	link
2024-05-19	URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images	Zoey Chen et.al.	2405.11656	null
2024-05-19	FIFO-Diffusion: Generating Infinite Videos from Text without Training	Jihwan Kim et.al.	2405.11473	link
2024-05-18	UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers	Duo Peng et.al.	2405.11336	null
2024-05-18	On the Trajectory Regularity of ODE-based Diffusion Sampling	Defang Chen et.al.	2405.11326	link
2024-05-18	TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation	Chengcheng Feng et.al.	2405.11236	null
2024-05-17	Improving face generation quality and prompt following with synthetic captions	Michail Tarasiou et.al.	2405.10864	null
2024-05-17	From Sora What We Can See: A Survey of Text-to-Video Generation	Rui Sun et.al.	2405.10674	link
2024-05-17	Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image	Jianshun Zeng et.al.	2405.10504	null
2024-05-17	Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers	Rya Sanovar et.al.	2405.10480	null
2024-05-16	UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models	Sahel Sharifymoghaddam et.al.	2405.10311	link
2024-05-16	VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing	Binghui Chen et.al.	2405.09985	null
2024-05-16	Chameleon: Mixed-Modal Early-Fusion Foundation Models	Chameleon Team et.al.	2405.09818	null
2024-05-16	Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images	Memoona Aziz et.al.	2405.09426	null
2024-05-15	DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations	Nima Fathi et.al.	2405.09288	link
2024-05-15	Dance Any Beat: Blending Beats with Visuals in Dance Video Generation	Xuanchen Wang et.al.	2405.09266	null
2024-05-14	Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding	Zhimin Li et.al.	2405.08748	link
2024-05-13	The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective	Andrew Shin et.al.	2405.08720	null
2024-05-14	Compositional Text-to-Image Generation with Dense Blob Representations	Weili Nie et.al.	2405.08246	null
2024-05-13	CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models	Nick Stracke et.al.	2405.07913	null
2024-05-13	SAR Image Synthesis with Diffusion Models	Denisa Qosja et.al.	2405.07776	null
2024-05-12	Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning	Jiarui Wang et.al.	2405.07346	link
2024-05-12	Stable Signature is Unstable: Removing Image Watermark from Diffusion Models	Yuepeng Hu et.al.	2405.07145	null
2024-05-12	MAxPrototyper: A Multi-Agent Generation System for Interactive User Interface Prototyping	Mingyue Yuan et.al.	2405.07131	null
2024-05-11	Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior	Ce Wang et.al.	2405.07044	link
2024-05-11	Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation	Shengyuan Liu et.al.	2405.06948	null
2024-05-10	Deep MMD Gradient Flow without adversarial training	Alexandre Galashov et.al.	2405.06780	null
2024-05-10	OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation	Jinwei Lin et.al.	2405.06547	link
2024-05-10	Controllable Image Generation With Composed Parallel Token Prediction	Jamie Stirling et.al.	2405.06535	null
2024-05-10	SketchDream: Sketch-based Text-to-3D Generation and Editing	Feng-Lin Liu et.al.	2405.06461	null
2024-05-09	Photonic quantum generative adversarial networks for classical data	Tigran Sedrakyan et.al.	2405.06023	link
2024-05-09	Frame Interpolation with Consecutive Brownian Bridge Diffusion	Zonglin Lyu et.al.	2405.05953	link
2024-05-09	Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models	Zhe Ma et.al.	2405.05846	link
2024-05-10	MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation	Yuxiang Wei et.al.	2405.05806	link
2024-05-09	Exploring Text-Guided Single Image Editing for Remote Sensing Images	Fangzhou Han et.al.	2405.05769	link
2024-05-09	End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base	Shuling Li et.al.	2405.05738	null
2024-05-09	A Survey on Personalized Content Synthesis with Diffusion Models	Xulu Zhang et.al.	2405.05538	null
2024-05-08	Cross-Modality Translation with Generative Adversarial Networks to Unveil Alzheimer’s Disease Biomarkers	Reihaneh Hassanzadeh et.al.	2405.05462	null
2024-05-08	DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation	Joshua N. Williams et.al.	2405.05382	link
2024-05-08	Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo	Nayantara Mudur et.al.	2405.05255	link
2024-05-08	Reviewing Intelligent Cinematography: AI research for camera-based video production	Adrian Azzarelli et.al.	2405.05039	null
2024-05-08	Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI	Keqiang Fan et.al.	2405.04974	null
2024-05-08	FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation	Xuehai He et.al.	2405.04834	null
2024-05-07	TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation	Hritik Bansal et.al.	2405.04682	link
2024-05-07	TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model	Yongming Zhang et.al.	2405.04675	null
2024-05-07	Towards Geographic Inclusion in the Evaluation of Text-to-Image Models	Melissa Hall et.al.	2405.04457	null
2024-05-07	Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation	Jihyun Kim et.al.	2405.04356	link
2024-05-07	Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation	Dogucan Yaman et.al.	2405.04327	null
2024-05-08	Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer	Zhuoyi Yang et.al.	2405.04312	link
2024-05-07	Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map	Yuxuan Xia et.al.	2405.04290	null
2024-05-07	Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models	Fan Bao et.al.	2405.04233	null
2024-05-07	Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models	Zhixuan Chu et.al.	2405.04180	link
2024-05-07	Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method	Peisong He et.al.	2405.04133	null
2024-05-07	Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model	Joo Young Choi et.al.	2405.03958	null
2024-05-06	CCDM: Continuous Conditional Diffusion Models for Image Generation	Xin Ding et.al.	2405.03546	link
2024-05-06	Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond	Zheng Zhu et.al.	2405.03520	link
2024-05-06	Video Diffusion Models: A Survey	Andrew Melnik et.al.	2405.03150	link
2024-05-05	Matten: Video Generation with Mamba-Attention	Yu Gao et.al.	2405.03025	null
2024-05-05	Data-Efficient Molecular Generation with Hierarchical Textual Inversion	Seojin Kim et.al.	2405.02845	link
2024-05-05	ImageInWords: Unlocking Hyper-Detailed Image Descriptions	Roopal Garg et.al.	2405.02793	link
2024-05-04	U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers	Yuchuan Tian et.al.	2405.02730	link
2024-05-03	Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification	Siqi Yin et.al.	2405.02155	null
2024-05-03	AI-generated art perceptions with GenFrame – an image-generating picture frame	Peter Kun et.al.	2405.01901	null
2024-05-03	Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition	Yichun Tai et.al.	2405.01872	null
2024-05-02	Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning	Rafael Elberg et.al.	2405.01705	link
2024-05-02	StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation	Yupeng Zhou et.al.	2405.01434	link
2024-05-02	Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration	Praveen Kumar Chandaliya et.al.	2405.01273	null
2024-05-02	DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines	Ye Tian et.al.	2405.01248	null
2024-05-02	On Mechanistic Knowledge Localization in Text-to-Image Generative Models	Samyadeep Basu et.al.	2405.01008	link
2024-05-01	SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models	Burak Can Biner et.al.	2405.00878	null
2024-05-01	UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement	Ruiquan Ge et.al.	2405.00542	link
2024-05-01	Compressive Sensing Imaging Using Caustic Lens Mask Generated by Periodic Perturbation in a Ripple Tank	Doğan Tunca Arık et.al.	2405.00407	null
2024-05-01	Streamlining Image Editing with Layered Diffusion Brushes	Peyman Gholami et.al.	2405.00313	null
2024-04-30	DOCCI: Descriptions of Connected and Contrasting Images	Yasumasa Onoe et.al.	2404.19753	null
2024-04-30	Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Yunhao Ge et.al.	2404.19752	null
2024-04-30	SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration	Yuto Nakashima et.al.	2404.19693	null
2024-04-30	VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization	Yuliang Liu et.al.	2404.19652	link
2024-04-30	TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models	Teng Zhou et.al.	2404.19475	link
2024-04-30	InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation	Chanran Kim et.al.	2404.19427	null
2024-04-30	Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model	Wentao Lei et.al.	2404.19277	null
2024-05-01	FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills	Yongqiang Zhao et.al.	2404.19217	link
2024-04-30	NeRF-Insert: 3D Local Editing with Multimodal Control Signals	Benet Oriol Sabat et.al.	2404.19204	null
2024-04-29	DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing	Minghao Chen et.al.	2404.18929	null
2024-04-29	TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation	Junhao Cheng et.al.	2404.18919	link
2024-04-29	Hide and Seek: How Does Watermarking Impact Face Recognition?	Yuguang Yao et.al.	2404.18890	null
2024-04-29	Learning Mixtures of Gaussians Using Diffusion Models	Khashayar Gatmiry et.al.	2404.18869	null
2024-04-29	FlexiFilm: Long Video Generation with Flexible Conditions	Yichen Ouyang et.al.	2404.18620	link
2024-04-29	Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting	Tianyidan Xie et.al.	2404.18598	null
2024-04-29	Autonomous Quality and Hallucination Assessment for Virtual Tissue Staining and Digital Pathology	Luzhe Huang et.al.	2404.18458	null
2024-04-29	PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images	Jiquan Yuan et.al.	2404.18409	link
2024-04-30	Equivalence: An analysis of artists’ roles with Image Generative AI from Conceptual Art perspective through an interactive installation design practice	Yixuan Li et.al.	2404.18385	null
2024-04-29	G-Refine: A General Quality Refiner for Text-to-Image Generation	Chunyi Li et.al.	2404.18343	link
2024-04-26	Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement	Zishu Yao et.al.	2404.17400	link
2024-04-26	Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection	Jiawei Song et.al.	2404.17254	null
2024-04-26	Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis	Shivangi Yadav et.al.	2404.17105	null
2024-04-25	REBEL: Reinforcement Learning via Regressing Relative Rewards	Zhaolin Gao et.al.	2404.16767	link
2024-04-27	Denoising: from classical methods to deep CNNs	Jean-Eric Campagne et.al.	2404.16617	link
2024-04-25	MuseumMaker: Continual Style Customization without Catastrophic Forgetting	Chenxi Liu et.al.	2404.16612	null
2024-04-25	Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models	Parul Gupta et.al.	2404.16556	null
2024-04-25	OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images	Ye Mao et.al.	2404.16538	link
2024-04-25	Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series	Aimi Okabayashi et.al.	2404.16409	link
2024-04-25	TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models	Haomiao Ni et.al.	2404.16306	link
2024-04-26	Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model	Gehui Chen et.al.	2404.16305	null
2024-04-26	Guardians of the Quantum GAN	Archisman Ghosh et.al.	2404.16156	null
2024-04-24	Spinning solar jets explained through the interplay between plasma sheets and vortex columns	Sahel Dey et.al.	2404.16096	null
2024-04-23	ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning	Weifeng Chen et.al.	2404.15449	null
2024-04-23	GLoD: Composing Global Contexts and Local Details in Image Generation	Moyuru Yamada et.al.	2404.15447	null
2024-04-23	ID-Animator: Zero-Shot Identity-Preserving Human Video Generation	Xuanhua He et.al.	2404.15275	link
2024-04-23	From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation	Zehuan Huang et.al.	2404.15267	link
2024-04-23	Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment	Tianwei Zhou et.al.	2404.15163	null
2024-04-23	Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation	Xun Wu et.al.	2404.15100	null
2024-04-23	SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models	Bo Lin et.al.	2404.14755	null
2024-04-23	FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction	Hang Hua et.al.	2404.14715	null
2024-04-22	The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking	Yuying Li et.al.	2404.14581	null
2024-04-22	GeoDiffuser: Geometry-Based Image Editing with Diffusion Models	Rahul Sajnani et.al.	2404.14403	null
2024-04-22	SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation	Yuying Ge et.al.	2404.14396	link
2024-04-22	TAVGBench: Benchmarking Text to Audible-Video Generation	Yuxin Mao et.al.	2404.14381	link
2024-04-22	MultiBooth: Towards Generating All Your Concepts in an Image from Text	Chenyang Zhu et.al.	2404.14239	link
2024-04-22	RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance	Chengrui Wang et.al.	2404.13984	null
2024-04-23	Accelerating Image Generation with Sub-path Linear Approximation Model	Chen Xu et.al.	2404.13903	null
2024-04-22	Towards Better Text-to-Image Generation Alignment via Attention Modulation	Yihang Wu et.al.	2404.13899	null
2024-04-21	Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation	Jensen Hwa et.al.	2404.13798	null
2024-04-21	Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control	Maria Mihaela Trusca et.al.	2404.13766	null
2024-04-21	Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models	Vitali Petsiuk et.al.	2404.13706	null
2024-04-19	PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation	Tianyuan Zhang et.al.	2404.13026	null
2024-04-19	Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images	Santosh et.al.	2404.12908	link
2024-04-19	ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model	Dingming Liu et.al.	2404.12903	null
2024-04-19	Generative Modelling with High-Order Langevin Dynamics	Ziqiang Shi et.al.	2404.12814	null
2024-04-19	How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples	Dren Fazlija et.al.	2404.12653	null
2024-04-18	On the Content Bias in Fréchet Video Distance	Songwei Ge et.al.	2404.12391	null
2024-04-18	RoboDreamer: Learning Compositional World Models for Robot Imagination	Siyuan Zhou et.al.	2404.12377	null
2024-04-18	AniClipart: Clipart Animation with Text-to-Video Priors	Ronghuan Wu et.al.	2404.12347	null
2024-04-18	Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models	Israel A. Laurensi et.al.	2404.12260	null
2024-04-18	First 2D electron density measurements using Coherence Imaging Spectroscopy in the MAST-U Super-X divertor	N. Lonigro et.al.	2404.12021	null
2024-04-18	©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model	Chao Zhou et.al.	2404.11962	link
2024-04-18	LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights	Thibault Castells et.al.	2404.11936	null
2024-04-18	EdgeFusion: On-Device Text-to-Image Generation	Thibault Castells et.al.	2404.11925	null
2024-04-18	TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation	Tianyi Liang et.al.	2404.11824	link
2024-04-17	On the Scalability of GNNs for Molecular Graphs	Maciej Sypetkowski et.al.	2404.11568	null
2024-04-17	MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation	Kuan-Chieh et.al.	2404.11565	null
2024-04-17	SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening	Yu Zhong et.al.	2404.11537	null
2024-04-17	Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt	Zhanjie Zhang et.al.	2404.11474	link
2024-04-17	Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks	Eri Hosonuma et.al.	2404.11280	null
2024-04-17	Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case	João Gabriel Vinholi et.al.	2404.11243	null
2024-04-17	TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing	Sherry X. Chen et.al.	2404.11120	link
2024-04-16	LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Yuchi Wang et.al.	2404.10763	link
2024-04-16	Adversarial Identity Injection for Semantic Face Image Synthesis	Giuseppe Tarollo et.al.	2404.10408	null
2024-04-16	CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout	Jiafu Wei et.al.	2404.10352	null
2024-04-17	OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model	Runyi Li et.al.	2404.10312	null
2024-04-16	OneActor: Consistent Character Generation via Cluster-Conditioned Guidance	Jiahao Wang et.al.	2404.10267	null
2024-04-16	Diffusion assisted image reconstruction in optoacoustic tomography	M. G. González et.al.	2404.10239	null
2024-04-15	MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models	Nithin Gopalakrishnan Nair et.al.	2404.09977	null
2024-04-15	Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers	Nithin Gopalakrishnan Nair et.al.	2404.09976	null
2024-04-15	Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model	Han Lin et.al.	2404.09967	null
2024-04-15	Scalable photonic diffractive generators through sampling noises from scattering medium	Ziyu Zhan et.al.	2404.09948	null
2024-04-15	Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models	Ziwei Luo et.al.	2404.09732	link
2024-04-15	In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation	Han Xue et.al.	2404.09633	null
2024-04-15	Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models	Peifei Zhu et.al.	2404.09401	null
2024-04-14	DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling	Xuening Yuan et.al.	2404.09227	null
2024-04-14	LoopAnimate: Loopable Salient Object Animation	Fanyi Wang et.al.	2404.09172	null
2024-04-13	THQA: A Perceptual Quality Assessment Database for Talking Heads	Yingjie Zhou et.al.	2404.09003	link
2024-04-13	LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field	Jiyang Li et.al.	2404.08966	link
2024-04-13	Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives	Yidan Liu et.al.	2404.08926	null
2024-04-12	E3: Ensemble of Expert Embedders for Adapting Synthetic Image Detectors to New Generators Using Limited Data	Aref Azizpour et.al.	2404.08814	link
2024-04-12	Semantic Approach to Quantifying the Consistency of Diffusion Model Image Generation	Brinnae Bent et.al.	2404.08799	link
2024-04-12	Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts	Yang Li et.al.	2404.08341	link
2024-04-11	Latent Guard: a Safety Framework for Text-to-image Generation	Runtao Liu et.al.	2404.08031	link
2024-04-11	Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models	Mazda Moayeri et.al.	2404.08030	null
2024-04-11	OpenBias: Open-set Bias Detection in Text-to-Image Generative Models	Moreno D’Incà et.al.	2404.07990	link
2024-04-11	Taming Stable Diffusion for Text to 360° Panorama Image Generation	Cheng Zhang et.al.	2404.07949	link
2024-04-11	Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models – Technical Challenges and Implications for Monitoring and Verification	Tuong Vy Nguyen et.al.	2404.07754	null
2024-04-11	Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models	Tuomas Kynkäänniemi et.al.	2404.07724	link
2024-04-11	ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation	Stanislav Frolov et.al.	2404.07564	null
2024-04-11	CAT: Contrastive Adapter Training for Personalized Image Generation	Jae Wan Park et.al.	2404.07554	link
2024-04-10	A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos	Suleyman Ozdel et.al.	2404.07351	null
2024-04-10	RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion	Jaidev Shriram et.al.	2404.07199	null
2024-04-10	A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks	Neel Mishra et.al.	2404.07172	link
2024-04-10	Fine color guidance in diffusion models and its application to image compression at extremely low bitrates	Tom Bordin et.al.	2404.06865	null
2024-04-10	UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion	Junsheng Zhou et.al.	2404.06851	null
2024-04-10	MedRG: Medical Report Grounding with Multi-modal Large Language Model	Ke Zou et.al.	2404.06798	null
2024-04-10	Deep Generative Data Assimilation in Multimodal Setting	Yongquan Qu et.al.	2404.06665	link
2024-04-09	High Noise Scheduling is a Must	Mahmut S. Gokmen et.al.	2404.06353	null
2024-04-09	DiffHarmony: Latent Diffusion Model Meets Image Harmonization	Pengfei Zhou et.al.	2404.06139	link
2024-04-09	Tackling Structural Hallucination in Image Translation with Local Diffusion	Seunghoi Kim et.al.	2404.05980	link
2024-04-09	StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion	Ming Tao et.al.	2404.05979	link
2024-04-08	SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing	Jing Gu et.al.	2404.05717	null
2024-04-08	MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation	Kunpeng Song et.al.	2404.05674	link
2024-04-08	Automatic Controllable Colorization via Imagination	Xiaoyan Cong et.al.	2404.05661	null
2024-04-08	UniFL: Improve Stable Diffusion via Unified Feedback Learning	Jiacheng Zhang et.al.	2404.05595	null
2024-04-08	Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI	Hugo Caselles-Dupré et.al.	2404.05468	null
2024-04-08	Action-conditioned video data improves predictability	Meenakshi Sarkar et.al.	2404.05439	null
2024-04-08	Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt	Zhiqi Huang et.al.	2404.05331	null
2024-04-08	MC $^2$ : Multi-concept Guidance for Customized Multi-concept Generation	Jiaxiu Jiang et.al.	2404.05268	link
2024-04-08	Text-to-Image Synthesis for Any Artistic Styles: Advancements in Personalized Artistic Image Generation via Subdivision and Dual Binding	Junseo Park et.al.	2404.05256	null
2024-04-08	A secure and private ensemble matcher using multi-vault obfuscated templates	Babak Poorebrahim Gilkalaye et.al.	2404.05205	null
2024-04-04	No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance	Vishaal Udandarao et.al.	2404.04125	link
2024-04-05	3D Facial Expressions through Analysis-by-Neural-Synthesis	George Retsinas et.al.	2404.04104	null
2024-04-05	Dynamic Prompt Optimizing for Text-to-Image Generation	Wenyi Mo et.al.	2404.04095	link
2024-04-05	Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models	Gihyun Kwon et.al.	2404.03913	null
2024-04-04	CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching	Dongzhi Jiang et.al.	2404.03653	link
2024-04-04	Reference-Based 3D-Aware Image Editing with Triplane	Bahri Batuhan Bilecen et.al.	2404.03632	null
2024-04-04	Robust Concept Erasure Using Task Vectors	Minh Pham et.al.	2404.03631	null
2024-04-04	Multi Positive Contrastive Learning with Pose-Consistent Generated Images	Sho Inayoshi et.al.	2404.03256	null
2024-04-04	Would Deep Generative Models Amplify Bias in Future Models?	Tianwei Chen et.al.	2404.03242	null
2024-04-04	Diverse and Tailored Image Generation for Zero-shot Multi-label Classification	Kaixin Zhang et.al.	2404.03144	null
2024-04-03	Many-to-many Image Generation with Auto-regressive Diffusion Models	Ying Shen et.al.	2404.03109	null
2024-04-03	Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction	Keyu Tian et.al.	2404.02905	link
2024-04-03	MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment	Duygu Ceylan et.al.	2404.02899	null
2024-04-03	On the Scalability of Diffusion-based Text-to-Image Generation	Hao Li et.al.	2404.02883	null
2024-04-03	MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation	Petru-Daniel Tudosiu et.al.	2404.02790	null
2024-04-03	InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation	Haofan Wang et.al.	2404.02733	link
2024-04-03	Model-agnostic Origin Attribution of Generated Images with Few-shot Examples	Fengyuan Liu et.al.	2404.02697	link
2024-04-03	Severity Controlled Text-to-Image Generative Model Bias Manipulation	Jordan Vice et.al.	2404.02530	null
2024-04-02	Diffusion $^2$ : Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models	Zeyu Yang et.al.	2404.02148	link
2024-04-02	3D Congealing: 3D-Aware Image Alignment in the Wild	Yunzhi Zhang et.al.	2404.02125	null
2024-04-02	CameraCtrl: Enabling Camera Control for Text-to-Video Generation	Hao He et.al.	2404.02101	link
2024-04-02	Real, fake and synthetic faces – does the coin have three sides?	Shahzeb Naeem et.al.	2404.01878	null
2024-04-02	Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model	Xu He et.al.	2404.01862	link
2024-04-02	Disentangled Pre-training for Human-Object Interaction Detection	Zhuolong Li et.al.	2404.01725	link
2024-04-01	PlayFutures: Imagining Civic Futures with AI and Puppets	Supratim Pait et.al.	2404.01527	null
2024-04-01	Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data	Matthias Gerstgrasser et.al.	2404.01413	null
2024-04-01	Evaluating Text-to-Visual Generation with Image-to-Text Generation	Zhiqiu Lin et.al.	2404.01291	link
2024-04-01	Condition-Aware Neural Network for Controlled Image Generation	Han Cai et.al.	2404.01143	null
2024-03-29	Benchmarking Counterfactual Image Generation	Thomas Melistas et.al.	2403.20287	link
2024-03-29	Motion Inversion for Video Customization	Luozhou Wang et.al.	2403.20193	null
2024-03-29	FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models	Barbara Toniella Corradini et.al.	2403.20105	null
2024-04-02	FairRAG: Fair Human Generation via Fair Retrieval Augmentation	Robik Shrestha et.al.	2403.19964	null
2024-03-28	Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks	Pooria Ashrafian et.al.	2403.19880	link
2024-03-28	Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization	Yuhang Li et.al.	2403.19866	null
2024-03-28	CLoRA: A Contrastive Approach to Compose Multiple LoRA Models	Tuna Han Salih Meral et.al.	2403.19776	null
2024-03-28	Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond	Katherine Xu et.al.	2403.19653	link
2024-03-28	GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models	Yusuf Dalva et.al.	2403.19645	null
2024-03-28	Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models	Ole Hall et.al.	2403.19620	link
2024-03-28	Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model	Zhicai Wang et.al.	2403.19600	link
2024-03-28	Frame by Familiar Frame: Understanding Replication in Video Diffusion Models	Aimon Rahman et.al.	2403.19593	null
2024-03-28	Imperceptible Protection against Style Imitation from Diffusion Models	Namhyuk Ahn et.al.	2403.19254	null
2024-03-28	Synthetic Medical Imaging Generation with Generative Adversarial Networks For Plain Radiographs	John R. McNulty et.al.	2403.19107	null
2024-03-28	Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation	Yutong He et.al.	2403.19103	null
2024-03-28	Purposeful remixing with generative AI: Constructing designer voice in multimodal composing	Xiao Tan et.al.	2403.19095	null
2024-03-27	TextCraftor: Your Text Encoder Can be Image Quality Controller	Yanyu Li et.al.	2403.18978	null
2024-03-27	Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching	Jannis Chemseddine et.al.	2403.18705	link
2024-03-27	Attention Calibration for Disentangled Text-to-Image Personalization	Yanbing Zhang et.al.	2403.18551	link
2024-03-27	DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis	Zhongxi Chen et.al.	2403.18471	link
2024-03-27	ECNet: Effective Controllable Text-to-Image Diffusion Models	Sicheng Li et.al.	2403.18417	null
2024-03-27	Ship in Sight: Diffusion Models for Ship-Image Super Resolution	Luigi Sigillo et.al.	2403.18370	link
2024-03-26	Tutorial on Diffusion Models for Imaging and Vision	Stanley H. Chan et.al.	2403.18103	null
2024-03-26	TC4D: Trajectory-Conditioned Text-to-4D Generation	Sherwin Bahmani et.al.	2403.17920	null
2024-03-26	Boosting Diffusion Models with Moving Average Sampling in Frequency Domain	Yurui Qian et.al.	2403.17870	null
2024-03-26	Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields	Rüveyda Yilmaz et.al.	2403.17808	link
2024-03-26	LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection	Yunpeng Luo et.al.	2403.17465	link
2024-03-25	DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment	Stella Bounareli et.al.	2403.17217	null
2024-03-25	FlashFace: Human Image Personalization with High-fidelity Identity Preservation	Shilong Zhang et.al.	2403.17008	null
2024-03-25	TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models	Zhongwei Zhang et.al.	2403.17005	null
2024-03-25	SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer	Rui Zhu et.al.	2403.17004	null
2024-03-25	Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation	Omer Dahary et.al.	2403.16990	null
2024-03-25	Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance	Jingyuan Zhu et.al.	2403.16954	null
2024-03-25	Iso-Diffusion: Improving Diffusion Probabilistic Models Using the Isotropy of the Additive Gaussian Noise	Dilum Fernando et.al.	2403.16790	null
2024-03-25	Multi-Scale Texture Loss for CT denoising with GANs	Francesco Di Feola et.al.	2403.16640	link
2024-03-25	SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions	Yuda Song et.al.	2403.16627	link
2024-03-25	An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models	Zizhao Hu et.al.	2403.16530	null
2024-03-25	Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation	Sanyam Lakhanpal et.al.	2403.16422	null
2024-03-25	A Survey on Long Video Generation: Challenges, Methods, and Prospects	Chengxuan Li et.al.	2403.16407	null
2024-03-25	Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation	Yingshan Chang et.al.	2403.16394	link
2024-03-25	FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models	Lin Zhao et.al.	2403.16379	null
2024-03-24	Opportunities and challenges in the application of large artificial intelligence models in radiology	Liangrui Pan et.al.	2403.16112	null
2024-03-23	Adaptive Super Resolution For One-Shot Talking-Head Generation	Luchuan Song et.al.	2403.15944	link
2024-03-23	Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content	Zhicheng Du et.al.	2403.15876	link
2024-03-22	DragAPart: Learning a Part-Level Motion Prior for Articulated Objects	Ruining Li et.al.	2403.15382	null
2024-03-22	Long-CLIP: Unlocking the Long-Text Capability of CLIP	Beichen Zhang et.al.	2403.15378	link
2024-03-22	Controlled Training Data Generation with Diffusion Models	Teresa Yeo et.al.	2403.15309	null
2024-03-22	Spectral Motion Alignment for Video Motion Transfer using Diffusion Models	Geon Yeong Park et.al.	2403.15249	null
2024-03-22	A Multimodal Approach for Cross-Domain Image Retrieval	Lucas Iijima et.al.	2403.15152	null
2024-03-22	MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration	Zhichao Wei et.al.	2403.15059	null
2024-03-22	Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning	Bumsoo Kim et.al.	2403.15048	null
2024-03-22	CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model	Seungdae Han et.al.	2403.14944	link
2024-03-22	Geometric Generative Models based on Morphological Equivariant PDEs and GANs	El Hadji S. Diop et.al.	2403.14897	null
2024-03-21	StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text	Roberto Henschel et.al.	2403.14773	link
2024-03-21	Explorative Inbetweening of Time and Space	Haiwen Feng et.al.	2403.14611	null
2024-03-21	DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing	Yueru Jia et.al.	2403.14487	link
2024-03-22	AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks	Max Ku et.al.	2403.14468	link
2024-03-21	Analysing Diffusion Segmentation for Medical Images	Mathias Öttl et.al.	2403.14440	null
2024-03-21	Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation	Mathias Öttl et.al.	2403.14429	null
2024-03-21	Enabling Visual Composition and Animation in Unsupervised Video Generation	Aram Davtyan et.al.	2403.14368	null
2024-03-21	Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models	Pablo Marcos-Manchón et.al.	2403.14291	link
2024-03-21	Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations	Xun Lin et.al.	2403.14250	null
2024-03-21	StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN	Jongwoo Choi et.al.	2403.14186	link
2024-03-21	Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition	Sihyun Yu et.al.	2403.14148	null
2024-03-20	Learning from Models and Data for Visual Grounding	Ruozhen He et.al.	2403.13804	null
2024-03-20	Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation	Fu-Yun Wang et.al.	2403.13745	link
2024-03-20	Step-Calibrated Diffusion for Biomedical Optical Image Restoration	Yiwei Lyu et.al.	2403.13680	link
2024-03-20	ReGround: Improving Textual and Spatial Grounding at No Cost	Yuseung Lee et.al.	2403.13589	null
2024-03-20	Diversity-aware Channel Pruning for StyleGAN Compression	Jiwoo Chung et.al.	2403.13548	link
2024-03-21	IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models	Siying Cui et.al.	2403.13535	null
2024-03-20	Deepfake Detection without Deepfakes: Generalization via Synthetic Frequency Patterns Injection	Davide Alessandro Coccomini et.al.	2403.13479	link
2024-03-20	S2DM: Sector-Shaped Diffusion Models for Video Generation	Haoran Lang et.al.	2403.13408	null
2024-03-20	AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation	Jingkun An et.al.	2403.13352	null
2024-03-20	TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation	Santosh Sanjeev et.al.	2403.13343	link
2024-03-19	FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis	Linjiang Huang et.al.	2403.12963	link
2024-03-19	Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties	Efrain Torres-Lomas et.al.	2403.12935	null
2024-03-19	Ultra-High-Resolution Image Synthesis with Pyramid Diffusion Model	Jiajie Yang et.al.	2403.12915	link
2024-03-19	How Spammers and Scammers Leverage AI-Generated Images on Facebook for Audience Growth	Renee DiResta et.al.	2403.12838	null
2024-03-19	Total Disentanglement of Font Images into Style and Character Class Features	Daichi Haraguchi et.al.	2403.12784	null
2024-03-19	AnimateDiff-Lightning: Cross-Model Diffusion Distillation	Shanchuan Lin et.al.	2403.12706	null
2024-03-18	Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts	Anh Bui et.al.	2403.12326	null
2024-03-18	Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat?	Melanie Mathys et.al.	2403.12207	null
2024-03-18	CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility	Bojia Zi et.al.	2403.12035	link
2024-03-18	Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation	Axel Sauer et.al.	2403.12015	null
2024-03-18	Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing	Juan Zhang et.al.	2403.11700	null
2024-03-19	Urban Scene Diffusion through Semantic Occupancy Map	Junge Zhang et.al.	2403.11697	null
2024-03-18	Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection	Julia Wolleb et.al.	2403.11667	link
2024-03-18	QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation	Zhizhen Zhou et.al.	2403.11626	null
2024-03-18	CRS-Diff: Controllable Generative Remote Sensing Foundation Model	Datao Tang et.al.	2403.11614	link
2024-03-17	StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining	Tushar Kataria et.al.	2403.11340	null
2024-03-17	Fast Personalized Text-to-Image Syntheses With Attention Injection	Yuxuan Zhang et.al.	2403.11284	null
2024-03-17	Understanding Diffusion Models by Feynman’s Path Integral	Yuji Hirono et.al.	2403.11262	null
2024-03-17	The Effects of Generative AI on Design Fixation and Divergent Thinking	Samangi Wadinambiarachchi et.al.	2403.11164	null
2024-03-17	CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion	Xiaoyu Wu et.al.	2403.11162	null
2024-03-15	Denoising Task Difficulty-based Curriculum for Training Diffusion Models	Jin-Young Kim et.al.	2403.10348	null
2024-03-15	DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers	Xuanlei Zhao et.al.	2403.10266	link
2024-03-15	Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder	Jinseok Kim et.al.	2403.10255	null
2024-03-15	Animate Your Motion: Turning Still Images into Dynamic Videos	Mingxiao Li et.al.	2403.10179	null
2024-03-15	SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model	Tao Wu et.al.	2403.10044	null
2024-03-14	SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior	Huan-ang Gao et.al.	2403.09638	null
2024-03-14	Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering	Zeyu Liu et.al.	2403.09622	null
2024-03-14	PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation	Yuhan Guo et.al.	2403.09615	null
2024-03-14	Counterfactual contrastive learning: robust representations via causal image synthesis	Melanie Roschewitz et.al.	2403.09605	link
2024-03-14	Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing	Wonjun Kang et.al.	2403.09468	link
2024-03-14	Mitigating attribute amplification in counterfactual image generation	Tian Xia et.al.	2403.09422	null
2024-03-14	Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction	Hanyu Chen et.al.	2403.09355	null
2024-03-14	Video Editing via Factorized Diffusion Distillation	Uriel Singer et.al.	2403.09334	null
2024-03-14	Noise Dimension of GAN: An Image Compression Perspective	Ziran Zhu et.al.	2403.09196	null
2024-03-14	Intention-driven Ego-to-Exo Video Generation	Hongchen Luo et.al.	2403.09194	null
2024-03-13	VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis	Enric Corona et.al.	2403.08764	null
2024-03-13	HAIFIT: Human-Centered AI for Fashion Image Translation	Jianan Jiang et.al.	2403.08651	link
2024-03-13	Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation	Tianyi Chu et.al.	2403.08294	null
2024-03-13	Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts	Yue Ma et.al.	2403.08268	link
2024-03-13	Make Me Happier: Evoking Emotions Through Image Diffusion Models	Qing Lin et.al.	2403.08255	null
2024-03-12	Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation	Shihao Zhao et.al.	2403.07860	link
2024-03-12	Quantifying and Mitigating Privacy Risks for Tabular Generative Models	Chaoyi Zhu et.al.	2403.07842	null
2024-03-12	Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model	Yuxuan Zhang et.al.	2403.07764	link
2024-03-12	Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings	Sahand Sharifzadeh et.al.	2403.07750	null
2024-03-14	Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion	Dongyang Li et.al.	2403.07721	link
2024-03-12	SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces	Yuta Oshima et.al.	2403.07711	link
2024-03-12	Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation	Michael Ogezi et.al.	2403.07605	null
2024-03-12	Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation	Likun Li et.al.	2403.07500	null
2024-03-12	Backdoor Attack with Mode Mixture Latent Modification	Hongwei Zhang et.al.	2403.07463	null
2024-03-13	DragAnything: Motion Control for Anything using Entity Representation	Weijia Wu et.al.	2403.07420	link
2024-03-11	DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation	Guosheng Zhao et.al.	2403.06845	null
2024-03-11	Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting	Wenting Chen et.al.	2403.06835	null
2024-03-11	Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection	Chuangchuang Tan et.al.	2403.06803	link
2024-03-11	FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation	Pengchong Qiao et.al.	2403.06775	link
2024-03-11	Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback	Adarsh N L et.al.	2403.06735	null
2024-03-11	Active Generation for Image Classification	Tao Huang et.al.	2403.06517	link
2024-03-11	Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning	Woojung Han et.al.	2403.06516	null
2024-03-11	3D-aware Image Generation and Editing with Multi-modal Conditions	Bo Li et.al.	2403.06470	null
2024-03-11	A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos	Weixia Zhang et.al.	2403.06421	link
2024-03-11	DivCon: Divide and Conquer for Progressive Text-to-Image Generation	Yuhao Jia et.al.	2403.06400	link
2024-03-08	Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola	Yijiang Li et.al.	2403.05523	null
2024-03-08	VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models	Yabo Zhang et.al.	2403.05438	link
2024-03-08	A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN	Cristiana Tiago et.al.	2403.05384	null
2024-03-08	Fine-tuning a Multiple Instance Learning Feature Extractor with Masked Context Modelling and Knowledge Distillation	Juan I. Pisula et.al.	2403.05325	null
2024-03-08	Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation	Junyan Wang et.al.	2403.05239	null
2024-03-08	Synthetic Privileged Information Enhances Medical Image Representation Learning	Lucas Farndale et.al.	2403.05220	null
2024-03-08	Denoising Autoregressive Representation Learning	Yazhe Li et.al.	2403.05196	null
2024-03-08	ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment	Xiwei Hu et.al.	2403.05135	null
2024-03-08	Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation	Joseph Cho et.al.	2403.05131	null
2024-03-08	Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis	Muxi Chen et.al.	2403.05125	link
2024-03-07	Photonic probabilistic machine learning using quantum vacuum noise	Seou Choi et.al.	2403.04731	null
2024-03-07	PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation	Junsong Chen et.al.	2403.04692	link
2024-03-07	Pix2Gif: Motion-Guided Diffusion for GIF Generation	Hitesh Kandala et.al.	2403.04634	link
2024-03-07	Discriminative Probing and Tuning for Text-to-Image Generation	Leigang Qu et.al.	2403.04321	null
2024-03-06	PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement	Zhijie Wang et.al.	2403.04014	link
2024-03-06	Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer	Naifu Xue et.al.	2403.03736	null
2024-03-06	Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving	He Li et.al.	2403.03541	null
2024-03-06	NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging	Takahiro Shirakawa et.al.	2403.03485	link
2024-03-07	DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network	Xiangquan Gui et.al.	2403.03456	null
2024-03-06	Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing	Bingyan Liu et.al.	2403.03431	null
2024-03-05	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis	Patrick Esser et.al.	2403.03206	null
2024-03-05	Behavior Generation with Latent Actions	Seungjae Lee et.al.	2403.03181	link
2024-03-05	Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation	Weijie Li et.al.	2403.02827	null
2024-03-05	Bias in Generative AI	Mi Zhou et.al.	2403.02726	null
2024-03-04	Transformer for Times Series: an Application to the S&P500	Pierre Brugiere et.al.	2403.02523	null
2024-03-04	NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function	Abdullah Nazhat Abdullah et.al.	2403.02411	link
2024-03-05	UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control	Xuweiyi Chen et.al.	2403.02332	link
2024-03-04	ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models	Jiaxiang Cheng et.al.	2403.02084	link
2024-03-04	ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models	Lukas Höllein et.al.	2403.01807	link
2024-03-05	AtomoVideo: High Fidelity Image-to-Video Generation	Litong Gong et.al.	2403.01800	null
2024-03-02	Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models	Neta Shaul et.al.	2403.01329	null
2024-03-02	SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code	Ziniu Hu et.al.	2403.01248	null
2024-03-02	TCIG: Two-Stage Controlled Image Generation with Quality Enhancement through Diffusion	Salaheldin Mohamed et.al.	2403.01212	null
2024-03-01	Improving Android Malware Detection Through Data Augmentation Using Wasserstein Generative Adversarial Networks	Kawana Stalin et.al.	2403.00890	null
2024-03-01	Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset	Ander Salaberria et.al.	2403.00587	link
2024-03-01	Rethinking cluster-conditioned diffusion models	Nikolas Adaloglou et.al.	2403.00570	link
2024-03-01	VisionLLaMA: A Unified LLaMA Interface for Vision Tasks	Xiangxiang Chu et.al.	2403.00522	link
2024-03-01	An Ordinal Diffusion Model for Generating Medical Images with Different Severity Levels	Shumpei Takezaki et.al.	2403.00452	null
2024-03-01	Abductive Ego-View Accident Video Understanding for Safe Driving Perception	Jianwu Fang et.al.	2403.00436	null
2024-02-29	Learning to Find Missing Video Frames with Synthetic Data Augmentation: A General Framework and Application in Generating Thermal Images Using RGB Cameras	Mathias Viborg Andersen et.al.	2403.00196	null
2024-02-29	Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers	Tsai-Shien Chen et.al.	2402.19479	null
2024-02-29	A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation	Hanxi Li et.al.	2402.19330	link
2024-02-29	Disentangling representations of retinal images with generative models	Sarah Müller et.al.	2402.19186	link
2024-02-29	Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection	Christos Koutlis et.al.	2402.19091	link
2024-02-29	WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis	Paul Friedrich et.al.	2402.19043	link
2024-02-29	ViewFusion: Towards Multi-View Consistency via Interpolated Denoising	Xianghui Yang et.al.	2402.18842	link
2024-02-29	A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D	Xiaohan Fei et.al.	2402.18780	null
2024-02-28	FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes	Ziying Pan et.al.	2402.18331	link
2024-02-28	Balancing Act: Distribution-Guided Debiasing in Diffusion Models	Rishubh Parihar et.al.	2402.18206	null
2024-02-28	VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation	Tao Peng et.al.	2402.18189	link
2024-02-28	Block and Detail: Scaffolding Sketch-to-Image Generation	Vishnu Sarukkai et.al.	2402.18116	null
2024-02-28	Context-aware Talking Face Video Generation	Meidai Xuanyuan et.al.	2402.18092	null
2024-02-28	Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis	Yanzuo Lu et.al.	2402.18078	link
2024-02-27	Structure-Guided Adversarial Training of Diffusion Models	Ling Yang et.al.	2402.17563	null
2024-02-27	Diffusion Model-Based Image Editing: A Survey	Yi Huang et.al.	2402.17525	link
2024-02-27	EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions	Linrui Tian et.al.	2402.17485	null
2024-02-27	Sora Generates Videos with Stunning Geometrical Consistency	Xuanyi Li et.al.	2402.17403	null
2024-02-27	Accelerating Diffusion Sampling with Optimized Time Steps	Shuchen Xue et.al.	2402.17376	link
2024-02-27	Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation	Daiqing Li et.al.	2402.17245	null
2024-02-27	Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System	Majid Memari et.al.	2402.17204	null
2024-02-27	Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models	Yixin Liu et.al.	2402.17177	link
2024-02-27	Video as the New Language for Real-World Decision Making	Sherry Yang et.al.	2402.17139	null
2024-02-27	Transparent Image Layer Diffusion using Latent Transparency	Lvmin Zhang et.al.	2402.17113	link
2024-02-26	Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion	Xuantong Liu et.al.	2402.16305	null
2024-02-25	Towards Efficient Quantum Hybrid Diffusion Models	Francesca De Falco et.al.	2402.16147	null
2024-02-23	Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition	Chun-Hsiao Yeh et.al.	2402.15504	link
2024-02-23	BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators	Yu Tian et.al.	2402.15218	null
2024-02-23	The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling	Jiajun Ma et.al.	2402.15170	null
2024-02-22	Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis	Willi Menapace et.al.	2402.14797	null
2024-02-22	Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models	Yixuan Ren et.al.	2402.14780	null
2024-02-25	Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening	Zhenrong Shen et.al.	2402.14707	null
2024-02-22	Visual Hallucinations of Multi-modal Large Language Models	Wen Huang et.al.	2402.14683	link
2024-02-22	MVD $^2$ : Efficient Multiview 3D Reconstruction for Multiview Diffusion	Xin-Yang Zheng et.al.	2402.14253	null
2024-02-21	T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching	Zizheng Pan et.al.	2402.14167	link
2024-02-21	SDXL-Lightning: Progressive Adversarial Diffusion Distillation	Shanchuan Lin et.al.	2402.13929	null
2024-02-21	SRNDiff: Short-term Rainfall Nowcasting with Condition Diffusion Model	Xudong Ling et.al.	2402.13737	link
2024-02-21	Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation	Kihong Kim et.al.	2402.13729	null
2024-02-21	Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving	Mehdi Azarafza et.al.	2402.13602	link
2024-02-21	Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models	Chen Wu et.al.	2402.13490	null
2024-02-20	Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control	Denis Lukovnikov et.al.	2402.13404	null
2024-02-20	CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples	Jianrui Zhang et.al.	2402.13254	link
2024-02-20	UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing	Jianhong Bai et.al.	2402.13185	null
2024-02-20	Neural Network Diffusion	Kai Wang et.al.	2402.13144	link
2024-02-20	VGMShield: Mitigating Misuse of Video Generative Models	Yan Pang et.al.	2402.13126	link
2024-02-20	Visual Style Prompting with Swapping Self-Attention	Jaeseok Jeong et.al.	2402.12974	link
2024-02-20	RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models	Xinchen Zhang et.al.	2402.12908	link
2024-02-20	Two-stage Rainfall-Forecasting Diffusion Model	XuDong Ling et.al.	2402.12779	link
2024-02-20	MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion	Sen Li et.al.	2402.12741	link
2024-02-20	MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction	Shitao Tang et.al.	2402.12712	null
2024-02-19	The (R)Evolution of Multimodal Large Language Models: A Survey	Davide Caffagni et.al.	2402.12451	null
2024-02-19	Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability	Xuelin Qian et.al.	2402.12225	null
2024-02-19	Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation	Yi Liu et.al.	2402.12100	null
2024-02-19	DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation	Chong Zeng et.al.	2402.11929	link
2024-02-18	SDiT: Spiking Diffusion Model with Transformer	Shu Yang et.al.	2402.11588	null
2024-02-18	Visual Concept-driven Image Generation with Text-to-Image Diffusion Model	Tanzila Rahman et.al.	2402.11487	null
2024-02-18	Deep learning methods for Hamiltonian parameter estimation and magnetic domain image generation in twisted van der Waals magnets	Woo Seok Lee et.al.	2402.11434	null
2024-02-17	TC-DiffRecon: Texture coordination MRI reconstruction method based on diffusion model and modified MF-UNet method	Chenyan Zhang et.al.	2402.11274	link
2024-02-16	The Male CEO and the Female Assistant: Probing Gender Biases in Text-To-Image Models Through Paired Stereotype Test	Yixin Wan et.al.	2402.11089	null
2024-02-16	Universal Prompt Optimizer for Safe Text-to-Image Generation	Zongyu Wu et.al.	2402.10882	link
2024-02-16	Exploring Precision and Recall to assess the quality and diversity of LLMs	Le Bronnec Florian et.al.	2402.10693	link
2024-02-16	Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation	Lanqing Guo et.al.	2402.10491	link
2024-02-16	UMAIR-FPS: User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style	Yan Kang et.al.	2402.10381	link
2024-02-15	Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation	Huizhuo Yuan et.al.	2402.10210	null
2024-02-15	Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning	Euclid Collaboration et.al.	2402.10187	link
2024-02-15	Classification Diffusion Models	Shahar Yadin et.al.	2402.10095	null
2024-02-15	Accelerating Parallel Sampling of Diffusion Models	Zhiwei Tang et.al.	2402.09970	link
2024-02-15	Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation	Junjie Shentu et.al.	2402.09966	link
2024-02-14	Magic-Me: Identity-Specific Video Customized Diffusion	Ze Ma et.al.	2402.09368	link
2024-02-14	Switch EMA: A Free Lunch for Better Flatness and Sharpness	Siyuan Li et.al.	2402.09240	link
2024-02-14	L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects	Yutaro Yamada et.al.	2402.09052	null
2024-02-14	Multi-modality transrectal ultrasound vudei classification for identification of clinically significant prostate cancer	Hong Wu et.al.	2402.08987	link
2024-02-13	Towards the Detection of AI-Synthesized Human Face Images	Yuhang Lu et.al.	2402.08750	null
2024-02-13	IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation	Luke Melas-Kyriazi et.al.	2402.08682	null
2024-02-13	Learning Continuous 3D Words for Text-to-Image Generation	Ta-Ying Cheng et.al.	2402.08654	link
2024-02-13	Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models	Jason Tang et.al.	2402.08532	null
2024-02-12	Using AI for Wavefront Estimation with the Rubin Observatory Active Optics System	John Franklin Crenshaw et.al.	2402.08094	null
2024-02-14	Score-based generative models break the curse of dimensionality in learning a family of sub-Gaussian probability distributions	Frank Cole et.al.	2402.08082	null
2024-02-12	Trustworthy SR: Resolving Ambiguity in Image Super-resolution via Diffusion Models and Human Feedback	Cansu Korkmaz et.al.	2402.07597	null
2024-02-11	The Aleph & Other Metaphors for Image Generation	Gonzalo Ramos et.al.	2402.07104	null
2024-02-10	Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework	Mengqi Wu et.al.	2402.06875	null
2024-02-09	Cardiac ultrasound simulation for autonomous ultrasound navigation	Abdoul Aziz Amadou et.al.	2402.06463	null
2024-02-08	Collaborative Control for Geometry-Conditioned PBR Image Generation	Shimon Vainer et.al.	2402.05919	null
2024-02-08	Scalable Diffusion Models with State Space Backbone	Zhengcong Fei et.al.	2402.05608	link
2024-02-08	Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application	Bumsoo Kim et.al.	2402.05448	null
2024-02-08	Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport	Jaemoo Choi et.al.	2402.05443	null
2024-02-09	Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models	Nicholas Konz et.al.	2402.05210	link
2024-02-07	ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12	Liuqing Chen et.al.	2402.04975	null
2024-02-07	Text2Street: Controllable Text-to-image Generation for Street Views	Jinming Su et.al.	2402.04504	null
2024-02-07	ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation	Jirayu Burapacheep et.al.	2402.04492	link
2024-02-06	Denoising Diffusion Probabilistic Models in Six Simple Steps	Richard E. Turner et.al.	2402.04384	null
2024-02-06	ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation	Weiming Ren et.al.	2402.04324	link
2024-02-06	QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning	Haoxuan Wang et.al.	2402.03666	link
2024-02-05	Projected Generative Diffusion Models for Constraint Satisfaction	Jacob K Christopher et.al.	2402.03559	link
2024-02-05	Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images	Xiaodan Xing et.al.	2402.03473	null
2024-02-05	Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?	Qiyao Liang et.al.	2402.03305	null
2024-02-05	InstanceDiffusion: Instance-level Control for Image Generation	Xudong Wang et.al.	2402.03290	link
2024-02-05	Training-Free Consistent Text-to-Image Generation	Yoad Tewel et.al.	2402.03286	null
2024-02-05	IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR images	Vincent Roca et.al.	2402.03227	link
2024-02-05	Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion	Shiyuan Yang et.al.	2402.03162	null
2024-02-05	InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions	Yiyuan Zhang et.al.	2402.03040	link
2024-02-05	SynthVision – Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data	Yudara Kularathne et.al.	2402.02826	null
2024-02-04	DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing	Chong Mou et.al.	2402.02583	link
2024-02-04	M $^3$ Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing	Mohammadreza Mofayezi et.al.	2402.02369	null
2024-02-03	DeCoF: Generated Video Detection via Frame Consistency	Long Ma et.al.	2402.02085	link
2024-02-02	NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties	Jingyuan Sun et.al.	2402.01590	null
2024-02-02	The galactic bubbles of starburst galaxies The influence of galactic large-scale magnetic fields	Z. Meliani et.al.	2402.01541	null
2024-02-02	Cross-view Masked Diffusion Transformers for Person Image Synthesis	Trung X. Pham et.al.	2402.01516	link
2024-02-02	Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors	Dingcheng Yang et.al.	2402.01369	link
2024-02-02	Can MLLMs Perform Text-to-Image In-Context Learning?	Yuchen Zeng et.al.	2402.01293	link
2024-02-02	Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?	Cristian Sbrolli et.al.	2402.01241	null
2024-02-01	AI-generated faces free from racial and gender stereotypes	Nouar AlDahoul et.al.	2402.01002	link
2024-02-01	Examining the Influence of Digital Phantom Models in Virtual Imaging Trials for Tomographic Breast Imaging	Amar Kavuri et.al.	2402.00812	null
2024-02-01	AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning	Fu-Yun Wang et.al.	2402.00769	link
2024-02-01	DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras	Weixing Xie et.al.	2402.00740	null
2024-01-31	SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation	Yang Li et.al.	2402.00631	null
2024-02-01	CapHuman: Capture Your Moments in Parallel Universes	Chao Liang et.al.	2402.00627	link
2024-02-01	Masked Conditional Diffusion Model for Enhancing Deepfake Detection	Tiewen Chen et.al.	2402.00541	null
2024-02-01	High-Quality Medical Image Generation from Free-hand Sketch	Quan Huu Cap et.al.	2402.00353	null
2024-02-01	Machine Unlearning for Image-to-Image Generative Models	Guihong Li et.al.	2402.00351	link
2024-01-31	Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation	Yuanhuiyi Lyu et.al.	2401.17664	null
2024-01-31	Head and Neck Tumor Segmentation from [18F]F-FDG PET/CT Images Based on 3D Diffusion Model	Yafei Dong et.al.	2401.17593	null
2024-01-31	Task-Oriented Diffusion Model Compression	Geonung Kim et.al.	2401.17547	null
2024-01-31	Fréchet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels	Negar Arabzadeh et.al.	2401.17543	null
2024-01-30	OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision	Bruno Berenguel-Baeta et.al.	2401.17061	link
2024-01-30	Repositioning the Subject within Image	Yikai Wang et.al.	2401.16861	link
2024-01-30	X-ray Image Generation as a Method of Performance Prediction for Real-Time Inspection: a Case Study	Vladyslav Andriiashen et.al.	2401.16847	link
2024-01-29	Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors	Shiyin Dong et.al.	2401.16459	null
2024-01-29	Spatial-Aware Latent Initialization for Controllable Image Generation	Wenqiang Sun et.al.	2401.16157	null
2024-01-31	Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You	Felix Friedrich et.al.	2401.16092	link
2024-01-29	Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling	Xiaoyu Shi et.al.	2401.15977	null
2024-01-29	Diffusion Facial Forgery Detection	Harry Cheng et.al.	2401.15859	link
2024-01-29	2L3: Lifting Imperfect Generated 2D Images into Accurate 3D	Yizheng Chen et.al.	2401.15841	null
2024-01-28	Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding	Jianxiang Lu et.al.	2401.15708	null
2024-01-28	Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation	Zhenyu Wang et.al.	2401.15688	null
2024-01-28	IntentTuner: An Interactive Framework for Integrating Human Intents in Fine-tuning Text-to-Image Generative Models	Xingchen Zeng et.al.	2401.15559	null
2024-01-27	GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis	Jing Hao et.al.	2401.15282	link
2024-01-26	Annotated Hands for Generative Models	Yue Yang et.al.	2401.15075	link
2024-01-26	Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support	Xiaojun Wu et.al.	2401.14688	link
2024-01-25	Deconstructing Denoising Diffusion Models for Self-Supervised Learning	Xinlei Chen et.al.	2401.14404	null
2024-01-25	UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation and Diffusion Models	Timo Kapsalis et.al.	2401.14379	null
2024-01-26	Image Synthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs	Rameshwar Mishra et.al.	2401.14111	null
2024-01-25	CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion	Nisha Huang et.al.	2401.14066	link
2024-01-25	Diffusion-based Data Augmentation for Object Counting Problems	Zhen Wang et.al.	2401.13992	null
2024-01-25	Learning to Manipulate Artistic Images	Wei Guo et.al.	2401.13976	link
2024-01-25	BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models	Senthil Purushwalkam et.al.	2401.13974	link
2024-01-25	A New Image Quality Database for Multiple Industrial Processes	Xuanchao Ma et.al.	2401.13956	null
2024-01-25	StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models	Yalong Bai et.al.	2401.13942	null
2024-01-24	Research about the Ability of LLM in the Tamper-Detection Area	Xinyu Yang et.al.	2401.13504	null
2024-01-24	UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion	Wei Li et.al.	2401.13388	null
2024-01-24	Deep Learning for Improved Polyp Detection from Synthetic Narrow-Band Imaging	Mathias Ramm Haugland et.al.	2401.13315	null
2024-01-24	Choose Your Diffusion: Efficient and flexible ways to accelerate the diffusion model in fast high energy physics simulation	Cheng Jiang et.al.	2401.13162	null
2024-01-23	CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data	Chandrakanth Gudavalli et.al.	2401.13006	null
2024-01-23	Lumiere: A Space-Time Diffusion Model for Video Generation	Omer Bar-Tal et.al.	2401.12945	null
2024-01-23	A Unified Generation-Registration Framework for Improved MR-based CT Synthesis in Proton Therapy	Xia Li et.al.	2401.12878	null
2024-01-23	UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators	Hengjia Li et.al.	2401.12596	null
2024-01-23	The Neglected Tails of Vision-Language Models	Shubham Parashar et.al.	2401.12425	null
2024-01-20	Large-scale Reinforcement Learning for Diffusion Models	Yinan Zhang et.al.	2401.12244	null
2024-01-23	Control of OSIRIS-REx OTES Observations using OCAMS TAG Images	Kris J. Becker et.al.	2401.12177	null
2024-01-22	Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs	Ling Yang et.al.	2401.11708	link
2024-01-21	Text-to-Image Cross-Modal Generation: A Systematic Review	Maciej Żelaszczyk et.al.	2401.11631	null
2024-01-21	Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers	Katherine Crowson et.al.	2401.11605	link
2024-01-19	Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion	Zuoyue Li et.al.	2401.10786	null
2024-01-18	Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution	Xin Yuan et.al.	2401.10404	null
2024-01-22	Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation	Changgu Chen et.al.	2401.10150	null
2024-01-18	DiffusionGPT: LLM-Driven Text-to-Image Generation System	Jie Qin et.al.	2401.10061	null
2024-01-18	WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens	Xiaofeng Wang et.al.	2401.09985	null
2024-01-18	CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects	Zhao Wang et.al.	2401.09962	null
2024-01-17	MITS-GAN: Safeguarding Medical Imaging from Tampering with Generative Adversarial Networks	Giovanni Pasqualino et.al.	2401.09624	link
2024-01-17	Efficient generative adversarial networks using linear additive-attention Transformers	Emilio Morales-Juarez et.al.	2401.09596	link
2024-01-17	Vlogger: Make Your Dream A Vlog	Shaobin Zhuang et.al.	2401.09414	link
2024-01-17	UniVG: Towards UNIfied-modal Video Generation	Ludan Ruan et.al.	2401.09084	null
2024-01-17	VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models	Haoxin Chen et.al.	2401.09047	link
2024-01-16	Fast Dynamic 3D Object Generation from a Single-view Video	Zijie Pan et.al.	2401.08742	link
2024-01-16	Fixed Point Diffusion Models	Xingjian Bai et.al.	2401.08741	link
2024-01-16	Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks	Chenyu Zhang et.al.	2401.08725	link
2024-01-16	Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data	Yuhui Zhang et.al.	2401.08567	link
2024-01-16	Instilling Multi-round Thinking to Text-guided Image Generation	Lidong Zeng et.al.	2401.08472	null
2024-01-16	Key-point Guided Deformable Image Manipulation Using Diffusion Model	Seok-Hwan Oh et.al.	2401.08178	null
2024-01-16	E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning	Qiang Qu et.al.	2401.08117	link
2024-01-16	SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation	Zhixuan Liu et.al.	2401.08053	null
2024-01-15	Towards A Better Metric for Text-to-Video Generation	Jay Zhangjie Wu et.al.	2401.07781	null
2024-01-15	Collaboratively Self-supervised Video Representation Learning for Action Recognition	Jie Zhang et.al.	2401.07584	null
2024-01-15	InstantID: Zero-shot Identity-Preserving Generation in Seconds	Qixun Wang et.al.	2401.07519	link
2024-01-14	Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs	Viktor Seib et.al.	2401.07370	null
2024-01-13	Quantum Denoising Diffusion Models	Michael Kölle et.al.	2401.07049	null
2024-01-13	Progressive Feature Fusion Network for Enhancing Image Quality Assessment	Kaiqun Wu et.al.	2401.06992	null
2024-01-12	360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model	Qian Wang et.al.	2401.06578	null
2024-01-12	Beyond the Surface: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation	Akshita Jha et.al.	2401.06310	link
2024-01-11	Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications	Yuwen Xiong et.al.	2401.06197	link
2024-01-10	AI Art is Theft: Labour, Extraction, and Exploitation, Or, On the Dangers of Stochastic Pollocks	Trystan S. Goetze et.al.	2401.06178	null
2024-01-11	RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks	Partha Ghosh et.al.	2401.06035	null
2024-01-11	EraseDiff: Erasing Data Influence in Diffusion Models	Jing Wu et.al.	2401.05779	link
2024-01-11	Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification	Kunpeng Qiu et.al.	2401.05771	link
2024-01-11	Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation	Seung Hyun Lee et.al.	2401.05675	null
2024-01-10	From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho Heritage	Marcellus Amadeus et.al.	2401.05520	null
2024-01-10	PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models	Junsong Chen et.al.	2401.05252	link
2024-01-09	Content-Conditioned Generation of Stylized Free hand Sketches	Jiajun Liu et.al.	2401.04739	null
2024-01-09	Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks	Tanmay Garg et.al.	2401.04647	null
2024-01-09	EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models	Jingyuan Yang et.al.	2401.04608	null
2024-01-09	Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models	Xuewen Liu et.al.	2401.04585	link
2024-01-09	Let’s Go Shopping (LGS) – Web-Scale Image-Text Dataset for Visual Concept Understanding	Yatong Bai et.al.	2401.04575	null
2024-01-09	MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation	Weimin Wang et.al.	2401.04468	null
2024-01-09	Vision Reimagined: AI-Powered Breakthroughs in WiFi Indoor Imaging	Jianyang Shi et.al.	2401.04317	null
2024-01-07	A Classification of Critical Configurations for any Number of Projective Views	Martin Bråtelund et.al.	2401.03450	link
2024-01-05	Latte: Latent Diffusion Transformer for Video Generation	Xin Ma et.al.	2401.03048	link
2024-01-05	Dataset of turbulent flow over interacting barchan dunes	Jimmy Gabriel Alvarez et.al.	2401.03032	null
2024-01-04	VASE: Object-Centric Appearance and Shape Manipulation of Real Videos	Elia Peruzzo et.al.	2401.02473	null
2024-01-04	Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection	Yabin Wang et.al.	2401.02335	link
2024-01-04	Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and Geometry	Xinzhe Luo et.al.	2401.02141	null
2024-01-04	Improving Diffusion-Based Image Synthesis with Context Prediction	Ling Yang et.al.	2401.02015	null
2024-01-03	Instruct-Imagen: Image Generation with Multi-modal Instruction	Hexiang Hu et.al.	2401.01952	null
2024-01-03	Can We Generate Realistic Hands Only Using Convolution?	Mehran Hosseini et.al.	2401.01951	null
2024-01-03	A Vision Check-up for Language Models	Pratyusha Sharma et.al.	2401.01862	null
2024-01-03	Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions	David Junhao Zhang et.al.	2401.01827	link
2024-01-03	aMUSEd: An Open MUSE Reproduction	Suraj Patil et.al.	2401.01808	link
2024-01-03	Few-shot Image Generation via Information Transfer from the Built Geodesic Surface	Yuexing Han et.al.	2401.01749	null
2024-01-03	An Edge-Cloud Collaboration Framework for Generative AI Service Provision with Synergetic Big Cloud Model and Small Edge Models	Yuqing Tian et.al.	2401.01666	null
2024-01-03	AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI	Fanda Fan et.al.	2401.01651	link
2024-01-02	VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics	Ammar A. Siddiqui et.al.	2401.01414	null
2024-01-02	VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM	Fuchen Long et.al.	2401.01256	link
2024-01-02	Joint Generative Modeling of Scene Graphs and Images via Diffusion Models	Bicheng Xu et.al.	2401.01130	null
2024-01-02	SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM	Weijin Cheng et.al.	2401.01128	null
2023-12-31	TrailBlazer: Trajectory Control for Diffusion-Based Video Generation	Wan-Duo Kurt Ma et.al.	2401.00896	null
2023-12-30	Improving the Stability of Diffusion Models for Content Consistent Super-Resolution	Lingchen Sun et.al.	2401.00877	link
2024-01-01	New Job, New Gender? Measuring the Social Bias in Image Generation Models	Wenxuan Wang et.al.	2401.00763	link
2024-01-01	DiffMorph: Text-less Image Morphing with Diffusion Models	Shounak Chatterjee et.al.	2401.00739	null
2023-12-31	Generative Model-Driven Synthetic Training Image Generation: An Approach to Cognition in Rail Defect Detection	Rahatara Ferdousi et.al.	2401.00393	link
2023-12-30	GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image Generation	M. AbdulRazek et.al.	2401.00314	null
2023-12-30	CamPro: Camera-based Anti-Facial Recognition	Wenjun Zhu et.al.	2401.00151	link
2023-12-27	RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement	Fan Shi et.al.	2312.17274	null
2023-12-28	Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action	Jiasen Lu et.al.	2312.17172	link
2023-12-27	Prompt Expansion for Adaptive Text-to-Image Generation	Siddhartha Datta et.al.	2312.16720	null
2023-12-27	I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models	Xun Guo et.al.	2312.16693	link
2023-12-27	Participatory prompting: a user-centric research method for eliciting AI assistance opportunities in knowledge workflows	Advait Sarkar et.al.	2312.16633	null
2023-12-27	A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss	Xiao Fang et.al.	2312.16498	link
2023-12-29	PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion	Guansong Lu et.al.	2312.16486	null
2023-12-27	Bellman Optimal Step-size Straightening of Flow-Matching Models	Bao Nguyen et.al.	2312.16414	link
2023-12-26	SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation	Yuxuan Zhang et.al.	2312.16272	link
2023-12-26	One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications	Mengyao Lyu et.al.	2312.16145	null
2023-12-26	Semantic Guidance Tuning for Text-To-Image Diffusion Models	Hyun Kang et.al.	2312.15964	link
2023-12-26	Cross Initialization for Personalized Text-to-Image Generation	Lianyu Pang et.al.	2312.15905	link
2023-12-25	A Recipe for Scaling up Text-to-Video Generation with Text-free Videos	Xiang Wang et.al.	2312.15770	null
2023-12-25	High-Fidelity Diffusion-based Image Editing	Chen Hou et.al.	2312.15707	null
2023-12-24	Make-A-Character: High Quality Text-to-3D Character Generation within Minutes	Jianqiang Ren et.al.	2312.15430	null
2023-12-23	Prompt-Propose-Verify: A Reliable Hand-Object-Interaction Data Generation Framework using Foundational Models	Gurusha Juneja et.al.	2312.15247	null
2023-12-22	Generative AI and the History of Architecture	Joern Ploennigs et.al.	2312.15106	null
2023-12-22	Emage: Non-Autoregressive Text-to-Image Generation	Zhangyin Feng et.al.	2312.14988	null
2023-12-22	VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation	Max Ku et.al.	2312.14867	link
2023-12-22	Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks	Haz Sameen Shahgir et.al.	2312.14440	link
2023-12-21	Fine-grained Forecasting Models Via Gaussian Process Blurring Effect	Sepideh Koohfar et.al.	2312.14280	link
2023-12-21	VCoder: Versatile Vision Encoders for Multimodal Large Language Models	Jitesh Jain et.al.	2312.14233	link
2023-12-21	Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation	Nina Weng et.al.	2312.14223	link
2023-12-21	VideoPoet: A Large Language Model for Zero-Shot Video Generation	Dan Kondratyuk et.al.	2312.14125	null
2023-12-21	Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models	Huan Ling et.al.	2312.13763	null
2023-12-21	DreamTuner: Single Image is Enough for Subject-Driven Generation	Miao Hua et.al.	2312.13691	null
2023-12-21	Free-Editor: Zero-shot Text-driven 3D Scene Editing	Nazmul Karim et.al.	2312.13663	link
2023-12-21	Diff-Oracle: Diffusion Model for Oracle Character Generation with Controllable Styles and Contents	Jing Li et.al.	2312.13631	null
2023-12-21	Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting	Junwu Zhang et.al.	2312.13271	link
2023-12-20	Conditional Image Generation with Pretrained Generative Model	Rajesh Shrestha et.al.	2312.13253	null
2023-12-21	Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation	Hongtao Wu et.al.	2312.13139	null
2023-12-20	Quantifying Bias in Text-to-Image Generative Models	Jordan Vice et.al.	2312.13053	null
2023-12-20	A self-attention-based differentially private tabular GAN with high data utility	Zijian Li et.al.	2312.13031	null
2023-12-20	All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models	Seunghoo Hong et.al.	2312.12807	null
2023-12-19	Surf-CDM: Score-Based Surface Cold-Diffusion Model For Medical Image Segmentation	Fahim Ahmed Zaman et.al.	2312.12649	null
2023-12-19	RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing	Shutong Jin et.al.	2312.12635	null
2023-12-19	On Inference Stability for Diffusion Models	Viet Nguyen et.al.	2312.12431	link
2023-12-19	Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models	Shweta Mahajan et.al.	2312.12416	null
2023-12-19	Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model	Lingjun Zhang et.al.	2312.12232	link
2023-12-19	Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method	Jiachun Pan et.al.	2312.12030	link
2023-12-19	Decoupled Textual Embeddings for Customized Image Generation	Yufei Cai et.al.	2312.11826	link
2023-12-18	Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics	Yesukhei Jagvaral et.al.	2312.11707	null
2023-12-18	SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing	Zeyinzi Jiang et.al.	2312.11392	link
2023-12-18	Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model	Decheng Liu et.al.	2312.11285	link
2023-12-18	MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising	Bingyuan Wang et.al.	2312.10899	null
2023-12-18	The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses	Mahmoud Ahmed et.al.	2312.10854	null
2023-12-17	VidToMe: Video Token Merging for Zero-Shot Video Editing	Xirui Li et.al.	2312.10656	link
2023-12-17	Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability	Jaehui Hwang et.al.	2312.10634	null
2023-12-16	Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection	Chuangchuang Tan et.al.	2312.10461	link
2023-12-16	DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content	Wentao Wang et.al.	2312.10407	link
2023-12-16	Fusing Conditional Submodular GAN and Programmatic Weak Supervision	Kumar Shubham et.al.	2312.10366	link
2023-12-16	Operator-learning-inspired Modeling of Neural Ordinary Differential Equations	Woojin Cho et.al.	2312.10274	null
2023-12-15	Rich Human Feedback for Text-to-Image Generation	Youwei Liang et.al.	2312.10240	link
2023-12-15	Data-Efficient Multimodal Fusion on a Single GPU	Noël Vouitsis et.al.	2312.10144	link
2023-12-15	Data and Approaches for German Text simplification – towards an Accessibility-enhanced Communication	Thorben Schomacker et.al.	2312.09966	null
2023-12-14	High-Resolution Maps of Left Atrial Displacements and Strains Estimated with 3D CINE MRI and Unsupervised Neural Networks	Christoforos Galazis et.al.	2312.09387	link
2023-12-14	ArchiGuesser – AI Art Architecture Educational Game	Joern Ploennigs et.al.	2312.09334	link
2023-12-14	LIME: Localized Image Editing via Attention Regularization in Diffusion Models	Enis Simsar et.al.	2312.09256	null
2023-12-14	FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection	Hongsuk Choi et.al.	2312.09252	null
2023-12-14	VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation	Jinguo Zhu et.al.	2312.09251	link
2023-12-14	Fast Sampling via De-randomization for Discrete Diffusion Models	Zixiang Chen et.al.	2312.09193	null
2023-12-14	VideoLCM: Video Latent Consistency Model	Xiang Wang et.al.	2312.09109	null
2023-12-13	SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance	Yuanyou Xu et.al.	2312.08889	null
2023-12-14	Agent Attention: On the Integration of Softmax and Linear Attention	Dongchen Han et.al.	2312.08874	link
2023-12-14	Local Conditional Controlling for Text-to-Image Diffusion Models	Yibo Zhao et.al.	2312.08768	link
2023-12-13	A Survey of Generative AI for Intelligent Transportation Systems	Huan Yan et.al.	2312.08248	null
2023-12-13	Black-box Membership Inference Attacks against Fine-tuned Diffusion Models	Yan Pang et.al.	2312.08207	link
2023-12-13	$ρ$ -Diffusion: A diffusion-based density estimation framework for computational physics	Maxwell X. Cai et.al.	2312.08153	link
2023-12-13	Clockwork Diffusion: Efficient Generation With Model-Step Distillation	Amirhossein Habibian et.al.	2312.08128	link
2023-12-13	3DGEN: A GAN-based approach for generating novel 3D models from image data	Antoine Schnepf et.al.	2312.08094	null
2023-12-13	Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision	Shengguang Wu et.al.	2312.08056	null
2023-12-13	AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing	Zhiyuan Ma et.al.	2312.08019	link
2023-12-13	Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users	Tianxun Zhou et.al.	2312.07854	null
2023-12-13	Stable Rivers: A Case Study in the Application of Text-to-Image Generative Models for Earth Sciences	C Kupferschmidt et.al.	2312.07833	null
2023-12-12	FreeInit: Bridging Initialization Gap in Video Diffusion Models	Tianxing Wu et.al.	2312.07537	link
2023-12-12	FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition	Sicheng Mo et.al.	2312.07536	null
2023-12-12	PEEKABOO: Interactive Video Generation via Masked-Diffusion	Yash Jain et.al.	2312.07509	link
2023-12-12	How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Zhongyi Han et.al.	2312.07424	link
2023-12-12	DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing	Kaiwen Zhang et.al.	2312.07409	null
2023-12-12	Learned representation-guided diffusion models for large-image generation	Alexandros Graikos et.al.	2312.07330	link
2023-12-12	Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description	Mianzhi Pan et.al.	2312.07294	null
2023-12-12	Image Content Generation with Causal Reasoning	Xiaochuan Li et.al.	2312.07132	link
2023-12-12	Divide-and-Conquer Attack: Harnessing the Power of LLM to Bypass the Censorship of Text-to-Image Generation Model	Yimo Deng et.al.	2312.07130	link
2023-12-11	User Friendly and Adaptable Discriminative AI: Using the Lessons from the Success of LLMs and Image Generation Models	Son The Nguyen et.al.	2312.06826	null
2023-12-11	Photorealistic Video Generation with Diffusion Models	Agrim Gupta et.al.	2312.06662	null
2023-12-11	ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models	Denis Zavadski et.al.	2312.06573	link
2023-12-11	PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization	Xu Peng et.al.	2312.06354	null
2023-12-11	Compensation Sampling for Improved Convergence in Diffusion Models	Hui Lu et.al.	2312.06285	link
2023-12-11	UIEDP:Underwater Image Enhancement with Diffusion Prior	Dazhao Du et.al.	2312.06240	link
2023-12-11	Invariant Representation Learning via Decoupling Style and Spurious Features	Ruimeng Li et.al.	2312.06226	null
2023-12-11	Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods	Panos Achlioptas et.al.	2312.06116	null
2023-12-10	Correcting Diffusion Generation through Resampling	Yujian Liu et.al.	2312.06038	link
2023-12-10	Disentangled Representation Learning for Controllable Person Image Generation	Wenju Xu et.al.	2312.05798	null
2023-12-10	AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model	Teng Hu et.al.	2312.05767	link
2023-12-08	SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation	Thuan Hoang Nguyen et.al.	2312.05239	link
2023-12-08	DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models	Mengyang Feng et.al.	2312.05107	null
2023-12-08	SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control	Jaskirat Singh et.al.	2312.05039	null
2023-12-08	Synthesizing Traffic Datasets using Graph Neural Networks	Daniel Rodriguez-Criado et.al.	2312.05031	link
2023-12-08	UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models	Yiming Zhao et.al.	2312.04884	link
2023-12-08	MVDD: Multi-View Depth Diffusion Models	Zhen Wang et.al.	2312.04875	null
2023-12-08	RS-Corrector: Correcting the Racial Stereotypes in Latent Diffusion Models	Yue Jiang et.al.	2312.04810	null
2023-12-07	ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations	Maitreya Patel et.al.	2312.04655	null
2023-12-07	Autoencoding Labeled Interpolator, Inferring Parameters From Image, And Image From Parameters	Ali SaraerToosi et.al.	2312.04640	null
2023-12-07	Scaling Laws of Synthetic Images for Model Training … for Now	Lijie Fan et.al.	2312.04567	link
2023-12-07	Gen2Det: Generate to Detect	Saksham Suri et.al.	2312.04566	null
2023-12-07	GenDeF: Learning Generative Deformation Field for Video Generation	Wen Wang et.al.	2312.04561	null
2023-12-07	GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation	Shoufa Chen et.al.	2312.04557	null
2023-12-07	Generating Illustrated Instructions	Sachit Menon et.al.	2312.04552	link
2023-12-07	Free3D: Consistent Novel View Synthesis without 3D Representation	Chuanxia Zheng et.al.	2312.04551	link
2023-12-07	Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation	Zhiwu Qing et.al.	2312.04483	link
2023-12-07	PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding	Zhen Li et.al.	2312.04461	link
2023-12-07	DreamVideo: Composing Your Dream Videos with Customized Subject and Motion	Yujie Wei et.al.	2312.04433	link
2023-12-07	Approximate Caching for Efficiently Serving Diffusion Models	Shubham Agarwal et.al.	2312.04429	null
2023-12-06	Self-conditioned Image Generation via Generating Representations	Tianhong Li et.al.	2312.03701	link
2023-12-06	Memory Triggers: Unveiling Memorization in Text-To-Image Generative Models through Word-Level Duplication	Ali Naseh et.al.	2312.03692	null
2023-12-06	MotionCtrl: A Unified and Flexible Motion Controller for Video Generation	Zhouxia Wang et.al.	2312.03641	link
2023-12-06	TokenCompose: Grounding Diffusion with Token-level Supervision	Zirui Wang et.al.	2312.03626	link
2023-12-06	DiffusionSat: A Generative Foundation Model for Satellite Imagery	Samar Khanna et.al.	2312.03606	null
2023-12-06	Context Diffusion: In-Context Aware Image Generation	Ivona Najdenkoska et.al.	2312.03584	null
2023-12-06	FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation	Olivia Markham et.al.	2312.03540	null
2023-12-06	FRDiff: Feature Reuse for Exquisite Zero-shot Acceleration of Diffusion Models	Junhyuk So et.al.	2312.03517	null
2023-12-06	Kandinsky 3.0 Technical Report	Vladimir Arkhipkin et.al.	2312.03511	link
2023-12-06	Data-driven Crop Growth Simulation on Time-varying Generated Images using Multi-conditional Generative Adversarial Networks	Lukas Drees et.al.	2312.03443	link
2023-12-05	GPT4Point: A Unified Framework for Point-Language Understanding and Generation	Zhangyang Qi et.al.	2312.02980	null
2023-12-05	MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures	Zhangyang Xiong et.al.	2312.02963	null
2023-12-05	WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation	Jiachen Lu et.al.	2312.02934	link
2023-12-05	LivePhoto: Real Image Animation with Text-guided Motion Control	Xi Chen et.al.	2312.02928	null
2023-12-05	Fine-grained Controllable Video Generation via Object Appearance and Context	Hsin-Ping Huang et.al.	2312.02919	null
2023-12-05	BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models	Fengyuan Shi et.al.	2312.02813	link
2023-12-05	Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler	Philippe Gonzalez et.al.	2312.02683	null
2023-12-05	FaceStudio: Put Your Face Everywhere in Seconds	Yuxuan Yan et.al.	2312.02663	null
2023-12-05	GeNIe: Generative Hard Negative Images Through Diffusion	Soroush Abbasi Koohpayegani et.al.	2312.02548	link
2023-12-05	Retrieving Conditions from Reference Images for Diffusion Models	Haoran Tang et.al.	2312.02521	null
2023-12-04	Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation	Bingxin Ke et.al.	2312.02145	link
2023-12-04	DiffiT: Diffusion Vision Transformers for Image Generation	Ali Hatamizadeh et.al.	2312.02139	link
2023-12-04	Style Aligned Image Generation via Shared Attention	Amir Hertz et.al.	2312.02133	link
2023-12-04	GIVT: Generative Infinite-Vocabulary Transformers	Michael Tschannen et.al.	2312.02116	link
2023-12-04	UniGS: Unified Representation for Image Generation and Segmentation	Lu Qi et.al.	2312.01985	link
2023-12-04	InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models	Xunguang Wang et.al.	2312.01886	link
2023-12-04	Fully Spiking Denoising Diffusion Implicit Models	Ryo Watanabe et.al.	2312.01742	link
2023-12-04	ResEnsemble-DDPM: Residual Denoising Diffusion Probabilistic Models for Ensemble Learning	Shi Zhenning et.al.	2312.01682	null
2023-12-03	Diffusion Posterior Sampling for Nonlinear CT Reconstruction	Shudong Li et.al.	2312.01464	null
2023-12-03	Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models	Shengqu Cai et.al.	2312.01409	null
2023-12-01	VideoBooth: Diffusion-based Video Generation with Image Prompts	Yuming Jiang et.al.	2312.00777	null
2023-12-01	StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter	Gongye Liu et.al.	2312.00330	link
2023-11-30	S2ST: Image-to-Image Translation in the Seed Space of Latent Diffusion	Or Greenberg et.al.	2312.00116	null
2023-11-30	VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models	Zhen Xing et.al.	2311.18837	null
2023-11-30	ART $\boldsymbol{\cdot}$ V: Auto-Regressive Text-to-Video Generation with Diffusion Models	Wenming Weng et.al.	2311.18834	null
2023-11-30	MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation	Yanhui Wang et.al.	2311.18829	null
2023-11-30	One-step Diffusion with Distribution Matching Distillation	Tianwei Yin et.al.	2311.18828	null
2023-11-30	ElasticDiffusion: Training-free Arbitrary Size Image Generation	Moayed Haji-Ali et.al.	2311.18822	link
2023-11-30	CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation	Zineng Tang et.al.	2311.18775	null
2023-11-30	CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model	Jianhao Zeng et.al.	2311.18405	link
2023-11-30	Situating the social issues of image generation models in the model life cycle: a sociotechnical approach	Amelia Katirai et.al.	2311.18345	null
2023-11-30	Diffusion Models Without Attention	Jing Nathan Yan et.al.	2311.18257	null
2023-11-30	Few-shot Image Generation via Style Adaptation and Content Preservation	Xiaosheng He et.al.	2311.18169	null
2023-11-29	SODA: Bottleneck Diffusion Models for Representation Learning	Drew A. Hudson et.al.	2311.17901	null
2023-11-29	Analyzing and Explaining Image Classifiers via Diffusion Guidance	Maximilian Augustin et.al.	2311.17833	link
2023-11-29	BAND-2k: Banding Artifact Noticeable Database for Banding Detection and Quality Assessment	Zijian Chen et.al.	2311.17752	link
2023-11-29	Fair Text-to-Image Diffusion via Fair Mapping	Jia Li et.al.	2311.17695	null
2023-11-29	Query-Relevant Images Jailbreak Large Multi-Modal Models	Xin Liu et.al.	2311.17600	link
2023-11-29	Non-Visible Light Data Synthesis and Application: A Case Study for Synthetic Aperture Radar Imagery	Zichen Tian et.al.	2311.17486	null
2023-11-29	When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation	Xiaoming Li et.al.	2311.17461	link
2023-11-29	VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model	Haoyu Zhao et.al.	2311.17338	link
2023-11-28	Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation	Hang Li et.al.	2311.17216	null
2023-11-28	Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features	Niladri Shekhar Dutt et.al.	2311.17024	link
2023-11-28	COLE: A Hierarchical Generation Framework for Graphic Design	Peidong Jia et.al.	2311.16974	null
2023-11-28	SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models	Yuwei Guo et.al.	2311.16933	null
2023-11-28	Denoising Diffusion Probabilistic Models for Image Inpainting of Cell Distributions in the Human Brain	Jan-Oliver Kropp et.al.	2311.16821	null
2023-11-28	Panacea: Panoramic and Controllable Video Generation for Autonomous Driving	Yuqing Wen et.al.	2311.16813	null
2023-11-28	Multi-Channel Cross Modal Detection of Synthetic Face Images	M. Ibsen et.al.	2311.16773	link
2023-11-28	MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation	Sitong Su et.al.	2311.16635	null
2023-11-28	MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices	Yang Zhao et.al.	2311.16567	null
2023-11-28	Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks	Ye Lin Tun et.al.	2311.16538	link
2023-11-28	Text-Driven Image Editing via Learnable Regions	Yuanze Lin et.al.	2311.16432	link
2023-11-27	Self-correcting LLM-controlled Diffusion Models	Tsung-Han Wu et.al.	2311.16090	link
2023-11-27	ViT-Lens-2: Gateway to Omni-modal Intelligence	Weixian Lei et.al.	2311.16081	link
2023-11-27	Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion	Yuanxun Lu et.al.	2311.15980	null
2023-11-27	Tell2Design: A Dataset for Language-Guided Floor Plan Generation	Sicong Leng et.al.	2311.15941	link
2023-11-27	Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation	Siteng Huang et.al.	2311.15841	null
2023-11-27	FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax	Yu Lu et.al.	2311.15813	null
2023-11-27	C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing	Avigyan Bhattacharya et.al.	2311.15812	null
2023-11-27	Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation	Biao Gong et.al.	2311.15773	null
2023-11-27	*Reinforcement Learning from Diffusion Feedback: Q for Image Search**	Aboli Marathe et.al.	2311.15648	null
2023-11-27	ET3D: Efficient Text-to-3D Generation via Multi-View Distillation	Yiming Chen et.al.	2311.15561	null
2023-11-24	CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization	Ruoyu Zhao et.al.	2311.14631	null
2023-11-24	MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation	Zhiqi Li et.al.	2311.14494	link
2023-11-24	Decouple Content and Motion for Conditional Image-to-Video Generation	Cuifeng Shen et.al.	2311.14294	null
2023-11-24	Paragraph-to-Image Generation with Information-Enriched Diffusion Model	Weijia Wu et.al.	2311.14284	link
2023-11-24	Image Super-Resolution with Text Prompt Diffusion	Zheng Chen et.al.	2311.14282	link
2023-11-23	ACT: Adversarial Consistency Models	Fei Kong et.al.	2311.14097	link
2023-11-22	The Challenges of Image Generation Models in Generating Multi-Component Images	Tham Yik Foong et.al.	2311.13620	null
2023-11-22	Guided Flows for Generative Modeling and Decision Making	Qinqing Zheng et.al.	2311.13443	null
2023-11-23	LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes	Jaeyoung Chung et.al.	2311.13384	null
2023-11-22	Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models	Mengyang Feng et.al.	2311.13141	link
2023-11-22	FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline	Vladimir Arkhipkin et.al.	2311.13073	link
2023-11-21	GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning	Jiaxi Lv et.al.	2311.12631	null
2023-11-20	NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation	Shachar Rosenman et.al.	2311.12229	link
2023-11-20	Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models	Rohit Gandikota et.al.	2311.12092	link
2023-11-20	Advancing Urban Renewal: An Automated Approach to Generating Historical Arcade Facades with Stable Diffusion Models	Zheyuan Kuang et.al.	2311.11590	null
2023-11-19	Data efficient protein backmapping with backbone-to-side chain transformers	Shriram Chennakesavalu et.al.	2311.11459	link
2023-11-19	DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model	Zhenghao Pan et.al.	2311.11417	link
2023-11-19	A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI	Yuheng Fan et.al.	2311.11383	null
2023-11-19	MoVideo: Motion-Aware Video Generation with Diffusion Models	Jingyun Liang et.al.	2311.11325	null
2023-11-19	AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort	Wen Wang et.al.	2311.11243	link
2023-11-19	GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise	Xinhai Li et.al.	2311.11221	null
2023-11-18	Mitigating Exposure Bias in Discriminator Guided Diffusion Models	Eleftherios Tsonis et.al.	2311.11164	null
2023-11-18	User-Centric Interactive AI for Distributed Diffusion Model-based AI-Generated Content	Hongyang Du et.al.	2311.11094	null
2023-11-18	Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models	Xuefeng Gao et.al.	2311.11003	null
2023-11-17	Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning	Rohit Girdhar et.al.	2311.10709	null
2023-11-17	SelfEval: Leveraging the discriminative nature of generative models for evaluation	Sai Saketh Rambhatla et.al.	2311.10708	null
2023-11-17	Enhancing Object Coherence in Layout-to-Image Synthesis	Yibin Wang et.al.	2311.10522	link
2023-11-17	End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks	Aghiles Kebaili et.al.	2311.10472	null
2023-11-17	High-fidelity Person-centric Subject-to-Image Synthesis	Yibin Wang et.al.	2311.10329	link
2023-11-16	K-space Cold Diffusion: Learning to Reconstruct Accelerated MRI without Noise	Guoyao Shen et.al.	2311.10162	link
2023-11-16	The Chosen One: Consistent Characters in Text-to-Image Diffusion Models	Omri Avrahami et.al.	2311.10093	null
2023-11-16	MAM-E: Mammographic synthetic image generation with diffusion models	Ricardo Montoya-del-Angel et.al.	2311.09822	link
2023-11-16	DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics	Aniket Roy et.al.	2311.09753	null
2023-11-14	UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs	Yanwu Xu et.al.	2311.09257	link
2023-11-14	Finding AI-Generated Faces in the Wild	Gonzalo J. Aniano Porcile et.al.	2311.08577	null
2023-11-14	Peer is Your Pillar: A Data-unbalanced Conditional GANs for Few-shot Image Generation	Ziqiang Li et.al.	2311.08217	null
2023-11-14	Diffusion-based generation of Histopathological Whole Slide Images at a Gigapixel scale	Robert Harb et.al.	2311.08199	null
2023-11-14	One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion	Minghua Liu et.al.	2311.07885	null
2023-11-13	The Impact of Generative Artificial Intelligence	Kaichen Zhang et.al.	2311.07071	null
2023-11-12	IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models	Zhaoyuan Yang et.al.	2311.06792	link
2023-11-12	ChatAnything: Facetime Chat with LLM-Enhanced Personas	Yilin Zhao et.al.	2311.06772	null
2023-11-12	BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis	Tingfeng Cao et.al.	2311.06752	null
2023-11-12	How do Minimum-Norm Shallow Denoisers Look in Function Space?	Chen Zeno et.al.	2311.06748	null
2023-11-11	Generative AI for Space-Air-Ground Integrated Networks (SAGIN)	Ruichen Zhang et.al.	2311.06523	null
2023-11-10	A Survey of AI Text-to-Image and AI Text-to-Video Generators	Aditi Singh et.al.	2311.06329	null
2023-11-09	LCM-LoRA: A Universal Stable-Diffusion Acceleration Module	Simian Luo et.al.	2311.05556	link
2023-11-09	L-WaveBlock: A Novel Feature Extractor Leveraging Wavelets for Generative Adversarial Networks	Mirat Shah et.al.	2311.05548	null
2023-11-09	ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors	Jingwen Chen et.al.	2311.05463	null
2023-11-09	ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image	Senthil Purushwalkam et.al.	2311.05230	null
2023-11-08	Image-Based Virtual Try-On: A Survey	Dan Song et.al.	2311.04811	link
2023-11-07	Energy-based Calibrated VAE with Test Time Free Lunch	Yihong Luo et.al.	2311.04071	link
2023-11-07	MeVGAN: GAN-based Plugin Model for Video Generation with Applications in Colonoscopy	Łukasz Struski et.al.	2311.03884	null
2023-11-07	SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation	Iman Abbasnejad et.al.	2311.03866	null
2023-11-07	Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models	Shengzhe Zhou et.al.	2311.03830	link
2023-11-07	CapST: An Enhanced and Lightweight Method for Deepfake Video Classification	Wasim Ahmad et.al.	2311.03782	link
2023-11-07	LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators	Allen Roush et.al.	2311.03716	null
2023-11-07	Image Generation and Learning Strategy for Deep Document Forgery Detection	Yamato Okamoto et.al.	2311.03650	null
2023-11-06	SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis	Hanrong Ye et.al.	2311.03355	null
2023-11-06	Cross-Image Attention for Zero-Shot Appearance Transfer	Yuval Alaluf et.al.	2311.03335	null
2023-11-04	From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models	Zhuoshi Pan et.al.	2311.02373	link
2023-11-04	Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting	Hao Ai et.al.	2311.02343	link
2023-11-03	PRISM: Progressive Restoration for Scene Graph-based Image Manipulation	Pavel Jahoda et.al.	2311.02247	null
2023-11-06	RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches	Jiayuan Gu et.al.	2311.01977	null
2023-11-03	FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation	Yuanxin Liu et.al.	2311.01813	link
2023-11-02	Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation	Hadrien Reynaud et.al.	2311.01567	null
2023-11-02	VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning	Hong Chen et.al.	2311.00990	null
2023-11-02	Optimal Noise pursuit for Augmenting Text-to-Video Generation	Shijie Ma et.al.	2311.00949	null
2023-11-02	The Age of Generative AI and AI-Generated Everything	Hongyang Du et.al.	2311.00947	null
2023-11-02	Gaussian Mixture Solvers for Diffusion Models	Hanzhong Guo et.al.	2311.00941	link
2023-11-02	Towards High-quality HDR Deghosting with Conditional Diffusion Models	Qingsen Yan et.al.	2311.00932	null
2023-11-01	LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing	Wei-Ge Chen et.al.	2311.00571	null
2023-11-01	fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding	Xuelin Qian et.al.	2311.00342	null
2023-11-01	Flooding Regularization for Stable Training of Generative Adversarial Networks	Iu Yahiro et.al.	2311.00318	null
2023-10-31	Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion	David Marwood et.al.	2311.00056	null
2023-10-31	SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction	Xinyuan Chen et.al.	2310.20700	null
2023-10-31	HWD: A Novel Evaluation Score for Styled Handwritten Text Generation	Vittorio Pippi et.al.	2310.20316	link
2023-10-31	Machine learning refinement of in situ images acquired by low electron dose LC-TEM	Hiroyasu Katsuno et.al.	2310.20279	null
2023-10-31	Beyond U: Making Diffusion Models Faster & Lighter	Sergio Calvo-Ordonez et.al.	2310.20092	null
2023-10-30	‘Person’ == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion	Sourojit Ghosh et.al.	2310.19981	null
2023-10-30	CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models	Ziyang Yuan et.al.	2310.19784	null
2023-10-30	Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities	Zhengliang Liu et.al.	2310.19626	null
2023-10-30	VideoCrafter1: Open Diffusion Models for High-Quality Video Generation	Haoxin Chen et.al.	2310.19512	link
2023-10-30	Few-shot Hybrid Domain Adaptation of Image Generators	Hengjia Li et.al.	2310.19378	link
2023-10-30	On Measuring Fairness in Generative Models	Christopher T. H. Teo et.al.	2310.19297	null
2023-10-29	FPGAN-Control: A Controllable Fingerprint Generator for Training with Synthetic Data	Alon Shoshan et.al.	2310.19024	link
2023-10-30	Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation	Jaemin Cho et.al.	2310.18235	null
2023-10-27	Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images	Pai Chet Ng et.al.	2310.17911	link
2023-10-27	One Style is All you Need to Generate a Video	Sandeep Manandhar et.al.	2310.17835	link
2023-10-26	DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation	Yongxin Zhu et.al.	2310.17570	null
2023-10-26	AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors	You-Ming Chang et.al.	2310.17419	link
2023-10-26	Exploring the Potential of Generative AI for the World Wide Web	Nouar AlDahoul et.al.	2310.17370	null
2023-10-26	Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics	Shuai Yang et.al.	2310.17316	link
2023-10-26	Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise	Zhenkai Zhang et.al.	2310.17167	null
2023-10-25	Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs	Jianwei Fei et.al.	2310.16919	null
2023-10-25	CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images	Aaron Gokaslan et.al.	2310.16825	link
2023-10-25	Interferometric Neural Networks	Arun Sehrawat et.al.	2310.16742	link
2023-10-25	Local Statistics for Generative Image Detection	Yung Jer Wong et.al.	2310.16684	null
2023-10-25	A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation	Eyal Segalis et.al.	2310.16656	null
2023-10-25	Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models	Weijie Chen et.al.	2310.16573	null
2023-10-25	Learning Robust Deep Visual Representations from EEG Brain Recordings	Prajwal Singh et.al.	2310.16532	link
2023-10-24	Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights	Alokendu Mazumder et.al.	2310.16194	link
2023-10-24	Complex Image Generation SwinTransformer Network for Audio Denoising	Youshan Zhang et.al.	2310.16109	link
2023-10-24	RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image Synthesis	Anant Khandelwal et.al.	2310.16074	null
2023-10-24	CVPR 2023 Text Guided Video Editing Competition	Jay Zhangjie Wu et.al.	2310.16003	link
2023-10-23	Fast Forward Modelling of Galaxy Spatial and Statistical Distributions	Pascale Berner et.al.	2310.15223	null
2023-10-23	FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling	Haonan Qiu et.al.	2310.15169	link
2023-10-23	DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design	Kevin Lin et.al.	2310.15144	link
2023-10-23	Matryoshka Diffusion Models	Jiatao Gu et.al.	2310.15111	link
2023-10-23	ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling	Qiugang Zhan et.al.	2310.14839	link
2023-10-23	Large Language Models can Share Images, Too!	Young-Jun Lee et.al.	2310.14804	link
2023-10-22	A Pytorch Reproduction of Masked Generative Image Transformer	Victor Besnier et.al.	2310.14400	link
2023-10-21	Adversarial Image Generation by Spatial Transformation in Perceptual Colorspaces	Ayberk Aydin et.al.	2310.13950	link
2023-10-20	Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models	Shawn Shan et.al.	2310.13828	null
2023-10-20	Localizing and Editing Knowledge in Text-to-Image Generative Models	Samyadeep Basu et.al.	2310.13730	null
2023-10-20	Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation	Wenyu Guo et.al.	2310.13361	link
2023-10-20	DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics	Kaiwen Zheng et.al.	2310.13268	link
2023-10-19	Conditional Generative Modeling for Images, 3D Animations, and Video	Vikram Voleti et.al.	2310.13157	null
2023-10-19	Particle Guidance: non-I.I.D. Diverse Sampling with Diffusion Models	Gabriele Corso et.al.	2310.13102	link
2023-10-19	Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning	Amey Pasarkar et.al.	2310.12952	link
2023-10-19	STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models	Belhal Karimi et.al.	2310.12667	null
2023-10-19	PrivacyGAN: robust generative image privacy	Mariia Zameshina et.al.	2310.12590	null
2023-10-19	Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation	Mariia Zameshina et.al.	2310.12583	null
2023-10-19	Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping	Zijie Pan et.al.	2310.12474	link
2023-10-18	An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning	Chen Jin et.al.	2310.12274	link
2023-10-18	Quality Diversity through Human Feedback	Li Ding et.al.	2310.12103	link
2023-10-20	Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach	Feng Luo et.al.	2310.12004	link
2023-10-17	GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment	Dhruba Ghosh et.al.	2310.11513	link
2023-10-18	EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Yaofang Liu et.al.	2310.11440	link
2023-10-17	Elucidating The Design Space of Classifier-Guided Diffusion Generation	Jiajun Ma et.al.	2310.11311	link
2023-10-17	BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference	Siqi Kou et.al.	2310.11142	link
2023-10-16	LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation	Ruiqi Wu et.al.	2310.10769	link
2023-10-18	BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys	Yu Gu et.al.	2310.10765	null
2023-10-16	A Survey on Video Diffusion Models	Zhen Xing et.al.	2310.10647	link
2023-10-16	LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts	Hanan Gani et.al.	2310.10640	link
2023-10-16	ViPE: Visualise Pretty-much Everything	Hassan Shahmohammadi et.al.	2310.10543	link
2023-10-16	ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion	Jiayu Yang et.al.	2310.10343	link
2023-10-16	Scene Graph Conditioning in Latent Diffusion	Frank Fundel et.al.	2310.10338	link
2023-10-16	Evading Detection Actively: Toward Anti-Forensics against Forgery Localization	Long Zhuo et.al.	2310.10036	null
2023-10-15	Segment Anything Model for Pedestrian Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data	Jiahao Xia et.al.	2310.09918	null
2023-10-14	Unified High-binding Watermark for Unconditional Image Generation Models	Ruinan Ma et.al.	2310.09479	null
2023-10-13	Making Multimodal Generation Easier: When Diffusion Models Meet LLMs	Xiangyu Zhao et.al.	2310.08949	link
2023-10-13	R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation	Jiayu Xiao et.al.	2310.08872	null
2023-10-12	SSG2: A new modelling paradigm for semantic segmentation	Foivos I. Diakogiannis et.al.	2310.08671	link
2023-10-12	HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion	Xian Liu et.al.	2310.08579	null
2023-10-12	MotionDirector: Motion Customization of Text-to-Video Diffusion Models	Rui Zhao et.al.	2310.08465	link
2023-10-12	Neural Diffusion Models	Grigory Bartosh et.al.	2310.08337	null
2023-10-12	Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting	Zijie Chen et.al.	2310.08129	link
2023-10-12	SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Zijie Wu et.al.	2310.08094	null
2023-10-12	CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity	Abdullah Hayajneh et.al.	2310.07969	link
2023-10-13	Generative Modeling with Phase Stochastic Bridges	Tianrong Chen et.al.	2310.07805	link
2023-10-11	DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model	Xiaofan Li et.al.	2310.07771	link
2023-10-11	ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models	Yingqing He et.al.	2310.07702	link
2023-10-11	ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation	Bo Peng et.al.	2310.07697	link
2023-10-11	Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models	Lai Zeqiang et.al.	2310.07653	link
2023-10-11	Distance-based Weighted Transformer Network for Image Completion	Pourya Shamsolmoali et.al.	2310.07440	null
2023-10-11	Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else	Hazarapet Tunanyan et.al.	2310.07419	null
2023-10-11	Crowd Counting in Harsh Weather using Image Denoising with Pix2Pix GANs	Muhammad Asif Khan et.al.	2310.07245	null
2023-10-11	Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model	Shiyuan Yang et.al.	2310.07222	link
2023-10-11	Echocardiography video synthesis from end diastolic semantic map via diffusion model	Phi Nguyen Van et.al.	2310.07131	null
2023-10-10	Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images	Che Liu et.al.	2310.07027	link
2023-10-10	ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning	Alec Helbling et.al.	2310.06968	null
2023-10-10	Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling	Huangjie Zheng et.al.	2310.06389	link
2023-10-10	JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling	Jingyang Zhang et.al.	2310.06347	null
2023-10-10	Improving Compositional Text-to-image Generation with Large Vision-Language Models	Song Wen et.al.	2310.06311	null
2023-10-09	Latent Diffusion Model for DNA Sequence Generation	Zehui Li et.al.	2310.06150	link
2023-10-09	A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models	Sebastian G. Gruber et.al.	2310.05833	link
2023-10-09	Language Model Beats Diffusion – Tokenizer is Key to Visual Generation	Lijun Yu et.al.	2310.05737	link
2023-10-09	Locality-Aware Generalizable Implicit Neural Representation}	Doyup Lee et.al.	2310.05624	null
2023-10-09	Adaptive Multi-head Contrastive Learning	Lei Wang et.al.	2310.05615	link
2023-10-09	A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers	Matteo Bastico et.al.	2310.05572	link
2023-10-09	Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers	Shiyue Cao et.al.	2310.05400	null
2023-10-08	The Emergence of Reproducibility and Consistency in Diffusion Models	Huijie Zhang et.al.	2310.05264	null
2023-10-07	Generative AI May Prefer to Present National-level Characteristics of Cities Based on Stereotypical Geographic Impressions at the Continental Level	Shan Ye et.al.	2310.04897	null
2023-10-07	Understanding and Improving Adversarial Attacks on Latent Diffusion Model	Boyang Zheng et.al.	2310.04687	link
2023-10-07	X-Transfer: A Transfer Learning-Based Framework for Robust GAN-Generated Fake Image Detection	Lei Zhang et.al.	2310.04639	null
2023-10-06	Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference	Simian Luo et.al.	2310.04378	link
2023-10-06	Assessing Robustness via Score-Based Adversarial Image Generation	Marcel Kollovieh et.al.	2310.04285	null
2023-10-05	Aligning Text-to-Image Diffusion Models with Reward Backpropagation	Mihir Prabhudesai et.al.	2310.03739	link
2023-10-05	Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency	Tianhong Li et.al.	2310.03734	null
2023-10-06	MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images	Yanwu Xu et.al.	2310.03559	link
2023-10-05	Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion	Anton Razzhigaev et.al.	2310.03502	link
2023-10-04	Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel	Paul Hagemann et.al.	2310.03054	link
2023-10-04	Kosmos-G: Generating Images in Context with Multimodal Large Language Models	Xichen Pan et.al.	2310.02992	link
2023-10-04	GETAvatar: Generative Textured Meshes for Animatable Human Avatars	Xuanmeng Zhang et.al.	2310.02714	null
2023-10-04	ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF	Jangho Park et.al.	2310.02712	null
2023-10-03	GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature	Aaron Ferber et.al.	2310.02442	null
2023-10-03	FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models	Yingqian Cui et.al.	2310.02401	null
2023-10-03	MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens	Kaizhi Zheng et.al.	2310.02239	link
2023-10-03	Optimizing microlens arrays for incoherent HiLo microscopy	Ziao Jiao et.al.	2310.01939	null
2023-10-03	Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation	Jun Li et.al.	2310.01819	null
2023-10-02	ImagenHub: Standardizing the evaluation of conditional image generation models	Max Ku et.al.	2310.01596	link
2023-10-02	Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code	Xuan Ju et.al.	2310.01506	link
2023-10-02	Conditional Diffusion Distillation	Kangfu Mei et.al.	2310.01407	link
2023-10-02	Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models	Takumi Harada et.al.	2310.00936	null
2023-10-02	Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP	Zixiang Chen et.al.	2310.00927	null
2023-10-02	RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches	Shawn Mathew et.al.	2310.00868	link
2023-10-01	Completing Visual Objects via Bridging Generation and Segmentation	Xiang Li et.al.	2310.00808	null
2023-10-02	LLM-grounded Video Diffusion Models	Long Lian et.al.	2309.17444	null
2023-09-29	Directly Fine-Tuning Diffusion Models on Differentiable Rewards	Kevin Clark et.al.	2309.17400	null
2023-09-29	Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning	Zihan Ding et.al.	2309.16984	link
2023-09-29	Leveraging Optimization for Adaptive Attacks on Image Watermarks	Nils Lukas et.al.	2309.16952	link
2023-09-29	Denoising Diffusion Bridge Models	Linqi Zhou et.al.	2309.16948	link
2023-09-28	CCEdit: Creative and Controllable Video Editing via Diffusion Models	Ruoyu Feng et.al.	2309.16496	null
2023-09-28	Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation	Guy Yariv et.al.	2309.16429	link
2023-09-28	Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning	Albert Mohwald et.al.	2309.16351	link
2023-09-28	OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions	Jin Liu et.al.	2309.16148	null
2023-09-27	Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness	Valentin Barriere et.al.	2309.15991	null
2023-09-27	Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation	David Junhao Zhang et.al.	2309.15818	link
2023-09-27	Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack	Xiaoliang Dai et.al.	2309.15807	null
2023-09-27	Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation	Xin Yuan et.al.	2309.15726	null
2023-09-27	Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing	Kai Wang et.al.	2309.15664	link
2023-09-27	Position and Orientation-Aware One-Shot Learning for Medical Action Recognition from Signal Data	Leiyu Xie et.al.	2309.15635	null
2023-09-28	Jointly Training Large Autoregressive Multimodal Models	Emanuele Aiello et.al.	2309.15564	null
2023-09-27	Teaching Text-to-Image Models to Communicate	Xiaowen Sun et.al.	2309.15516	null
2023-09-27	DreamCom: Finetuning Text-guided Inpainting Model for Image Composition	Lingxiao Lu et.al.	2309.15508	null
2023-09-27	Finite Scalar Quantization: VQ-VAE Made Simple	Fabian Mentzer et.al.	2309.15505	link
2023-09-27	LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models	Yaohui Wang et.al.	2309.15103	link
2023-09-26	Seimei KOOLS-IFU mapping of the gas and dust distributions in Galactic PNe: Unveiling the origin and evolution of Galactic halo PN H4-1	Masaaki Otsuka et.al.	2309.15099	null
2023-09-26	VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning	Han Lin et.al.	2309.15091	null
2023-09-26	Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation	Shin-Ying Yeh et.al.	2309.14859	link
2023-09-26	On quantifying and improving realism of images generated with diffusion	Yunzhuo Chen et.al.	2309.14756	null
2023-09-27	Text-to-Image Generation for Abstract Concepts	Jiayi Liao et.al.	2309.14623	null
2023-09-25	Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator	Hanzhuo Huang et.al.	2309.14494	link
2023-09-25	Chop & Learn: Recognizing and Generating Object-State Compositions	Nirat Saini et.al.	2309.14339	null
2023-09-27	Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation	Quang Nguyen et.al.	2309.14303	link
2023-09-25	Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments	Najmeh Mohammadbagheri et.al.	2309.14267	null
2023-09-25	SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution	Zhongjie Ba et.al.	2309.14122	link
2023-09-25	Diverse Semantic Image Editing with Style Codes	Hakan Sivuk et.al.	2309.13975	link
2023-09-23	GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER	Mingzhen Sun et.al.	2309.13274	link
2023-09-23	Randomize to Generalize: Domain Randomization for Runway FOD Detection	Javaria Farooq et.al.	2309.13264	null
2023-09-23	NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation	Rui Yu et.al.	2309.13240	null
2023-09-21	POLAR3D: Augmenting NASA’s POLAR Dataset for Data-Driven Lunar Perception and Rover Simulation	Bo-Hsun Chen et.al.	2309.12397	link
2023-09-21	TextCLIP: Text-Guided Face Image Generation And Manipulation Without Adversarial Training	Xiaozhou You et.al.	2309.11923	null
2023-09-21	PIE: Simulating Disease Progression via Progressive Image Editing	Kaizhao Liang et.al.	2309.11745	link
2023-09-24	Latent Diffusion Models for Structural Component Design	Ethan Herron et.al.	2309.11601	null
2023-09-20	Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge	Manuel Brack et.al.	2309.11575	null
2023-09-20	FreeU: Free Lunch in Diffusion U-Net	Chenyang Si et.al.	2309.11497	link
2023-09-20	Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation	Hyelin Nam et.al.	2309.11127	null
2023-09-21	Learning End-to-End Channel Coding with Diffusion Models	Muah Kim et.al.	2309.10505	null
2023-09-23	AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration	Lijiang Li et.al.	2309.10438	link
2023-09-19	Language Guided Adversarial Purification	Himanshu Singh et.al.	2309.10348	link
2023-09-18	Multimodal Foundation Models: From Specialists to General-Purpose Assistants	Chunyuan Li et.al.	2309.10020	link
2023-09-18	DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving	Xiaofeng Wang et.al.	2309.09777	null
2023-09-18	Gradpaint: Gradient-Guided Inpainting with Diffusion Models	Asya Grechka et.al.	2309.09614	null
2023-09-18	DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues	Kun Pan et.al.	2309.09526	link
2023-09-18	Progressive Text-to-Image Diffusion with Soft Latent Direction	YuTeng Ye et.al.	2309.09466	link
2023-09-15	Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models	Feihong He et.al.	2309.08251	null
2023-09-15	Talkin’ ‘Bout AI Generation: Copyright and the Generative-AI Supply Chain	Katherine Lee et.al.	2309.08133	null
2023-09-15	Increasing diversity of omni-directional images generated from single image using cGAN based on MLPMixer	Atsuya Nakata et.al.	2309.08129	link
2023-09-14	Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset	Iya Chivileva et.al.	2309.08009	null
2023-09-14	Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models	James Burgess et.al.	2309.07986	link
2023-09-14	ALWOD: Active Learning for Weakly-Supervised Object Detection	Yuting Wang et.al.	2309.07914	link
2023-09-13	Unbiased Face Synthesis With Diffusion Models: Are We There Yet?	Harrison Rosenberg et.al.	2309.07277	link
2023-09-13	MagiCapture: High-Resolution Multi-Concept Portrait Customization	Junha Hyung et.al.	2309.06895	null
2023-09-12	InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation	Xingchao Liu et.al.	2309.06380	link
2023-09-12	Elucidating the solution space of extended reverse-time SDE for diffusion models	Qinpeng Cui et.al.	2309.06169	link
2023-09-12	Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation	Ling Huang et.al.	2309.05919	link
2023-09-11	Divergences in Color Perception between Deep Neural Networks and Humans	Ethan O. Nadler et.al.	2309.05809	link
2023-09-11	PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models	Li Chen et.al.	2309.05793	null
2023-09-11	ITI-GEN: Inclusive Text-to-Image Generation	Cheng Zhang et.al.	2309.05569	link
2023-09-11	PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud	Chengyu Wang et.al.	2309.05534	null
2023-09-11	Comprehensive analysis of synthetic learning applied to neonatal brain MRI segmentation	R Valabregue et.al.	2309.05306	link
2023-09-10	Gender Bias in Multimodal Models: A Transnational Feminist Approach Considering Geographical Region and Culture	Abhishek Mandal et.al.	2309.04997	null
2023-09-09	Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video	Xiuzhe Wu et.al.	2309.04814	link
2023-09-08	The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion	Yujin Jeong et.al.	2309.04509	null
2023-09-08	Create Your World: Lifelong Text-to-Image Diffusion	Gan Sun et.al.	2309.04430	null
2023-09-08	MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers	Sijia Li et.al.	2309.04372	null
2023-09-08	Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Hyelin Nam et.al.	2309.04287	null
2023-09-08	Robot Localization and Mapping Final Report – Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry	Akankshya Kar et.al.	2309.04147	null
2023-09-08	From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models	Changming Xiao et.al.	2309.04109	link
2023-09-07	Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis	Jiapeng Zhu et.al.	2309.03904	link
2023-09-07	T2IW: Joint Text to Image & Watermark Generation	An-An Liu et.al.	2309.03815	null
2023-09-07	Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model	Sungwon Hwang et.al.	2309.03550	null
2023-09-07	Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation	Jiaxi Gu et.al.	2309.03549	null
2023-09-07	Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation	Zhuqiang Lu et.al.	2309.03467	link
2023-09-06	My Art My Choice: Adversarial Protection Against Unruly AI	Anthony Rhodes et.al.	2309.03198	null
2023-09-06	Hierarchical-level rain image generative model based on GAN	Zhenyuan Liu et.al.	2309.02964	null
2023-09-06	BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network	Takashi Shibuya et.al.	2309.02836	link
2023-09-06	Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter	Jinglong Wang et.al.	2309.02773	link
2023-09-05	Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning	Lili Yu et.al.	2309.02591	null
2023-09-05	Diffusion on the Probability Simplex	Griffin Floto et.al.	2309.02530	null
2023-09-05	Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface	Atieh Taheri et.al.	2309.02402	null
2023-09-05	Exchanging-based Multimodal Fusion with Transformer	Renyu Zhu et.al.	2309.02190	link
2023-09-04	Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images	Jamiu Idowu et.al.	2309.01850	null
2023-09-04	StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation	Zhouxia Wang et.al.	2309.01770	null
2023-09-04	Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion	Ryota Yoshihashi et.al.	2309.01369	null
2023-09-04	Mutual Information Maximizing Quantum Generative Adversarial Network and Its Applications in Finance	Mingyu Lee et.al.	2309.01363	null
2023-09-03	Diffusion Models with Deterministic Normalizing Flow Priors	Mohsen Zand et.al.	2309.01274	link
2023-09-03	Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection	Weijie Wang et.al.	2309.01104	null
2023-09-02	Constrained CycleGAN for Effective Generation of Ultrasound Sector Images of Improved Spatial Resolution	Xiaofei Sun et.al.	2309.00995	link
2023-09-02	Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities	Shanyuan Liu et.al.	2309.00952	null
2023-09-01	VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation	Xin Li et.al.	2309.00398	null
2023-09-01	DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using Stable Diffusion Models	Michael Shenoda et.al.	2309.00248	link
2023-09-01	Diffusion Model with Clustering-based Conditioning for Food Image Generation	Yue Han et.al.	2309.00199	null
2023-08-31	StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation	Yuhan Wang et.al.	2308.16909	link
2023-08-31	Diffusion Models for Interferometric Satellite Aperture Radar	Alexandre Tuel et.al.	2308.16847	link
2023-08-31	Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images	Cuican Yu et.al.	2308.16758	null
2023-08-31	Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps	Miguel Espinosa et.al.	2308.16648	link
2023-08-31	Detecting Out-of-Context Image-Caption Pairs in News: A Counter-Intuitive Method	Eivind Moholdt et.al.	2308.16611	link
2023-08-30	Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation	Mengping Yang et.al.	2308.16110	link
2023-08-30	Semantic Image Synthesis via Class-Adaptive Cross-Attention	Tomaso Fontanini et.al.	2308.16071	null
2023-08-30	Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models	Takami Sato et.al.	2308.15692	null
2023-08-29	Learning Modulated Transformation in GANs	Ceyuan Yang et.al.	2308.15472	link
2023-08-29	IndGIC: Supervised Action Recognition under Low Illumination	Jingbo Zeng et.al.	2308.15345	null
2023-08-29	A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information	Shuxiao Ma et.al.	2308.15142	null
2023-08-28	Automated Conversion of Music Videos into Lyric Videos	Jiaju Ma et.al.	2308.14922	null
2023-08-28	RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-centric Learning	Nathan Drenkow et.al.	2308.14899	null
2023-08-28	Identifying and Mitigating the Security Risks of Generative AI	Clark Barrett et.al.	2308.14840	null
2023-08-28	MagicAvatar: Multimodal Avatar Generation and Animation	Jianfeng Zhang et.al.	2308.14748	null
2023-08-28	Causality-Based Feature Importance Quantifying Methods:PN-FI, PS-FI and PNS-FI	Shuxian Du et.al.	2308.14474	null
2023-08-28	Semi-Supervised Semantic Depth Estimation using Symbiotic Transformer and NearFarMix Augmentation	Md Awsafur Rahman et.al.	2308.14400	null
2023-08-28	FaceChain: A Playground for Identity-Preserving Portrait Generation	Yang Liu et.al.	2308.14256	link
2023-08-28	HoloFusion: Towards Photo-realistic 3D Generative Modeling	Animesh Karnewar et.al.	2308.14244	null
2023-08-27	A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy	Forough Fazeli-Asl et.al.	2308.14048	null
2023-08-26	Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models	Hao Fei et.al.	2308.13812	null
2023-08-26	ORES: Open-vocabulary Responsible Visual Synthesis	Minheng Ni et.al.	2308.13785	link
2023-08-25	Residual Denoising Diffusion Models	Jiawei Liu et.al.	2308.13712	link
2023-08-25	Is Deep Learning Network Necessary for Image Generation?	Chenqiu Zhao et.al.	2308.13612	null
2023-08-25	WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI	Hai Dang et.al.	2308.13355	null
2023-08-25	Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model	Xunpeng Yi et.al.	2308.13164	null
2023-08-25	A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions	Tianyi Zhang et.al.	2308.13142	null
2023-08-24	Dense Text-to-Image Generation with Attention Modulation	Yunji Kim et.al.	2308.12964	link
2023-08-24	APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency	Yupu Yao et.al.	2308.12605	null
2023-08-23	Augmenting medical image classifiers with synthetic data from latent diffusion models	Luke W. Sagers et.al.	2308.12453	null
2023-08-23	DISGAN: Wavelet-informed Discriminator Guides GAN to MRI Super-resolution with Noise Cleaning	Qi Wang et.al.	2308.12084	link
2023-08-23	Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages	Jinyi Hu et.al.	2308.12038	link
2023-08-23	Efficient Transfer Learning in Diffusion Models via Adversarial Noise	Xiyu Wang et.al.	2308.11948	null
2023-08-23	LFS-GAN: Lifelong Few-Shot Image Generation	Juwon Seo et.al.	2308.11917	link
2023-08-23	CoC-GAN: Employing Context Cluster for Unveiling a New Pathway in Image Generation	Zihao Wang et.al.	2308.11857	null
2023-08-22	Ceci n’est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings	Eugene Bagdasaryan et.al.	2308.11804	link
2023-08-22	StoryBench: A Multifaceted Benchmark for Continuous Story Visualization	Emanuele Bugliarello et.al.	2308.11606	link
2023-08-22	Open Set Synthetic Image Source Attribution	Shengbang Fang et.al.	2308.11557	null
2023-08-22	Hamiltonian GAN	Christine Allen-Blanchette et.al.	2308.11216	null
2023-08-22	MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers	Daniel Silver et.al.	2308.11096	null
2023-08-21	Debiasing Counterfactuals In the Presence of Spurious Correlations	Amar Kumar et.al.	2308.10984	null
2023-08-21	Sampling From Autoencoders’ Latent Space via Quantization And Probability Mass Function Concepts	Aymene Mohammed Bouayed et.al.	2308.10704	null
2023-08-20	Turning Waste into Wealth: Leveraging Low-Quality Samples for Enhancing Continuous Conditional Generative Adversarial Networks	Xin Ding et.al.	2308.10273	link
2023-08-20	StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data	Yanda Li et.al.	2308.10253	link
2023-08-20	Spiking-Diffusion: Vector Quantized Discrete Diffusion Model with Spiking Neural Networks	Mingxuan Liu et.al.	2308.10187	link
2023-08-20	SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation	Chengyou Jia et.al.	2308.10156	null
2023-08-19	ASPIRE: Language-Guided Augmentation for Robust Image Classification	Sreyan Ghosh et.al.	2308.10103	link
2023-08-19	ControlCom: Controllable Image Composition using Diffusion Model	Bo Zhang et.al.	2308.10040	link
2023-08-19	ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval	Kaihang Pan et.al.	2308.10025	null
2023-08-19	DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization	Xiaoyu Ye et.al.	2308.09889	null
2023-08-18	Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis	Shijie Li et.al.	2308.09835	link
2023-08-18	SimDA: Simple Diffusion Adapter for Efficient Video Generation	Zhen Xing et.al.	2308.09710	null
2023-08-18	Guide3D: Create 3D Avatars from Text and Image Guidance	Yukang Cao et.al.	2308.09705	null
2023-08-18	DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability	Runhui Huang et.al.	2308.09306	null
2023-08-18	RFDforFin: Robust Deep Forgery Detection for GAN-generated Fingerprint Images	Hui Miao et.al.	2308.09285	null
2023-08-17	Watch Your Steps: Local Image and Scene Editing by Text Instructions	Ashkan Mirzaei et.al.	2308.08947	null
2023-08-16	Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment	Qi Chen et.al.	2308.08525	link
2023-08-16	Painter: Teaching Auto-regressive Language Models to Draw Sketches	Reza Pourreza et.al.	2308.08520	null
2023-08-16	Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising Diffusion Model	Ran Jiang et.al.	2308.08367	null
2023-08-16	Denoising Diffusion Probabilistic Model for Retinal Image Generation and Segmentation	Alnur Alimanov et.al.	2308.08339	link
2023-08-18	Dual-Stream Diffusion Net for Text-to-Video Generation	Binhui Liu et.al.	2308.08316	null
2023-08-16	Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis	Minho Park et.al.	2308.08157	link
2023-08-16	DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory	Shengming Yin et.al.	2308.08089	null
2023-08-15	Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training	Ximing Xing et.al.	2308.07665	link
2023-08-15	Story Visualization by Online Text Augmentation with Context Memory	Daechul Ahn et.al.	2308.07575	link
2023-08-13	Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks	David Junhao Zhang et.al.	2308.06739	null
2023-08-13	IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models	Hu Ye et.al.	2308.06721	null
2023-08-13	LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts	Binbin Yang et.al.	2308.06713	null
2023-08-12	Semantic Communications with Explicit Semantic Base for Image Transmission	Yuan Zheng et.al.	2308.06599	null
2023-08-11	White-box Membership Inference Attacks against Diffusion Models	Yan Pang et.al.	2308.06405	null
2023-08-15	DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	Melissa Hall et.al.	2308.06198	link
2023-08-11	Improving Joint Speech-Text Representations Without Alignment	Cal Peyser et.al.	2308.06125	null
2023-08-11	Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation	Yuki Endo et.al.	2308.06027	link
2023-08-10	SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network	Chenwei Wang et.al.	2308.05489	null
2023-08-10	Beyond Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization	Hongyang Du et.al.	2308.05384	link
2023-08-09	PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions	John Joon Young Chung et.al.	2308.05184	link
2023-08-12	LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation	Leigang Qu et.al.	2308.05095	null
2023-08-13	TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design	Yifan Gao et.al.	2308.04733	null
2023-08-09	GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization	Hao Fang et.al.	2308.04699	link
2023-08-08	DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images	Xuechao Zou et.al.	2308.04417	link
2023-08-08	The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings	Timothy Merino et.al.	2308.04052	link
2023-08-05	DermoSegDiff: A Boundary-aware Segmentation Diffusion Model for Skin Lesion Delineation	Afshin Bozorgpour et.al.	2308.02959	link
2023-08-05	Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation	Zijie Wu et.al.	2308.02874	null
2023-08-03	ConceptLab: Creative Generation using Diffusion Prior Constraints	Elad Richardson et.al.	2308.02669	link
2023-08-04	Towards Personalized Prompt-Model Retrieval for Generative Recommendation	Yuanhe Guo et.al.	2308.02205	link
2023-08-04	SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation	Shikun Sun et.al.	2308.02154	null
2023-08-03	Focus on Content not Noise: Improving Image Generation for Nuclei Segmentation by Suppressing Steganography in CycleGAN	Jonas Utz et.al.	2308.01769	null
2023-08-07	BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout	Kairui Yang et.al.	2308.01661	null
2023-08-03	Interleaving GANs with knowledge graphs to support design creativity for book covers	Alexandru Motogna et.al.	2308.01626	link
2023-08-03	Circumventing Concept Erasure Methods For Text-to-Image Generative Models	Minh Pham et.al.	2308.01508	link
2023-08-02	Reverse Stable Diffusion: What prompt was used to generate this image?	Florinel-Alin Croitoru et.al.	2308.01472	link
2023-08-02	Revisiting DETR Pre-training for Object Detection	Yan Ma et.al.	2308.01300	null
2023-08-02	Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation	Guojin Zhong et.al.	2308.01147	link
2023-08-01	The Bias Amplification Paradox in Text-to-Image Generation	Preethi Seshadri et.al.	2308.00755	link
2023-08-01	Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models	Cheng-Yu Hsieh et.al.	2308.00675	null
2023-08-01	Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design	Jaechang Ko et.al.	2308.00227	null
2023-08-01	SkullGAN: Synthetic Skull CT Generation with Generative Adversarial Networks	Kasra Naftchi-Ardebili et.al.	2308.00206	link
2023-07-28	Testing the Depth of ChatGPT’s Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5’s Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking	David Bayani et.al.	2307.16806	null
2023-07-31	DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation	Runyang Feng et.al.	2307.16687	null
2023-07-31	Towards General Visual-Linguistic Face Forgery Detection	Ke Sun et.al.	2307.16545	null
2023-07-31	BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models	Jordan Vice et.al.	2307.16489	link
2023-07-31	HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution	Minyi Zhao et.al.	2307.16410	null
2023-07-31	MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text	Junchen Zhu et.al.	2307.16371	null
2023-07-30	Mask-guided Data Augmentation for Multiparametric MRI Generation with a Rare Hepatocellular Carcinoma	Karen Sanchez et.al.	2307.16314	null
2023-07-30	Stylized Projected GAN: A Novel Architecture for Fast and Realistic Image Generation	Md Nurul Muttakin et.al.	2307.16275	null
2023-07-29	HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation	Zuyan Liu et.al.	2307.16061	null
2023-07-28	Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search	Alexander Chebykin et.al.	2307.15621	link
2023-07-28	RAWIW: RAW Image Watermarking Robust to ISP Pipeline	Kang Fu et.al.	2307.15443	null
2023-07-28	Staging E-Commerce Products for Online Advertising using Retrieval Assisted Image Generation	Yueh-Ning Ku et.al.	2307.15326	null
2023-07-27	Semantic Image Completion and Enhancement using GANs	Priyansh Saxena et.al.	2307.14748	null
2023-07-31	Pre-training Vision Transformers with Very Limited Synthesized Images	Ryo Nakamura et.al.	2307.14710	link
2023-07-27	LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement	Tao Wang et.al.	2307.14659	link
2023-07-27	EqGAN: Feature Equalization Fusion for Few-shot Image Generation	Yingbo Zhou et.al.	2307.14638	null
2023-07-26	Deepfake Image Generation for Improved Brain Tumor Segmentation	Roa’a Al-Emaryeen et.al.	2307.14273	null
2023-07-26	Learning Disentangled Discrete Representations	David Friede et.al.	2307.14151	link
2023-07-26	VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet	Zhihao Hu et.al.	2307.14073	null
2023-07-25	**Composite Diffusion	whole >= Σparts**	Vikram Jamwal et.al.	2307.13720
2023-07-25	Fake It Without Making It: Conditioned Face Generation for Accurate 3D Face Shape Estimation	Will Rowan et.al.	2307.13639	null
2023-07-25	XDLM: Cross-lingual Diffusion Language Model for Machine Translation	Linyao Chen et.al.	2307.13560	null
2023-07-25	Not with my name! Inferring artists’ names of input strings employed by Diffusion Models	Roberto Leotta et.al.	2307.13527	link
2023-07-24	A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models	Jindong Gu et.al.	2307.12980	link
2023-07-24	Interpolating between Images with Diffusion Models	Clinton J. Wang et.al.	2307.12560	null
2023-07-22	Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis	Hao Tang et.al.	2307.12084	link
2023-07-21	PartDiff: Image Super-resolution with Partial Diffusion Models	Kai Zhao et.al.	2307.11926	null
2023-07-21	UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN	Zhaojie Fang et.al.	2307.11530	link
2023-07-21	Attention Consistency Refined Masked Frequency Forgery Representation for Generalizing Face Forgery Detection	Decheng Liu et.al.	2307.11438	link
2023-07-21	Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning	Jian Ma et.al.	2307.11410	link
2023-07-20	Diffusion Sampling with Momentum for Mitigating Divergence Artifacts	Suttisak Wizadwongsa et.al.	2307.11118	link
2023-07-20	Progressive distillation diffusion for raw music generation	Svetlana Pavlova et.al.	2307.10994	null
2023-07-20	Divide & Bind Your Attention for Improved Generative Semantic Nursing	Yumeng Li et.al.	2307.10864	link
2023-07-20	Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation	Fa-Ting Hong et.al.	2307.09906	link
2023-07-19	Compressive Image Scanning Microscope	Ajay Gunalan et.al.	2307.09841	link
2023-07-19	A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images	Lydia Abady et.al.	2307.09822	link
2023-07-19	Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline	Zhigang Chang et.al.	2307.09821	null
2023-07-19	Text2Layer: Layered Image Generation using Latent Diffusion Model	Xinyang Zhang et.al.	2307.09781	null
2023-07-18	AnyDoor: Zero-shot Object-level Image Customization	Xi Chen et.al.	2307.09481	link
2023-07-19	Let’s ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation	Federico Betti et.al.	2307.09416	null
2023-07-18	Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow	Dogucan Yaman et.al.	2307.09368	null
2023-07-18	Augmenting CLIP with Improved Visio-Linguistic Reasoning	Samyadeep Basu et.al.	2307.09233	null
2023-07-18	Jean-Luc Picard at Touché 2023: Comparing Image Generation, Stance Detection and Feature Matching for Image Retrieval for Arguments	Max Moebius et.al.	2307.09172	null
2023-07-18	Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond	Yang Zhao et.al.	2307.08996	null
2023-07-18	PromptCrafter: Crafting Text-to-Image Prompt through Mixed-Initiative Dialogue with LLM	Seungho Baek et.al.	2307.08985	null
2023-07-17	Harnessing the Power of AI based Image Generation Model DALLE 2 in Agricultural Settings	Ranjan Sapkota et.al.	2307.08789	null
2023-07-17	Diffusion Models Beat GANs on Image Classification	Soumik Mukhopadhyay et.al.	2307.08702	null
2023-07-17	Flow Matching in Latent Space	Quan Dao et.al.	2307.08698	link
2023-07-17	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	Tri Dao et.al.	2307.08691	link
2023-07-17	Image Captions are Natural Prompts for Text-to-Image Models	Shiye Lei et.al.	2307.08526	null
2023-07-17	Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data	Kai Katsumata et.al.	2307.08319	null
2023-07-17	Manifold-Guided Sampling in Diffusion Models for Unbiased Image Generation	Xingzhe Su et.al.	2307.08199	null
2023-07-16	Planting a SEED of Vision in Large Language Model	Yuying Ge et.al.	2307.08041	link
2023-07-15	Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?	Jialu Gao et.al.	2307.07837	null
2023-07-18	Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer	Wing-Yin Yu et.al.	2307.07754	link
2023-07-14	GenAssist: Making Image Generation Accessible	Mina Huh et.al.	2307.07589	null
2023-07-14	Generative adversarial networks for data-scarce spectral applications	Juan José García-Esteban et.al.	2307.07454	null
2023-07-13	InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	Yi Wang et.al.	2307.06942	link
2023-07-13	Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation	Yingqing He et.al.	2307.06940	link
2023-07-13	Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models	Moab Arar et.al.	2307.06925	null
2023-07-13	Improving Nonalcoholic Fatty Liver Disease Classification Performance With Latent Diffusion Models	Romain Hardy et.al.	2307.06507	null
2023-07-12	T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation	Kaiyi Huang et.al.	2307.06350	link
2023-07-12	Facial Reenactment Through a Personalized Generator	Ariel Elazary et.al.	2307.06307	null
2023-07-12	CellGAN: Conditional Cervical Cell Synthesis for Augmenting Cytopathological Image Classification	Zhenrong Shen et.al.	2307.06182	link
2023-07-12	Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models	Sanghyun Kim et.al.	2307.05977	link
2023-07-12	DiffuseGAE: Controllable and High-fidelity Image Manipulation from Disentangled Representation	Yipeng Leng et.al.	2307.05899	null
2023-07-12	Precise Image Generation on Current Noisy Quantum Computing Devices	Florian Rehm et.al.	2307.05253	null
2023-07-11	Generative Pretraining in Multimodality	Quan Sun et.al.	2307.05222	link
2023-07-11	TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation	Paul Grimal et.al.	2307.05134	link
2023-07-11	SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation	Zhengxin Lei et.al.	2307.05087	null
2023-07-11	Diffusion idea exploration for art generation	Nikhil Verma et.al.	2307.04978	null
2023-07-10	Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback	Jaskirat Singh et.al.	2307.04749	null
2023-07-11	DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer	Dan Ruta et.al.	2307.04157	null
2023-07-09	Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance	Paul Kuo-Ming Huang et.al.	2307.04081	null
2023-07-08	Measuring the Success of Diffusion Models at Imitating Human Artists	Stephen Casper et.al.	2307.04028	null
2023-07-08	HUMS2023 Data Challenge Result Submission	Dhiraj Neupane et.al.	2307.03871	null
2023-07-07	Synthesizing Forestry Images Conditioned on Plant Phenotype Using a Generative Adversarial Network	Debasmita Pal et.al.	2307.03789	link
2023-07-07	RGB-D Mapping and Tracking in a Plenoxel Radiance Field	Andreas L. Teigen et.al.	2307.03404	link
2023-07-06	VideoGLUE: Video General Understanding Evaluation of Foundation Models	Liangzhe Yuan et.al.	2307.03166	link
2023-07-06	On the Cultural Gap in Text-to-Image Generation	Bingshuai Liu et.al.	2307.02971	null
2023-07-06	Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback	TaeHo Yoon et.al.	2307.02770	link
2023-07-05	Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation	Sébastien Lachapelle et.al.	2307.02598	link
2023-07-05	Diffusion Models for Computational Design at the Example of Floor Plans	Joern Ploennigs et.al.	2307.02511	link
2023-07-05	Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality	Peter Lorenz et.al.	2307.02347	link
2023-07-05	On the Adversarial Robustness of Generative Autoencoders in the Latent Space	Mingfei Lu et.al.	2307.02202	null
2023-07-05	Prompting Diffusion Representations for Cross-Domain Semantic Segmentation	Rui Gong et.al.	2307.02138	null
2023-07-04	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis	Dustin Podell et.al.	2307.01952	link
2023-07-04	A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization	Kshama Kodthalu Shivashankara et.al.	2307.01946	link
2023-07-04	Text + Sketch: Image Compression at Ultra Low Rates	Eric Lei et.al.	2307.01944	link
2023-07-04	Generative Artificial Intelligence Consensus in a Trustless Network	Edward Kim et.al.	2307.01898	null
2023-07-04	Training Energy-Based Models with Diffusion Contrastive Divergences	Weijian Luo et.al.	2307.01668	null
2023-07-04	AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation	Yunqing Zhao et.al.	2307.01465	null
2023-07-03	Squeezing Large-Scale Diffusion Models for Mobile	Jiwoong Choi et.al.	2307.01193	null
2023-07-03	MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion	Shitao Tang et.al.	2307.01097	link
2023-07-03	DifFSS: Diffusion Model for Few-Shot Semantic Segmentation	Weimin Tan et.al.	2307.00773	link
2023-07-02	LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance	Linoy Tsaban et.al.	2307.00522	null
2023-07-01	DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation	Zhuowei Chen et.al.	2307.00300	null
2023-07-01	AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence	Jiarui Wang et.al.	2307.00211	link
2023-06-30	Stay on topic with Classifier-Free Guidance	Guillaume Sanchez et.al.	2306.17806	null
2023-06-30	Practical and Asymptotically Exact Conditional Sampling in Diffusion Models	Luhuan Wu et.al.	2306.17775	link
2023-06-30	Counting Guidance for High Fidelity Text-to-Image Synthesis	Wonjun Kang et.al.	2306.17567	null
2023-06-30	Class-Incremental Learning using Diffusion Model for Distillation and Replay	Quentin Jodelet et.al.	2306.17560	null
2023-06-30	DreamDiffusion: Generating High-Quality Images from Brain EEG Signals	Yunpeng Bai et.al.	2306.16934	link
2023-06-29	CLIPAG: Towards Generator-Free Text-to-Image Generation	Roy Ganz et.al.	2306.16805	null
2023-06-28	Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing	Jiechong Song et.al.	2306.16060	link
2023-06-27	Semi-supervised Multimodal Representation Learning through a Global Workspace	Benjamin Devillers et.al.	2306.15711	link
2023-06-26	A Simple and Effective Baseline for Attentional Generative Adversarial Networks	Mingyu Jin et.al.	2306.14708	link
2023-06-26	Localized Text-to-Image Generation for Free via Cross Attention Control	Yutong He et.al.	2306.14636	null
2023-06-26	A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis	Aishwarya Agarwal et.al.	2306.14544	null
2023-06-26	Progressive Energy-Based Cooperative Learning for Multi-Domain Image-to-Image Translation	Weinan Song et.al.	2306.14448	null
2023-06-26	Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models	Luozhou Wang et.al.	2306.14408	link
2023-06-25	DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data	Jingyuan Zhu et.al.	2306.14153	null
2023-06-24	UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation	Michael Ogezi et.al.	2306.14067	link
2023-06-23	Zero-shot spatial layout conditioning for text-to-image diffusion models	Guillaume Couairon et.al.	2306.13754	null

enhancement & editing

Publish Date	Title	Authors	PDF	Code
2025-07-23	DFDNet: Dynamic Frequency-Guided De-Flare Network	Minglong Xue et.al.	2507.17489	null
2025-07-23	PolarAnything: Diffusion-based Polarimetric Image Synthesis	Kailong Zhang et.al.	2507.17268	null
2025-07-23	UNICE: Training A Universal Image Contrast Enhancer	Ruodai Cui et.al.	2507.17157	null
2025-07-22	ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement	Kahim Wong et.al.	2507.16397	null
2025-07-22	Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling	Chao Zhou et.al.	2507.16240	null
2025-07-22	LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs	Zitong Xu et.al.	2507.16193	null
2025-07-21	SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement	Hanting Li et.al.	2507.15520	null
2025-07-20	EBA-AI: Ethics-Guided Bias-Aware AI for Efficient Underwater Image Enhancement and Coral Reef Monitoring	Lyes Saad Saoud et.al.	2507.15036	null
2025-07-20	Light Future: Multimodal Action Frame Prediction via InstructPix2Pix	Zesen Zhong et.al.	2507.14809	null
2025-07-20	Exploring Scalable Unified Modeling for General Low-Level Vision	Xiangyu Chen et.al.	2507.14801	null
2025-07-18	NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining	Maksim Kuprashevich et.al.	2507.14119	null
2025-07-18	Moodifier: MLLM-Enhanced Emotion-Driven Image Editing	Jiarong Ye et.al.	2507.14024	null
2025-07-18	Global Modeling Matters: A Fast, Lightweight and Effective Baseline for Efficient Image Restoration	Xingyu Jiang et.al.	2507.13663	null
2025-07-16	Wavelet-based Decoupling Framework for low-light Stereo Image Enhancement	Shuangli Du et.al.	2507.12188	null
2025-07-16	Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement	Junyu Lou et.al.	2507.12135	null
2025-07-16	Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints	Jiahao Xia et.al.	2507.11985	null
2025-07-16	A Spatial-Physics Informed Model for 3D Spiral Sample Scanned by SQUID Microscopy	J. Senthilnath et.al.	2507.11853	null
2025-07-14	Expert Operational GANS: Towards Real-Color Underwater Image Restoration	Ozer Can Devecioglu et.al.	2507.11562	null
2025-07-15	EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing	Vassilis Sioros et.al.	2507.11096	null
2025-07-14	Sparse Fine-Tuning of Transformers for Generative Tasks	Wei Chen et.al.	2507.10855	null
2025-07-14	CWNet: Causal Wavelet Network for Low-Light Image Enhancement	Tongshun Zhang et.al.	2507.10689	null
2025-07-14	RefSTAR: Blind Facial Image Restoration with Reference Selection, Transfer, and Reconstruction	Zhicun Yin et.al.	2507.10470	null
2025-07-14	On a class of forward-backward reaction-diffusion systems with local and nonlocal coupling for image restoration	Yihui Tong et.al.	2507.10393	null
2025-07-14	LayLens: Improving Deepfake Understanding through Simplified Explanations	Abhijeet Narang et.al.	2507.10066	null
2025-07-13	A New Wireless Image Transmission System Using Code Index Modulation and Image Enhancement for High-Rate Next Generation Networks	Burak Ahmet Ozden et.al.	2507.09713	null
2025-07-11	FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields	Gwanhyeong Koo et.al.	2507.08285	null
2025-07-11	Single-Step Latent Diffusion for Underwater Image Restoration	Jiayi Wu et.al.	2507.07878	null
2025-07-10	IRAF-SLAM: An Illumination-Robust and Adaptive Feature-Culling Front-End for Visual SLAM in Challenging Environments	Thanh Nguyen Canh et.al.	2507.07752	null
2025-07-10	Degradation-Agnostic Statistical Facial Feature Transformation for Blind Face Restoration in Adverse Weather Conditions	Chang-Hwan Son et.al.	2507.07464	null
2025-07-09	ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation	Sherry X. Chen et.al.	2507.07317	null
2025-07-09	HVI-CIDNet+: Beyond Extreme Darkness for Low-Light Image Enhancement	Qingsen Yan et.al.	2507.06814	null
2025-07-09	Enhancing Diffusion Model Stability for Image Restoration via Gradient Management	Hongjie Wu et.al.	2507.06656	null
2025-07-08	2D Instance Editing in 3D Space	Yuhuan Xie et.al.	2507.05819	null
2025-07-08	Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration	Yuyang Hu et.al.	2507.05604	null
2025-07-07	Simulating Refractive Distortions and Weather-Induced Artifacts for Resource-Constrained Autonomous Perception	Moseli Mots’oehli et.al.	2507.05536	null
2025-07-07	Neural-Driven Image Editing	Pengfei Zhou et.al.	2507.05397	null
2025-07-07	Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing	Chun-Hsiao Yeh et.al.	2507.05259	null
2025-07-07	S $^2$ Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control	Xudong Liu et.al.	2507.04584	null
2025-07-06	Towards Lightest Low-Light Image Enhancement Architecture for Mobile Devices	Guangrui Bai et.al.	2507.04277	null
2025-07-06	Quick Bypass Mechanism of Zero-Shot Diffusion-Based Image Restoration	Yu-Shan Tai et.al.	2507.04207	null
2025-07-05	EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems	Hyunwoo Cho et.al.	2507.03937	null
2025-07-04	Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images	Yuran Dong et.al.	2507.03402	null
2025-07-04	LACONIC: A 3D Layout Adapter for Controllable Image Creation	Léopold Maillard et.al.	2507.03257	null
2025-07-03	From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding	Xiangfeng Wang et.al.	2507.02790	null
2025-07-03	IGDNet: Zero-Shot Robust Underexposed Image Enhancement via Illumination-Guided and Denoising	Hailong Yan et.al.	2507.02445	null
2025-07-03	MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image Enhancement	Fanghai Yi et.al.	2507.02270	null
2025-07-03	SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement	Zeyu Lei et.al.	2507.02252	null
2025-07-02	Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning	Qingdong He et.al.	2507.01908	null
2025-07-02	MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices	Hailong Yan et.al.	2507.01838	null
2025-07-02	ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation	Jimyeong Kim et.al.	2507.01496	null
2025-07-02	QC-OT: Optimal Transport with Quasiconformal Mapping	Yuping Lv et.al.	2507.01456	null
2025-07-02	DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal	Wenjie Liu et.al.	2507.01422	null
2025-07-04	LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling	Huaqiu Li et.al.	2507.00790	null
2025-07-01	Laplace-Mamba: Laplace Frequency Prior-Guided Mamba-CNN Fusion Network for Image Dehazing	Yongzhen Wang et.al.	2507.00501	null
2025-06-30	A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement	Gaozheng Pei et.al.	2506.23676	null
2025-06-30	Oneta: Multi-Style Image Enhancement Using Eigentransformation Functions	Jiwon Kim et.al.	2506.23547	null
2025-06-30	TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity	Yuzhuo Chen et.al.	2506.23484	null
2025-06-29	OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions	Yuanhao Cai et.al.	2506.23361	null
2025-06-29	Layer Decomposition and Morphological Reconstruction for Task-Oriented Infrared Image Enhancement	Siyuan Chai et.al.	2506.23353	null
2025-06-29	Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis	Lei-lei Li et.al.	2506.23263	null
2025-06-29	Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction	Hanlin Dong et.al.	2506.23053	null
2025-07-01	Ovis-U1 Technical Report	Guo-Hua Wang et.al.	2506.23044	null
2025-06-28	Towards Explainable Bilingual Multimodal Misinformation Detection and Localization	Yiwei He et.al.	2506.22930	null
2025-06-28	STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing	Junsung Lee et.al.	2506.22868	null
2025-06-27	Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy	Yuhao Liu et.al.	2506.22432	null
2025-06-27	EAMamba: Efficient All-Around Vision State Space Model for Image Restoration	Yu-Cheng Lin et.al.	2506.22246	null
2025-06-27	ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning	Ming Zhao et.al.	2506.22216	null
2025-06-27	GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles	Mengyi Shan et.al.	2506.21839	null
2025-06-26	Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration	Xin Lu et.al.	2506.21722	null
2025-06-26	Wild refitting for black box prediction	Martin J. Wainwright et.al.	2506.21460	null
2025-06-26	Controllable 3D Placement of Objects with Scene-Aware Diffusion Models	Mohamed Omran et.al.	2506.21446	null
2025-06-26	Learning to See in the Extremely Dark	Hai Jiang et.al.	2506.21132	null
2025-06-26	Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling	Hansam Cho et.al.	2506.21045	null
2025-06-27	DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing	Lingling Cai et.al.	2506.20967	null
2025-06-26	M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization	Ju-Hyeon Nam et.al.	2506.20922	null
2025-06-26	*FaSTA $^$ : Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing**	Advait Gupta et.al.	2506.20911	null
2025-06-25	EditP23: 3D Editing via Propagation of Image Prompts to Multi-View	Roi Bar-On et.al.	2506.20652	null
2025-06-25	TDiR: Transformer based Diffusion for Image Restoration Tasks	Abbas Anwar et.al.	2506.20302	null
2025-06-25	Towards Efficient Exemplar Based Image Editing with Multimodal VLMs	Avadhoot Jadhav et.al.	2506.20155	null
2025-06-24	A Comparative Study of NAFNet Baselines for Image Restoration	Vladislav Esaulov et.al.	2506.19845	null
2025-06-24	SceneCrafter: Controllable Multi-View Driving Scene Editing	Zehao Zhu et.al.	2506.19488	null
2025-06-24	NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs	Khuram Naveed et.al.	2506.19387	null
2025-06-23	Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models	Ilia Beletskii et.al.	2506.19103	null
2025-06-23	Let Your Video Listen to Your Music!	Xinyu Zhang et.al.	2506.18881	null
2025-06-25	OmniGen2: Exploration to Advanced Multimodal Generation	Chenyuan Wu et.al.	2506.18871	null
2025-06-23	Enhancing Image Restoration Transformer via Adaptive Translation Equivariance	JiaKui Hu et.al.	2506.18520	null
2025-06-23	CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing	Dinh-Khoi Vo et.al.	2506.18438	null
2025-06-23	BSMamba: Brightness and Semantic Modeling for Long-Range Interaction in Low-Light Image Enhancement	Tongshun Zhang et.al.	2506.18346	null
2025-06-23	A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement	Muhammad Azeem Aslam et.al.	2506.18323	null
2025-06-23	Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction	Han Zhang et.al.	2506.18290	null
2025-06-22	CmFNet: Cross-modal Fusion Network for Weakly-supervised Segmentation of Medical Images	Dongdong Meng et.al.	2506.18042	null
2025-06-20	Reversing Flow for Image Restoration	Haina Qin et.al.	2506.16961	null
2025-06-20	Visual-Instructed Degradation Diffusion for All-in-One Image Restoration	Wenyang Luo et.al.	2506.16960	link
2025-06-20	FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation	Fan Yang et.al.	2506.16806	null
2025-06-20	Temperature calibration of surface emissivities with an improved thermal image enhancement network	Ning Chu et.al.	2506.16803	null
2025-06-23	RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought	Junbo Qiao et.al.	2506.16796	link
2025-06-19	Arch-Router: Aligning LLM Routing with Human Preferences	Co Tran et.al.	2506.16655	null
2025-06-19	Integrating Generative Adversarial Networks and Convolutional Neural Networks for Enhanced Traffic Accidents Detection and Analysis	Zhenghao Xi et.al.	2506.16186	null
2025-06-19	MoiréXNet: Adaptive Multi-Scale Demoiréing with Linear Attention Test-Time Training and Truncated Flow Matching Prior	Liangyan Li et.al.	2506.15929	null
2025-06-18	VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics	Josef Kuchař et.al.	2506.15903	null
2025-06-17	Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits	Taisei Kato et.al.	2506.14624	null
2025-06-17	Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching	Giacomo Meanti et.al.	2506.14605	link
2025-06-17	Exploring Diffusion with Test-Time Training on Efficient Image Restoration	Rongchang Lu et.al.	2506.14541	null
2025-06-17	Causally Steered Diffusion for Automated Video Counterfactual Generation	Nikos Spyrou et.al.	2506.14404	link
2025-06-18	DREAM: On hallucinations in AI-generated content for nuclear medicine imaging	Menghua Xia et.al.	2506.13995	null
2025-06-15	Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing	Zhuoying Li et.al.	2506.13827	null
2025-06-16	Exploiting the Exact Denoising Posterior Score in Training-Free Guidance of Diffusion Models	Gregory Bellchambers et.al.	2506.13614	null
2025-06-16	AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing	Biao Yang et.al.	2506.13301	null
2025-06-15	ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Chenglin Wang et.al.	2506.12830	null
2025-06-15	Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution	Hang Xu et.al.	2506.12738	null
2025-06-14	An Iterative PDE Based Illumination Restoration Scheme for Image Enhancement	Dragos-Patru Covei et.al.	2506.12560	null
2025-06-14	Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts	Saemee Choi et.al.	2506.12520	null
2025-06-14	UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers	Yuantao Wang et.al.	2506.12324	null
2025-06-13	SphereDrag: Spherical Geometry-Aware Panoramic Image Editing	Zhiao Feng et.al.	2506.11863	null
2025-06-12	VINCIE: Unlocking In-context Image Editing from Video	Leigang Qu et.al.	2506.10941	null
2025-06-12	Edit360: 2D Image Edits to 3D Assets from Any Angle	Junchao Huang et.al.	2506.10507	null
2025-06-11	LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning	Chenjian Gao et.al.	2506.10082	null
2025-06-11	Text-Aware Image Restoration with Diffusion Models	Jaewon Min et.al.	2506.09993	null
2025-06-11	EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits	Ron Yosef et.al.	2506.09988	null
2025-06-11	ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models	Qin Zhou et.al.	2506.09740	null
2025-06-11	Ming-Omni: A Unified Multimodal Model for Perception and Generation	Inclusion AI et.al.	2506.09344	link
2025-06-11	Fine-Grained Spatially Varying Material Selection in Images	Julia Guerrero-Viu et.al.	2506.09023	null
2025-06-10	Do Concept Replacement Techniques Really Erase Unacceptable Concepts?	Anudeep Das et.al.	2506.08991	null
2025-06-10	RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping	Yang Bai et.al.	2506.08632	null
2025-06-09	Highly Compressed Tokenizer Can Generate Without Training	L. Lao Beyer et.al.	2506.08257	link
2025-06-09	PairEdit: Learning Semantic Variations for Exemplar-based Image Editing	Haoguang Lu et.al.	2506.07992	link
2025-06-09	Diffusion Counterfactual Generation with Semantic Abduction	Rajat Rasal et.al.	2506.07883	link
2025-06-09	M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration	Yongzhen Wang et.al.	2506.07814	null
2025-06-09	Consistent Video Editing as Flow-Driven Image-to-Video Generation	Ge Wang et.al.	2506.07713	null
2025-06-09	DragNeXt: Rethinking Drag-Based Image Editing	Yuan Zhou et.al.	2506.07611	null
2025-06-09	Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding	Boyu Chen et.al.	2506.07576	null
2025-06-08	Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI	Aditya Chakravarty et.al.	2506.07286	null
2025-06-08	TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation	Min-Jung Kim et.al.	2506.07205	null
2025-06-08	A PDE-Based Image Restoration Method: Mathematical Analysis and Implementation	Dragos-Patru Covei et.al.	2506.07132	null
2025-06-06	A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance	Anees Nashath Shaik et.al.	2506.06578	null
2025-06-06	Bidirectional Image-Event Guided Low-Light Image Enhancement	Zhanwen Liu et.al.	2506.06120	null
2025-06-06	Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models	Yifu Qiu et.al.	2506.06006	link
2025-06-06	FADE: Frequency-Aware Diffusion Model Factorization for Video Editing	Yixuan Zhu et.al.	2506.05934	link
2025-06-06	NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces	Pierluigi Zama Ramirez et.al.	2506.05815	null
2025-06-05	UniRes: Universal Image Restoration for Complex Degradations	Mo Zhou et.al.	2506.05599	null
2025-06-05	OpenRR-5k: A Large-Scale Benchmark for Reflection Removal in the Wild	Jie Cai et.al.	2506.05482	null
2025-06-05	Towards Reliable Identification of Diffusion-based Image Manipulations	Alex Costanzino et.al.	2506.05466	null
2025-06-05	Degradation-Aware Image Enhancement via Vision-Language Classification	Jie Cai et.al.	2506.05450	null
2025-06-05	SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training	Jianyi Wang et.al.	2506.05301	null
2025-06-06	SeedEdit 3.0: Fast and High-Quality Generative Image Editing	Peng Wang et.al.	2506.05083	null
2025-06-05	FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing	Guangzhao Li et.al.	2506.05046	null
2025-06-05	Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking	Yu-Feng Chen et.al.	2506.04879	link
2025-06-05	Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement	Niki Martinel et.al.	2506.04753	null
2025-06-04	A Poisson-Guided Decomposition Network for Extreme Low-Light Image Enhancement	Isha Rao et.al.	2506.04470	null
2025-06-04	HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation	Hermann Kumbong et.al.	2506.04421	null
2025-06-04	Is Perturbation-Based Image Protection Disruptive to Image Editing?	Qiuyu Tang et.al.	2506.04394	null
2025-06-04	UNIC: Unified In-Context Video Editing	Zixuan Ye et.al.	2506.04216	null
2025-06-05	FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers	Xuanhua He et.al.	2506.04213	null
2025-06-04	Image Editing As Programs with Diffusion Models	Yujia Hu et.al.	2506.04158	null
2025-06-04	Joint Video Enhancement with Deblurring, Super-Resolution, and Frame Interpolation Network	Giyong Choi et.al.	2506.03892	null
2025-06-03	RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions	Bimsara Pathiraja et.al.	2506.03448	null
2025-06-04	UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	Bin Lin et.al.	2506.03147	null
2025-06-03	ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions	Di Chang et.al.	2506.03107	null
2025-06-03	NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results	Xiaohong Liu et.al.	2506.02875	null
2025-06-03	ControlMambaIR: Conditional Controls with State-Space Model for Image Restoration	Cheng Yang et.al.	2506.02633	null
2025-06-03	DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing	Zixiang Li et.al.	2506.02560	null
2025-06-03	RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers	Yan Gong et.al.	2506.02528	null
2025-06-04	NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution	Marcos V. Conde et.al.	2506.02197	null
2025-06-02	IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout	Fei Shen et.al.	2506.01949	null
2025-06-02	RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report	Marcos V. Conde et.al.	2506.01947	null
2025-06-04	MedEBench: Revisiting Text-instructed Image Editing on Medical Domain	Minghao Liu et.al.	2506.01921	null
2025-05-30	MiniMax-Remover: Taming Bad Noise Helps Video Object Removal	Bojia Zi et.al.	2505.24873	null
2025-05-30	RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement	Raman Jha et.al.	2505.24705	link
2025-05-30	IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models	Hanting Wang et.al.	2505.24406	link
2025-05-30	Boosting All-in-One Image Restoration via Self-Improved Privilege Learning	Gang Wu et.al.	2505.24207	link
2025-05-29	Cora: Correspondence-aware image editing using few step diffusion	Amirhossein Almohammadi et.al.	2505.23907	null
2025-05-29	LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers	Yusuf Dalva et.al.	2505.23758	null
2025-05-29	Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features	Ziyong Wang et.al.	2505.23586	null
2025-05-29	Video Editing for Audio-Visual Dubbing	Binyamin Manela et.al.	2505.23406	link
2025-05-29	Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging	Ping Wang et.al.	2505.23180	link
2025-05-29	FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing	Jeongsol Kim et.al.	2505.23145	link
2025-05-29	Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing	Tongtong Su et.al.	2505.23134	link
2025-05-29	CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing	Yuka Ogino et.al.	2505.23102	null
2025-05-29	URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration	Rui Xu et.al.	2505.23068	link
2025-05-29	Vision-Based Assistive Technologies for People with Cerebral Visual Impairment: A Review and Focus Study	Bhanuka Gamage et.al.	2505.22983	null
2025-05-29	EquiReg: Equivariance Regularized Diffusion for Inverse Problems	Bahareh Tolooshams et.al.	2505.22973	null
2025-05-28	From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration	Junyu Fan et.al.	2505.22284	null
2025-05-28	GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement	Zhihong Tang et.al.	2505.22021	null
2025-05-28	Reference-Guided Identity Preserving Face Restoration	Mo Zhou et.al.	2505.21905	null
2025-05-28	Broadening Our View: Assistive Technology for Cerebral Visual Impairment	Bhanuka Gamage et.al.	2505.21875	null
2025-05-27	BaryIR: Learning Multi-Source Unified Representation in Continuous Barycenter Space for Generalizable All-in-One Image Restoration	Xiaole Tang et.al.	2505.21637	null
2025-05-27	Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion	Yang Yang et.al.	2505.21593	null
2025-05-27	Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks	Kyzyl Monteiro et.al.	2505.20916	null
2025-05-28	See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction	Yuan Wu et.al.	2505.20641	link
2025-05-27	InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling	Xiaoxiao Jiang et.al.	2505.20600	null
2025-05-28	PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy	Shuhao Guan et.al.	2505.20429	null
2025-05-26	What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models	Lorenzo Baraldi et.al.	2505.20405	null
2025-05-26	ImgEdit: A Unified Image Editing Dataset and Benchmark	Yang Ye et.al.	2505.20275	link
2025-05-26	Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement	Afrah Shaahid et.al.	2505.19895	null
2025-05-26	StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation	Yi Wu et.al.	2505.19874	null
2025-05-26	A Regularization-Guided Equivariant Approach for Image Restoration	Yulu Bai et.al.	2505.19799	link
2025-05-26	TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs	Juntong Wang et.al.	2505.19535	null
2025-05-25	Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions	Chenrui Ma et.al.	2505.19352	null
2025-05-25	Improving Novel view synthesis of 360 $^\circ$ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images	Guangan Chen et.al.	2505.19264	link
2025-05-25	Benchmarking Laparoscopic Surgical Image Restoration and Beyond	Jialun Pei et.al.	2505.19161	link
2025-05-25	SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation	Shenggan Cheng et.al.	2505.19151	null
2025-05-25	MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection	Shuyu Wang et.al.	2505.19149	null
2025-05-25	Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition	Xiaoyang Liu et.al.	2505.19120	link
2025-05-23	RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2505.18047	null
2025-05-23	DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval	Yuxin Yang et.al.	2505.17796	null
2025-05-23	R-Genie: Reasoning-Guided Generative Image Editing	Dong Zhang et.al.	2505.17768	null
2025-05-23	MODEM: A Morton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery	Hainuo Wang et.al.	2505.17581	link
2025-05-23	Dual Ascent Diffusion for Inverse Problems	Minseo Kim et.al.	2505.17353	null
2025-05-22	Forward-only Diffusion Probabilistic Models	Ziwei Luo et.al.	2505.16733	link
2025-05-22	KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models	Yongliang Wu et.al.	2505.16707	null
2025-05-22	Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration	Yuetong Liu et.al.	2505.16479	null
2025-05-22	NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment	Shuhao Han et.al.	2505.16314	null
2025-05-26	Understanding Generative AI Capabilities in Everyday Image Editing Tasks	Mohammad Reza Taesiri et.al.	2505.16181	null
2025-05-22	Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey	Liyan Wang et.al.	2505.16161	link
2025-05-22	Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention	Yuang Ai et.al.	2505.16157	null
2025-05-21	FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models	Zhen Sun et.al.	2505.15644	link
2025-05-22	Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives	Yisi Luo et.al.	2505.15222	link
2025-05-21	AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection	Zhipei Xu et.al.	2505.15173	null
2025-05-20	UHD Image Dehazing via anDehazeFormer with Atmospheric-aware KV Cache	Pu Wang et.al.	2505.14010	null
2025-05-19	Adaptive Image Restoration for Video Surveillance: A Real-Time Approach	Muhammad Awais Amin et.al.	2505.13130	null
2025-05-19	LatentINDIGO: An INN-Guided Latent Diffusion Algorithm for Image Restoration	Di You et.al.	2505.12935	null
2025-05-19	Towards a Universal Image Degradation Model via Content-Degradation Disentanglement	Wenbo Yang et.al.	2505.12860	null
2025-05-19	Degradation-Aware Feature Perturbation for All-in-One Image Restoration	Xiangpeng Tian et.al.	2505.12630	link
2025-05-20	DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model	Siwei Xia et.al.	2505.12427	link
2025-05-18	Trustworthy Image Super-Resolution via Generative Pseudoinverse	Andreas Floros et.al.	2505.12375	link
2025-05-18	PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement	ZhanFeng Feng et.al.	2505.12266	link
2025-05-18	From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations	Yuzhi Li et.al.	2505.12237	null
2025-05-20	CompBench: Benchmarking Complex Instruction-guided Image Editing	Bohan Jia et.al.	2505.12200	null
2025-05-16	X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models	Valentina Bazyleva et.al.	2505.11753	null
2025-05-16	GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing	Yusu Qian et.al.	2505.11493	null
2025-05-16	Diff-Unfolding: A Model-Based Score Learning Framework for Inverse Problems	Yuanhao Wang et.al.	2505.11393	null
2025-05-16	Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement	Nirjhor Datta et.al.	2505.11246	link
2025-05-15	torchmfbd: a flexible multi-object multi-frame blind deconvolution code	A. Asensio Ramos et.al.	2505.10639	link
2025-05-19	Super-Resolution Generative Adversarial Networks based Video Enhancement	Kağan ÇETİN et.al.	2505.10589	null
2025-05-15	3D-Fixup: Advancing Photo Editing with 3D Priors	Yen-Chi Cheng et.al.	2505.10566	null
2025-05-14	Don’t Forget your Inverse DDIM for Image Editing	Guillermo Gomez-Trenado et.al.	2505.09571	null
2025-05-14	PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement	Tong Li et.al.	2505.09196	null
2025-05-15	IntrinsicEdit: Precise generative image manipulation in intrinsic space	Linjie Lyu et.al.	2505.08889	null
2025-05-13	Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations	Petrus H. Zwart et.al.	2505.08176	null
2025-05-12	Image Restoration via Integration of Optimal Control Techniques and the Hamilton-Jacobi-Bellman Equation	Dragos-Patru Covei et.al.	2505.07699	null
2025-05-12	Generalizable Pancreas Segmentation via a Dual Self-Supervised Learning Framework	Jun Li et.al.	2505.07165	null
2025-05-11	DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models	Junhao Xia et.al.	2505.07057	null
2025-05-10	UnfoldIR: Rethinking Deep Unfolding Network in Illumination Degradation Image Restoration	Chunming He et.al.	2505.06683	null
2025-05-10	Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach	Minting Pan et.al.	2505.06482	null
2025-05-09	MonetGPT: Solving Puzzles Enhances MLLMs’ Image Retouching Skills	Niladri Shekhar Dutt et.al.	2505.06176	null
2025-05-09	A review of advancements in low-light image enhancement using deep learning	Fangxue Liu et.al.	2505.05759	null
2025-05-08	Semantic Style Transfer for Enhancing Animal Facial Landmark Detection	Anadil Hussein et.al.	2505.05640	null
2025-05-08	A Preliminary Study for GPT-4o on Image Restoration	Hao Yang et.al.	2505.05621	link
2025-05-08	SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation	Yonwoo Choi et.al.	2505.05475	link
2025-05-11	Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation	Chao Liao et.al.	2505.05472	null
2025-05-08	EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution	Haizhen Xie et.al.	2505.05209	null
2025-05-12	MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models	Hongyang Zhu et.al.	2505.05101	null
2025-05-08	ADNP-15: An Open-Source Histopathological Dataset for Neuritic Plaque Segmentation in Human Brain Whole Slide Images with Frequency Domain Image Enhancement for Stain Normalization	Chenxi Zhao et.al.	2505.05041	null
2025-05-08	GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing	Tong Wang et.al.	2505.04915	null
2025-05-07	Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers	Divyansh Srivastava et.al.	2505.04718	null
2025-05-07	Multi-turn Consistent Image Editing	Zijun Zhou et.al.	2505.04320	null
2025-05-07	TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image Enhancement	Yi Li et.al.	2505.04281	link
2025-05-07	Regional chemical potential analysis for material surfaces	Masahiro Fukuda et.al.	2505.04053	null
2025-05-04	Video Forgery Detection for Surveillance Cameras: A Review	Noor B. Tayfor et.al.	2505.03832	null
2025-05-06	DDaTR: Dynamic Difference-aware Temporal Residual Network for Longitudinal Radiology Report Generation	Shanshan Song et.al.	2505.03401	link
2025-05-05	NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results	Nikolay Safonov et.al.	2505.03007	link
2025-05-07	Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction	Inclusion AI et.al.	2505.02471	link
2025-05-05	MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection	Jiaqi Zhang et.al.	2505.02441	link
2025-05-05	SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing	Ming Li et.al.	2505.02370	link
2025-05-04	HiLLIE: Human-in-the-Loop Training for Low-Light Image Enhancement	Xiaorui Zhao et.al.	2505.02134	null
2025-05-03	ImageR: Enhancing Bug Report Clarity by Screenshots	Xuchen Tan et.al.	2505.01925	null
2025-05-03	Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement	Haofan Wu et.al.	2505.01831	null
2025-05-02	Deblurring fission fragment mass distributions	Pierre Nzabahimana et.al.	2505.01294	null
2025-05-02	RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement	Kui Jiang et.al.	2505.01224	link
2025-05-02	Improving Editability in Image Generation with Layer-wise Memory	Daneul Kim et.al.	2505.01079	null
2025-05-02	A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories	Ziqi Ding et.al.	2505.01067	null
2025-05-02	Photoshop Batch Rendering Using Actions for Stylistic Video Editing	Tessa De La Fuente et.al.	2505.01001	null
2025-05-01	InstructAttribute: Fine-grained Object Attributes editing with Instruction	Xingxi Yin et.al.	2505.00751	null
2025-05-01	Controllable Weather Synthesis and Removal with Video Diffusion Models	Chih-Hao Lin et.al.	2505.00704	null
2025-05-01	GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution	Aditya Arora et.al.	2505.00687	null
2025-05-01	Towards Scalable Human-aligned Benchmark for Text-guided Image Editing	Suho Ryu et.al.	2505.00502	link
2025-04-30	DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration	Hebaixu Wang et.al.	2504.21487	link
2025-04-30	VR-FuseNet: A Fusion of Heterogeneous Fundus Data and Explainable Deep Network for Diabetic Retinopathy Classification	Shamim Rahim Refat et.al.	2504.21464	null
2025-04-29	In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer	Zechuan Zhang et.al.	2504.20690	null
2025-04-30	PixelHacker: Image Inpainting with Structural and Semantic Consistency	Ziyang Xu et.al.	2504.20438	null
2025-04-27	FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement	Kangbiao Shi et.al.	2504.19295	null
2025-04-27	Marine Snow Removal Using Internally Generated Pseudo Ground Truth	Alexandra Malyugina et.al.	2504.19289	null
2025-04-27	Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting	Xiaofeng Jin et.al.	2504.19261	null
2025-04-27	CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes	Tuan Nguyen et.al.	2504.19212	null
2025-04-27	Adaptive Dual-domain Learning for Underwater Image Enhancement	Lingtao Peng et.al.	2504.19198	link
2025-04-27	DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning	Jialang Lu et.al.	2504.19127	null
2025-04-26	REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models	Gal Almog et.al.	2504.18989	link
2025-04-24	DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing	Aniruddha Bala et.al.	2504.17894	null
2025-04-24	VEU-Bench: Towards Comprehensive Understanding of Video Editing	Bozheng Li et.al.	2504.17828	null
2025-04-24	Dual Prompting Image Restoration with Diffusion Transformers	Dehong Kong et.al.	2504.17825	null
2025-04-28	Step1X-Edit: A Practical Framework for General Image Editing	Shiyu Liu et.al.	2504.17761	link
2025-04-24	DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model	Zhanwen Liu et.al.	2504.17732	null
2025-04-24	Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields	Zhuo He et.al.	2504.17712	null
2025-04-24	Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems	Jaegang Jo et.al.	2504.17368	null
2025-04-24	I-INR: Iterative Implicit Neural Representations	Ali Haider et.al.	2504.17364	null
2025-04-24	Enhancing Variational Autoencoders with Smooth Robust Latent Encoding	Hyomin Lee et.al.	2504.17219	null
2025-04-23	RouteWinFormer: A Route-Window Transformer for Middle-range Attention in Image Restoration	Qifan Li et.al.	2504.16637	null
2025-04-23	Cross Paradigm Representation and Alignment Transformer for Image Deraining	Shun Zou et.al.	2504.16455	null
2025-04-22	Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework	Xinyuan Song et.al.	2504.16016	null
2025-04-22	Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models	Dasol Jeong et.al.	2504.15723	null
2025-04-24	Vidi: Large Multimodal Models for Video Understanding and Editing	Vidi Team et.al.	2504.15681	null
2025-04-22	AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization	Jinda Lu et.al.	2504.15619	null
2025-04-22	SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking	Yunfeng Li et.al.	2504.15609	link
2025-04-22	InstaRevive: One-Step Image Enhancement via Dynamic Score Matching	Yixuan Zhu et.al.	2504.15513	null
2025-04-21	MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World	Ankit Dhiman et.al.	2504.15397	null
2025-04-21	Plug-and-Play Versatile Compressed Video Enhancement	Huimin Zeng et.al.	2504.15380	null
2025-04-21	Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration	Junyuan Deng et.al.	2504.15159	null
2025-04-21	Structure-guided Diffusion Transformer for Low-Light Image Enhancement	Xiangchen Yin et.al.	2504.15054	null
2025-04-21	Distribution-aware Dataset Distillation for Efficient Image Restoration	Zhuoran Zheng et.al.	2504.14826	null
2025-04-20	MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation	Siyi Jiao et.al.	2504.14606	null
2025-04-19	Visual Prompting for One-shot Controllable Video Editing without Inversion	Zhengbo Zhang et.al.	2504.14335	null
2025-04-19	Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation	Bin Ren et.al.	2504.14249	null
2025-04-19	PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling	Alara Dirik et.al.	2504.14219	null
2025-04-18	Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design	Wei Dong et.al.	2504.14075	link
2025-04-18	Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation	Fulvio Sanguigni et.al.	2504.14011	null
2025-04-18	Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing	Joowon Kim et.al.	2504.13490	null
2025-04-18	Circular Image Deturbulence using Quasi-conformal Geometry	Chu Chen et.al.	2504.13432	null
2025-04-17	Image Editing with Diffusion Models: A Survey	Jia Wang et.al.	2504.13226	null
2025-04-17	$\texttt{Complex-Edit}$ : CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark	Siwei Yang et.al.	2504.13143	null
2025-04-17	UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models	Guanlong Jiao et.al.	2504.13109	null
2025-04-17	Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval	WonJun Moon et.al.	2504.13035	null
2025-04-17	Image-Editing Specialists: An RLAIF Approach for Diffusion Models	Elior Benarous et.al.	2504.12833	link
2025-04-17	Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal	Inzamamul Alam et.al.	2504.12809	link
2025-04-17	SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding	Qianqian Sun et.al.	2504.12704	null
2025-04-17	AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting	Xin Su et.al.	2504.12605	null
2025-04-16	Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling	Zhihua Wang et.al.	2504.12204	link
2025-04-16	Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline	Joanne Lin et.al.	2504.12169	null
2025-04-16	Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging	Tristan S. W. Stevens et.al.	2504.12154	null
2025-04-17	DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency	Mengshi Qi et.al.	2504.12080	link
2025-04-17	Understanding Attention Mechanism in Video Diffusion Models	Bingyan Liu et.al.	2504.12027	null
2025-04-16	Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach	Lvpan Cai et.al.	2504.11922	link
2025-04-16	Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement	Xingxing Yang et.al.	2504.11896	null
2025-04-16	HyperKING: Quantum-Classical Generative Adversarial Networks for Hyperspectral Image Restoration	Chia-Hsiang Lin et.al.	2504.11782	null
2025-04-15	Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain	Pengcheng Zheng et.al.	2504.11286	null
2025-04-15	UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques	Pedro Diaz-Garcia et.al.	2504.11063	null
2025-04-15	AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent	Pu Wang et.al.	2504.10978	null
2025-04-15	An Efficient and Mixed Heterogeneous Model for Image Restoration	Yubin Gu et.al.	2504.10967	link
2025-04-14	Enhancing Image Restoration through Learning Context-Rich and Detail-Accurate Features	Hu Gao et.al.	2504.10558	link
2025-04-14	Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing	Taihang Hu et.al.	2504.10434	link
2025-04-14	PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems	Maud Biquard et.al.	2504.10375	null
2025-04-14	Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis	Kaiwen Zheng et.al.	2504.10351	null
2025-04-14	Analysis of Attention in Video Diffusion Transformers	Yuxin Wen et.al.	2504.10317	null
2025-04-14	VibrantLeaves: A principled parametric image generator for training deep restoration models	Raphael Achddou et.al.	2504.10201	link
2025-04-14	Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction	Yucheng Lu et.al.	2504.10080	null
2025-04-14	Progressive Transfer Learning for Multi-Pass Fundus Image Restoration	Uyen Phan et.al.	2504.10025	null
2025-04-14	Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration	Gang Wu et.al.	2504.09973	link
2025-04-13	SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow	Kenan Tang et.al.	2504.09697	link
2025-04-13	CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models	Pooja Guhan et.al.	2504.09472	null
2025-04-11	ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration	Yongsheng Yu et.al.	2504.08591	null
2025-04-11	CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model	Ruohao Zhan et.al.	2504.08259	null
2025-04-11	VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions	Ziyan Liu et.al.	2504.08219	null
2025-04-10	POEM: Precise Object-level Editing via MLLM control	Marco Schouten et.al.	2504.08111	null
2025-04-10	Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement	Daniel Torres et.al.	2504.07810	null
2025-04-10	Learning Universal Features for Generalizable Image Forgery Localization	Hengrun Zhao et.al.	2504.07462	link
2025-04-10	Synthetic CT Generation from Time-of-Flight Non-Attenutaion-Corrected PET for Whole-Body PET Attenuation Correction	Weijie Chen et.al.	2504.07450	null
2025-04-10	Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing	Chenxi Sun et.al.	2504.07424	null
2025-04-09	Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model	Yingjie Zhou et.al.	2504.07148	null
2025-04-08	VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing	Juan Luis Gonzalez Bello et.al.	2504.07146	null
2025-04-09	FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution	Gene Chou et.al.	2504.07093	link
2025-04-09	Rethinking LayerNorm in Image Restoration Transformers	MinKyu Lee et.al.	2504.06629	null
2025-04-08	AstroClearNet: Deep image prior for multi-frame astronomical image restoration	Yashil Sukurdeep et.al.	2504.06463	null
2025-04-08	Transfer between Modalities with MetaQueries	Xichen Pan et.al.	2504.06256	null
2025-04-08	Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model	Qi Mao et.al.	2504.05594	null
2025-04-08	TAPNext: Tracking Any Point (TAP) as Next Token Prediction	Artem Zholus et.al.	2504.05579	null
2025-04-07	CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models	Kavana Venkatesh et.al.	2504.05306	null
2025-04-07	Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion	Xingyu Hu et.al.	2504.05164	null
2025-04-07	DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration	Jiamei Xiong et.al.	2504.05135	null
2025-04-08	Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision	Yuandong Pu et.al.	2504.04903	null
2025-04-07	Content-Aware Transformer for All-in-one Image Restoration	Gang Wu et.al.	2504.04869	link
2025-04-07	Inland Waterway Object Detection in Multi-environment: Dataset and Approach	Shanshan Wang et.al.	2504.04835	null
2025-04-07	Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing	Hui Liu et.al.	2504.04784	null
2025-04-05	JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration	Yunlong Lin et.al.	2504.04158	null
2025-04-07	MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Wulin Xie et.al.	2504.03641	null
2025-04-04	Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal	Yuyang Hu et.al.	2504.03607	null
2025-04-04	Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning	Lucas Choi et.al.	2504.03168	null
2025-04-04	Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices	Yang He et.al.	2504.03155	null
2025-04-03	How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models	Pascal Chang et.al.	2504.03072	null
2025-04-03	VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning	Xianwei Zhuang et.al.	2504.02949	link
2025-04-03	Concept Lancet: Image Editing with Compositional Representation Transplant	Jinqi Luo et.al.	2504.02828	null
2025-04-03	GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation	Zhiyuan Yan et.al.	2504.02782	link
2025-04-03	RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models	ZhongLi Fang et.al.	2504.02640	null
2025-04-03	Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement	Hesong Li et.al.	2504.02555	link
2025-04-03	HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement	Hantang Li et.al.	2504.02373	null
2025-04-03	Brightness Perceiving for Recursive Low-Light Image Enhancement	Haodian Wang et.al.	2504.02362	link
2025-04-03	SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW	Masakazu Yoshimura et.al.	2504.02345	null
2025-04-02	FreSca: Unveiling the Scaling Space in Diffusion Models	Chao Huang et.al.	2504.02154	null
2025-04-03	ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement	Runhui Huang et.al.	2504.01934	null
2025-04-02	A Diffusion-Based Framework for Occluded Object Movement	Zheng-Peng Duan et.al.	2504.01873	null
2025-04-02	Bridge the Gap between SNN and ANN for Image Restoration	Xin Su et.al.	2504.01755	null
2025-04-01	Deconver: A Deconvolutional Network for Medical Image Segmentation	Pooya Ashtari et.al.	2504.00302	link
2025-03-31	InstructRestore: Region-Customized Image Restoration with Human Instructions	Shuaizheng Liu et.al.	2503.24357	link
2025-03-31	AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents	Jiaxiang Chen et.al.	2503.23948	link
2025-03-31	Training-Free Text-Guided Image Editing with Visual Autoregressive Model	Yufei Wang et.al.	2503.23897	link
2025-03-31	3D Dental Model Segmentation with Geometrical Boundary Preserving	Shufan Xi et.al.	2503.23702	link
2025-03-30	Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging	Amar Kumar et.al.	2503.23618	null
2025-03-30	ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025	Tianming Liang et.al.	2503.23509	link
2025-03-30	SketchVideo: Sketch-based Video Generation and Editing	Feng-Lin Liu et.al.	2503.23284	null
2025-03-29	A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery	Pengyu Chen et.al.	2503.23200	null
2025-03-29	FreeInv: Free Lunch for Improving DDIM Inversion	Yuxiang Bao et.al.	2503.23035	null
2025-03-29	indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy	Ashesh Ashesh et.al.	2503.22983	null
2025-03-28	RELD: Regularization by Latent Diffusion Models for Image Restoration	Pasquale Cascarano et.al.	2503.22563	null
2025-03-28	Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance	Haijie Yang et.al.	2503.22225	null
2025-03-27	Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration	Yujie Chen et.al.	2503.21970	null
2025-03-28	LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing	Achint Soni et.al.	2503.21541	link
2025-03-27	Invert2Restore: Zero-Shot Degradation-Blind Image Restoration	Hamadi Chihaoui et.al.	2503.21486	null
2025-03-27	Diffusion Image Prior	Hamadi Chihaoui et.al.	2503.21410	null
2025-03-26	Synthetic Video Enhances Physical Fidelity in Video Synthesis	Qi Zhao et.al.	2503.20822	null
2025-03-26	Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising	Yan-Bo Lin et.al.	2503.20782	null
2025-03-26	Underwater Image Enhancement by Convolutional Spiking Neural Networks	Vidya Sudevan et.al.	2503.20485	link
2025-03-26	EditCLIP: Representation Learning for Image Editing	Qian Wang et.al.	2503.20318	link
2025-03-26	Wan: Open and Advanced Large-Scale Video Generative Models	WanTeam et.al.	2503.20314	link
2025-03-26	InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction	Yuhui Wu et.al.	2503.20287	link
2025-03-26	Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration	Shihao Zhou et.al.	2503.20174	null
2025-03-25	FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model	Jun Zhou et.al.	2503.19839	null
2025-03-25	LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset	Manjushree Aithal et.al.	2503.19804	null
2025-03-24	FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing	Yufan Ren et.al.	2503.19191	null
2025-03-24	LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment	Haoran Wang et.al.	2503.18640	null
2025-03-25	Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning	Sherry X. Chen et.al.	2503.18406	link
2025-03-24	Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance	Sicong Feng et.al.	2503.18386	null
2025-03-24	MaSS13K: A Matting-level Semantic Segmentation Benchmark	Chenxi Xie et.al.	2503.18364	link
2025-03-23	Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance	Harang Ju et.al.	2503.18238	link
2025-03-25	Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods	Yuzhi Li et.al.	2503.17975	null
2025-03-23	Deep Learning Assisted Denoising of Experimental Micrographs	Owais Ahmad et.al.	2503.17945	null
2025-03-23	Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach	Zhi Zhang et.al.	2503.17937	null
2025-03-23	Cat-AIR: Content and Task-Aware All-in-One Image Restoration	Jiachen Jiang et.al.	2503.17915	null
2025-03-23	What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images	Dongheng Lin et.al.	2503.17899	null
2025-03-21	HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks	Maria Pilligua et.al.	2503.17276	null
2025-03-21	Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks	Haijin Zeng et.al.	2503.16930	null
2025-03-21	DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics	Yihan Hu et.al.	2503.16795	null
2025-03-20	Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems	Teresa Klatzer et.al.	2503.16222	null
2025-03-20	FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing	Tianyi Wei et.al.	2503.16153	null
2025-03-20	Single Image Iterative Subject-driven Generation and Editing	Yair Shpitzer et.al.	2503.16025	link
2025-03-20	DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration	Suraj Singh et.al.	2503.15984	null
2025-03-21	UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations	Debabrata Mandal et.al.	2503.15868	null
2025-03-23	Multi-focal Conditioned Latent Diffusion for Person Image Synthesis	Jiaqi Liu et.al.	2503.15686	link
2025-03-19	Image Restoration Models with Optimal Transport and Total Variation Regularization	Weijia Huang et.al.	2503.14947	null
2025-03-18	ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing	Yulin Pan et.al.	2503.14482	null
2025-03-18	SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model	Yucheng Mao et.al.	2503.14463	null
2025-03-19	VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation	Shoubin Yu et.al.	2503.14350	null
2025-03-18	Towards properties of adversarial image perturbations	Egor Kuznetsov et.al.	2503.14111	null
2025-03-18	Intra and Inter Parser-Prompted Transformers for Effective Image Restoration	Cong Wang et.al.	2503.14037	link
2025-03-18	TarPro: Targeted Protection against Malicious Image Editing	Kaixin Shen et.al.	2503.13994	null
2025-03-17	FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models	Minghan Li et.al.	2503.13684	null
2025-03-17	Unified Autoregressive Visual Generation and Understanding with Continuous Tokens	Lijie Fan et.al.	2503.13436	null
2025-03-17	Edit Transfer: Learning Image Editing via Vision In-Context Relations	Lan Chen et.al.	2503.13327	null
2025-03-17	From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective	Chen Zhao et.al.	2503.13165	null
2025-03-17	GIFT: Generated Indoor video frames for Texture-less point tracking	Jianzheng Huang et.al.	2503.12944	null
2025-03-17	DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode	Junjia Huang et.al.	2503.12838	null
2025-03-17	Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion	Yidi Liu et.al.	2503.12764	null
2025-03-16	UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing	Tsu-Jui Fu et.al.	2503.12652	null
2025-03-16	Personalize Anything for Free with Diffusion Transformer	Haoran Feng et.al.	2503.12590	null
2025-03-16	DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement	Han Mei et.al.	2503.12470	link
2025-03-16	Pathology Image Restoration via Mixture of Prompts	Jiangdong Cai et.al.	2503.12399	link
2025-03-14	RASA: Replace Anyone, Say Anything – A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing	Tianrui Pan et.al.	2503.11571	null
2025-03-14	Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption	Du Chen et.al.	2503.11221	null
2025-03-14	Multi-Stage Generative Upscaler: Reconstructing Football Broadcast Images via Diffusion Models	Luca Martini et.al.	2503.11181	null
2025-03-14	Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement	Yini Li et.al.	2503.11175	link
2025-03-14	LUSD: Localized Update Score Distillation for Text-Guided Image Editing	Worameth Chinchuthakun et.al.	2503.11054	link
2025-03-14	InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences	Hongkai Zheng et.al.	2503.11043	null
2025-03-14	V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes	Yanming Zhang et.al.	2503.10634	null
2025-03-13	CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing	Advait Gupta et.al.	2503.10613	link
2025-03-13	EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing	Zexuan Yan et.al.	2503.10270	link
2025-03-13	Hybrid Agents for Image Restoration	Bingchen Li et.al.	2503.10120	null
2025-03-13	MoEdit: On Learning Quantity Perception for Multi-object Image Editing	Yanfeng Li et.al.	2503.10112	link
2025-03-13	Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion	Xingxin Xu et.al.	2503.10109	null
2025-03-14	On the Limitations of Vision-Language Models in Understanding Image Transforms	Ahmad Mustafa Anis et.al.	2503.09837	null
2025-03-12	Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space	Yifan Zhou et.al.	2503.09419	link
2025-03-12	Multi-Agent Image Restoration	Xu Jiang et.al.	2503.09403	null
2025-03-12	MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration	Zhehui Wu et.al.	2503.09131	link
2025-03-12	InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images	Jiun Tian Hoe et.al.	2503.09130	null
2025-03-12	Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal	Rongxin Liao et.al.	2503.09013	link
2025-03-11	QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution	Siddhant Dutta et.al.	2503.08759	null
2025-03-12	OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting	Yongsheng Yu et.al.	2503.08677	null
2025-03-13	Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models	Armando Fortes et.al.	2503.08434	null
2025-03-11	PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net	Jun Yin et.al.	2503.08276	null
2025-03-11	Aligning Text to Image in Diffusion Models is Easier Than You Think	Jaa-Yeon Lee et.al.	2503.08250	link
2025-03-11	TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement	Miao Zhang et.al.	2503.08168	null
2025-03-11	Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features	Hanbyul Lee et.al.	2503.08148	null
2025-03-11	ObjectMover: Generative Object Movement with Video Prior	Xin Yu et.al.	2503.08037	null
2025-03-11	Deep Perceptual Enhancement for Medical Image Analysis	S M A Sharif et.al.	2503.08027	link
2025-03-11	CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement	Chenrui Ma et.al.	2503.07938	null
2025-03-10	Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model	Lixue Gong et.al.	2503.07703	null
2025-03-10	GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts	Minwen Liao et.al.	2503.07417	null
2025-03-11	Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios	Chenglu Pan et.al.	2503.07232	null
2025-03-10	TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation	Victor Shea-Jay Huang et.al.	2503.07050	null
2025-03-10	MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement	Shrutika Vishal Thengane et.al.	2503.06953	link
2025-03-10	Interactive Tumor Progression Modeling via Sketch-Based Image Editing	Gexin Huang et.al.	2503.06809	null
2025-03-09	Consistent Image Layout Editing with Diffusion Models	Tao Xia et.al.	2503.06419	null
2025-03-08	Get In Video: Add Anything You Want to the Video	Shaobin Zhuang et.al.	2503.06268	null
2025-03-08	X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation	Jian Ma et.al.	2503.06134	link
2025-03-10	VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control	Yuxuan Bian et.al.	2503.05639	link
2025-03-07	Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients	Niklas Penzel et.al.	2503.05424	null
2025-03-06	Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models	Rui Jiang et.al.	2503.04215	null
2025-03-05	GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors	Yaopei Zeng et.al.	2503.03944	null
2025-03-05	An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation	Yuezhe Tian et.al.	2503.03640	null
2025-03-03	Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications	Yuchen Xiang et.al.	2503.02908	null
2025-03-04	ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement	Xuejian Guo et.al.	2503.02484	link
2025-03-04	Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration	Pengchen Liang et.al.	2503.02321	null
2025-03-04	h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform	Toan Nguyen et.al.	2503.02187	link
2025-03-03	MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting	Mojtaba Safari et.al.	2503.01576	link
2025-03-03	Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions	Zihan Shen et.al.	2503.01339	null
2025-03-03	Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual	Chong Wang et.al.	2503.01288	link
2025-03-03	VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors	Juil Koo et.al.	2503.01107	null
2025-03-01	Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement	Aupendu Kar et.al.	2503.00642	link
2025-03-01	GenVDM: Generating Vector Displacement Maps From a Single Image	Yuezhi Yang et.al.	2503.00605	null
2025-03-01	Flow Matching for Medical Image Synthesis: Bridging the Gap Between Speed and Quality	Milad Yazdani et.al.	2503.00266	link
2025-02-28	SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events	Yunfan Lu et.al.	2502.21120	null
2025-02-28	Diffusion Restoration Adapter for Real-World Image Restoration	Hanbang Liang et.al.	2502.20679	null
2025-02-27	Tight Inversion: Image-Conditioned Inversion for Real Image Editing	Edo Kadosh et.al.	2502.20376	null
2025-02-28	HVI: A New Color Space for Low-light Image Enhancement	Qingsen Yan et.al.	2502.20272	link
2025-02-27	Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps	Tianxiao Gao et.al.	2502.20054	null
2025-02-27	Identity-preserving Distillation Sampling by Fixed-Point Iterator	SeonHwa Kim et.al.	2502.19930	null
2025-02-27	Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement	Nan An et.al.	2502.19867	null
2025-02-26	ILACS-LGOT: A Multi-Layer Contrast Enhancement Approach for Palm-Vein Images	Kaveen Perera et.al.	2502.19456	null
2025-02-26	Self-supervised conformal prediction for uncertainty quantification in Poisson imaging problems	Bernardin Tamo Amougou et.al.	2502.19194	null
2025-02-26	Multi-level Attention-guided Graph Neural Network for Image Restoration	Jiatao Jiang et.al.	2502.19181	null
2025-02-27	RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images	Yuhan Tang et.al.	2502.19153	null
2025-02-26	Dynamic Degradation Decomposition Network for All-in-One Image Restoration	Huiqiang Wang et.al.	2502.19068	null
2025-02-25	Spatial Analysis of Neuromuscular Junctions Activation in Three-Dimensional Histology-based Muscle Reconstructions	Alessandro Ascani Orsini et.al.	2502.18646	link
2025-02-25	Application of Attention Mechanism with Bidirectional Long Short-Term Memory (BiLSTM) and CNN for Human Conflict Detection using Computer Vision	Erick da Silva Farias et.al.	2502.18555	null
2025-02-26	Bayesian Optimization for Controlled Image Editing via LLMs	Chengkun Cai et.al.	2502.18116	null
2025-02-25	KV-Edit: Training-Free Image Editing for Precise Background Preservation	Tianrui Zhu et.al.	2502.17363	link
2025-02-24	VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing	Xiangpeng Yang et.al.	2502.17258	null
2025-02-24	Splitting Regularized Wasserstein Proximal Algorithms for Nonsmooth Sampling Problems	Fuqun Han et.al.	2502.16773	link
2025-02-22	DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation	Yuxuan Xiong et.al.	2502.16302	null
2025-02-21	Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement	Uche A. Nnolim et.al.	2502.15986	null
2025-02-21	LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement	Namrah Siddiqua et.al.	2502.15186	null
2025-02-21	Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization	Ach Khozaimi et.al.	2502.15156	null
2025-02-20	Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications	Maha Ezzelarab et.al.	2502.14995	null
2025-02-23	PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data	Shijie Huang et.al.	2502.14397	link
2025-02-20	EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement	Wenhui Zhu et.al.	2502.14260	null
2025-02-19	RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior	Ching-Hua Lee et.al.	2502.13574	null
2025-02-18	AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks	Ming Xie et.al.	2502.11158	null
2025-02-14	PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control	Kunal Swami et.al.	2502.10258	null
2025-02-14	VideoDiff: Human-AI Video Co-Creation with Alternatives	Mina Huh et.al.	2502.10190	null
2025-02-14	Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training	Rodrigo Santos et.al.	2502.10064	null
2025-02-19	Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal	Jinpei Guo et.al.	2502.09873	link
2025-02-13	Source function from two-particle correlation function through entropy-regularized Richardson-Lucy deblurring	C. K. Tam et.al.	2502.09478	null
2025-02-14	SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment	Tica Lin et.al.	2502.08621	null
2025-02-19	MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers	Ao Li et.al.	2502.07856	null
2025-02-13	Visual-based spatial audio generation system for multi-speaker environments	Xiaojing Liu et.al.	2502.07538	null
2025-02-11	Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems	Ai Chen et.al.	2502.07351	link
2025-02-10	Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists	Bojia Zi et.al.	2502.06734	null
2025-02-10	Predictive Red Teaming: Breaking Policies Without Breaking Robots	Anirudha Majumdar et.al.	2502.06575	null
2025-02-10	UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis	Zemin Yang et.al.	2502.06324	null
2025-02-09	A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement	Muhammad Turab et.al.	2502.05995	null
2025-02-11	UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control	Kaizhen Zhu et.al.	2502.05749	link
2025-02-08	AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection	Shuheng Zhang et.al.	2502.05433	null
2025-02-07	Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems	Jasper M. Everink et.al.	2502.05127	null
2025-02-07	Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition	S Sreehari et.al.	2502.04680	null
2025-02-05	Lost in Edits? A $λ$ -Compass for AIGC Provenance	Wenhao You et.al.	2502.04364	null
2025-02-06	MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation	Jinbo Xing et.al.	2502.04299	null
2025-02-06	PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models	Aleksandar Cvejic et.al.	2502.04050	null
2025-02-06	DICE: Distilling Classifier-Free Guidance into Text Embeddings	Zhenyu Zhou et.al.	2502.03726	null
2025-02-05	All-in-One Image Compression and Restoration	Huimin Zeng et.al.	2502.03649	link
2025-02-05	REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations	Peter Sushko et.al.	2502.03629	null
2025-02-05	Efficient Image Restoration via Latent Consistency Flow Matching	Elad Cohen et.al.	2502.03500	null
2025-02-04	Blind Visible Watermark Removal with Morphological Dilation	Preston K. Robinette et.al.	2502.02676	null
2025-02-04	Exploring the latent space of diffusion models directly through singular value decomposition	Li Wang et.al.	2502.02225	null
2025-02-04	EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues	Rohit Girmaji et.al.	2502.02172	null
2025-02-03	Human Body Restoration with One-Step Diffusion Model and A New Benchmark	Jue Gong et.al.	2502.01411	null
2025-02-04	Compressed Image Generation with Denoising Diffusion Codebook Models	Guy Ohayon et.al.	2502.01189	null
2025-02-01	A framework for river connectivity classification using temporal image processing and attention based neural networks	Timothy James Becker et.al.	2502.00474	null
2025-02-01	Shape from Semantics: 3D Shape Generation from Multi-View Semantics	Liangchen Li et.al.	2502.00360	null
2025-01-30	DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models	Ruofan Liang et.al.	2501.18590	null
2025-01-30	Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration	Kyusu Ahn et.al.	2501.18517	null
2025-01-31	MatIR: A Hybrid Mamba-Transformer Image Restoration Model	Juan Wen et.al.	2501.18401	link
2025-01-29	Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment	Zixue Zeng et.al.	2501.17690	link
2025-01-28	Text-to-Image Generation for Vocabulary Learning Using the Keyword Method	Nuwan T. Attygalle et.al.	2501.17099	null
2025-01-27	Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration	Long Peng et.al.	2501.16583	null
2025-01-27	UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images	Tatiana Taís Schein et.al.	2501.16211	link
2025-01-27	CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference	Zhengyang Lu et.al.	2501.15852	link
2025-01-26	Universal Image Restoration Pre-training via Degradation Classification	JiaKui Hu et.al.	2501.15510	link
2025-01-24	MATCHA:Towards Matching Anything	Fei Xue et.al.	2501.14945	null
2025-01-24	Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders	Zaheer Ahmad et.al.	2501.14709	null
2025-01-24	Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.*	Ludovica Schaerf et.al.	2501.14524	null
2025-01-24	Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement	Guoxi Huang et.al.	2501.14265	link
2025-01-24	CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image	Xiaojun Tang et.al.	2501.14264	null
2025-01-23	INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration	Di You et.al.	2501.14014	null
2025-01-23	IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models	Jiayi Lei et.al.	2501.13920	null
2025-01-23	Binary Diffusion Probabilistic Model	Vitaliy Kinakh et.al.	2501.13915	null
2025-01-23	Where Do You Go? Pedestrian Trajectory Prediction using Scene Features	Mohammad Ali Rezaei et.al.	2501.13848	null
2025-01-23	Training-Free Consistency Pipeline for Fashion Repose	Potito Aghilar et.al.	2501.13692	null
2025-01-22	UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior	I-Hsiang Chen et.al.	2501.13134	null
2025-01-22	Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects	Louis Aberdeen et.al.	2501.13009	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration	Ruicheng Zhang et.al.	2501.12832	link
2025-01-21	Slot-BERT: Self-supervised Object Discovery in Surgical Video	Guiqiu Liao et.al.	2501.12477	null
2025-01-21	Regressor-Guided Image Editing Regulates Emotional Response to Reduce Online Engagement	Christoph Gebhardt et.al.	2501.12289	null
2025-01-21	Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping	Hongxu Yang et.al.	2501.12245	null
2025-01-21	DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains	Junyu Xia et.al.	2501.12235	null
2025-01-21	Exploring Temporally-Aware Features for Point Tracking	Inès Hyeonsu Kim et.al.	2501.12218	link
2025-01-21	Proxies for Distortion and Consistency with Applications for Real-World Image Restoration	Sean Man et.al.	2501.12102	null
2025-01-20	SILO: Solving Inverse Problems with Latent Operators	Ron Raphaeli et.al.	2501.11746	null
2025-01-20	PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents	Kanika Goswami et.al.	2501.11233	null
2025-01-19	Counteracting temporal attacks in Video Copy Detection	Katarzyna Fojcik et.al.	2501.11171	null
2025-01-17	DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration	Huiyun Cao et.al.	2501.10325	null
2025-01-17	IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment	Shangkun Sun et.al.	2501.09927	null
2025-01-16	PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery	Shristi Das Biswas et.al.	2501.09826	link
2025-01-16	FLOL: Fast Baselines for Real-World Low-Light Enhancement	Juan C. Benito et.al.	2501.09718	link
2025-01-16	Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression	Yongheng Zhang et.al.	2501.09321	null
2025-01-16	Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images	Yongheng Zhang et.al.	2501.09268	null
2025-01-14	AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring	Sanjida Afrin Mou et.al.	2501.08266	link
2025-01-14	FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors	Yabo Zhang et.al.	2501.08225	link
2025-01-13	SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing	Varun Biyyala et.al.	2501.07554	link
2025-01-13	IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion	Tharun Anand et.al.	2501.07530	null
2025-01-11	Natural Language Supervision for Low-light Image Enhancement	Jiahui Tang et.al.	2501.06546	null
2025-01-11	Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning	Maomao Li et.al.	2501.06438	null
2025-01-10	Underwater Image Enhancement using Generative Adversarial Networks: A Survey	Kancharagunta Kishan Babu et.al.	2501.06273	null
2025-01-10	Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs	Dabing Cheng et.al.	2501.05884	null
2025-01-09	Bit-depth color recovery via off-the-shelf super-resolution models	Xuanshuo Fu et.al.	2501.05611	null
2025-01-09	HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction	Shaurya Singh Rathore et.al.	2501.05195	null
2025-01-09	IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation	Qi Chen et.al.	2501.04995	link
2025-01-08	Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration	Laibin Chang et.al.	2501.04740	null
2025-01-08	EditAR: Unified Conditional Generation with Autoregressive Models	Jiteng Mu et.al.	2501.04699	null
2025-01-08	Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion	Yangfan He et.al.	2501.04606	link
2025-01-08	FrontierNet: Learning Visual Cues to Explore	Boyang Sun et.al.	2501.04597	link
2025-01-08	MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration	Zhi Jin et.al.	2501.04486	link
2025-01-08	Edit as You See: Image-guided Video Editing via Masked Motion Modeling	Zhi-Lin Huang et.al.	2501.04325	null
2025-01-08	Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization	Seitaro Ono et.al.	2501.04210	null
2025-01-07	Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications	L. Berlyand et.al.	2501.04182	null
2025-01-07	Convergent Primal-Dual Plug-and-Play Image Restoration: A General Algorithm and Applications	Yodai Suzuki et.al.	2501.03780	link
2025-01-07	Materialist: Physically Based Editing Using Single-Image Inverse Rendering	Lezhong Wang et.al.	2501.03717	link
2025-01-07	Exploring Optimal Latent Trajetory for Zero-shot Image Editing	Maomao Li et.al.	2501.03631	null
2025-01-07	Textualize Visual Prompt for Image Editing via Diffusion Bridge	Pengcheng Xu et.al.	2501.03495	null
2025-01-06	ImageMM: Joint multi-frame image restoration and super-resolution	Yashil Sukurdeep et.al.	2501.03002	null
2025-01-06	Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis	Xiaojiao Guo et.al.	2501.02701	link
2025-01-03	JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing	Qili Wang et.al.	2501.01798	link
2025-01-03	Conditional Consistency Guided Image Translation and Enhancement	Amil Bhagat et.al.	2501.01223	link
2025-01-02	Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion	Dong Zhang et.al.	2501.01114	null
2025-01-01	Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model	Chenyang Liu et.al.	2501.00895	null
2024-12-31	SoundBrush: Sound as a Brush for Visual Scene Editing	Kim Sung-Bin et.al.	2501.00645	null
2025-01-02	Edicho: Consistent Image Editing in the Wild	Qingyan Bai et.al.	2412.21079	link
2024-12-30	Varformer: Adapting VAR’s Generative Prior for Image Restoration	Siyang Wang et.al.	2412.21063	link
2024-12-30	Low-Light Image Enhancement via Generative Perceptual Priors	Han Zhou et.al.	2412.20916	link
2024-12-29	Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)	Tomer Garber et.al.	2412.20596	link
2024-12-28	Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems	Wen-Dong Jiang et.al.	2412.20201	null
2024-12-28	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration	Boyun Li et.al.	2412.20066	link
2024-12-28	MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion	Zechao Zhan et.al.	2412.20062	null
2024-12-28	An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models	Yuang Wang et.al.	2412.19992	null
2024-12-28	MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation	Haoyu Zheng et.al.	2412.19978	null
2024-12-27	Generative Adversarial Network on Motion-Blur Image Restoration	Zhengdong Li et.al.	2412.19479	null
2024-12-27	DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes	Yiyuan Liang et.al.	2412.19458	link
2024-12-25	DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images	Enbo Huang et.al.	2412.18797	null
2024-12-24	DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation	Minghong Cai et.al.	2412.18597	link
2024-12-24	Underwater Image Restoration via Polymorphic Large Kernel CNNs	Xiaojiao Guo et.al.	2412.18459	link
2024-12-24	Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models	Qice Qin et.al.	2412.18421	null
2024-12-24	UNet–: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections	Lingxiao Yin et.al.	2412.18276	null
2024-12-24	SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos	Zhen Zhang et.al.	2412.18214	link
2024-12-23	The Superposition of Diffusion Models Using the Itô Density Estimator	Marta Skreta et.al.	2412.17762	null
2024-12-21	Optoelectronic generative adversarial networks	Jumin Qiu et.al.	2412.16672	link
2024-12-21	Rethinking Model Redundancy for Low-light Image Enhancement	Tong Li et.al.	2412.16459	null
2024-12-20	Mapping the Mind of an Instruction-based Image Editing using SMILE	Zeinab Dehghani et.al.	2412.16277	link
2024-12-20	SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild	Jannik Elsäßer et.al.	2412.16147	null
2024-12-20	NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images	Yue Guo et.al.	2412.15890	null
2024-12-20	Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation	Aiwen Jiang et.al.	2412.15845	link
2024-12-20	Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance	Hyunsoo Lee et.al.	2412.15798	null
2024-12-19	Efficient Neural Network Encoding for 3D Color Lookup Tables	Vahid Zehtab et.al.	2412.15438	link
2024-12-19	UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency	Enis Simsar et.al.	2412.15216	null
2024-12-19	Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model	Minglong Xue et.al.	2412.14630	link
2024-12-19	Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion	Jixuan He et.al.	2412.14462	link
2024-12-18	Personalized Generative Low-light Image Denoising and Enhancement	Xijun Wang et.al.	2412.14327	null
2024-12-18	Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing	Le-Anh Tran et.al.	2412.14220	link
2024-12-18	Fed-AugMix: Balancing Privacy and Utility via Data Augmentation	Haoyang Li et.al.	2412.13818	null
2024-12-18	Text2Relight: Creative Portrait Relighting with Text Guidance	Junuk Cha et.al.	2412.13734	null
2024-12-18	VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement	Chen Zhao et.al.	2412.13655	link
2024-12-18	DarkIR: Robust Low-Light Image Restoration	Daniel Feijoo et.al.	2412.13443	link
2024-12-18	Zero-Shot Low Light Image Enhancement with Diffusion Prior	Joshua Cho et.al.	2412.13401	link
2024-12-17	MotionBridge: Dynamic Video Inbetweening with Flexible Controls	Maham Tanveer et.al.	2412.13190	null
2024-12-17	Prompt Augmentation for Self-supervised Text-guided Image Manipulation	Rumeysa Bodur et.al.	2412.13081	null
2024-12-17	Unsupervised Region-Based Image Editing of Denoising Diffusion Models	Zixiang Li et.al.	2412.12912	null
2024-12-17	MIVE: New Design and Benchmark for Multi-Instance Video Editing	Samuel Teodoro et.al.	2412.12877	null
2024-12-17	Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration	Xinlong Cheng et.al.	2412.12550	null
2024-12-17	Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy	Aditya Ganeshan et.al.	2412.12463	null
2024-12-16	Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)	Ki-Hwan Oh et.al.	2412.12238	link
2024-12-16	Re-Attentional Controllable Video Diffusion Editing	Yuanzhi Wang et.al.	2412.11710	link
2024-12-15	Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing	Jiancheng Huang et.al.	2412.11152	null
2024-12-15	Towards Context-aware Convolutional Network for Image Restoration	Fangwei Hao et.al.	2412.11008	null
2024-12-14	Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification	Yucong Meng et.al.	2412.10776	null
2024-12-16	BrushEdit: All-In-One Image Inpainting and Editing	Yaowei Li et.al.	2412.10316	null
2024-12-13	Learning Complex Non-Rigid Image Edits from Multimodal Conditioning	Nikolai Warner et.al.	2412.10219	null
2024-12-16	Matrix Completion via Residual Spectral Matching	Ziyuan Chen et.al.	2412.10005	null
2024-12-12	Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG	Kavana Venkatesh et.al.	2412.09614	null
2024-12-12	FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers	Yusuf Dalva et.al.	2412.09611	null
2024-12-12	Video Seal: Open and Efficient Video Watermarking	Pierre Fernandez et.al.	2412.09492	link
2024-12-12	OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs	Yuanzhi Zhu et.al.	2412.09465	link
2024-12-13	Are Conditional Latent Diffusion Models Effective for Image Restoration?	Yunchen Yuan et.al.	2412.09324	null
2024-12-12	Text-Video Multi-Grained Integration for Video Moment Montage	Zhihui Yin et.al.	2412.09276	null
2024-12-12	ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring	Zhongbao Yang et.al.	2412.09193	null
2024-12-12	Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration	Yunshuai Zhou et.al.	2412.08939	link
2024-12-11	Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm	Marien Renaud et.al.	2412.08262	null
2024-12-10	Leveraging Content and Context Cues for Low-Light Image Enhancement	Igor Morawski et.al.	2412.07693	link
2024-12-10	Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement	Axel Martinez et.al.	2412.07659	null
2024-12-10	Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE).pdf	Tu Vo et.al.	2412.07527	null
2024-12-10	Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring	Yuzhi Zhao et.al.	2412.07256	link
2024-12-10	EchoIR: Advancing Image Restoration with Echo Upsampling and Bi-Level Optimization	Yuhan He et.al.	2412.07225	null
2024-12-10	A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing	Yujie Feng et.al.	2412.07195	null
2024-12-09	InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention	Howard Zhang et.al.	2412.06753	null
2024-12-09	PrEditor3D: Fast and Precise 3D Shape Editing	Ziya Erkoç et.al.	2412.06592	null
2024-12-09	MoViE: Mobile Diffusion for Video Editing	Adil Karjauv et.al.	2412.06578	null
2024-12-09	EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume	Deepthy Rose Jose et.al.	2412.06271	null
2024-12-08	GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis	Ashish Goswami et.al.	2412.06089	null
2024-12-08	Enhanced 3D Generation by 2D Editing	Haoran Li et.al.	2412.05929	null
2024-12-07	Enhancing Sample Generation of Diffusion Models using Noise Level Correction	Abulikemu Abuduweili et.al.	2412.05488	null
2024-12-06	Equivariant Denoisers for Image Restoration	Marien Renaud et.al.	2412.05343	null
2024-12-06	ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration	Chi-Wei Hsiao et.al.	2412.05043	null
2024-12-09	Video Decomposition Prior: A Methodology to Decompose Videos into Layers	Gaurav Shrivastava et.al.	2412.04930	null
2024-12-06	Addressing Attribute Leakages in Diffusion-based Image Editing without Training	Sunung Mun et.al.	2412.04715	null
2024-12-05	Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise	Brayan Monroy et.al.	2412.04648	link
2024-12-05	MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers	Byeonghyeon Lee et.al.	2412.04591	null
2024-12-05	Action-based image editing guided by human instructions	Maria Mihaela Trusca et.al.	2412.04558	null
2024-12-05	SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion	Trong-Tung Nguyen et.al.	2412.04301	null
2024-12-05	HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing	Jinbin Bai et.al.	2412.04280	link
2024-12-05	Deep priors for satellite image restoration with accurate uncertainties	Biquard Maud et.al.	2412.04130	null
2024-12-05	Blind Underwater Image Restoration using Co-Operational Regressor Networks	Ozer Can Devecioglu et.al.	2412.03995	null
2024-12-05	INRetouch: Context Aware Implicit Neural Representation for Photography Retouching	Omar Elezabi et.al.	2412.03848	null
2024-12-05	LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model	Yuan Xue et.al.	2412.03841	null
2024-12-05	Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration	Yuzhen Du et.al.	2412.03814	null
2024-12-05	EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM	Quang Nguyen et.al.	2412.03809	null
2024-12-04	DIVE: Taming DINO for Subject-Driven Video Editing	Yi Huang et.al.	2412.03347	null
2024-12-04	Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges	Minghao Shao et.al.	2412.03220	null
2024-12-04	Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution	Jiahua Xiao et.al.	2412.02960	null
2024-12-03	Motion Prompting: Controlling Video Generation with Motion Trajectories	Daniel Geng et.al.	2412.02700	null
2024-12-03	MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis	Tianyu Wang et.al.	2412.02635	null
2024-12-04	GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing	Khawar Islam et.al.	2412.02366	null
2024-12-03	Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models	Jungwon Park et.al.	2412.02237	link
2024-12-03	OmniCreator: Self-Supervised Unified Generation with Universal Editing	Haodong Chen et.al.	2412.02114	null
2024-12-03	Relaxed and Inertial Nonlinear Forward-Backward with Momentum	Fernando Roldán et.al.	2412.02045	link
2024-12-02	CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion	Kai He et.al.	2412.01792	null
2024-12-02	OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking	Xuanyu Zhang et.al.	2412.01615	null
2024-12-02	Learning Adaptive Lighting via Channel-Aware Guidance	Qirui Yang et.al.	2412.01493	null
2024-12-02	Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond	MD Raqib Khan et.al.	2412.01456	link
2024-11-29	Self-Supervised Denoiser Framework	Emilien Valat et.al.	2411.19593	null
2024-11-28	Trajectory Attention for Fine-grained Video Motion Control	Zeqi Xiao et.al.	2411.19324	null
2024-11-28	LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair	Xue Song et.al.	2411.19156	null
2024-11-28	Descriptions of women are longer than that of men: An analysis of gender portrayal prompts in Stable Diffusion	Yan Asadchy et.al.	2411.18994	null
2024-11-27	Hierarchical Information Flow for Generalized Efficient Image Restoration	Yawei Li et.al.	2411.18588	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Adaptive Blind All-in-One Image Restoration	David Serrano-Lozano et.al.	2411.18412	link
2024-11-29	HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning	Zengxi Zhang et.al.	2411.18296	link
2024-11-27	TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution	Linwei Dong et.al.	2411.18263	link
2024-11-27	Prediction with Action: Visual Policy Learning via Joint Denoising Process	Yanjiang Guo et.al.	2411.18179	null
2024-11-26	Generative Image Layer Decomposition with Visual Effects	Jinrui Yang et.al.	2411.17864	null
2024-11-26	Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation	Sudarshan Rajagopalan et.al.	2411.17814	null
2024-11-26	GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2411.17687	null
2024-11-26	VideoDirector: Precise Video Editing via Text-to-Video Models	Yukun Wang et.al.	2411.17592	null
2024-11-26	Puzzle Similarity: A Perceptually-guided No-Reference Metric for Artifact Detection in 3D Scene Reconstructions	Nicolai Hermann et.al.	2411.17489	null
2024-11-26	InsightEdit: Towards Better Instruction Following for Image Editing	Yingjing Xu et.al.	2411.17323	null
2024-11-26	MLI-NeRF: Multi-Light Intrinsic-Aware Neural Radiance Fields	Yixiong Yang et.al.	2411.17235	link
2024-11-26	MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers	Ruoxi Zhu et.al.	2411.17226	link
2024-11-26	DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting	Yicheng Yang et.al.	2411.17223	link
2024-11-25	Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing	Hanhui Wang et.al.	2411.16832	link
2024-11-25	Pathways on the Image Manifold: Image Editing via Video Generation	Noam Rotstein et.al.	2411.16819	null
2024-11-25	Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding	Yubin Gu et.al.	2411.16217	null
2024-11-25	U2NeRF: Unsupervised Underwater Image Restoration and Neural Radiance Fields	Vinayak Gupta et.al.	2411.16172	null
2024-11-24	PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation	Chia-Ming Lee et.al.	2411.15922	link
2024-11-24	Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing	Pengcheng Xu et.al.	2411.15843	null
2024-11-24	MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking	Chunhui Zhang et.al.	2411.15761	link
2024-11-24	LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration	Gaojing Zhang et.al.	2411.15740	null
2024-11-24	AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea	Qifan Yu et.al.	2411.15738	null
2024-11-23	Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator	Chaehun Shin et.al.	2411.15466	null
2024-11-22	Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration	Darshan Thaker et.al.	2411.15295	null
2024-11-22	HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads	Yu Xu et.al.	2411.15034	null
2024-11-22	Benchmarking the Robustness of Optical Flow Estimation to Corruptions	Zhonghua Yi et.al.	2411.14865	link
2024-11-22	AI Tailoring: Evaluating Influence of Image Features on Fashion Product Popularity	Xiaomin Li et.al.	2411.14737	null
2024-11-22	TrojanEdit: Backdooring Text-Based Image Editing Models	Ji Guo et.al.	2411.14681	null
2024-11-21	Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection	Ali Awad et.al.	2411.14626	link
2024-11-21	Stable Flow: Vital Layers for Training-Free Image Editing	Omri Avrahami et.al.	2411.14430	link
2024-11-21	Guided MRI Reconstruction via Schrödinger Bridge	Yue Wang et.al.	2411.14269	null
2024-11-21	Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion	Jinhong He et.al.	2411.13961	link
2024-11-21	GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter	Aniruddha Bala et.al.	2411.13794	null
2024-11-20	Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms	Matthieu Kowalski et.al.	2411.13276	null
2024-11-20	Open-World Amodal Appearance Completion	Jiayang Ao et.al.	2411.13019	null
2024-11-19	Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution	Yang Zou et.al.	2411.12530	link
2024-11-19	Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models	Jun Xiao et.al.	2411.12450	null
2024-11-19	Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images	Zheng Gong et.al.	2411.12278	null
2024-11-16	GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding	Yue Zhou et.al.	2411.11904	link
2024-11-18	Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment	Zhendong Liu et.al.	2411.11543	null
2024-11-17	Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method	Yan Zheng et.al.	2411.11135	null
2024-11-17	StableV2V: Stablizing Shape Consistency in Video-to-Video Editing	Chang Liu et.al.	2411.11045	link
2024-11-19	TSFormer: A Robust Framework for Efficient UHD Image Restoration	Xin Su et.al.	2411.10951	null
2024-11-16	AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations	Jiawei Mao et.al.	2411.10708	null
2024-11-16	Underwater Image Enhancement with Cascaded Contrastive Learning	Yi Liu et.al.	2411.10682	link
2024-11-15	OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models	Mathis Koroglu et.al.	2411.10501	null
2024-11-15	Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence	Guodong Sun et.al.	2411.10321	null
2024-11-15	ColorEdit: Training-free Image-Guided Color editing with diffusion model	Xingxi Yin et.al.	2411.10232	null
2024-11-14	MagicQuill: An Intelligent Interactive Image Editing System	Zichen Liu et.al.	2411.09703	link
2024-11-13	A Survey on Vision Autoregressive Model	Kai Jiang et.al.	2411.08666	null
2024-11-12	Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing	Zitao Shuai et.al.	2411.08196	null
2024-11-12	CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising	Linxuan Li et.al.	2411.07930	link
2024-11-12	Joint multi-dimensional dynamic attention and transformer for general image restoration	Huan Zhang et.al.	2411.07893	link
2024-11-12	All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model	Yuanbo Wen et.al.	2411.07445	null
2024-11-12	Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models	Yoad Tewel et.al.	2411.07232	null
2024-11-11	OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision	Cong Wei et.al.	2411.07199	null
2024-11-11	Multi-scale Frequency Enhancement Network for Blind Image Deblurring	Yawen Xiang et.al.	2411.06893	null
2024-11-11	SeedEdit: Align Image Re-Generation to Image Editing	Yichun Shi et.al.	2411.06686	null
2024-11-10	Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration	Chen Wu et.al.	2411.06456	null
2024-11-08	A Modular Conditional Diffusion Framework for Image Reconstruction	Magauiya Zhussip et.al.	2411.05993	null
2024-11-08	UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors	Suhas Srinath et.al.	2411.05886	link
2024-11-07	A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model	Panwen Hu et.al.	2411.04942	null
2024-11-07	Taming Rectified Flow for Inversion and Editing	Jiangshan Wang et.al.	2411.04746	link
2024-11-06	Multi-Reward as Condition for Instruction-based Image Editing	Xin Gu et.al.	2411.04713	null
2024-11-06	ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models	Ashutosh Srivastava et.al.	2411.03982	null
2024-11-05	CrowdGenUI: Enhancing LLM-Based UI Widget Generation with a Crowdsourced Preference Library	Yimeng Liu et.al.	2411.03477	null
2024-11-07	DiT4Edit: Diffusion Transformer for Image Editing	Kunyu Feng et.al.	2411.03286	null
2024-11-05	Local Lesion Generation is Effective for Capsule Endoscopy Image Data Augmentation in a Limited Data Setting	Adrian B. Chłopowiec et.al.	2411.03098	null
2024-11-05	ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing	Yuka Ogino et.al.	2411.02799	null
2024-11-04	AutoVFX: Physically Realistic Video Editing from Natural Language Instructions	Hao-Yu Hsu et.al.	2411.02394	null
2024-11-04	DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability	Bo Gao et.al.	2411.01819	null
2024-11-03	Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration	Xiaole Tang et.al.	2411.01656	link
2024-11-03	Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach	Qihe Pan et.al.	2411.01545	link
2024-11-03	TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement	Xuanzhao Dong et.al.	2411.01403	link
2024-11-02	Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization	Sohrab Namazi Nia et.al.	2411.01373	null
2024-11-01	Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing	Naufal Suryanto et.al.	2411.00425	link
2024-10-31	Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes	Shaohua Liu et.al.	2411.00239	null
2024-10-31	Chasing Better Deep Image Priors between Over- and Under-parameterization	Qiming Wu et.al.	2410.24187	link
2024-10-31	Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation	Yihang Zhou et.al.	2410.23962	null
2024-10-31	Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data	Yucun Hou et.al.	2410.23628	null
2024-10-31	MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction	Ziqi Gao et.al.	2410.23577	link
2024-10-31	Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization	Xiao Guo et.al.	2410.23556	null
2024-10-30	EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models	Shangquan Sun et.al.	2410.22959	link
2024-10-30	Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement	Sahil Ali Akbar et.al.	2410.21946	link
2024-10-25	ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting	Takuma Nishimura et.al.	2410.21310	null
2024-10-27	Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement	Junhao Tan et.al.	2410.20314	link
2024-10-27	Deep Learning, Machine Learning – Digital Signal and Image Processing: From Theory to Application	Weiche Hsieh et.al.	2410.20304	null
2024-10-24	HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision	Burak Ercan et.al.	2410.19164	null
2024-10-24	Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances	Shilin Lu et.al.	2410.18775	link
2024-10-28	Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing	Haonan Lin et.al.	2410.18756	null
2024-10-29	DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation	Yuang Ai et.al.	2410.18666	link
2024-10-23	TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation	Ruicheng Zhang et.al.	2410.17855	null
2024-10-23	DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection	Qingpeng Li et.al.	2410.17822	link
2024-10-23	An Intelligent Agentic System for Complex Image Restoration Problems	Kaiwen Zhu et.al.	2410.17809	link
2024-10-23	A variational approach to nonlocal image restoration flows	Harsh Prasad et.al.	2410.17649	null
2024-10-23	Diffusion Priors for Variational Likelihood Estimation and Image Denoising	Jun Cheng et.al.	2410.17521	link
2024-10-20	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-19	A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends	Junjun Jiang et.al.	2410.15067	link
2024-10-19	Attack as Defense: Run-time Backdoor Implantation for Image Content Protection	Haichuan Zhang et.al.	2410.14966	link
2024-10-18	ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing	Jimin Dai et.al.	2410.14247	null
2024-10-17	MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks	Xinxin Liu et.al.	2410.14089	null
2024-10-17	Movie Gen: A Cast of Media Foundation Models	Adam Polyak et.al.	2410.13720	link
2024-10-17	Generative Location Modeling for Spatially Aware Object Insertion	Jooyeol Yun et.al.	2410.13564	null
2024-10-16	AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing	DuoSheng Chen et.al.	2410.12696	link
2024-10-16	Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing	Mingce Guo et.al.	2410.12526	null
2024-10-16	Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks	Pranjali Pathre et.al.	2410.12432	link
2024-10-16	Towards Flexible and Efficient Diffusion Low Light Enhancer	Guanzhou Lan et.al.	2410.12346	null
2024-10-16	Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond	Pengwei Liang et.al.	2410.12274	null
2024-10-15	Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos	Zhouxia Wang et.al.	2410.11828	null
2024-10-15	SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing	Zhiyuan Zhang et.al.	2410.11815	null
2024-10-15	RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation	Anton Antonov et.al.	2410.11722	link
2024-10-15	Augmentation-Driven Metric for Balancing Preservation and Modification in Text-Guided Image Editing	Yoonjeon Kim et.al.	2410.11374	null
2024-10-14	Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits	Xuhui Kang et.al.	2410.11013	null
2024-10-14	Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations	Litu Rout et.al.	2410.10792	null
2024-10-14	Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing	Kejie Wang et.al.	2410.10496	link
2024-10-13	TextMaster: Universal Controllable Text Edit	Aoqiang Wang et.al.	2410.09879	null
2024-10-13	LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Md Tanvir Islam et.al.	2410.09831	link
2024-10-14	LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection	Mingjia Li et.al.	2410.08810	link
2024-10-11	Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers	Jin Cao et.al.	2410.08688	link
2024-10-11	Natural Language Induced Adversarial Images	Xiaopei Zhu et.al.	2410.08620	link
2024-10-10	TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration	Hsing-Hua Wang et.al.	2410.08177	link
2024-10-10	RNA: Video Editing with ROI-based Neural Atlas	Jaekyeong Lee et.al.	2410.07600	null
2024-10-09	BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models	Fangyikang Wang et.al.	2410.07273	null
2024-10-09	InstantIR: Blind Image Restoration with Instant Generative Reference	Jen-Yuan Huang et.al.	2410.06551	null
2024-10-08	PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM	Stefan Stefanache et.al.	2410.05710	link
2024-10-08	DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing	June Suk Choi et.al.	2410.05694	link
2024-10-08	ReFIR: Grounding Large Restoration Models with Retrieval Augmentation	Hang Guo et.al.	2410.05601	link
2024-10-07	GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting	Yukang Cao et.al.	2410.05259	null
2024-10-07	PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing	Feng Tian et.al.	2410.04844	link
2024-10-07	Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration	Zhiyu Zhu et.al.	2410.04811	link
2024-10-06	Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli	Valentyn Piskovskyi et.al.	2410.04497	null
2024-10-06	SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems	Ismail Alkhouri et.al.	2410.04479	link
2024-10-08	IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis	Shitong Shao et.al.	2410.04171	link
2024-10-05	Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model	Keda Tao et.al.	2410.04161	null
2024-10-04	Diffusion State-Guided Projected Gradient for Inverse Problems	Rayhan Zirvi et.al.	2410.03463	link
2024-10-04	Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing	Ziqi Jiang et.al.	2410.03097	null
2024-10-03	PnP-Flow: Plug-and-Play Image Restoration with Flow Matching	Ségolène Martin et.al.	2410.02423	link
2024-10-03	Can Capacitive Touch Images Enhance Mobile Keyboard Decoding?	Piyawat Lertvittayakumjorn et.al.	2410.02264	link
2024-10-02	Posterior sampling via Langevin dynamics based on generative priors	Vishal Purohit et.al.	2410.02078	null
2024-10-02	Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust	Asher J. Hancock et.al.	2410.01971	null
2024-10-02	MiraGe: Editable 2D Images using Gaussian Splatting	Joanna Waczyńska et.al.	2410.01521	link
2024-10-01	Three-Operator Splitting Method with Two-Step Inertial Extrapolation	Olaniyi S. Iyiola et.al.	2410.01099	null
2024-10-01	GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer	Youngho Yoon et.al.	2410.00672	link
2024-10-01	Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation	Yunnan Wang et.al.	2410.00447	null
2024-10-01	Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration	Guy Ohayon et.al.	2410.00418	link
2024-10-01	GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction	Zaid Ilyas et.al.	2410.00380	null
2024-09-30	A Survey on Diffusion Models for Inverse Problems	Giannis Daras et.al.	2410.00083	null
2024-09-30	FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing	Lingling Cai et.al.	2409.20500	null
2024-09-30	UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation	Cheng Zhang et.al.	2409.20197	link
2024-09-29	Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation	Xiaofeng Cong et.al.	2409.19685	link
2024-09-28	Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration	Chu-Jie Qin et.al.	2409.19403	link
2024-09-28	PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution	Song Zhang et.al.	2409.19269	link
2024-09-27	Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors	Yunlong Lin et.al.	2409.18899	null
2024-09-27	Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models	Nguyen Gia Bach et.al.	2409.18476	link
2024-09-27	SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement	Yunkui Pang et.al.	2409.18355	link
2024-09-26	Toward Efficient Deep Blind RAW Image Restoration	Marcos V. Conde et.al.	2409.18204	link
2024-09-26	FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner	Wenliang Zhao et.al.	2409.18128	link
2024-09-26	FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction	Runze He et.al.	2409.18071	null
2024-09-26	Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs	Qinpeng Cui et.al.	2409.17778	link
2024-09-26	MIO: A Foundation Model on Multimodal Tokens	Zekun Wang et.al.	2409.17692	link
2024-09-26	Learning Quantized Adaptive Conditions for Diffusion Models	Yuchen Liang et.al.	2409.17487	null
2024-09-25	Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement	Yihao Zhou et.al.	2409.16661	null
2024-09-25	Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement	Guanlin Li et.al.	2409.16604	link
2024-09-24	Proactive Schemes: A Survey of Adversarial Attacks for Social Good	Vishal Asnani et.al.	2409.16491	null
2024-09-24	Liger at W.M. Keck Observatory: imager structural analysis, fabrication, and characterization plan	James Wiley et.al.	2409.16263	null
2024-09-23	PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions	Weifeng Lin et.al.	2409.15278	link
2024-09-23	LoVA: Long-form Video-to-Audio Generation	Xin Cheng et.al.	2409.15157	null
2024-09-23	Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP	Zeliang Zhang et.al.	2409.15035	null
2024-09-23	ControlEdit: A MultiModal Local Clothing Image Editing Method	Di Cheng et.al.	2409.14720	link
2024-09-23	Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections	Ankit Dhiman et.al.	2409.14677	link
2024-09-22	Low-Light Enhancement Effect on Classification and Detection: An Empirical Study	Xu Wu et.al.	2409.14461	null
2024-09-22	Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement	Cameron Khanpour et.al.	2409.14334	null
2024-09-20	Colorful Diffuse Intrinsic Image Decomposition in the Wild	Chris Careaga et.al.	2409.13690	link
2024-09-20	A Bottom-Up Approach to Class-Agnostic Image Segmentation	Sebastian Dille et.al.	2409.13687	null
2024-09-18	Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution	Peng Wang et.al.	2409.12191	link
2024-09-18	Denoising diffusion models for high-resolution microscopy image restoration	Pamela Osuna-Vargas et.al.	2409.12078	null
2024-09-18	InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models	Yan Zheng et.al.	2409.11734	null
2024-09-17	Ultrasound Image Enhancement with the Variance of Diffusion Models	Yuxin Zhang et.al.	2409.11380	link
2024-09-17	OmniGen: Unified Image Generation	Shitao Xiao et.al.	2409.11340	link
2024-09-17	MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance	Debin Meng et.al.	2409.11010	link
2024-09-17	CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement	Xuanzhao Dong et.al.	2409.10966	link
2024-09-16	SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing	Qi Qian et.al.	2409.10476	null
2024-09-16	Taming Diffusion Models for Image Restoration: A Review	Ziwei Luo et.al.	2409.10353	null
2024-09-15	Underwater Image Enhancement via Dehazing and Color Restoration	Chengqin Wu et.al.	2409.09779	null
2024-09-15	EditBoard: Towards A Comprehensive Evaluation Benchmark for Text-based Video Editing Models	Yupeng Chen et.al.	2409.09668	link
2024-09-15	TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer	Zihan Su et.al.	2409.09610	link
2024-09-13	InstantDrag: Improving Interactivity in Drag-based Image Editing	Joonghyuk Shin et.al.	2409.08857	null
2024-09-13	Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori	Jinhong He et.al.	2409.08585	null
2024-09-12	Click2Mask: Local Editing with Dynamic Mask Generation	Omer Regev et.al.	2409.08272	link
2024-09-12	Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement	Vamsi Krishna Vasa et.al.	2409.07862	null
2024-09-12	Quaternion Nuclear Norm minus Frobenius Norm Minimization for color image reconstruction	Yu Guo et.al.	2409.07797	null
2024-09-11	FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process	Yang Luo et.al.	2409.07451	null
2024-09-11	Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement	Xianmin Chen et.al.	2409.07040	link
2024-09-11	PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening	RuoCheng Wu et.al.	2409.06980	null
2024-09-10	Face Mask Removal with Region-attentive Face Inpainting	Minmin Yang et.al.	2409.06845	link
2024-09-10	Modeling Image Tone Dichotomy with the Power Function	Axel Martinez et.al.	2409.06764	null
2024-09-10	GeoCalib: Learning Single-image Calibration with Geometric Optimization	Alexander Veicht et.al.	2409.06704	link
2024-09-10	Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer	Li Ke et.al.	2409.06590	null
2024-09-09	NeIn: Telling What You Don’t Want	Nhat-Tan Bui et.al.	2409.06481	null
2024-09-10	Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models	Siyu Zhai et.al.	2409.06420	null
2024-09-10	Multi-Weather Image Restoration via Histogram-Based Transformer Feature Enhancement	Yang Wen et.al.	2409.06334	null
2024-09-10	AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration	Hongyi Cai et.al.	2409.06206	null
2024-09-09	MemoVis: A GenAI-Powered Tool for Creating Companion Reference Images for 3D Design Feedback	Chen Chen et.al.	2409.06082	null
2024-09-09	Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement	Shyang-En Weng et.al.	2409.05274	link
2024-09-07	Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation	Jiaxin Cheng et.al.	2409.04847	link
2024-09-07	Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines	Sai Yang et.al.	2409.04812	link
2024-09-06	Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior	Charlesquin Kemajou Mbakam et.al.	2409.04384	null
2024-09-06	RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement	Hao Luo et.al.	2409.04363	link
2024-09-05	Blended Latent Diffusion under Attention Control for Real-World Video Editing	Deyin Liu et.al.	2409.03514	null
2024-09-05	Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration	Pei Wang et.al.	2409.03455	null
2024-09-05	KAN See In the Dark	Aoxiang Ning et.al.	2409.03404	link
2024-09-05	Multiple weather images restoration using the task transformer and adaptive mixup strategy	Yang Wen et.al.	2409.03249	null
2024-09-05	Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem	Qiwen Zhu et.al.	2409.03179	link
2024-09-04	Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering	Peng Wang et.al.	2409.02426	link
2024-09-04	Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing	Siyi Chen et.al.	2409.02374	link
2024-09-03	Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning	Xiaowei Hu et.al.	2409.02108	link
2024-09-03	Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models	Jiaqi Xu et.al.	2409.02101	link
2024-09-03	F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring	Subhajit Paul et.al.	2409.02056	null
2024-09-03	AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions	Chenghao Qian et.al.	2409.02045	link
2024-09-03	Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement	Kun Zhou et.al.	2409.01641	link
2024-09-03	GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting	Zixuan Guo et.al.	2409.01581	null
2024-09-02	Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets	Ishan Rajendrakumar Dave et.al.	2409.01445	null
2024-09-02	Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing	Vadim Titov et.al.	2409.01322	link
2024-08-30	Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method	Yuji Lin et.al.	2408.17339	link
2024-08-30	Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL	Haiyang Zhao et.al.	2408.17060	null
2024-08-29	GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content	Lebin Zhou et.al.	2408.16866	null
2024-09-02	A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising	Shuaiyu Yuan et.al.	2408.16481	null
2024-08-29	What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer	Chaeyeon Chung et.al.	2408.16450	link
2024-08-29	Enhanced Control for Diffusion Bridge in Image Restoration	Conghan Yue et.al.	2408.16303	link
2024-08-29	EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More	Kanghao Chen et.al.	2408.16254	null
2024-08-29	LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement	Ye Yu et.al.	2408.16235	link
2024-08-28	Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration	Xu Zhang et.al.	2408.15994	null
2024-08-27	A Preliminary Exploration Towards General Image Restoration	Xiangtao Kong et.al.	2408.15143	null
2024-08-27	Towards Real-world Event-guided Low-light Video Enhancement and Deblurring	Taewoo Kim et.al.	2408.14916	link
2024-08-26	GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy	Peiyan Li et.al.	2408.14368	link
2024-08-26	I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing	Yiwei Ma et.al.	2408.14180	link
2024-08-26	Image Provenance Analysis via Graph Encoding with Vision Transformer	Keyang Zhang et.al.	2408.14170	null
2024-08-27	Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing	Yitong Yang et.al.	2408.13623	null
2024-08-24	CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track	Jinming Chai et.al.	2408.13582	null
2024-08-23	Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing	Zitao Shuai et.al.	2408.13335	null
2024-08-23	O-Mamba: O-shape State-Space Model for Underwater Image Enhancement	Chenyu Dong et.al.	2408.12816	link
2024-08-22	FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing	Jue Wang et.al.	2408.12429	link
2024-08-22	CODE: Confident Ordinary Differential Editing	Bastien van Delft et.al.	2408.12418	link
2024-08-22	Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement	Lingyu Zhu et.al.	2408.12316	link
2024-08-21	Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models	Chun-Yen Shih et.al.	2408.11810	null
2024-08-23	AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion	Yunfang Niu et.al.	2408.11553	link
2024-08-21	E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment	Shangkun Sun et.al.	2408.11481	link
2024-08-21	OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal	Qiao Mo et.al.	2408.11480	link
2024-08-21	Taming Generative Diffusion for Universal Blind Image Restoration	Siwei Tu et.al.	2408.11287	null
2024-08-20	Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement	Satoshi Kosugi et.al.	2408.11055	link
2024-08-20	Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos	Dennis Fedorishin et.al.	2408.10998	null
2024-08-20	SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement	Linlin Hu et.al.	2408.10934	null
2024-08-20	A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse	Zhongliang Guo et.al.	2408.10901	null
2024-08-20	UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement	Yingtie Lei et.al.	2408.10653	link
2024-08-19	Multi-Scale Representation Learning for Image Restoration with State-Space Model	Yuhong He et.al.	2408.10145	null
2024-08-19	ARMADA: Attribute-Based Multimodal Data Augmentation	Xiaomeng Jin et.al.	2408.10086	null
2024-08-19	Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration	Alik Pramanick et.al.	2408.09912	link
2024-08-19	ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement	Eashan Adhikarla et.al.	2408.09650	link
2024-08-17	Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration	Xin Lin et.al.	2408.09241	link
2024-08-16	Language-Driven Interactive Shadow Detection	Hongqiu Wang et.al.	2408.08543	link
2024-08-16	Achieving Complex Image Edits via Function Aggregation with Diffusion Models	Mohammadreza Samadi et.al.	2408.08495	null
2024-08-16	DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer’s Case Studies	Mohammad Hossein Najafi et.al.	2408.08489	null
2024-08-14	TurboEdit: Instant text-based image editing	Zongze Wu et.al.	2408.08332	null
2024-08-15	Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks	Jiawei Wu et.al.	2408.08149	link
2024-08-15	HAIR: Hypernetworks-based All-in-One Image Restoration	Jin Cao et.al.	2408.08091	link
2024-08-14	DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency	Xiaojing Zhong et.al.	2408.07481	null
2024-08-14	GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval	Zechen Bai et.al.	2408.07249	null
2024-08-13	Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method	Xin Su et.al.	2408.06709	null
2024-08-13	EditScribe: Non-Visual Image Editing with Natural Language Verification Loops	Ruei-Che Chang et.al.	2408.06632	null
2024-08-12	Wavelet based inpainting detection	Barglazan Adrian-Alin et.al.	2408.06429	null
2024-08-12	Latent Disentanglement for Low Light Image Enhancement	Zhihao Zheng et.al.	2408.06245	null
2024-08-10	Greedy randomized block Kaczmarz method for matrix equation AXB=C and its applications in color image restoration	Wenli Wang et.al.	2408.05444	null
2024-08-09	Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models	Qirui Jiao et.al.	2408.04594	link
2024-08-08	Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration	Ziran Zhang et.al.	2408.04227	null
2024-08-08	MultiColor: Image Colorization by Learning from Multiple Color Spaces	Xiangcheng Du et.al.	2408.04172	null
2024-08-06	FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning	Zhi Chen et.al.	2408.03355	null
2024-08-05	Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models	Tongtong Feng et.al.	2408.02408	null
2024-08-05	Dense Feature Interaction Network for Image Inpainting Localization	Ye Yao et.al.	2408.02191	null
2024-08-03	SAT3D: Image-driven Semantic Attribute Transfer in 3D	Zhijun Zhai et.al.	2408.01664	null
2024-08-02	Multi-task SAR Image Processing via GAN-based Unsupervised Manipulation	Xuran Hu et.al.	2408.01553	null
2024-08-02	Underwater Object Detection Enhancement via Channel Stabilization	Muhammad Ali et.al.	2408.01293	link
2024-08-02	Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement	Wenbin Zou et.al.	2408.01276	link
2024-08-02	Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration	Donwon Park et.al.	2408.01099	null
2024-08-01	TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models	Gilad Deutch et.al.	2408.00735	null
2024-08-01	A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition	Qi Xiong et.al.	2408.00210	null
2024-07-31	Hyper-parameter tuning for text guided image editing	Shiwen Zhang et.al.	2407.21703	link
2024-07-31	Fine-gained Zero-shot Video Sampling	Dengsheng Chen et.al.	2407.21475	null
2024-07-31	Generalized Tampered Scene Text Detection in the era of Generative AI	Chenfan Qu et.al.	2407.21422	link
2024-07-30	UniProcessor: A Text-induced Unified Low-level Image Processor	Huiyu Duan et.al.	2407.20928	link
2024-07-27	Inverse Problems with Diffusion Models: A MAP Estimation Perspective	Sai bharath chandra Gutha et.al.	2407.20784	link
2024-07-29	Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing	Ekaterina Iakovleva et.al.	2407.20232	null
2024-07-29	FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention	Yu Lu et.al.	2407.19918	null
2024-07-29	ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement	Ezequiel Perez-Zarate et.al.	2407.19708	link
2024-07-31	Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint	Song Zhang et.al.	2407.19248	null
2024-07-27	Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration	Xiaoyan Yu et.al.	2407.19139	link
2024-07-26	Floating No More: Object-Ground Reconstruction from a Single Image	Yunze Man et.al.	2407.18914	null
2024-07-26	PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis	Sohyeong Kim et.al.	2407.18695	null
2024-07-26	Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner	Pengxiang Cai et.al.	2407.18656	null
2024-07-26	LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification	Shuhan Cui et.al.	2407.18614	null
2024-07-26	Dilated Strip Attention Network for Image Restoration	Fangwei Hao et.al.	2407.18613	null
2024-07-25	RegionDrag: Fast Region-Based Image Editing with Diffusion Models	Jingyi Lu et.al.	2407.18247	null
2024-07-25	RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models	Haoyu Chen et.al.	2407.18035	null
2024-07-25	Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography	Kailai Zhou et.al.	2407.17996	link
2024-07-25	FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing	Gwanhyeong Koo et.al.	2407.17850	link
2024-07-25	Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing	Pengfei Jiang et.al.	2407.17847	link
2024-07-25	DragText: Rethinking Text Embedding in Point-based Image Editing	Gayoon Choi et.al.	2407.17843	link
2024-07-23	S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks	Neha A S et.al.	2407.17587	null
2024-07-23	PyBench: Evaluating LLM Agent on various real-world coding tasks	Yaolun Zhang et.al.	2407.16732	link
2024-07-23	DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors	Zizheng Yan et.al.	2407.16260	null
2024-07-23	CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction	Liang Zhao et.al.	2407.16204	null
2024-07-23	Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems	Sojin Lee et.al.	2407.16125	link
2024-07-21	MedEdit: Counterfactual Diffusion-based Image Editing on Brain MRI	Malek Ben Alaya et.al.	2407.15270	null
2024-07-21	Assessing Sample Quality via the Latent Space of Generative Models	Jingyi Xu et.al.	2407.15171	link
2024-07-20	Deep Learning CT Image Restoration using System Blur and Noise Models	Yijie Yuan et.al.	2407.14983	null
2024-07-23	AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement	Yunlong Lin et.al.	2407.14900	null
2024-07-20	Dual High-Order Total Variation Model for Underwater Image Restoration	Yuemei Li et.al.	2407.14868	link
2024-07-20	Text-based Talking Video Editing with Cascaded Conditional Diffusion	Bo Han et.al.	2407.14841	null
2024-07-19	Adaptive Frequency Enhancement Network for Single Image Deraining	Fei Yan et.al.	2407.14292	link
2024-07-18	BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models	Moon Ye-Bin et.al.	2407.13442	null
2024-07-18	Any Image Restoration with Efficient Automatic Degradation Adaptation	Bin Ren et.al.	2407.13372	link
2024-07-18	Multi-sentence Video Grounding for Long Video Generation	Wei Feng et.al.	2407.13219	null
2024-07-19	Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking	Zhiyuan Ma et.al.	2407.13188	null
2024-07-18	Training-Free Large Model Priors for Multiple-in-One Image Restoration	Xuanhua He et.al.	2407.13181	null
2024-07-18	Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement	Eashan Adhikarla et.al.	2407.13170	null
2024-07-18	Image Inpainting Models are Effective Tools for Instruction-guided Image Editing	Xuan Ju et.al.	2407.13139	null
2024-07-21	HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration	Shuchang Zhang et.al.	2407.13120	link
2024-07-17	Zero-shot Text-guided Infinite Image Synthesis with LLM guidance	Soyeong Kwon et.al.	2407.12642	null
2024-07-17	Rethinking the Architecture Design for Efficient Generic Event Boundary Detection	Ziwei Zheng et.al.	2407.12622	link
2024-07-17	Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations	Tomáš Chobola et.al.	2407.12511	link
2024-07-17	GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval	Han Zhou et.al.	2407.12431	link
2024-07-17	Sphere Window: Challenges and Opportunities of 360° Video in Collaborative Design Workshops	Wo Meijer et.al.	2407.12407	null
2024-07-17	GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity	Shuo Cao et.al.	2407.12273	null
2024-07-16	Efficient Training with Denoised Neural Weights	Yifan Gong et.al.	2407.11966	null
2024-07-16	TGIF: Text-Guided Inpainting Forgery Dataset	Hannes Mareen et.al.	2407.11566	link
2024-07-16	Haze-Aware Attention Network for Single-Image Dehazing	Lihan Tong et.al.	2407.11505	null
2024-07-14	Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV	Zhiwen Yang et.al.	2407.11087	link
2024-07-15	InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models	Nirat Saini et.al.	2407.10958	null
2024-07-15	In-Loop Filtering via Trained Look-Up Tables	Zhuoyuan Li et.al.	2407.10926	null
2024-07-15	MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration	Yulin Ren et.al.	2407.10833	null
2024-07-15	Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval	Youngsun Lim et.al.	2407.10683	null
2024-07-14	Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models	Qinyu Yang et.al.	2407.10285	link
2024-07-14	Restoring Images in Adverse Weather Conditions via Histogram Transformer	Shangquan Sun et.al.	2407.10172	link
2024-07-13	NamedCurves: Learned Image Enhancement via Color Naming	David Serrano-Lozano et.al.	2407.09892	link
2024-07-12	Region Attention Transformer for Medical Image Restoration	Zhiwen Yang et.al.	2407.09268	link
2024-07-12	Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration	Hu Gao et.al.	2407.08950	link
2024-07-12	LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models	Hai Jiang et.al.	2407.08939	link
2024-07-11	Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey	Laniqng Guo et.al.	2407.08865	link
2024-07-11	Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration	Shuang Xu et.al.	2407.08509	null
2024-07-11	ERD: Exponential Retinex decomposition based on weak space and hybrid nonconvex regularization and its denoising application	Wenjing Lu et.al.	2407.08498	null
2024-07-12	Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending	Delong Wu et.al.	2407.08457	null
2024-07-10	Generative Image as Action Models	Mohit Shridhar et.al.	2407.07875	link
2024-07-10	Aging-Resistant Wideband Precoding in 5G and Beyond Using 3D Convolutional Neural Networks	Alejandro Villena-Rodriguez et.al.	2407.07434	null
2024-07-10	CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement	Wei Wang et.al.	2407.07056	null
2024-07-10	Asymmetric Mask Scheme for Self-Supervised Real Image Denoising	Xiangyu Liao et.al.	2407.06514	link
2024-07-08	Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN	Jiacheng Su et.al.	2407.05577	null
2024-07-07	Image-Conditional Diffusion Transformer for Underwater Image Enhancement	Xingyang Nie et.al.	2407.05389	null
2024-07-07	UltraEdit: Instruction-based Fine-Grained Image Editing at Scale	Haozhe Zhao et.al.	2407.05282	link
2024-07-07	Multi-scale Conditional Generative Modeling for Microscopic Image Restoration	Luzhe Huang et.al.	2407.05259	null
2024-07-06	Robust Skin Color Driven Privacy Preserving Face Recognition via Function Secret Sharing	Dong Han et.al.	2407.05045	null
2024-07-05	On a nonlinear nonlocal reaction-diffusion system applied to image restoration	Yuhang Li et.al.	2407.04347	null
2024-07-05	A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation	Dazhao Du et.al.	2407.04230	link
2024-07-04	DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts	Zheng-Peng Duan et.al.	2407.03757	null
2024-07-04	Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration	Yuhong Zhang et.al.	2407.03636	null
2024-07-04	MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration	Yuhong Zhang et.al.	2407.03635	null
2024-07-03	BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement	Ruirui Lin et.al.	2407.03535	null
2024-07-03	Learning Action and Reasoning-Centric Image Editing from Videos and Simulations	Benno Krojer et.al.	2407.03471	link
2024-07-02	Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model	Cong Cao et.al.	2407.01960	null
2024-06-30	Learning Frequency-Aware Dynamic Transformers for All-In-One Image Restoration	Zenglin Shi et.al.	2407.01636	null
2024-07-01	Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing	Bingliang Zhang et.al.	2407.01521	link
2024-07-01	DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models	Chang-Han Yeh et.al.	2407.01519	link
2024-07-01	Unrolling Plug-and-Play Gradient Graph Laplacian Regularizer for Image Restoration	Jianghe Cai et.al.	2407.01469	null
2024-07-01	Blind Inversion using Latent Diffusion Priors	Weimin Bai et.al.	2407.01027	null
2024-06-30	Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation	Yuchuan Tian et.al.	2407.00676	link
2024-06-28	Transformer-based Image and Video Inpainting: Current Challenges and Future Directions	Omar Elharrouss et.al.	2407.00226	null
2024-06-28	Network Bending of Diffusion Models for Audio-Visual Generation	Luke Dzwonczyk et.al.	2406.19589	link
2024-06-27	BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection	Yang Song et.al.	2406.19048	null
2024-06-27	Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model	Jiangtong Tan et.al.	2406.19030	link
2024-06-26	IDA-UIE: An Iterative Framework for Deep Network-based Degradation Aware Underwater Image Enhancement	Pranjali Singh et.al.	2406.18628	null
2024-06-26	Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration	Kang Liao et.al.	2406.18516	link
2024-06-26	ConStyle v2: A Strong Prompter for All-in-One Image Restoration	Dongqi Fan et.al.	2406.18242	link
2024-06-26	MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal	Yiguo Jiang et.al.	2406.18079	link
2024-06-25	LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing	Aoyang Liu et.al.	2406.17236	null
2024-06-24	GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization	Yirui Chen et.al.	2406.16531	link
2024-06-24	DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution	Aiwen Jiang et.al.	2406.16477	link
2024-06-22	Quality-guided Skin Tone Enhancement for Portrait Photography	Shiqi Gao et.al.	2406.15848	null
2024-06-22	MVOC: a training-free multiple video object composition method with diffusion models	Wei Wang et.al.	2406.15829	link
2024-06-21	LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement	Haodong Yang et.al.	2406.14973	null
2024-06-20	A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models	Xincheng Shuai et.al.	2406.14555	link
2024-06-26	Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps	Nikita Starodubcev et.al.	2406.14539	null
2024-06-20	V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data	Rotem Shalev-Arkushin et.al.	2406.14510	null
2024-06-19	EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy	Long Bai et.al.	2406.13705	link
2024-06-22	Ultra-High-Definition Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution	Liyan Wang et.al.	2406.13607	link
2024-06-19	WaterMono: Teacher-Guided Anomaly Masking and Enhancement Boosting for Robust Underwater Self-Supervised Monocular Depth Estimation	Yilin Ding et.al.	2406.13344	link
2024-06-19	ECAFormer: Low-light Image Enhancement using Cross Attention	Yudi Ruan et.al.	2406.13281	link
2024-06-19	Diffusion Model-based FOD Restoration from High Distortion in dMRI	Shuo Huang et.al.	2406.13209	null
2024-06-18	VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing	Jing Gu et.al.	2406.12831	null
2024-06-18	Restorer: Solving Multiple Image Restoration Tasks with One Set of Parameters	Jiawei Mao et.al.	2406.12587	link
2024-06-17	Generative Visual Instruction Tuning	Jefferson Hernandez et.al.	2406.11262	link
2024-06-16	Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields	Yixiong Yang et.al.	2406.11077	null
2024-06-16	Enhancing Supermarket Robot Interaction: A Multi-Level LLM Conversational Interface for Handling Diverse Customer Intents	Chandran Nandkumar et.al.	2406.11047	null
2024-06-15	Fast Unsupervised Tensor Restoration via Low-rank Deconvolution	David Reixach et.al.	2406.10679	null
2024-06-15	The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing	Denis Bobkov et.al.	2406.10601	link
2024-06-14	VideoGUI: A Benchmark for GUI Automation from Instructional Videos	Kevin Qinghong Lin et.al.	2406.10227	null
2024-06-14	InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning	Tiancheng Li et.al.	2406.09973	null
2024-06-14	RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement	Jingcheng Li et.al.	2406.09656	null
2024-06-13	DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer	Wei-Ting Chen et.al.	2406.09622	null
2024-06-13	Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion	Linzhan Mou et.al.	2406.09402	null
2024-06-13	Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior	Baiang Li et.al.	2406.09389	link
2024-06-13	CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models	Yigit Ekin et.al.	2406.09368	link
2024-06-13	Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation	Yufan Zhou et.al.	2406.09305	null
2024-06-13	Preserving Identity with Variational Score for General-purpose 3D Editing	Duong H. Le et.al.	2406.08953	null
2024-06-13	Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation	Jingyuan Xia et.al.	2406.08896	link
2024-06-13	COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing	Jiangshan Wang et.al.	2406.08850	link
2024-06-12	LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach	Maria Pilligua et.al.	2406.08610	link
2024-06-12	PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement	Wei-Tung Lin et.al.	2406.08444	link
2024-06-12	DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor	Juncheng Wu et.al.	2406.08377	link
2024-06-12	2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation	Zhensong Xu et.al.	2406.08192	null
2024-06-12	One-Step Effective Diffusion Network for Real-World Image Super-Resolution	Rongyuan Wu et.al.	2406.08177	link
2024-06-12	From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization	Ziran Zhang et.al.	2406.08090	null
2024-06-12	CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models	Hyungjin Chung et.al.	2406.08070	null
2024-06-12	3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement	Sasidhar Alavala et.al.	2406.08048	null
2024-06-12	DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera	Senyan Xu et.al.	2406.07951	link
2024-06-11	HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness	Zihui Xue et.al.	2406.07754	null
2024-06-11	Zero-shot Image Editing with Reference Imitation	Xi Chen et.al.	2406.07547	null
2024-06-11	Beware of Aliases – Signal Preservation is Crucial for Robust Image Restoration	Shashank Agnihotri et.al.	2406.07435	null
2024-06-11	Missingness-resilient Video-enhanced Multimodal Disfluency Detection	Payal Mohapatra et.al.	2406.06964	link
2024-06-11	Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems	Jiawei Zhang et.al.	2406.06959	link
2024-06-10	NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing	Ting-Hsuan Chen et.al.	2406.06523	link
2024-06-10	FRAG: Frequency Adapting Group for Diffusion Video Editing	Sunjae Yoon et.al.	2406.06044	link
2024-06-09	PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction	Shangyu Chen et.al.	2406.05641	null
2024-06-08	Training-Free Robust Interactive Video Object Segmentation	Xiaoli Wei et.al.	2406.05485	null
2024-06-07	Optimal Eye Surgeon: Finding Image Priors through Sparse Generators at Initialization	Avrajit Ghosh et.al.	2406.05288	link
2024-06-07	Research on Tumors Segmentation based on Image Enhancement Method	Danyi Huang et.al.	2406.05170	null
2024-06-10	GenHeld: Generating and Editing Handheld Objects	Chaerin Min et.al.	2406.05059	link
2024-06-07	Zero-Shot Video Editing through Adaptive Sliding Score Distillation	Lianghan Zhu et.al.	2406.04888	null
2024-06-07	Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior	Tanvir Mahmud et.al.	2406.04873	link
2024-06-06	GenAI Arena: An Open Evaluation Platform for Generative Models	Dongfu Jiang et.al.	2406.04485	null
2024-06-06	Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning	Amandeep Kumar et.al.	2406.04413	link
2024-06-06	Diffusion-based image inpainting with internal learning	Nicolas Cherel et.al.	2406.04206	link
2024-06-06	LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression	Junhui Li et.al.	2406.03961	link
2024-06-06	JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits	Minzhou Pan et.al.	2406.03720	link
2024-06-06	Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting	Inkyu Shin et.al.	2406.02541	null
2024-06-04	Deep Block Proximal Linearised Minimisation Algorithm for Non-convex Inverse Problems	Chaoyan Huang et.al.	2406.02458	null
2024-06-03	DiffUHaul: A Training-Free Method for Object Dragging in Images	Omri Avrahami et.al.	2406.01594	null
2024-06-03	CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment	Daekyu Kwon et.al.	2406.01020	null
2024-06-03	MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models	Mingzhen Huang et.al.	2406.00985	null
2024-06-03	Assessing the Adversarial Security of Perceptual Hashing Algorithms	Jordan Madden et.al.	2406.00918	link
2024-06-02	Invisible Backdoor Attacks on Diffusion Models	Sen Li et.al.	2406.00816	link
2024-06-02	Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior	Yukai Shi et.al.	2406.00632	link
2024-06-02	Correlation Matching Transformation Transformers for UHD Image Restoration	Cong Wang et.al.	2406.00629	link
2024-06-01	FlowIE: Efficient Image Enhancement via Rectified Flow	Yixuan Zhu et.al.	2406.00508	link
2024-05-31	Learning Gaze-aware Compositional GAN	Nerea Aranjuelo et.al.	2405.20643	link
2024-05-30	MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion	Shuyuan Tu et.al.	2405.20325	link
2024-05-30	Sharing Key Semantics in Transformer Makes Efficient Image Restoration	Bin Ren et.al.	2405.20008	link
2024-05-30	All-In-One Medical Image Restoration via Task-Adaptive Routing	Zhiwen Yang et.al.	2405.19769	link
2024-05-30	Streaming Video Diffusion: Online Video Editing with Diffusion Models	Feng Chen et.al.	2405.19726	link
2024-05-30	Text Guided Image Editing with Automatic Concept Locating and Forgetting	Jia Li et.al.	2405.19708	null
2024-05-30	A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning	Xiaofeng Cong et.al.	2405.19684	null
2024-05-30	Creating Language-driven Spatial Variations of Icon Images	Xianghao Xu et.al.	2405.19636	null
2024-05-29	Blind Image Restoration via Fast Diffusion Inversion	Hamadi Chihaoui et.al.	2405.19572	link
2024-05-29	VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture	Heesup Yun et.al.	2405.19413	null
2024-05-28	3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting	Qihang Zhang et.al.	2405.18424	null
2024-05-28	RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives	Jaehong Yoon et.al.	2405.18406	link
2024-05-29	Color Shift Estimation-and-Correction for Image Enhancement	Yiyu Li et.al.	2405.17725	link
2024-05-27	Fast Samplers for Inverse Problems in Iterative Refinement Models	Kushagra Pandey et.al.	2405.17673	link
2024-05-27	Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection	Gihyun Kwon et.al.	2405.16823	null
2024-05-27	TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing	Xinyu Zhang et.al.	2405.16803	null
2024-05-27	PromptFix: You Prompt and We Fix the Photo	Yongsheng Yu et.al.	2405.16785	link
2024-05-26	I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models	Wenqi Ouyang et.al.	2405.16537	null
2024-05-26	Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models	Regev Cohen et.al.	2405.16475	null
2024-05-28	Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation	Jinlin Liu et.al.	2405.16393	null
2024-05-25	LEAST: “Local” text-conditioned image style transfer	Silky Singh et.al.	2405.16330	link
2024-05-25	ModelLock: Locking Your Model With a Spell	Yifeng Gao et.al.	2405.16285	null
2024-05-25	Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination	Shelly Golan et.al.	2405.16260	link
2024-05-25	Underwater Image Enhancement by Diffusion Model with Customized CLIP-Classifier	Shuaixin Liu et.al.	2405.16214	link
2024-05-24	FastDrag: Manipulate Anything in One Step	Xuanjia Zhao et.al.	2405.15769	link
2024-05-24	Hierarchical Uncertainty Exploration via Feedforward Posterior Trees	Elias Nehme et.al.	2405.15719	null
2024-05-24	Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition	Xiaogang Xu et.al.	2405.15660	null
2024-05-24	Efficient Degradation-aware Any Image Restoration	Eduard Zamfir et.al.	2405.15475	null
2024-05-24	Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features	Lichuan Ji et.al.	2405.15343	null
2024-05-24	Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion	Aoxue Li et.al.	2405.15313	null
2024-05-24	Blaze3DM: Marry Triplane Representation with Diffusion for 3D Medical Inverse Problem Solving	Jia He et.al.	2405.15241	null
2024-05-23	EditWorld: Simulating World Dynamics for Instruction-Following Image Editing	Ling Yang et.al.	2405.14785	link
2024-05-23	TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing	Teng Xu et.al.	2405.14455	null
2024-05-23	Efficient Visual State Space Model for Image Deblurring	Lingshun Kong et.al.	2405.14343	link
2024-05-22	ReVideo: Remake a Video with Motion and Content Control	Chong Mou et.al.	2405.13865	null
2024-05-22	Perceptual Fairness in Image Restoration	Guy Ohayon et.al.	2405.13805	null
2024-05-22	Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data	Tarun Kalluri et.al.	2405.13779	null
2024-05-22	InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos	Yujun Shi et.al.	2405.13722	link
2024-05-21	DARK: Denoising, Amplification, Restoration Kit	Zhuoheng Li et.al.	2405.12891	link
2024-05-21	Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image	Zerui Zhang et.al.	2405.12872	link
2024-05-21	EmoEdit: Evoking Emotions through Image Manipulation	Jingyuan Yang et.al.	2405.12661	null
2024-05-21	Customize Your Own Paired Data via Few-shot Way	Jinshu Chen et.al.	2405.12490	null
2024-05-20	Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices	Nathaniel Cohen et.al.	2405.12211	link
2024-05-20	A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator	Zhigang Jia et.al.	2405.12114	null
2024-05-19	Verification technology for finger vein biometric	George Kumi Kyeremeh et.al.	2405.11540	null
2024-05-19	Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement	Igor Morawski et.al.	2405.11478	null
2024-05-19	Emphasizing Crucial Features for Efficient Image Restoration	Hu Gao et.al.	2405.11468	link
2024-05-18	ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing	Ying Jin et.al.	2405.11190	null
2024-05-17	A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model	Mingxiang Fu et.al.	2405.10890	null
2024-05-17	LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion	Tong Chen et.al.	2405.10550	link
2024-05-16	RSDehamba: Lightweight Vision Mamba for Remote Sensing Satellite Image Dehazing	Huiling Zhou et.al.	2405.10030	null
2024-05-16	NTIRE 2024 Restore Any Image Model (RAIM) in the Wild Challenge	Jie Liang et.al.	2405.09923	null
2024-05-15	Inference in higher-order undirected graphical models and binary polynomial optimization	Aida Khajavirad et.al.	2405.09727	null
2024-05-15	Illumination Histogram Consistency Metric for Quantitative Assessment of Video Sequences	Long Chen et.al.	2405.09716	link
2024-05-15	RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content	Tianhao Peng et.al.	2405.08621	null
2024-05-14	WaterMamba: Visual State Space Model for Underwater Image Enhancement	Meisheng Guan et.al.	2405.08419	null
2024-05-14	Palette-based Color Transfer between Images	Chenlei Lv et.al.	2405.08263	null
2024-05-13	FRRffusion: Unveiling Authenticity with Diffusion-Based Face Retouching Reversal	Fengchuang Xing et.al.	2405.07582	link
2024-05-09	Diag2Diag: Multi modal super resolution for physics discovery with application to fusion	Azarakhsh Jalalvand et.al.	2405.05908	null
2024-05-09	DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation	Sitian Shen et.al.	2405.05800	null
2024-05-09	Exploring Text-Guided Single Image Editing for Remote Sensing Images	Fangzhou Han et.al.	2405.05769	link
2024-05-09	RPBG: Towards Robust Neural Point-based Graphics in the Wild	Qingtian Zhu et.al.	2405.05663	link
2024-05-08	Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection	Zhaoxiang Zhang et.al.	2405.04782	null
2024-05-07	Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing	Yi Zuo et.al.	2405.04496	null
2024-05-07	DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks	Jiaxin Zhang et.al.	2405.04408	link
2024-05-07	SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing	Yuying Ge et.al.	2405.04007	link
2024-05-06	Low-light Object Detection	Pengpeng Li et.al.	2405.03519	null
2024-05-06	Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement	Jiesong Bai et.al.	2405.03349	link
2024-05-06	Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance	Xunchu Zhou et.al.	2405.03333	link
2024-05-05	Residual-Conditioned Optimal Transport: Towards Structure-preserving Unpaired and Paired Image Restoration	Xiaole Tang et.al.	2405.02843	link
2024-05-04	Deep Image Restoration For Image Anti-Forensics	Eren Tahir et.al.	2405.02751	link
2024-05-06	SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising	Guanyiman Fu et.al.	2405.01726	link
2024-05-02	LocInv: Localization-aware Inversion for Text-Guided Image Editing	Chuanming Tang et.al.	2405.01496	link
2024-05-01	SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models	Burak Can Biner et.al.	2405.00878	null
2024-05-01	TexSliders: Diffusion-Based Texture Editing in CLIP Space	Julia Guerrero-Viu et.al.	2405.00672	null
2024-05-01	Streamlining Image Editing with Layered Diffusion Brushes	Peyman Gholami et.al.	2405.00313	null
2024-04-27	Remote Sensing Image Enhancement through Spatiotemporal Filtering	Hessah Albanwan et.al.	2404.18950	null
2024-04-29	Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle	Jungwoo Lee et.al.	2404.18395	null
2024-04-29	Reconstructing Satellites in 3D from Amateur Telescope Images	Zhiming Chang et.al.	2404.18394	null
2024-04-28	Paint by Inpaint: Learning to Add Image Objects by Removing Them First	Navve Wasserman et.al.	2404.18212	link
2024-04-27	DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images	Maria Mihaela Trusca et.al.	2404.18020	link
2024-04-27	FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder	Zheng Cheng et.al.	2404.17936	null
2024-05-02	Underwater Variable Zoom: Depth-Guided Perception Network for Underwater Image Enhancement	Zhixiong Huang et.al.	2404.17883	link
2024-04-26	Inhomogeneous illuminated image enhancement under extremely low visibility condition	Libang Chen et.al.	2404.17503	null
2024-04-26	Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model	Zhenghong Li et.al.	2404.17484	null
2024-04-26	PromptCIR: Blind Compressed Image Restoration with Prompt Learning	Bingchen Li et.al.	2404.17433	link
2024-04-26	One-Shot Image Restoration	Deborah Pereg et.al.	2404.17426	null
2024-04-26	Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement	Zishu Yao et.al.	2404.17400	link
2024-04-25	V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection	Xuanyu Zhang et.al.	2404.16824	null
2024-04-25	NTIRE 2024 Quality Assessment of AI-Generated Content Challenge	Xiaohong Liu et.al.	2404.16687	null
2024-04-25	AudioScenic: Audio-Driven Video Scene Editing	Kaixin Shen et.al.	2404.16581	null
2024-04-24	Editable Image Elements for Controllable Synthesis	Jiteng Mu et.al.	2404.16029	null
2024-04-26	A Survey on Visual Mamba	Hanwei Zhang et.al.	2404.15956	null
2024-04-26	A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution	Zhixiong Yang et.al.	2404.15620	link
2024-04-22	UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement	Yaofeng Xie et.al.	2404.14542	link
2024-04-22	GeoDiffuser: Geometry-Based Image Editing with Diffusion Models	Rahul Sajnani et.al.	2404.14403	null
2024-04-22	NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results	Xiaoning Liu et.al.	2404.14248	link
2024-04-22	Face2Face: Label-driven Facial Retouching Restoration	Guanhua Zhao et.al.	2404.14177	null
2024-04-22	Text in the Dark: Extremely Low-Light Text Image Enhancement	Che-Tsung Lin et.al.	2404.14135	link
2024-04-22	CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task	Kangzhen Yang et.al.	2404.14132	link
2024-04-22	MambaUIE&SR: Unraveling the Ocean’s Secrets with Only 2.8 FLOPs	Zhihao Chen et.al.	2404.13884	link
2024-04-23	LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation	Haoyu Zheng et.al.	2404.13558	null
2024-04-24	Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition	Genggeng Chen et.al.	2404.13537	link
2024-04-20	PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition	Xi Fang et.al.	2404.13299	null
2024-04-19	On-board classification of underwater images using hybrid classical-quantum CNN based method	Sreeraj Rajan Warrier et.al.	2404.13130	null
2024-04-18	GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models	Sai Sree Harsha et.al.	2404.12541	null
2024-04-18	AstroSat observations of interacting galaxies NGC 7469 and IC 5283	Abhinna Sundar Samantaray et.al.	2404.12527	null
2024-04-18	Lazy Diffusion Transformer for Interactive Image Editing	Yotam Nitzan et.al.	2404.12382	null
2024-04-18	Customizing Text-to-Image Diffusion with Camera Viewpoint Control	Nupur Kumari et.al.	2404.12333	null
2024-04-18	StyleBooth: Image Style Editing with Multimodal Instruction	Zhen Han et.al.	2404.12154	link
2024-04-18	Improving the perception of visual fiducial markers in the field using Adaptive Active Exposure Control	Ziang Ren et.al.	2404.12055	null
2024-04-18	FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models	Wei Wu et.al.	2404.11895	link
2024-04-17	CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration	Rui Deng et.al.	2404.11778	null
2024-04-17	AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adapters	Hao-Wei Chen et.al.	2404.11475	null
2024-04-17	TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing	Sherry X. Chen et.al.	2404.11120	link
2024-04-16	Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation	Wenjie Lin et.al.	2404.10358	null
2024-04-16	Referring Flexible Image Restoration	Runwei Guan et.al.	2404.10342	link
2024-04-17	OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model	Runyi Li et.al.	2404.10312	null
2024-04-15	Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets	Dai Quoc Tran et.al.	2404.10078	link
2024-04-15	HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing	Mude Hui et.al.	2404.09990	null
2024-04-15	Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model	Han Lin et.al.	2404.09967	null
2024-04-15	The Problem Of Image Super-Resolution, Denoising And Some Image Restoration Methods In Deep Learning Models	Ngoc-Giau Pham et.al.	2404.09817	null
2024-04-15	Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement	Wenyi Lian et.al.	2404.09735	link
2024-04-15	Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models	Ziwei Luo et.al.	2404.09732	link
2024-04-15	Real-world Instance-specific Image Goal Navigation for Service Robots: Bridging the Domain Gap with Contrastive Learning	Taichi Sakaguchi et.al.	2404.09645	null
2024-04-13	BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection	Jian Zhang et.al.	2404.08979	null
2024-04-13	Seeing Text in the Dark: Algorithm and Benchmark	Chengpei Xu et.al.	2404.08965	null
2024-04-11	S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing	Guangzhi Wang et.al.	2404.08111	null
2024-04-11	TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising	Junyi Li et.al.	2404.07846	link
2024-04-11	Joint Conditional Diffusion Model for Image Restoration with Mixed Degradations	Yufeng Yue et.al.	2404.07770	null
2024-04-11	Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method	Tashmoy Ghosh et.al.	2404.07649	null
2024-04-10	Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models	Yasi Zhang et.al.	2404.07389	null
2024-04-10	Unfolding ADMM for Enhanced Subspace Clustering of Hyperspectral Images	Xianlu Li et.al.	2404.07112	link
2024-04-08	NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement	Giordano Cicchetti et.al.	2404.05669	link
2024-04-08	Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models	Saman Motamed et.al.	2404.05519	null
2024-04-08	Comparative Analysis of Image Enhancement Techniques for Brain Tumor Segmentation: Contrast, Histogram, and Hybrid Approaches	Shoffan Saifullah et.al.	2404.05341	null
2024-04-08	CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement	Xu Wu et.al.	2404.05253	null
2024-04-07	STAIC regularization for spatio-temporal image reconstruction	Deepak G Skariah et.al.	2404.05070	null
2024-04-07	AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment	Yuanfeng Xu et.al.	2404.04946	null
2024-04-07	ByteEdit: Boost, Comply and Accelerate Generative Image Editing	Yuxi Ren et.al.	2404.04860	null
2024-04-09	Empowering Image Recovery_ A Multi-Attention Approach	Juan Wen et.al.	2404.04617	null
2024-04-05	ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing	Alec Helbling et.al.	2404.04376	link
2024-04-05	Physics-Inspired Synthesized Underwater Image Dataset	Reina Kaneko et.al.	2404.03998	link
2024-04-04	DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior	Yiming Zhang et.al.	2404.03642	null
2024-04-04	Reference-Based 3D-Aware Image Editing with Triplane	Bahri Batuhan Bilecen et.al.	2404.03632	null
2024-04-04	DI-Retinex: Digital-Imaging Retinex Theory for Low-Light Image Enhancement	Shangquan Sun et.al.	2404.03327	null
2024-04-03	Deep Image Composition Meets Image Forgery	Eren Tahir et.al.	2404.02897	link
2024-04-03	MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation	Petru-Daniel Tudosiu et.al.	2404.02790	null
2024-04-02	Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration	Akshay Dudhane et.al.	2404.02154	link
2024-04-02	3D Congealing: 3D-Aware Image Alignment in the Wild	Yunzhi Zhang et.al.	2404.02125	null
2024-04-02	Specularity Factorization for Low-Light Enhancement	Saurabh Saini et.al.	2404.01998	null
2024-04-02	Fashion Style Editing with Generative Human Prior	Chaerin Kong et.al.	2404.01984	null
2024-04-03	RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement	Tatiana Gaintseva et.al.	2404.01889	link
2024-04-01	An image speaks a thousand words, but can everyone listen? On translating images for cultural relevance	Simran Khanuja et.al.	2404.01247	link
2024-04-01	Uncovering the Text Embedding in Text-to-Image Diffusion Models	Hu Yu et.al.	2404.01154	null
2024-04-01	CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment	Hyeongmin Lee et.al.	2404.01123	null
2024-04-01	Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach	Guoqiang Liang et.al.	2404.00834	null
2024-03-31	GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration	Youssef Mansour et.al.	2404.00807	null
2024-03-29	Binarized Low-light Raw Video Enhancement	Gengchen Zhang et.al.	2403.19944	link
2024-03-28	GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models	Yusuf Dalva et.al.	2403.19645	null
2024-03-28	Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality	Kyotaro Tokoro et.al.	2403.19428	link
2024-03-28	Taming Lookup Tables for Efficient Image Retouching	Sidi Yang et.al.	2403.19238	link
2024-03-28	A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement	Junjie Wen et.al.	2403.19079	null
2024-03-27	TextCraftor: Your Text Encoder Can be Image Quality Controller	Yanyu Li et.al.	2403.18978	null
2024-03-27	ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion	Daniel Winter et.al.	2403.18818	null
2024-03-27	Towards Image Ambient Lighting Normalization	Florin-Alexandru Vasluianu et.al.	2403.18730	link
2024-03-27	InstructBrush: Learning Attention-based Instruction Optimization for Image Editing	Ruoyu Zhao et.al.	2403.18660	null
2024-03-28	FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing	Trong-Tung Nguyen et.al.	2403.18605	null
2024-03-26	Bidirectional Consistency Models	Liangchen Li et.al.	2403.18035	link
2024-03-26	Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models	Mohammad Shahab Sepehri et.al.	2403.17902	null
2024-03-26	ExpressEdit: Video Editing with Natural Language and Sketching	Bekzat Tilekbay et.al.	2403.17693	null
2024-03-26	SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder	Dihan Zheng et.al.	2403.17502	link
2024-03-26	Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching	Shohei Enomoto et.al.	2403.17423	null
2024-03-26	Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance	Donghoon Ahn et.al.	2403.17377	link
2024-03-25	Residual Dense Swin Transformer for Continuous Depth-Independent Ultrasound Imaging	Jintong Hu et.al.	2403.16384	link
2024-03-25	Distilling Semantic Priors from SAM to Efficient Image Restoration Models	Quan Zhang et.al.	2403.16368	null
2024-03-24	EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing	Xiangpeng Yang et.al.	2403.16111	null
2024-03-24	Edit3K: Universal Representation Learning for Video Editing Components	Xin Gu et.al.	2403.16048	null
2024-03-23	Graph Image Prior for Unsupervised Dynamic MRI Reconstruction	Zhongsen Li et.al.	2403.15770	link
2024-03-22	MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis	Mai A. Shaaban et.al.	2403.15585	link
2024-03-22	Latent Neural Cellular Automata for Resource-Efficient Image Restoration	Andrea Menta et.al.	2403.15525	null
2024-03-22	Medical Image Data Provenance for Medical Cyber-Physical System	Vijay Kumar et.al.	2403.15522	null
2024-03-21	Osmosis: RGBD Diffusion Prior for Underwater Image Restoration	Opher Bar Nathan et.al.	2403.14837	null
2024-03-25	Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing	Alberto Baldrati et.al.	2403.14828	link
2024-03-21	StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text	Roberto Henschel et.al.	2403.14773	link
2024-03-22	Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion	Xiang Fan et.al.	2403.14617	null
2024-03-21	AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation	Yuning Cui et.al.	2403.14614	link
2024-03-21	ReNoise: Real Image Inversion Through Iterative Noising	Daniel Garibi et.al.	2403.14602	null
2024-03-21	DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing	Yueru Jia et.al.	2403.14487	link
2024-03-22	AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks	Max Ku et.al.	2403.14468	link
2024-03-20	Step-Calibrated Diffusion for Biomedical Optical Image Restoration	Yiwei Lyu et.al.	2403.13680	link
2024-03-20	Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing	Hangeol Chang et.al.	2403.13551	link
2024-03-20	A multilevel framework for accelerating uSARA in radio-interferometric imaging	Guillaume Lauga et.al.	2403.13385	null
2024-03-22	Mora: Enabling Generalist Video Generation via A Multi-Agent Framework	Zhengqing Yuan et.al.	2403.13248	link
2024-03-19	Multispectral Image Restoration by Generalized Opponent Transformation Total Variation	Zhantao Ma et.al.	2403.12770	null
2024-03-19	LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing	Yazeed Alharbi et.al.	2403.12585	null
2024-03-19	Generalized Consistency Trajectory Models for Image Manipulation	Beomsu Kim et.al.	2403.12510	link
2024-03-18	Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation	Axel Sauer et.al.	2403.12015	null
2024-03-18	DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing	Hyeonho Jeong et.al.	2403.12002	null
2024-03-18	LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model	Runhui Huang et.al.	2403.11929	null
2024-03-18	View-Consistent 3D Editing with Gaussian Splatting	Yuxuan Wang et.al.	2403.11868	null
2024-03-18	EffiVED:Efficient Video Editing via Text-instruction Diffusion Models	Zhenghao Zhang et.al.	2403.11568	link
2024-03-18	End-To-End Underwater Video Enhancement: Dataset and Model	Dazhao Du et.al.	2403.11506	link
2024-03-18	Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors	Ruicheng Wang et.al.	2403.11503	null
2024-03-18	CasSR: Activating Image Power for Real-World Image Super-Resolution	Haolan Chen et.al.	2403.11451	null
2024-03-18	VmambaIR: Visual State Space Model for Image Restoration	Yuan Shi et.al.	2403.11423	link
2024-03-18	DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation	Jeongsol Kim et.al.	2403.11415	link
2024-03-18	Divide-and-Conquer Posterior Sampling for Denoising Diffusion Priors	Yazid Janati et.al.	2403.11407	link
2024-03-17	Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model	Dian Zheng et.al.	2403.11157	link
2024-03-17	Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models	Ruibin Li et.al.	2403.11105	link
2024-03-16	A Spectrum-based Image Denoising Method with Edge Feature Enhancement	Peter Luvton et.al.	2403.11036	null
2024-03-15	How Powerful Potential of Attention on Image Restoration?	Cong Wang et.al.	2403.10336	null
2024-03-15	BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution	Feng Li et.al.	2403.10211	link
2024-03-15	E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance	Tianrui Huang et.al.	2403.10133	null
2024-03-15	PQDynamicISP: Dynamically Controlled Image Signal Processor for Any Image Sensors Pursuing Perceptual Quality	Masakazu Yoshimura et.al.	2403.10091	null
2024-03-15	ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images	Xiangtian Xue et.al.	2403.10004	null
2024-03-14	Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing	Wonjun Kang et.al.	2403.09468	link
2024-03-14	Video Editing via Factorized Diffusion Distillation	Uriel Singer et.al.	2403.09334	null
2024-03-14	D-YOLO a robust framework for object detection in adverse weather conditions	Zihan Chu et.al.	2403.09233	null
2024-03-13	7T MRI Synthesization from 3T Acquisitions	Qiming Cui et.al.	2403.08979	link
2024-03-13	FogGuard: guarding YOLO against fog using perceptual loss	Soheil Gharatappeh et.al.	2403.08939	link
2024-03-13	DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation	Minbin Huang et.al.	2403.08857	null
2024-03-13	VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis	Enric Corona et.al.	2403.08764	null
2024-03-13	iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer	Dinh-Khoi Vo et.al.	2403.08746	link
2024-03-13	Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Corrupted Data	Asad Aali et.al.	2403.08728	link
2024-03-13	Make Me Happier: Evoking Emotions Through Image Diffusion Models	Qing Lin et.al.	2403.08255	null
2024-03-12	Pix2Pix-OnTheFly: Leveraging LLMs for Instruction-Guided Image Editing	Rodrigo Santos et.al.	2403.08004	null
2024-03-12	Multiple Latent Space Mapping for Compressed Dark Image Enhancement	Yi Zeng et.al.	2403.07622	null
2024-03-12	Imagine a dragon made of seaweed: How images enhance learning in Wikipedia	Anita Silva et.al.	2403.07613	null
2024-03-12	NightHaze: Nighttime Image Dehazing via Self-Prior Learning	Beibei Lin et.al.	2403.07408	null
2024-03-12	Efficient Diffusion Model for Image Restoration by Residual Shifting	Zongsheng Yue et.al.	2403.07319	link
2024-03-12	Continual All-in-One Adverse Weather Removal with Knowledge Replay on a Unified Network Structure	De Cheng et.al.	2403.07292	link
2024-03-11	Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions	Lan Wang et.al.	2403.07198	null
2024-03-11	DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation	Guosheng Zhao et.al.	2403.06845	null
2024-03-11	Boosting Image Restoration via Priors from Pre-trained Models	Xiaogang Xu et.al.	2403.06793	null
2024-03-11	Comparison of No-Reference Image Quality Models via MAP Estimation in Diffusion Latents	Weixia Zhang et.al.	2403.06406	null
2024-03-10	FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing	Youyuan Zhang et.al.	2403.06269	null
2024-03-10	Textureless Object Recognition: An Edge-based Approach	Frincy Clement et.al.	2403.06107	null
2024-03-10	Universal Debiased Editing for Fair Medical Image Classification	Ruinan Jin et.al.	2403.06104	link
2024-03-10	Reframe Anything: LLM Agent for Open World Video Reframing	Jiawang Cao et.al.	2403.06070	null
2024-03-10	Implicit Image-to-Image Schrodinger Bridge for CT Super-Resolution and Denoising	Yuang Wang et.al.	2403.06069	link
2024-03-12	Decoupled Data Consistency with Diffusion Purification for Image Restoration	Xiang Li et.al.	2403.06054	link
2024-03-09	Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration	Jingyun Xue et.al.	2403.05906	null
2024-03-08	InstructGIE: Towards Generalizable Image Editing	Zichong Meng et.al.	2403.05018	null
2024-03-07	An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control	Aosong Feng et.al.	2403.04880	link
2024-03-07	FriendNet: Detection-Friendly Dehazing Network	Yihua Fan et.al.	2403.04443	link
2024-03-07	StableDrag: Stable Dragging for Point-based Image Editing	Yutao Cui et.al.	2403.04437	null
2024-03-07	Image enhancement algorithm for absorption imaging	Pengcheng Zheng et.al.	2403.04240	null
2024-03-06	Low-Dose CT Image Reconstruction by Fine-Tuning a UNet Pretrained for Gaussian Denoising for the Downstream Task of Image Enhancement	Tim Selig et.al.	2403.03551	null
2024-03-06	Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing	Bingyan Liu et.al.	2403.03431	null
2024-03-05	Doubly Abductive Counterfactual Inference for Text-based Image Editing	Xue Song et.al.	2403.02981	link
2024-03-05	Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement	Jinhong He et.al.	2403.02879	null
2024-03-05	Speckle Noise Reduction in Ultrasound Images using Denoising Auto-encoder with Skip Connection	Suraj Bhute et.al.	2403.02750	null
2024-03-04	A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement	Ruirui Lin et.al.	2403.02408	null
2024-03-03	Learning A Physical-aware Diffusion Model Based on Transformer for Underwater Image Enhancement	Chen Zhao et.al.	2403.01497	link
2024-03-02	Extrapolated Plug-and-Play Three-Operator Splitting Methods for Nonconvex Optimization with Applications to Image Restoration	Zhongming Wu et.al.	2403.01144	link
2024-03-02	Edge-guided Low-light Image Enhancement with Inertial Bregman Alternating Linearized Minimization	Chaoyan Huang et.al.	2403.01142	null
2024-03-01	LoMOE: Localized Multi-Object Editing via Multi-Diffusion	Goirik Chakrabarty et.al.	2403.00437	null
2024-03-01	ChartReformer: Natural Language-Driven Chart Image Editing	Pengyu Yan et.al.	2403.00209	link
2024-02-28	Misalignment-Robust Frequency Distribution Loss for Image Transformation	Zhangkai Ni et.al.	2402.18192	link
2024-02-28	A Lightweight Low-Light Image Enhancement Network via Channel Prior and Gamma Correction	Shyang-En Weng et.al.	2402.18147	null
2024-02-28	Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction	Koki Maeda et.al.	2402.17969	null
2024-02-26	Randomized Algorithms for Solving Singular Value Decomposition Problems with Matlab Toolbox	Xiaowen Li et.al.	2402.17794	null
2024-02-27	Diffusion Model-Based Image Editing: A Survey	Yi Huang et.al.	2402.17525	link
2024-02-27	Learning Exposure Correction in Dynamic Scenes	Jin Liu et.al.	2402.17296	link
2024-02-25	Diffusion Posterior Proximal Sampling for Image Restoration	Hongjie Wu et.al.	2402.16907	link
2024-02-26	Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing	Ling Yang et.al.	2402.16627	link
2024-02-25	ARIN: Adaptive Resampling and Instance Normalization for Robust Blind Inpainting of Dunhuang Cave Paintings	Alexander Schmidt et.al.	2402.16188	null
2024-02-25	An Image Enhancement Method for Improving Small Intestinal Villi Clarity	Shaojie Zhang et.al.	2402.15977	null
2024-02-24	Sandwich GAN: Image Reconstruction from Phase Mask based Anti-dazzle Imaging	Xiaopeng Peng et.al.	2402.15919	null
2024-02-24	HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models	Li Pang et.al.	2402.15865	link
2024-02-24	IRConStyle: Image Restoration Framework Using Contrastive Learning and Style Transfer	Dongqi Fan et.al.	2402.15784	link
2024-02-23	MambaIR: A Simple Baseline for Image Restoration with State-Space Model	Hang Guo et.al.	2402.15648	link
2024-02-26	LLMBind: A Unified Modality-Task Integration Framework	Bin Zhu et.al.	2402.14891	link
2024-02-22	Consolidating Attention Features for Multi-view Image Editing	Or Patashnik et.al.	2402.14792	null
2024-02-22	Place Anything into Any Video	Ziling Liu et.al.	2402.14316	null
2024-02-21	Adversarial Purification and Fine-tuning for Robust UDC Image Restoration	Zhenbo Song et.al.	2402.13629	null
2024-02-22	UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing	Jianhong Bai et.al.	2402.13185	null
2024-02-21	Robust-Wide: Robust Watermarking against Instruction-driven Image Editing	Runyi Hu et.al.	2402.12688	link
2024-02-19	Integrating kNN with Foundation Models for Adaptable and Privacy-Aware Image Classification	Sebastian Doerrich et.al.	2402.12500	link
2024-02-19	Human Video Translation via Query Warping	Haiming Zhu et.al.	2402.12099	null
2024-02-08	Text2Data: Low-Resource Data Generation with Textual Control	Shiyu Wang et.al.	2402.10941	null
2024-02-15	LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing	Bryan Wang et.al.	2402.10294	null
2024-02-15	Seed Optimization with Frozen Generator for Superior Zero-shot Low-light Enhancement	Yuxuan Gu et.al.	2402.09694	null
2024-02-14	DestripeCycleGAN: Stripe Simulation CycleGAN for Unsupervised Infrared Image Destriping	Shiqi Yang et.al.	2402.09101	null
2024-02-05	Point and Instruct: Enabling Precise Image Editing by Unifying Direct Manipulation and Text Instructions	Alec Helbling et.al.	2402.07925	null
2024-02-12	Tutorial: Shaping the Spatial Correlations of Entangled Photon Pairs	Patrick Cameron et.al.	2402.07667	null
2024-02-10	Gyroscope-Assisted Motion Deblurring Network	Simin Luan et.al.	2402.06854	link
2024-02-08	You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement	Yixu Feng et.al.	2402.05809	link
2024-02-08	Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application	Bumsoo Kim et.al.	2402.05448	null
2024-02-08	Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model	Junghun Cha et.al.	2402.05350	null
2024-02-07	Noise Map Guidance: Inversion with Spatial Context for Real Image Editing	Hansam Cho et.al.	2402.04625	link
2024-02-07	Troublemaker Learning for Low-Light Image Enhancement	Yinghao Song et.al.	2402.04584	link
2024-02-14	U-shaped Vision Mamba for Single Image Dehazing	Zhuoran Zheng et.al.	2402.04139	link
2024-02-08	Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction	Shijun Liang et.al.	2402.04097	null
2024-02-05	Rethinking RGB Color Representation for Image Restoration Models	Jaerin Lee et.al.	2402.03399	null
2024-02-05	Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing	Yan Shu et.al.	2402.03082	link
2024-02-05	InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions	Yiyuan Zhang et.al.	2402.03040	link
2024-02-05	Knowledge-driven deep learning for fast MR imaging: undersampled MR image reconstruction from supervised to un-supervised learning	Shanshan Wang et.al.	2402.02704	null
2024-02-04	Key-Graph Transformer for Image Restoration	Bin Ren et.al.	2402.02634	null
2024-02-04	DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing	Chong Mou et.al.	2402.02583	link
2024-02-04	Exploring Intrinsic Properties of Medical Images for Self-Supervised Binary Semantic Segmentation	Pranav Singh et.al.	2402.02367	null
2024-02-04	Video Editing for Video Retrieval	Bin Zhu et.al.	2402.02335	null
2024-02-03	S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation	Yurui Chen et.al.	2402.02112	null
2024-02-03	BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement	Nantheera Anantrasirichai et.al.	2402.01970	link
2024-02-02	LIR: Efficient Degradation Removal for Lightweight Image Restoration	Dongqi Fan et.al.	2402.01368	link
2024-01-31	Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators	Daniel Geng et.al.	2401.18085	null
2024-01-31	Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models	Kyungsung Lee et.al.	2401.17629	null
2024-01-31	Task-Oriented Diffusion Model Compression	Geonung Kim et.al.	2401.17547	null
2024-01-30	Anything in Any Scene: Photorealistic Video Object Insertion	Chen Bai et.al.	2401.17509	null
2024-01-30	LATENTPATCH: A Non-Parametric Approach for Face Generation and Editing	Benjamin Samuth et.al.	2401.16830	null
2024-01-31	High-Quality Image Restoration Following Human Instructions	Marcos V. Conde et.al.	2401.16468	link
2024-01-30	Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation	Zhenyu Wang et.al.	2401.15688	null
2024-01-28	CPDM: Content-Preserving Diffusion Model for Underwater Image Enhancement	Xiaowen Shi et.al.	2401.15649	null
2024-01-28	UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration	Nachuan Ma et.al.	2401.15647	null
2024-01-26	CascadedGaze: Efficiency in Global Context Extraction for Image Restoration	Amirhosein Ghasemabadi et.al.	2401.15235	link
2024-01-30	LYT-Net: Lightweight YUV Transformer-based Network for Low-Light Image Enhancement	A. Brateanu et.al.	2401.15204	link
2024-01-25	Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks	Tianhe Ren et.al.	2401.14159	link
2024-01-30	CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion	Nisha Huang et.al.	2401.14066	link
2024-01-24	Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild	Fanghua Yu et.al.	2401.13627	null
2024-01-29	Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval	Dezhao Luo et.al.	2401.13329	null
2024-01-24	Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration	Yimin Xu et.al.	2401.13221	link
2024-01-23	CCA: Collaborative Competitive Agents for Image Editing	Tiankai Hang et.al.	2401.13011	link
2024-01-23	CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data	Chandrakanth Gudavalli et.al.	2401.13006	null
2024-01-23	Lumiere: A Space-Time Diffusion Model for Video Generation	Omer Bar-Tal et.al.	2401.12945	null
2024-01-21	Text-to-Image Cross-Modal Generation: A Systematic Review	Maciej Żelaszczyk et.al.	2401.11631	null
2024-01-21	LLMRA: Multi-modal Large Language Model based Restoration Assistant	Xiaoyu Jin et.al.	2401.11401	null
2024-01-19	MixNet: Towards Effective and Efficient UHD Low-Light Image Enhancement	Chen Wu et.al.	2401.10666	link
2024-01-18	M3BUNet: Mobile Mean Max UNet for Pancreas Segmentation on CT-Scans	Juwita juwita et.al.	2401.10419	null
2024-01-18	Edit One for All: Interactive Batch Image Editing	Thao Nguyen et.al.	2401.10219	null
2024-01-18	WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens	Xiaofeng Wang et.al.	2401.09985	null
2024-01-20	Boosting Few-Shot Semantic Segmentation Via Segment Anything Model	Chen-Bin Feng et.al.	2401.09826	null
2024-01-18	Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing	Gwanhyeong Koo et.al.	2401.09794	null
2024-01-16	Deep Linear Array Pushbroom Image Restoration: A Degradation Pipeline and Jitter-Aware Restoration Network	Zida Chen et.al.	2401.08171	link
2024-01-15	Low-light Stereo Image Enhancement and De-noising in the Low-frequency Information Enhanced Image Space	Minghua Zhao et.al.	2401.07753	link
2024-01-15	Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks	Siyu Zou et.al.	2401.07709	link
2024-01-13	Exploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial Transferability	Junxi Chen et.al.	2401.07087	null
2024-01-12	LiDAR Depth Map Guided Image Compression Model	Alessandro Gnutti et.al.	2401.06517	null
2024-01-12	RotationDrag: Point-based Image Editing with Rotated Diffusion Features	Minxing Luo et.al.	2401.06442	link
2024-01-11	E $^{2}$ GAN: Efficient Training of Efficient GANs for Image-to-Image Translation	Yifan Gong et.al.	2401.06127	null
2024-01-11	Object-Centric Diffusion for Efficient Video Editing	Kumara Kahatapitiya et.al.	2401.05735	null
2024-01-10	Content-Aware Depth-Adaptive Image Restoration	Tom Richard Vargis et.al.	2401.05049	null
2024-01-10	Structure-focused Neurodegeneration Convolutional Neural Network for Modeling and Classification of Alzheimer’s Disease	Simisola Odimayo et.al.	2401.03922	link
2024-01-08	Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion	Minglong Xue et.al.	2401.03788	link
2024-01-07	SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing	Songyan Chen et.al.	2401.03433	link
2024-01-07	Towards Effective Multiple-in-One Image Restoration: A Sequential and Prompt Learning Strategy	Xiangtao Kong et.al.	2401.03379	link
2024-01-06	MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond	Yupei Lin et.al.	2401.03221	null
2024-01-05	Generating Non-Stationary Textures using Self-Rectification	Yang Zhou et.al.	2401.02847	link
2024-01-05	Analysis of a wavelet frame based two-scale model for enhanced edges	Bin Dong et.al.	2401.02688	null
2024-01-05	FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF	Hao Zhang et.al.	2401.02616	link
2024-01-04	VASE: Object-Centric Appearance and Shape Manipulation of Real Videos	Elia Peruzzo et.al.	2401.02473	null
2024-01-04	Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain	Xuanhua He et.al.	2401.02161	link
2024-01-04	Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance	Jiacheng Wang et.al.	2401.02126	link
2024-01-03	Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions	David Junhao Zhang et.al.	2401.01827	link
2024-01-03	AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image Enhancement	Kang Fu et.al.	2401.01569	null
2024-01-01	Bracketing is All You Need: Unifying Image Restoration and Enhancement Tasks with Multi-Exposure Images	Zhilu Zhang et.al.	2401.00766	link
2024-01-01	From Covert Hiding to Visual Editing: Robust Generative Video Steganography	Xueying Mao et.al.	2401.00652	null
2023-12-31	UGPNet: Universal Generative Prior for Image Restoration	Hwayoon Lee et.al.	2401.00370	null
2024-01-02	USFM: A Universal Ultrasound Foundation Model Generalized to Tasks and Organs towards Label Efficient Image Analysis	Jing Jiao et.al.	2401.00153	null
2023-12-30	CamPro: Camera-based Anti-Facial Recognition	Wenjun Zhu et.al.	2401.00151	link
2023-12-28	Improving Image Restoration through Removing Degradations in Textual Representations	Jingbo Lin et.al.	2312.17334	link
2023-12-28	Personalized Restoration via Dual-Pivot Tuning	Pradyumna Chari et.al.	2312.17234	null
2023-12-28	Restoration by Generation with Constrained Priors	Zheng Ding et.al.	2312.17161	null
2023-12-29	DarkShot: Lighting Dark Images with Low-Compute and High-Quality	Jiazhang Zheng et.al.	2312.16805	null
2023-12-28	ZONE: Zero-Shot Instruction-Guided Local Editing	Shanglin Li et.al.	2312.16794	link
2023-12-27	Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation	Rongyu Zhang et.al.	2312.16610	null
2023-12-27	Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance	Tomer Garber et.al.	2312.16519	link
2023-12-27	A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss	Xiao Fang et.al.	2312.16498	link
2023-12-30	A Survey on Super Resolution for video Enhancement Using GAN	Ankush Maity et.al.	2312.16471	null
2023-12-27	Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer	Yongsong Huang et.al.	2312.16455	null
2023-12-26	Geometric-Aware Low-Light Image and Video Enhancement via Depth Guidance	Yingqi Lin et.al.	2312.15855	null
2023-12-25	High-Fidelity Diffusion-based Image Editing	Chen Hou et.al.	2312.15707	null
2023-12-25	Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration	Jiahong Fu et.al.	2312.15701	link
2023-12-25	MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility	Ahsan Baidar Bakht et.al.	2312.15633	null
2023-12-24	Perception-Distortion Balanced Super-Resolution: A Multi-Objective Optimization Perspective	Lingchen Sun et.al.	2312.15408	link
2023-12-23	Revealing Shadows: Low-Light Image Enhancement Using Self-Calibrated Illumination	Farzaneh Koohestani et.al.	2312.15199	null
2023-12-22	UniHuman: A Unified Model for Editing Human Images in the Wild	Nannan Li et.al.	2312.14985	link
2023-12-22	Tuning-Free Inversion-Enhanced Control for Consistent Image Editing	Xiaoyue Duan et.al.	2312.14611	null
2023-12-22	StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors	Wanchao Su et.al.	2312.14389	null
2023-12-22	Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation	Chaowei Fang et.al.	2312.14387	null
2023-12-22	Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal	Yicheng Leng et.al.	2312.14383	null
2023-12-20	Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis	Bichen Wu et.al.	2312.13834	null
2023-12-22	AppAgent: Multimodal Agents as Smartphone Users	Chi Zhang et.al.	2312.13771	link
2023-12-21	HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in Image Editing via Hypernetworks	Hai Zhang et.al.	2312.13537	link
2023-12-20	Texture Matching GAN for CT Image Enhancement	Madhuri Nagare et.al.	2312.13422	null
2023-12-20	ClassLIE: Structure- and Illumination-Adaptive Classification for Low-Light Image Enhancement	Zixiang Wei et.al.	2312.13265	null
2023-12-21	RadEdit: stress-testing biomedical vision models via diffusion image editing	Fernando Pérez-García et.al.	2312.12865	null
2023-12-20	ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement	Yuhui Wu et.al.	2312.12826	null
2023-12-21	RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing	Shutong Jin et.al.	2312.12635	null
2023-12-19	Fixed-point Inversion for Text-to-image diffusion models	Barak Meiri et.al.	2312.12540	link
2023-12-19	Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion	Fan Zhang et.al.	2312.12471	link
2023-12-19	MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers	Haoyu Ma et.al.	2312.12468	null
2023-12-18	Ultrasound Image Enhancement using CycleGAN and Perceptual Loss	Shreeram Athreya et.al.	2312.11748	link
2023-12-18	TIP: Text-Driven Image Processing with Semantic and Restoration Instructions	Chenyang Qi et.al.	2312.11595	null
2023-12-18	Warping the Residuals for Image Editing with StyleGAN	Ahmet Burak Yildirim et.al.	2312.11422	null
2023-12-18	MAG-Edit: Localized Image Editing in Complex Scenarios via $\underline{M}$ask-Based $\underline{A}$ttention-Adjusted $\underline{G}$ uidance	Qi Mao et.al.	2312.11396	null
2023-12-18	CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update	Zhi Gao et.al.	2312.10908	null
2023-12-17	Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models	Nikita Starodubcev et.al.	2312.10835	link
2023-12-17	Latent Space Editing in Transformer-Based Flow Matching	Vincent Tao Hu et.al.	2312.10825	null
2023-12-17	Bengali License Plate Recognition: Unveiling Clarity with CNN and GFP-GAN	Noushin Afrin et.al.	2312.10701	link
2023-12-19	VidToMe: Video Token Merging for Zero-Shot Video Editing	Xirui Li et.al.	2312.10656	link
2023-12-16	Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge	Conghan Yue et.al.	2312.10299	link
2023-12-15	Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation	Qin Guo et.al.	2312.10113	link
2023-12-15	Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement	Xiaofeng Zhang et.al.	2312.10109	link
2023-12-15	A Case Study of Image Enhancement Algorithms’ Effectiveness of Improving Neural Networks’ Performance on Adverse Images	Jonathan Sanderson et.al.	2312.09509	null
2023-12-15	System Integration of Xilinx DPU and HDMI for Real-Time inference in PYNQ Environment with Image Enhancement	Jonathan Sanderson et.al.	2312.09506	null
2023-12-15	Image Deblurring using GAN	Zhengdong Li et.al.	2312.09496	null
2023-12-14	LIME: Localized Image Editing via Attention Regularization in Diffusion Models	Enis Simsar et.al.	2312.09256	null
2023-12-14	SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds	Minghao Chen et.al.	2312.09246	null
2023-12-14	Guided Image Restoration via Simultaneous Feature and Image Guided Fusion	Xinyi Liu et.al.	2312.08853	null
2023-12-14	VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook	Wenbin Zou et.al.	2312.08606	link
2023-12-13	A Compact and Semantic Latent Space for Disentangled and Controllable Image Editing	Gwilherm Lesné et.al.	2312.08256	link
2023-12-13	EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset	Peiqi Duan et.al.	2312.08220	null
2023-12-13	Clockwork Diffusion: Efficient Generation With Model-Step Distillation	Amirhossein Habibian et.al.	2312.08128	link
2023-12-13	AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing	Zhiyuan Ma et.al.	2312.08019	link
2023-12-13	CoIE: Chain-of-Instruct Editing for Multi-Attribute Face Manipulation	Zhenduo Zhang et.al.	2312.07879	null
2023-12-13	Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements	Gaurav Shrivastava et.al.	2312.07835	null
2023-12-12	Uncertainty Visualization via Low-Dimensional Posterior Projections	Omer Yair et.al.	2312.07804	link
2023-12-12	Hyper-Restormer: A General Hyperspectral Image Restoration Transformer for Remote Sensing Imaging	Yo-Yu Lai et.al.	2312.07016	null
2023-12-12	DGNet: Dynamic Gradient-guided Network with Noise Suppression for Underwater Image Enhancement	Jingchun Zhou et.al.	2312.06999	null
2023-12-12	IA2U: A Transfer Plugin with Multi-Prior for In-Air Model to Underwater	Jingchun Zhou et.al.	2312.06955	null
2023-12-12	WaterHE-NeRF: Water-ray Tracing Neural Radiance Fields for Underwater Scene Reconstruction	Jingchun Zhou et.al.	2312.06946	null
2023-12-11	SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models	Yuzhou Huang et.al.	2312.06739	link
2023-12-11	Learning to See Low-Light Images via Feature Domain Adaptation	Qirui Yang et.al.	2312.06723	null
2023-12-10	Neutral Editing Framework for Diffusion-based Video Editing	Sunjae Yoon et.al.	2312.06708	null
2023-12-11	UIEDP:Underwater Image Enhancement with Diffusion Prior	Dazhao Du et.al.	2312.06240	link
2023-12-11	DisControlFace: Disentangled Control for Personalized Facial Image Editing	Haozhe Jia et.al.	2312.06193	null
2023-12-11	Textual Prompt Guided Image Restoration	Qiuhai Yan et.al.	2312.06162	link
2023-12-10	A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing	Maomao Li et.al.	2312.05856	link
2023-12-09	BARET : Balanced Attention based Real image Editing driven by Target-text Inversion	Yuming Qiao et.al.	2312.05482	null
2023-12-08	NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models	Yusuf Dalva et.al.	2312.05390	null
2023-12-08	Learning 3D Particle-based Simulators from RGB-D Videos	William F. Whitney et.al.	2312.05359	null
2023-12-08	Fine Dense Alignment of Image Bursts through Camera Pose and Depth Estimation	Bruno Lecouat et.al.	2312.05190	null
2023-12-08	Prompt-In-Prompt Learning for Universal Image Restoration	Zilong Li et.al.	2312.05038	link
2023-12-08	Decoupling Degradation and Content Processing for Adverse Weather Image Restoration	Xi Wang et.al.	2312.05006	null
2023-12-07	Inversion-Free Image Editing with Natural Language	Sihan Xu et.al.	2312.04965	link
2023-12-07	GenDeF: Learning Generative Deformation Field for Video Generation	Wen Wang et.al.	2312.04561	null
2023-12-07	RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models	Ozgur Kara et.al.	2312.04524	link
2023-12-07	Ricci-Notation Tensor Framework for Model-Based Approaches to Imaging	Dileepan Joseph et.al.	2312.04018	link
2023-12-06	A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement	Risab Biswas et.al.	2312.03946	link
2023-12-06	FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability	Linze Li et.al.	2312.03775	null
2023-12-05	DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing	Shao-Yu Chang et.al.	2312.03772	null
2023-12-07	Intrinsic Harmonization for Illumination-Aware Compositing	Chris Careaga et.al.	2312.03698	link
2023-12-06	Training Neural Networks on RAW and HDR Images for Restoration Tasks	Lei Luo et.al.	2312.03640	link
2023-12-06	Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention	Jianjin Xu et.al.	2312.03556	null
2023-12-05	MagicStick: Controllable Video Editing via Control Handle Transformations	Yue Ma et.al.	2312.03047	link
2023-12-05	Drag-A-Video: Non-rigid Video Editing with Point-based Interaction	Yao Teng et.al.	2312.02936	null
2023-12-05	Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration	Yuang Ai et.al.	2312.02918	null
2023-12-05	BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models	Fengyuan Shi et.al.	2312.02813	link
2023-12-05	Deep-learning-driven end-to-end metalens imaging	Joonhyuk Seo et.al.	2312.02669	link
2023-12-05	GeNIe: Generative Hard Negative Images Through Diffusion	Soroush Abbasi Koohpayegani et.al.	2312.02548	link
2023-12-05	SAVE: Protagonist Diversification with Structure Agnostic Video Editing	Yeji Song et.al.	2312.02503	null
2023-12-04	Peer attention enhances student learning	Songlin Xu et.al.	2312.02358	link
2023-12-05	VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence	Yuchao Gu et.al.	2312.02087	null
2023-12-04	SRTransGAN: Image Super-Resolution using Transformer based Generative Adversarial Network	Neeraj Baghel et.al.	2312.01999	null
2023-12-05	Multi-task Image Restoration Guided By Robust DINO Features	Xin Lin et.al.	2312.01677	null
2023-12-04	Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training	Runze He et.al.	2312.01663	null
2023-12-03	T3D: Towards 3D Medical Image Understanding through Vision-Language Pre-training	Che Liu et.al.	2312.01529	null
2023-12-03	Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain Adaptation for Medical Image Enhancement	Heng Li et.al.	2312.01338	link
2023-12-03	An Augmented Lagrangian Primal-Dual Semismooth Newton Method for Multi-Block Composite Optimization	Zhanwang Deng et.al.	2312.01273	null
2023-12-02	Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation	Zhipeng Du et.al.	2312.01220	link
2023-12-02	Taming Latent Diffusion Models to See in the Dark	Qiang Wen et.al.	2312.01027	null
2023-12-01	Zero-Shot Video Question Answering with Procedural Programs	Rohan Choudhury et.al.	2312.00937	null
2023-12-01	Adversarial Score Distillation: When score distillation meets GAN	Min Wei et.al.	2312.00739	link
2023-11-30	Advancements and Trends in Ultra-High-Resolution Image Processing: An Overview	Zhuoran Zheng et.al.	2312.00250	null
2023-11-30	VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models	Zhen Xing et.al.	2311.18837	null
2023-11-30	MotionEditor: Editing Video Motion via Content-Aware Diffusion	Shuyuan Tu et.al.	2311.18830	link
2023-11-30	Motion-Conditioned Image Animation for Video Editing	Wilson Yan et.al.	2311.18827	null
2023-11-30	Is Underwater Image Enhancement All Object Detectors Need?	Yudong Wang et.al.	2311.18814	link
2023-11-30	Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing	Hyelin Nam et.al.	2311.18608	null
2023-11-30	ZeST-NeRF: Using temporal aggregation for Zero-Shot Temporal NeRFs	Violeta Menéndez González et.al.	2311.18491	null
2023-11-30	Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis	Zipeng Qi et.al.	2311.18435	null
2023-11-30	On Exact Inversion of DPM-Solvers	Seongmin Hong et.al.	2311.18387	null
2023-11-30	A Novel Variational Approach for Multiphoton Microscopy Image Restoration: from PSF Estimation to 3D Deconvolution	Julien Ajdenbaum et.al.	2311.18386	null
2023-11-30	HiPA: Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation	Yifan Zhang et.al.	2311.18158	null
2023-11-29	Variational Bayes image restoration with compressive autoencoders	Maud Biquard et.al.	2311.17744	null
2023-11-29	Improving Stability during Upsampling – on the Importance of Spatial Context	Shashank Agnihotri et.al.	2311.17524	null
2023-11-29	VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model	Haoyu Zhao et.al.	2311.17338	link
2023-11-28	Optimisation-Based Multi-Modal Semantic Image Editing	Bowen Li et.al.	2311.16882	null
2023-11-28	Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration	Chen Zhao et.al.	2311.16845	link
2023-11-28	Decomposer: Semi-supervised Learning of Image Restoration and Image Decomposition	Boris Meinardus et.al.	2311.16829	null
2023-11-28	LEDITS++: Limitless Image Editing using Text-to-Image Models	Manuel Brack et.al.	2311.16711	null
2023-11-28	Full-resolution MLPs Empower Medical Dense Prediction	Mingyuan Meng et.al.	2311.16707	link
2023-11-28	MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation	Sitong Su et.al.	2311.16635	null
2023-11-27	LLMGA: Multimodal Large Language Model based Generation Assistant	Bin Xia et.al.	2311.16500	link
2023-11-28	Text-Driven Image Editing via Learnable Regions	Yuanze Lin et.al.	2311.16432	link
2023-11-27	Joint Deep Image Restoration and Unsupervised Quality Assessment	Hakan Emre Gedik et.al.	2311.16372	null
2023-11-27	Self-correcting LLM-controlled Diffusion Models	Tsung-Han Wu et.al.	2311.16090	link
2023-11-27	Real Time GAZED: Online Shot Selection and Editing of Virtual Cameras from Wide-Angle Monocular Video Recordings	Sudheer Achary et.al.	2311.15581	null
2023-11-26	FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration	Zihao Zou et.al.	2311.15445	null
2023-11-26	Sketch Video Synthesis	Yudian Zheng et.al.	2311.15306	link
2023-11-24	Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion	Minshan Xie et.al.	2311.14343	null
2023-11-23	A New Benchmark and Model for Challenging Image Manipulation Detection	Zhenfei Zhang et.al.	2311.14218	link
2023-11-23	Posterior Distillation Sampling	Juil Koo et.al.	2311.13831	null
2023-11-22	Retargeting Visual Data with Deformation Fields	Tim Elsner et.al.	2311.13297	null
2023-11-20	PanBench: Towards High-Resolution and High-Performance Pansharpening	Shiying Wang et.al.	2311.12083	null
2023-11-19	EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models	Ruoxi Chen et.al.	2311.12066	null
2023-11-20	Cut-and-Paste: Subject-Driven Video Editing with Attention Control	Zhichao Zuo et.al.	2311.11697	null
2023-11-20	Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement	Yanyan Wei et.al.	2311.11695	null
2023-11-20	Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model	Chunming He et.al.	2311.11638	link
2023-11-20	Deep Equilibrium Diffusion Restoration with Parallel Sampling	Jiezhang Cao et.al.	2311.11600	link
2023-11-19	On the Noise Scheduling for Generating Plausible Designs with Diffusion Models	Jiajie Fan et.al.	2311.11207	null
2023-11-17	Astronomical Images Quality Assessment with Automated Machine Learning	Olivier Parisot et.al.	2311.10617	null
2023-11-16	K-space Cold Diffusion: Learning to Reconstruct Accelerated MRI without Noise	Guoyao Shen et.al.	2311.10162	link
2023-11-16	Emu Edit: Precise Image Editing via Recognition and Generation Tasks	Shelly Sheynin et.al.	2311.10089	null
2023-11-15	FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier	Zhongjie Duan et.al.	2311.09265	link
2023-11-14	The Perception-Robustness Tradeoff in Deterministic Image Restoration	Guy Ohayon et.al.	2311.09253	null
2023-11-15	Progressive Feedback-Enhanced Transformer for Image Forgery Localization	Haochen Zhu et.al.	2311.08910	link
2023-11-14	Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation	Zhihang Zhong et.al.	2311.08007	link
2023-11-09	Dynamic Association Learning of Self-Attention and Convolution in Image Restoration	Kui Jiang et.al.	2311.05147	null
2023-11-08	LuminanceL1Loss: A loss function which measures percieved brightness and colour differences	Dominic De Jonge et.al.	2311.04614	null
2023-11-11	Learning the What and How of Annotation in Video Object Segmentation	Thanos Delatolas et.al.	2311.04414	null
2023-11-07	Energy-based Calibrated VAE with Test Time Free Lunch	Yihong Luo et.al.	2311.04071	link
2023-11-07	CLIP Guided Image-perceptive Prompt Learning for Image Enhancement	Zinuo Li et.al.	2311.03943	null
2023-11-07	Constrained Regularization by Denoising with Automatic Parameter Selection	Pasquale Cascarano et.al.	2311.03819	null
2023-11-06	Pelvic floor MRI segmentation based on semi-supervised deep learning	Jianwei Zuo et.al.	2311.03105	null
2023-11-06	A New Extrapolation Economy Cascadic Multigrid Method for Image Restoration Problems	Zhaoteng Chu et.al.	2311.03010	null
2023-11-06	Zero-Shot Enhancement of Low-Light Image Based on Retinex Decomposition	Wenchao Li et.al.	2311.02995	link
2023-11-08	Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things	Li Ping Qian et.al.	2311.02926	link
2023-11-03	Cascadic Tensor Multigrid Method and Economic Cascadic Tensor Multigrid Method for Image Restoration Problems	Ziqi Yan et.al.	2311.01924	null
2023-11-02	The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing	Shen Nie et.al.	2311.01410	null
2023-11-02	Convergent plug-and-play with proximal denoiser and unconstrained regularization parameter	Samuel Hurault et.al.	2311.01216	null
2023-11-03	On Manipulating Scene Text in the Wild with Diffusion Models	Joshua Santoso et.al.	2311.00734	link
2023-11-01	fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding	Xuelin Qian et.al.	2311.00342	null
2023-11-01	RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method	Wangzhen Peng et.al.	2311.00246	link
2023-11-01	Consistent Video-to-Video Transfer Using Synthetic Dataset	Jiaxin Cheng et.al.	2311.00213	link
2023-10-31	Image Restoration with Point Spread Function Regularization and Active Learning	Peng Jia et.al.	2311.00186	null
2023-10-31	Navigating the Complex Landscape of Shock Filter Cahn-Hilliard Equation: From Regularized to Young Measure Solutions	Darko Mitrovic et.al.	2310.20383	null
2023-10-31	Low-Dose CT Image Enhancement Using Deep Learning	A. Demir et.al.	2310.20265	null
2023-10-31	UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer	Xuhang Chen et.al.	2310.20210	link
2023-10-30	IterInv: Iterative Inversion for Pixel-Level T2I Models	Chuanming Tang et.al.	2310.19540	link
2023-10-29	Learning to Follow Object-Centric Image Editing Instructions Faithfully	Tuhin Chakrabarty et.al.	2310.19145	link
2023-10-28	PrObeD: Proactive Object Detection Wrapper	Vishal Asnani et.al.	2310.18788	null
2023-10-27	Always Clear Days: Degradation Type and Severity Aware All-In-One Adverse Weather Removal	Yu-Wei Chen et.al.	2310.18293	link
2023-10-27	DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF	Chaowei Liu et.al.	2310.17910	null
2023-10-27	Global Structure-Aware Diffusion Process for Low-Light Image Enhancement	Jinhui Hou et.al.	2310.17577	link
2023-10-26	AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors	You-Ming Chang et.al.	2310.17419	link
2023-10-25	Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models	Tianyi Lu et.al.	2310.16400	link
2023-10-24	From Posterior Sampling to Meaningful Diversity in Image Restoration	Noa Cohen et.al.	2310.16047	null
2023-10-24	CVPR 2023 Text Guided Video Editing Competition	Jay Zhangjie Wu et.al.	2310.16003	link
2023-10-26	Integrating View Conditions for Image Synthesis	Jinbin Bai et.al.	2310.16002	link
2023-10-19	Neural Degradation Representation Learning for All-In-One Image Restoration	Mingde Yao et.al.	2310.12848	link
2023-10-18	Object-aware Inversion and Reassembly for Image Editing	Zhen Yang et.al.	2310.12149	link
2023-10-18	A Comparative Study of Image Restoration Networks for General Backbone Network Design	Xiangyu Chen et.al.	2310.11881	link
2023-10-16	LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation	Ruiqi Wu et.al.	2310.10769	link
2023-10-21	BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys	Yu Gu et.al.	2310.10765	null
2023-10-16	A Survey on Video Diffusion Models	Zhen Xing et.al.	2310.10647	link
2023-10-16	Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models	Kevin Black et.al.	2310.10639	link
2023-10-16	DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing	Jia-Wei Liu et.al.	2310.10624	null
2023-10-16	Unifying Image Processing as Visual Prompting Question Answering	Yihao Liu et.al.	2310.10513	null
2023-10-17	AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion	Yitong Jiang et.al.	2310.10123	null
2023-10-15	ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context	Binglun Wang et.al.	2310.09965	null
2023-10-15	LOVECon: Text-driven Training-Free Long Video Editing with ControlNet	Zhenyi Liao et.al.	2310.09711	link
2023-10-14	Dimma: Semi-supervised Low Light Image Enhancement with Adaptive Dimming	Wojciech Kozłowski et.al.	2310.09633	link
2023-10-13	Image Cropping under Design Constraints	Takumi Nishiyasu et.al.	2310.08892	null
2023-10-12	DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing	Yueming Lyu et.al.	2310.08785	link
2023-10-12	Frequency-Aware Re-Parameterization for Over-Fitting Based Image Compression	Yun Ye et.al.	2310.08068	null
2023-10-11	Uncovering Hidden Connections: Iterative Tracking and Reasoning for Video-grounded Dialog	Haoyu Zhang et.al.	2310.07259	link
2023-10-10	Tweedie Moment Projected Diffusions For Inverse Problems	Benjamin Boys et.al.	2310.06721	null
2023-10-10	Improving Compositional Text-to-image Generation with Large Vision-Language Models	Song Wen et.al.	2310.06311	null
2023-10-10	Three-Dimensional Medical Image Fusion with Deformable Cross-Attention	Lin Liu et.al.	2310.06291	null
2023-10-09	FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing	Yuren Cong et.al.	2310.05922	null
2023-10-09	Dipole-Spread Function Engineering for 6D Super-Resolution Microscopy	Tingting Wu et.al.	2310.05810	null
2023-10-08	ITRE: Low-light Image Enhancement Based on Illumination Transmission Ratio Estimation	Yu Wang et.al.	2310.05158	null
2023-10-07	Combining UPerNet and ConvNeXt for Contrails Identification to reduce Global Warming	Zhenkuan Wang et.al.	2310.04808	link
2023-10-06	Towards A Robust Group-level Emotion Recognition via Uncertainty-Aware Learning	Qing Zhu et.al.	2310.04306	null
2023-10-06	Degradation-Aware Self-Attention Based Transformer for Blind Image Super-Resolution	Qingguo Liu et.al.	2310.04180	link
2023-10-04	ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF	Jangho Park et.al.	2310.02712	null
2023-10-04	Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis	Han Zhang et.al.	2310.02641	null
2023-10-03	EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods	Samyadeep Basu et.al.	2310.02426	null
2023-10-03	Leveraging Classic Deconvolution and Feature Extraction in Zero-Shot Image Restoration	Tomáš Chobola et.al.	2310.02097	link
2023-10-02	ImagenHub: Standardizing the evaluation of conditional image generation models	Max Ku et.al.	2310.01596	link
2023-10-02	Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code	Xuan Ju et.al.	2310.01506	link
2023-10-02	Conditional Diffusion Distillation	Kangfu Mei et.al.	2310.01407	link
2023-10-02	Sequential Data Generation with Groupwise Diffusion Process	Sangyun Lee et.al.	2310.01400	null
2023-10-02	A Restoration Network as an Implicit Prior	Yuyang Hu et.al.	2310.01391	null
2023-10-02	Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models	Hyeonho Jeong et.al.	2310.01107	link
2023-10-02	Controlling Vision-Language Models for Universal Image Restoration	Ziwei Luo et.al.	2310.01018	link
2023-10-02	JPEG Information Regularized Deep Image Prior for Denoising	Tsukasa Takagi et.al.	2310.00894	null
2023-10-01	Exchange means change: an unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange	Hongruixuan Chen et.al.	2310.00689	link
2023-09-29	Guiding Instruction-based Image Editing via Multimodal Large Language Models	Tsu-Jui Fu et.al.	2309.17102	link
2023-09-29	Denoising Diffusion Bridge Models	Linqi Zhou et.al.	2309.16948	link
2023-09-28	KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing	Jiancheng Huang et.al.	2309.16608	null
2023-09-28	CCEdit: Creative and Controllable Video Editing via Diffusion Models	Ruoyu Feng et.al.	2309.16496	null
2023-09-28	Joint Correcting and Refinement for Balanced Low-Light Image Enhancement	Nana Yu et.al.	2309.16128	link
2023-09-27	Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness	Valentin Barriere et.al.	2309.15991	null
2023-09-27	Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing	Kai Wang et.al.	2309.15664	link
2023-09-27	Guided Frequency Loss for Image Restoration	Bilel Benjdiraa et.al.	2309.15563	null
2023-09-27	Uncertainty Quantification via Neural Posterior Principal Components	Elias Nehme et.al.	2309.15533	null
2023-09-27	VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning	Yanan Wang et.al.	2309.15494	null
2023-09-27	Survey on Deep Face Restoration: From Non-blind to Blind and Beyond	Wenjie Li et.al.	2309.15490	link
2023-09-26	FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing	Songyan Chen et.al.	2309.14934	null
2023-09-26	Image Denoising via Style Disentanglement	Jingwei Niu et.al.	2309.14755	null
2023-09-26	Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement	Jiancheng Huang et.al.	2309.14709	null
2023-09-25	Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments	Najmeh Mohammadbagheri et.al.	2309.14267	null
2023-09-25	Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time	Cheng-Hung Chan et.al.	2309.14022	null
2023-09-25	Diverse Semantic Image Editing with Style Codes	Hakan Sivuk et.al.	2309.13975	link
2023-09-25	In-Domain GAN Inversion for Faithful Reconstruction and Editability	Jiapeng Zhu et.al.	2309.13956	null
2023-09-25	Adversarial Attacks on Video Object Segmentation with Hard Region Discovery	Ping Li et.al.	2309.13857	null
2023-09-21	License Plate Super-Resolution Using Diffusion Models	Sawsan AlHalawani et.al.	2309.12506	null
2023-09-21	Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction	Yu Tian et.al.	2309.11811	link
2023-09-21	PIE: Simulating Disease Progression via Progressive Image Editing	Kaizhao Liang et.al.	2309.11745	link
2023-09-21	Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal	Xiao Feng Zhang et.al.	2309.11715	null
2023-09-19	Local Lipschitz continuity for energy integrals with slow growth and lower order terms	Michela Eleuteri et.al.	2309.10727	null
2023-09-19	Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image Denoising	Yujin Wang et.al.	2309.10714	null
2023-09-19	Forgedit: Text Guided Image Editing via Learning and Forgetting	Shiwen Zhang et.al.	2309.10556	link
2023-09-16	AOSR-Net: All-in-One Sandstorm Removal Network	Yazhong Si et.al.	2309.08838	null
2023-09-16	Dual-Camera Joint Deblurring-Denoising	Shayan Shekarforoush et.al.	2309.08826	null
2023-09-15	Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation Surveillance	Jingxiang Qu et.al.	2309.08382	link
2023-09-14	A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing	Yujie Feng et.al.	2309.07524	null
2023-09-13	FAIR: Frequency-aware Image Restoration for Industrial Visual Anomaly Detection	Tongkun Liu et.al.	2309.07068	link
2023-09-13	DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision	Xiangchen Yin et.al.	2309.06941	null
2023-09-13	Improving Deep Learning-based Defect Detection on Window Frames with Image Processing Strategies	Jorge Vasquez et.al.	2309.06731	null
2023-09-12	Can we predict the Most Replayed data of video streaming platforms?	Alessandro Duico et.al.	2309.06102	link
2023-09-12	Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration	Gang Wu et.al.	2309.06023	link
2023-09-12	Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos	Sarthak Bhagat et.al.	2309.05943	null
2023-09-11	PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud	Chengyu Wang et.al.	2309.05534	null
2023-09-11	HAT: Hybrid Attention Transformer for Image Restoration	Xiangyu Chen et.al.	2309.05239	link
2023-09-10	Effective Real Image Editing with Accelerated Iterative Diffusion Inversion	Zhihong Pan et.al.	2309.04907	null
2023-09-09	UnitModule: A Lightweight Joint Image Enhancement Module for Underwater Object Detection	Zhuoyan Liu et.al.	2309.04708	null
2023-09-08	MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers	Sijia Li et.al.	2309.04372	null
2023-09-08	Toward Sufficient Spatial-Frequency Interaction for Gradient-aware Underwater Image Enhancement	Chen Zhao et.al.	2309.04089	link
2023-09-07	Feature Enhancer Segmentation Network (FES-Net) for Vessel Segmentation	Tariq M. Khan et.al.	2309.03535	null
2023-09-07	Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip Strategy	Yi Tang et.al.	2309.03445	link
2023-09-06	SLiMe: Segment Like Me	Aliasghar Khani et.al.	2309.03179	link
2023-09-06	Prompt-based All-in-One Image Restoration using CNNs and Transformer	Hu Gao et.al.	2309.03063	link
2023-09-05	Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning	Lili Yu et.al.	2309.02591	null
2023-09-05	SAM-Deblur: Let Segment Anything Boost Image Deblurring	Siwei Li et.al.	2309.02270	link
2023-09-05	Advanced Underwater Image Restoration in Complex Illumination Conditions	Yifan Song et.al.	2309.02217	null
2023-09-05	Empowering Low-Light Image Enhancer through Customized Learnable Priors	Naishan Zheng et.al.	2309.01958	link
2023-09-07	Implicit Neural Image Stitching With Enhanced and Blended Feature Reconstruction	Minsu Kim et.al.	2309.01409	link
2023-09-04	Memory augment is All You Need for image restoration	Xiao Feng Zhang et.al.	2309.01377	link
2023-09-04	Restoration Guarantee of Image Inpainting via Low Rank Patch Matrix Completion	Jian-Feng Cai et.al.	2309.01328	null
2023-09-03	Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction	Xiaoke Shang et.al.	2309.01183	null
2023-09-03	Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception	Zengxi Zhang et.al.	2309.01102	null
2023-09-02	MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation	Hanshu Yan et.al.	2309.00908	null
2023-09-02	A Generic Fundus Image Enhancement Network Boosted by Frequency Self-supervised Representation Learning	Heng Li et.al.	2309.00885	link
2023-09-01	Iterative Multi-granular Image Editing using Diffusion Models	K J Joseph et.al.	2309.00613	null
2023-08-31	Robust GAN inversion	Egor Sevriugov et.al.	2308.16510	null
2023-08-30	Feature Attention Network (FA-Net): A Deep-Learning Based Approach for Underwater Single Image Enhancement	Muhammad Hamza et.al.	2308.15868	null
2023-08-30	Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models	Zhanbo Feng et.al.	2308.15854	link
2023-08-31	Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement	Basit Alawode et.al.	2308.15816	link
2023-08-29	IndGIC: Supervised Action Recognition under Low Illumination	Jingbo Zeng et.al.	2308.15345	null
2023-08-29	DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior	Xinqi Lin et.al.	2308.15070	link
2023-08-28	Copy-Paste Image Augmentation with Poisson Image Editing for Ultrasound Instance Segmentation Learning	Wei-Hsiang Shen et.al.	2308.14772	null
2023-08-28	MagicEdit: High-Fidelity and Temporally Coherent Video Editing	Jun Hao Liew et.al.	2308.14749	null
2023-08-28	1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation	Tao Zhang et.al.	2308.14392	link
2023-08-28	MetaWeather: Few-Shot Weather-Degraded Image Restoration via Degradation Pattern Matching	Youngrae Kim et.al.	2308.14334	link
2023-08-27	Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection	Xin Feng et.al.	2308.14061	link
2023-08-26	Generalized Lightness Adaptation with Channel Selective Normalization	Mingde Yao et.al.	2308.13783	link
2023-08-25	Residual Denoising Diffusion Models	Jiawei Liu et.al.	2308.13712	link
2023-08-25	Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions	Yibo Wang et.al.	2308.13178	null
2023-08-25	Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model	Xunpeng Yi et.al.	2308.13164	null
2023-08-26	CDAN: Convolutional Dense Attention-guided Network for Low-light Image Enhancement	Hossein Shakibania et.al.	2308.12902	link
2023-08-24	MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices	Xiangyu Chen et.al.	2308.12494	link
2023-08-23	Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement	Dehuan Zhang et.al.	2308.11932	link
2023-08-21	EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints	Yutao Chen et.al.	2308.10648	null
2023-08-24	Patternshop: Editing Point Patterns by Image Manipulation	Xingchang Huang et.al.	2308.10517	null
2023-08-20	Blind Face Restoration for Under-Display Camera via Dictionary Guided Transformer	Jingfan Tan et.al.	2308.10196	null
2023-08-22	WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning	Dongjian Huo et.al.	2308.10195	null
2023-08-19	ASPIRE: Language-Guided Augmentation for Robust Image Classification	Sreyan Ghosh et.al.	2308.10103	link
2023-08-19	Semantic-Human: Neural Rendering of Humans from Monocular Video with Human Parsing	Jie Zhang et.al.	2308.09894	null
2023-08-18	Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis	Jonathon Luiten et.al.	2308.09713	link
2023-08-18	SimDA: Simple Diffusion Adapter for Efficient Video Generation	Zhen Xing et.al.	2308.09710	null
2023-08-18	StableVideo: Text-driven Consistency-aware Diffusion Video Editing	Wenhao Chai et.al.	2308.09592	link
2023-08-18	Diffusion Models for Image Restoration and Enhancement – A Comprehensive Survey	Xin Li et.al.	2308.09388	link
2023-08-18	DiffLLE: Diffusion-guided Domain Calibration for Unsupervised Low-light Image Enhancement	Shuzhou Yang et.al.	2308.09279	null
2023-08-17	Edit Temporal-Consistent Videos with Image Diffusion Model	Yuanzhi Wang et.al.	2308.09091	link
2023-08-17	Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration	Liyan Wang et.al.	2308.08730	link
2023-08-16	Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network	Yinglong Wang et.al.	2308.08220	null
2023-08-21	Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement	Jianyu Wen et.al.	2308.08197	link
2023-08-15	Geometry of the Visual Cortex with Applications to Image Inpainting and Enhancement	Francesco Ballerin et.al.	2308.07652	link
2023-08-14	Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation	Alexander Martin et.al.	2308.07316	link
2023-08-13	FastLLVE: Real-Time Low-Light Video Enhancement with Intensity-Aware Lookup Table	Wenhao Li et.al.	2308.06749	link
2023-08-12	Tiny and Efficient Model for the Edge Detection Generalization	Xavier Soria et.al.	2308.06468	link
2023-08-11	DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models	Weijia Wu et.al.	2308.06160	link
2023-08-10	Is there progress in activity progress prediction?	Frans de Boer et.al.	2308.05533	link
2023-08-10	A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement	Pan Mu et.al.	2308.05447	link
2023-08-10	TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and Localization	Soumyaroop Nandi et.al.	2308.05264	null
2023-08-09	Transmission and Color-guided Network for Underwater Image Enhancement	Pan Mu et.al.	2308.04892	null
2023-08-09	A Forensic Methodology for Detecting Image Manipulations	Jiwon Lee et.al.	2308.04723	link
2023-08-08	Under-Display Camera Image Restoration with Scattering Effect	Binbin Song et.al.	2308.04163	link
2023-08-06	Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction	Xiaohong Fan et.al.	2308.03807	link
2023-08-06	PNN: From proximal algorithms to robust unfolded image denoising networks and Plug-and-Play methods	Hoang Trieu Vy Le et.al.	2308.03139	null
2023-08-06	NNVISR: Bring Neural Network Video Interpolation and Super Resolution into Video Processing Framework	Yuan Tong et.al.	2308.03121	link
2023-08-06	FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information	Chenxi Wang et.al.	2308.03033	link
2023-08-06	Brighten-and-Colorize: A Decoupled Network for Customized Low-Light Image Enhancement	Chenxi Wang et.al.	2308.03029	null
2023-08-06	All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation	Cheng Zhang et.al.	2308.03021	null
2023-08-06	Recurrent Spike-based Image Restoration under General Illumination	Lin Zhu et.al.	2308.03018	link
2023-08-05	Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image Enhancement	Huake Wang et.al.	2308.02776	null
2023-08-04	CTP-Net: Character Texture Perception Network for Document Image Forgery Localization	Xin Liao et.al.	2308.02158	null
2023-08-03	A Multidimensional Analysis of Social Biases in Vision Transformers	Jannik Brinkmann et.al.	2308.01948	null
2023-08-02	WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond	Zengxi Zhang et.al.	2308.00931	null
2023-08-02	ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation	Yasheng Sun et.al.	2308.00906	null
2023-08-01	Decomposition Ascribed Synergistic Learning for Unified Image Restoration	Jinghao Zhang et.al.	2308.00759	null
2023-08-01	Context-Aware Talking-Head Video Editing	Songlin Yang et.al.	2308.00462	null
2023-08-01	Space Debris: Are Deep Learning-based Image Enhancements part of the Solution?	Michele Jamrozik et.al.	2308.00408	null
2023-07-28	Benchmarking Anomaly Detection System on various Jetson Edge Devices	Hoang Viet Pham et.al.	2307.16834	link
2023-07-31	From Generation to Suppression: Towards Effective Irregular Glow Removal for Nighttime Visibility Enhancement	Wanyu Wu et.al.	2307.16783	null
2023-07-30	RealityCanvas: Augmented Reality Sketching for Embedded and Responsive Scribble Animation Effects	Zhijie Xia et.al.	2307.16116	link
2023-07-27	Fast Dust Sand Image Enhancement Based on Color Correction and New Membership Function	Ali Hakem Alsaeedi et.al.	2307.15230	null
2023-07-27	The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation	Lingdong Kong et.al.	2307.15061	link
2023-07-27	Meta-Processing: A robust framework for multi-tasks seismic processing	Shijun Cheng et.al.	2307.14851	link
2023-07-27	Semantic Image Completion and Enhancement using GANs	Priyansh Saxena et.al.	2307.14748	null
2023-07-27	LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement	Tao Wang et.al.	2307.14659	link
2023-07-26	SuperInpaint: Learning Detail-Enhanced Attentional Implicit Representation for Super-resolutional Image Inpainting	Canyu Zhang et.al.	2307.14489	null
2023-08-01	Phenotype-preserving metric design for high-content image reconstruction by generative inpainting	Vaibhav Sharma et.al.	2307.14436	link
2023-07-26	Visual Instruction Inversion: Image Editing via Visual Prompting	Thao Nguyen et.al.	2307.14331	link
2023-07-25	On the unreasonable vulnerability of transformers for image restoration – and an easy fix	Shashank Agnihotri et.al.	2307.13856	null
2023-07-24	Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry	Yong-Hyun Park et.al.	2307.12868	link
2023-07-24	A Theoretically Guaranteed Quaternion Weighted Schatten p-norm Minimization Method for Color Image Restoration	Qing-Hua Zhang et.al.	2307.12656	link
2023-07-25	TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition	Shilin Lu et.al.	2307.12493	link
2023-07-22	Real-Time Neural Video Recovery and Enhancement on Mobile Devices	Zhaoyuan He et.al.	2307.12152	null
2023-07-21	Physics-Aware Semi-Supervised Underwater Image Enhancement	Hao Qi et.al.	2307.11470	null
2023-07-20	OBJECT 3DIT: Language-guided 3D-aware Image Editing	Oscar Michel et.al.	2307.11073	null
2023-07-20	Lighting up NeRF via Unsupervised Decomposition and Enhancement	Haoyuan Wang et.al.	2307.10664	link
2023-07-20	Physics-Driven Turbulence Image Restoration with Stochastic Refinement	Ajay Jaiswal et.al.	2307.10603	link
2023-07-23	TokenFlow: Consistent Diffusion Features for Consistent Video Editing	Michal Geyer et.al.	2307.10373	null
2023-07-19	Text2Layer: Layered Image Generation using Latent Diffusion Model	Xinyang Zhang et.al.	2307.09781	null
2023-07-19	NTIRE 2023 Quality Assessment of Video Enhancement Challenge	Xiaohong Liu et.al.	2307.09729	null
2023-07-18	Division Gets Better: Learning Brightness-Aware and Detail-Sensitive Representations for Low-Light Image Enhancement	Huake Wang et.al.	2307.09104	null
2023-07-18	Unleashing the Imagination of Text: A Novel Framework for Text-to-image Person Retrieval via Exploring the Power of Words	Delong Liu et.al.	2307.09059	link
2023-07-18	Soft-IntroVAE for Continuous Latent space Image Super-Resolution	Zhi-Song Liu et.al.	2307.09008	null
2023-07-18	Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond	Yang Zhao et.al.	2307.08996	null
2023-07-18	Revisiting Latent Space of GAN Inversion for Real Image Editing	Kai Katsumata et.al.	2307.08995	null
2023-07-18	CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing	Ahmet Canberk Baykal et.al.	2307.08397	null
2023-07-16	LUCYD: A Feature-Driven Richardson-Lucy Deconvolution Network	Tomáš Chobola et.al.	2307.07998	link
2023-07-15	Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?	Jialu Gao et.al.	2307.07837	null
2023-07-15	HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance	Chunming He et.al.	2307.07829	null
2023-07-15	ExposureDiffusion: Learning to Expose for Low-light Image Enhancement	Yufei Wang et.al.	2307.07710	link
2023-07-15	DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration	Yuanshuo Cheng et.al.	2307.07688	null
2023-07-15	INVE: Interactive Neural Video Editing	Jiahui Huang et.al.	2307.07663	null
2023-07-08	Face Image Quality Enhancement Study for Face Recognition	Iqbal Nouyed et.al.	2307.05534	null
2023-07-11	Bio-Inspired Night Image Enhancement Based on Contrast Enhancement and Denoising	Xinyi Bai et.al.	2307.05447	null
2023-07-10	FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing	Pengyang Ling et.al.	2307.04684	link
2023-07-11	DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer	Dan Ruta et.al.	2307.04157	null
2023-07-12	Latent Graph Attention for Enhanced Spatial Context	Ayush Singh et.al.	2307.04149	null
2023-07-09	Enhancing Low-Light Images Using Infrared-Encoded Images	Shulin Tian et.al.	2307.04122	null
2023-07-07	Joint Perceptual Learning for Enhancement and Object Detection in Underwater Scenarios	Chenping Fu et.al.	2307.03536	null
2023-07-06	UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering	Triet M. Thai et.al.	2307.02783	null
2023-07-05	LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion	Long Bai et.al.	2307.02452	link
2023-07-05	DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models	Chong Mou et.al.	2307.02421	link
2023-07-05	Generative Adversarial Networks for Dental Patient Identity Protection in Orthodontic Educational Imaging	Mingchuan Tian et.al.	2307.02019	null
2023-07-04	Augment Features Beyond Color for Domain Generalized Segmentation	Qiyu Sun et.al.	2307.01703	null
2023-07-02	LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance	Linoy Tsaban et.al.	2307.00522	null
2023-06-29	FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude	Feng Liu et.al.	2306.17206	null
2023-06-28	PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing	Wenjing Huang et.al.	2306.16894	link
2023-06-29	Low-Light Enhancement in the Frequency Domain	Hao Chen et.al.	2306.16782	null
2023-06-27	Cutting-Edge Techniques for Depth Map Super-Resolution	Ryan Peterson et.al.	2306.15244	null
2023-06-27	DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing	Yujun Shi et.al.	2306.14435	link
2023-07-01	Faster Segment Anything: Towards Lightweight SAM for Mobile Applications	Chaoning Zhang et.al.	2306.14289	link
2023-06-25	Diffusion Model Based Low-Light Image Enhancement for Space Satellite	Yiman Zhu et.al.	2306.14227	null
2023-06-25	A Gated Cross-domain Collaborative Network for Underwater Object Detection	Linhui Dai et.al.	2306.14141	link
2023-06-23	ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration	Jiaqi Ma et.al.	2306.13653	link
2023-06-23	Augmenting Sports Videos with VisCommentator	Zhutian Chen et.al.	2306.13491	null
2023-06-22	PromptIR: Prompting for All-in-One Blind Image Restoration	Vaishnav Potlapalli et.al.	2306.13090	link
2023-06-22	Continuous Layout Editing of Single Images with Diffusion Models	Zhiyuan Zhang et.al.	2306.13078	null
2023-06-22	Restoration of the JPEG Maximum Lossy Compressed Face Images with Hourglass Block based on Early Stopping Discriminator	Jongwook Si et.al.	2306.12757	null