2025-06-26 |
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture |
Kehan Sui et.al. |
2506.21478 |
null |
2025-06-26 |
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation |
Bowen Chen et.al. |
2506.21416 |
null |
2025-06-26 |
GenFlow: Interactive Modular System for Image Generation |
Duc-Hung Nguyen et.al. |
2506.21369 |
null |
2025-06-26 |
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models |
Hongbo Liu et.al. |
2506.21356 |
null |
2025-06-26 |
HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation |
Diego Biagini et.al. |
2506.21287 |
null |
2025-06-26 |
Video Virtual Try-on with Conditional Diffusion Transformer Inpainter |
Cheng Zou et.al. |
2506.21270 |
null |
2025-06-26 |
BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models |
Louis Kerner et.al. |
2506.21209 |
null |
2025-06-26 |
Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation |
Ze Wang et.al. |
2506.21022 |
null |
2025-06-26 |
HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation |
Qingyue Jiao et.al. |
2506.21015 |
null |
2025-06-26 |
Rethink Sparse Signals for Pose-guided Text-to-image Generation |
Wenjie Xuan et.al. |
2506.20983 |
null |
2025-06-25 |
Video Perception Models for 3D Scene Synthesis |
Rui Huang et.al. |
2506.20601 |
null |
2025-06-25 |
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling |
Tobias Vontobel et.al. |
2506.20452 |
null |
2025-06-25 |
Med-Art: Diffusion Transformer for 2D Medical Text-to-Image Generation |
Changlu Guo et.al. |
2506.20449 |
null |
2025-06-25 |
EAR: Erasing Concepts from Unified Autoregressive Models |
Haipeng Fan et.al. |
2506.20151 |
null |
2025-06-25 |
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos |
Jiahao Lin et.al. |
2506.20103 |
null |
2025-06-24 |
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation |
Xingyang Li et.al. |
2506.19852 |
null |
2025-06-24 |
GenHSI: Controllable Generation of Human-Scene Interaction Videos |
Zekun Li et.al. |
2506.19840 |
null |
2025-06-24 |
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution |
Liangbin Xie et.al. |
2506.19838 |
null |
2025-06-24 |
Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router |
Yubo Huang et.al. |
2506.19833 |
null |
2025-06-24 |
Varif.ai to Vary and Verify User-Driven Diversity in Scalable Image Generation |
M. Michelessa et.al. |
2506.19644 |
null |
2025-06-24 |
Stylized Structural Patterns for Improved Neural Network Pre-training |
Farnood Salehi et.al. |
2506.19465 |
null |
2025-06-24 |
Enhancing Galaxy Classification with U-Net Variational Autoencoders for Image Denoising |
Sergey Mirzoyan et.al. |
2506.19434 |
null |
2025-06-24 |
SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation |
Yunsung Chung et.al. |
2506.19360 |
null |
2025-06-24 |
Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation |
Jintao Rong et.al. |
2506.19348 |
null |
2025-06-24 |
Style Transfer: A Decade Survey |
Tianshan Zhang et.al. |
2506.19278 |
null |
2025-06-23 |
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory |
Runjia Li et.al. |
2506.18903 |
null |
2025-06-23 |
From Virtual Games to Real-World Play |
Wenqiang Sun et.al. |
2506.18901 |
null |
2025-06-23 |
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation |
Kaiyi Huang et.al. |
2506.18899 |
null |
2025-06-23 |
MinD: Unified Visual Imagination and Control via Hierarchical World Models |
Xiaowei Chi et.al. |
2506.18897 |
null |
2025-06-23 |
OmniGen2: Exploration to Advanced Multimodal Generation |
Chenyuan Wu et.al. |
2506.18871 |
null |
2025-06-23 |
OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation |
Qijun Gan et.al. |
2506.18866 |
null |
2025-06-23 |
TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting |
Zhongbin Guo et.al. |
2506.18862 |
null |
2025-06-23 |
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset |
Zhuowei Chen et.al. |
2506.18851 |
null |
2025-06-23 |
Matrix-Game: Interactive World Foundation Model |
Yifan Zhang et.al. |
2506.18701 |
null |
2025-06-23 |
RDPO: Real Data Preference Optimization for Physics Consistency Video Generation |
Wenxu Qian et.al. |
2506.18655 |
null |
2025-06-23 |
Emergent Temporal Correspondences from Video Diffusion Transformers |
Jisu Nam et.al. |
2506.17220 |
link |
2025-06-20 |
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens |
Zeyuan Yang et.al. |
2506.17218 |
null |
2025-06-20 |
DreamCube: 3D Panorama Generation via Multi-plane Synchronization |
Yukun Huang et.al. |
2506.17206 |
null |
2025-06-20 |
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition |
Jiaqi Li et.al. |
2506.17201 |
null |
2025-06-20 |
The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation |
Giulia Bertazzini et.al. |
2506.17016 |
null |
2025-06-20 |
AI’s Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario |
Ciro Beneduce et.al. |
2506.16898 |
null |
2025-06-20 |
Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models |
Semin Kim et.al. |
2506.16853 |
null |
2025-06-20 |
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation |
Fan Yang et.al. |
2506.16806 |
null |
2025-06-20 |
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation |
Riccardo Corvi et.al. |
2506.16802 |
null |
2025-06-20 |
PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model |
Beomseok Ko et.al. |
2506.16776 |
null |
2025-06-18 |
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model |
Anirud Aggarwal et.al. |
2506.15682 |
link |
2025-06-20 |
Sekai: A Video Dataset towards World Exploration |
Zhen Li et.al. |
2506.15675 |
null |
2025-06-20 |
Show-o2: Improved Native Unified Multimodal Models |
Jinheng Xie et.al. |
2506.15564 |
link |
2025-06-18 |
Control and Realism: Best of Both Worlds in Layout-to-Image without Training |
Bonan Li et.al. |
2506.15563 |
null |
2025-06-18 |
GalaxyGenius: A Mock Galaxy Image Generator for Various Telescopes from Hydrodynamical Simulations |
Xingchen Zhou et.al. |
2506.15060 |
null |
2025-06-17 |
Frequency-Calibrated Membership Inference Attacks on Medical Image Diffusion Models |
Xinkai Zhao et.al. |
2506.14919 |
null |
2025-06-17 |
DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization |
Renjith Prasad et.al. |
2506.14903 |
null |
2025-06-17 |
The Quasi-Radial Field-line Tracing (QRaFT): an Adaptive Segmentation of the Open-Flux Solar Corona |
Vadim M. Uritsky et.al. |
2506.14894 |
null |
2025-06-17 |
Cost-Aware Routing for Efficient Text-To-Image Generation |
Qinchan et.al. |
2506.14753 |
null |
2025-06-17 |
Align Your Flow: Scaling Continuous-Time Flow Map Distillation |
Amirmojtaba Sabour et.al. |
2506.14603 |
null |
2025-06-17 |
Risk Estimation of Knee Osteoarthritis Progression via Predictive Multi-task Modelling from Efficient Diffusion Model using X-ray Images |
David Butler et.al. |
2506.14560 |
null |
2025-06-17 |
Causally Steered Diffusion for Automated Video Counterfactual Generation |
Nikos Spyrou et.al. |
2506.14404 |
null |
2025-06-17 |
Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models |
Tian Xia et.al. |
2506.14399 |
null |
2025-06-17 |
CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation |
Jia-Chen Zhang et.al. |
2506.14206 |
null |
2025-06-17 |
DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion |
Makoto Shing et.al. |
2506.14202 |
null |
2025-06-18 |
VideoMAR: Autoregressive Video Generatio with Continuous Tokens |
Hu Yu et.al. |
2506.14168 |
null |
2025-06-16 |
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions |
Zhucun Xue et.al. |
2506.13691 |
null |
2025-06-16 |
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention |
Jeonghoon Park et.al. |
2506.13298 |
null |
2025-06-16 |
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation |
Jiamin Wang et.al. |
2506.13138 |
null |
2025-06-15 |
iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer |
Zhelun Shen et.al. |
2506.12847 |
null |
2025-06-14 |
Retrieval Augmented Comic Image Generation |
Yunhao Shui et.al. |
2506.12517 |
null |
2025-06-14 |
Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity |
Mohsen Jenadeleh et.al. |
2506.12505 |
null |
2025-06-14 |
Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback |
Janet Wang et.al. |
2506.12323 |
null |
2025-06-13 |
Exploring the Effectiveness of Deep Features from Domain-Specific Foundation Models in Retinal Image Synthesis |
Zuzanna Skorniewska et.al. |
2506.11753 |
null |
2025-06-13 |
SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation |
Xu Wang et.al. |
2506.11621 |
null |
2025-06-13 |
A Watermark for Auto-Regressive Image Generation Models |
Yihan Wu et.al. |
2506.11371 |
null |
2025-06-12 |
GenWorld: Towards Detecting AI-generated Real-world Simulation Videos |
Weiliang Chen et.al. |
2506.10975 |
null |
2025-06-13 |
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning |
Yuxuan Luo et.al. |
2506.10963 |
null |
2025-06-12 |
The Role of Generative AI in Facilitating Social Interactions: A Scoping Review |
T. T. J. E. Arets et.al. |
2506.10927 |
null |
2025-06-12 |
M4V: Multi-Modal Mamba for Text-to-Video Generation |
Jiancheng Huang et.al. |
2506.10915 |
null |
2025-06-12 |
GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning |
Xiaoyi Bao et.al. |
2506.10639 |
null |
2025-06-12 |
Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models |
Francisco Caetano et.al. |
2506.10634 |
null |
2025-06-12 |
High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model |
Eshan Ramesh et.al. |
2506.10605 |
null |
2025-06-12 |
Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning |
Chun-Mei Feng et.al. |
2506.10575 |
null |
2025-06-12 |
Unitary Scrambling and Collapse: A Quantum Diffusion Framework for Generative Modeling |
Yihua Li et.al. |
2506.10571 |
link |
2025-06-12 |
DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers |
Lizhen Wang et.al. |
2506.10568 |
null |
2025-06-11 |
PlayerOne: Egocentric World Simulator |
Yuanpeng Tu et.al. |
2506.09995 |
null |
2025-06-11 |
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions |
Zhenzhi Wang et.al. |
2506.09984 |
null |
2025-06-11 |
ReSim: Reliable World Simulation for Autonomous Driving |
Jiazhi Yang et.al. |
2506.09981 |
null |
2025-06-11 |
Canonical Latent Representations in Conditional Diffusion Models |
Yitao Xu et.al. |
2506.09955 |
null |
2025-06-11 |
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations |
Marco Federici et.al. |
2506.09932 |
null |
2025-06-11 |
Only-Style: Stylistic Consistency in Image Generation without Content Leakage |
Tilemachos Aravanis et.al. |
2506.09916 |
link |
2025-06-11 |
ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models |
Qin Zhou et.al. |
2506.09740 |
null |
2025-06-11 |
DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning |
Dongxu Liu et.al. |
2506.09644 |
null |
2025-06-12 |
Consistent Story Generation with Asymmetry Zigzag Sampling |
Mingxiao Li et.al. |
2506.09612 |
link |
2025-06-11 |
Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression |
Dingcheng Zhen et.al. |
2506.09482 |
link |
2025-06-10 |
MagCache: Fast Video Generation with Magnitude-Aware Cache |
Zehong Ma et.al. |
2506.09045 |
link |
2025-06-10 |
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models |
Xuanchi Ren et.al. |
2506.09042 |
link |
2025-06-10 |
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better |
Dianyi Wang et.al. |
2506.09040 |
link |
2025-06-10 |
Diffuse and Disperse: Image Generation with Representation Regularization |
Runqian Wang et.al. |
2506.09027 |
null |
2025-06-11 |
SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping |
Jiajun Li et.al. |
2506.08908 |
link |
2025-06-10 |
CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics |
Shravan Nayak et.al. |
2506.08835 |
null |
2025-06-10 |
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency |
Yifei Su et.al. |
2506.08822 |
null |
2025-06-10 |
HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation |
Ziyao Huang et.al. |
2506.08797 |
null |
2025-06-10 |
Flow Diverse and Efficient: Learning Momentum Flow Matching via Stochastic Velocity Field Sampling |
Zhiyuan Ma et.al. |
2506.08796 |
null |
2025-06-10 |
MAMBO: High-Resolution Generative Approach for Mammography Images |
Milica Škipina et.al. |
2506.08677 |
null |
2025-06-09 |
StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets |
Anh-Quan Cao et.al. |
2506.08013 |
link |
2025-06-09 |
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion |
Xun Huang et.al. |
2506.08009 |
null |
2025-06-09 |
Dreamland: Controllable World Creation with Simulator and Generative Models |
Sicheng Mo et.al. |
2506.08006 |
null |
2025-06-09 |
Audio-Sync Video Generation with Multi-Stream Temporal Control |
Shuchen Weng et.al. |
2506.08003 |
null |
2025-06-09 |
MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation |
Junhao Chen et.al. |
2506.07999 |
null |
2025-06-09 |
Generative Modeling of Weights: Generalization or Memorization? |
Boya Zeng et.al. |
2506.07998 |
link |
2025-06-10 |
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation |
Jingjing Chang et.al. |
2506.07977 |
link |
2025-06-09 |
Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces |
Kevin Rojas et.al. |
2506.07903 |
link |
2025-06-09 |
Video Unlearning via Low-Rank Refusal Vector |
Simone Facchiano et.al. |
2506.07891 |
null |
2025-06-09 |
Diffusion Counterfactual Generation with Semantic Abduction |
Rajat Rasal et.al. |
2506.07883 |
link |
2025-06-06 |
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis |
Jiatao Gu et.al. |
2506.06276 |
null |
2025-06-06 |
GenIR: Generative Visual Feedback for Mental Image Retrieval |
Diji Yang et.al. |
2506.06220 |
null |
2025-06-06 |
Feedback Guidance of Diffusion Models |
Koulischer Felix et.al. |
2506.06085 |
null |
2025-06-06 |
Restereo: Diffusion stereo video generation and restoration |
Xingchang Huang et.al. |
2506.06023 |
null |
2025-06-06 |
Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models |
Chaoyi Zhu et.al. |
2506.06018 |
link |
2025-06-06 |
Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection |
Yu Li et.al. |
2506.05872 |
null |
2025-06-06 |
LLIA – Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models |
Haojie Yu et.al. |
2506.05806 |
null |
2025-06-06 |
Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery |
Sajjad Abdoli et.al. |
2506.05673 |
null |
2025-06-05 |
UniRes: Universal Image Restoration for Complex Degradations |
Mo Zhou et.al. |
2506.05599 |
null |
2025-06-05 |
EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh |
Tao Hu et.al. |
2506.05554 |
null |
2025-06-05 |
ContentV: Efficient Training of Video Generation Models with Limited Compute |
Wenfeng Lin et.al. |
2506.05343 |
null |
2025-06-05 |
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model |
Pingyu Wu et.al. |
2506.05289 |
link |
2025-06-05 |
Aligning Latent Spaces with Flow Priors |
Yizhuo Li et.al. |
2506.05240 |
null |
2025-06-05 |
PixCell: A generative foundation model for digital histopathology images |
Srikar Yellapragada et.al. |
2506.05127 |
null |
2025-06-05 |
Membership Inference Attacks on Sequence Models |
Lorenzo Rossi et.al. |
2506.05126 |
null |
2025-06-05 |
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models |
Revant Teotia et.al. |
2506.05108 |
null |
2025-06-06 |
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers |
Haosong Liu et.al. |
2506.05096 |
null |
2025-06-05 |
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation |
Huihan Wang et.al. |
2506.04956 |
null |
2025-06-05 |
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx |
Lukas Picek et.al. |
2506.04931 |
null |
2025-06-05 |
Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking |
Yu-Feng Chen et.al. |
2506.04879 |
null |
2025-06-04 |
LayerFlow: A Unified Model for Layer-aware Video Generation |
Sihui Ji et.al. |
2506.04228 |
null |
2025-06-04 |
UNIC: Unified In-Context Video Editing |
Zixuan Ye et.al. |
2506.04216 |
null |
2025-06-05 |
FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers |
Xuanhua He et.al. |
2506.04213 |
null |
2025-06-04 |
Image Editing As Programs with Diffusion Models |
Yujia Hu et.al. |
2506.04158 |
null |
2025-06-05 |
RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors |
Hicham Eddoubi et.al. |
2506.03988 |
link |
2025-06-04 |
EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation |
Cheng Zhang et.al. |
2506.03652 |
null |
2025-06-04 |
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning |
Feng Han et.al. |
2506.03596 |
link |
2025-06-04 |
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models |
Ziyi Wu et.al. |
2506.03517 |
null |
2025-06-03 |
Robustness in Both Domains: CLIP Needs a Robust Text Encoder |
Elias Abad Rocamora et.al. |
2506.03355 |
null |
2025-06-03 |
Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas |
Austin Silveria et.al. |
2506.03275 |
null |
2025-06-03 |
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation |
Yuanze Lin et.al. |
2506.03150 |
null |
2025-06-04 |
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation |
Bin Lin et.al. |
2506.03147 |
null |
2025-06-03 |
Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval |
Jiwen Yu et.al. |
2506.03141 |
null |
2025-06-03 |
CamCloneMaster: Enabling Reference-based Camera Control for Video Generation |
Yawen Luo et.al. |
2506.03140 |
null |
2025-06-03 |
AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation |
Lu Qiu et.al. |
2506.03126 |
null |
2025-06-03 |
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation |
Zhengyao Lv et.al. |
2506.03123 |
null |
2025-06-03 |
TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models |
Chetwin Low et.al. |
2506.03099 |
null |
2025-06-03 |
ORV: 4D Occupancy-centric Robot Video Generation |
Xiuyu Yang et.al. |
2506.03079 |
link |
2025-06-03 |
EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models |
Mingzhe Li et.al. |
2506.03067 |
null |
2025-06-03 |
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers |
Pengtao Chen et.al. |
2506.03065 |
null |
2025-05-30 |
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL |
Yu Zhang et.al. |
2505.24875 |
null |
2025-05-30 |
MiniMax-Remover: Taming Bad Noise Helps Video Object Removal |
Bojia Zi et.al. |
2505.24873 |
null |
2025-05-30 |
GenSpace: Benchmarking Spatially-Aware Image Generation |
Zehan Wang et.al. |
2505.24870 |
null |
2025-05-30 |
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation |
Yucheng Zhou et.al. |
2505.24787 |
link |
2025-05-30 |
DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds |
Jiaxu Zhang et.al. |
2505.24733 |
null |
2025-05-30 |
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation |
Yang-Tian Sun et.al. |
2505.24521 |
null |
2025-05-30 |
un $^2$ CLIP: Improving CLIP’s Visual Detail Capturing Ability via Inverting unCLIP |
Yinqi Li et.al. |
2505.24517 |
link |
2025-05-30 |
Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields |
Md Shahriar Rahim Siddiqui et.al. |
2505.24434 |
null |
2025-06-03 |
Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning |
Stepan Shabalin et.al. |
2505.24360 |
link |
2025-05-30 |
Category-aware EEG image generation based on wavelet transform and contrast semantic loss |
Enshang Zhang et.al. |
2505.24301 |
link |
2025-05-29 |
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers |
Yusuf Dalva et.al. |
2505.23758 |
null |
2025-05-29 |
MAGREF: Masked Guidance for Any-Reference Video Generation |
Yufan Deng et.al. |
2505.23742 |
link |
2025-05-29 |
How Animals Dance (When You’re Not Looking) |
Xiaojuan Wang et.al. |
2505.23738 |
null |
2025-05-29 |
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos |
Tingyu Song et.al. |
2505.23693 |
link |
2025-05-29 |
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models |
Xiangdong Zhang et.al. |
2505.23656 |
link |
2025-05-29 |
Inference-time Scaling of Diffusion Models through Classical Search |
Xiangcheng Zhang et.al. |
2505.23614 |
null |
2025-05-29 |
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model |
Qingyu Shi et.al. |
2505.23606 |
link |
2025-05-29 |
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation |
Kaijie Chen et.al. |
2505.23493 |
null |
2025-05-29 |
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation |
Shi-Xue Zhang et.al. |
2505.23484 |
link |
2025-05-29 |
Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering |
Sixian Wang et.al. |
2505.23343 |
link |
2025-05-28 |
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation |
Zhe Kong et.al. |
2505.22647 |
link |
2025-05-28 |
SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation |
Dekai Zhu et.al. |
2505.22643 |
null |
2025-05-28 |
Principled Out-of-Distribution Generalization via Simplicity |
Jiawei Ge et.al. |
2505.22622 |
null |
2025-05-28 |
ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models |
Dmitrii Sorokin et.al. |
2505.22569 |
null |
2025-05-28 |
PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models |
Junwen Chen et.al. |
2505.22523 |
null |
2025-05-28 |
ProCrop: Learning Aesthetic Image Cropping from Professional Compositions |
Ke Zhang et.al. |
2505.22490 |
null |
2025-05-28 |
Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation |
Jiadong Pan et.al. |
2505.22407 |
null |
2025-05-28 |
PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models |
Fan Fei et.al. |
2505.22394 |
null |
2025-05-28 |
Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion |
Kewen Chen et.al. |
2505.22360 |
null |
2025-05-28 |
Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers |
Weilun Feng et.al. |
2505.22167 |
null |
2025-05-27 |
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation |
Boyang Wang et.al. |
2505.21491 |
null |
2025-05-27 |
Policy Optimized Text-to-Image Pipeline Design |
Uri Gadot et.al. |
2505.21478 |
null |
2025-05-27 |
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction |
Yiheng Liu et.al. |
2505.21473 |
link |
2025-05-27 |
Dynamic Vision from EEG Brain Recordings: How much does EEG know? |
Prajwal Singh et.al. |
2505.21385 |
null |
2025-05-28 |
SageAttention2++: A More Efficient Implementation of SageAttention2 |
Jintao Zhang et.al. |
2505.21136 |
link |
2025-05-27 |
Creativity in LLM-based Multi-Agent Systems: A Survey |
Yi-Cheng Lin et.al. |
2505.21116 |
null |
2025-05-27 |
Minute-Long Videos with Dual Parallelisms |
Zeqing Wang et.al. |
2505.21070 |
link |
2025-05-27 |
RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy |
Aiyue Chen et.al. |
2505.21036 |
null |
2025-05-27 |
OrienText: Surface Oriented Textual Image Generation |
Shubham Singh Paliwal et.al. |
2505.20958 |
null |
2025-05-27 |
Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models |
Puwei Lian et.al. |
2505.20955 |
null |
2025-05-26 |
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities |
Jin Wang et.al. |
2505.20147 |
null |
2025-05-26 |
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion |
Zheqi Lv et.al. |
2505.20053 |
link |
2025-05-27 |
Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM |
Peng Liu et.al. |
2505.19901 |
null |
2025-05-26 |
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation |
Yi Wu et.al. |
2505.19874 |
null |
2025-05-26 |
DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving |
Wenchao Sun et.al. |
2505.19692 |
link |
2025-05-26 |
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs |
Juntong Wang et.al. |
2505.19535 |
null |
2025-05-26 |
Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning |
Jiyu Hu et.al. |
2505.19522 |
null |
2025-05-26 |
The Role of Video Generation in Enhancing Data-Limited Action Understanding |
Wei Li et.al. |
2505.19495 |
null |
2025-05-26 |
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models |
Hang Hua et.al. |
2505.19415 |
null |
2025-05-26 |
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals |
Nate Gillman et.al. |
2505.19386 |
null |
2025-05-23 |
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions |
Zizhang Li et.al. |
2505.18151 |
null |
2025-05-23 |
F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles |
Varun Ajith et.al. |
2505.18106 |
null |
2025-05-23 |
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation |
Junhao Chen et.al. |
2505.18078 |
null |
2025-05-23 |
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration |
Sudarshan Rajagopalan et.al. |
2505.18047 |
null |
2025-05-23 |
SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain |
Jiawei Zhou et.al. |
2505.17727 |
null |
2025-05-23 |
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving |
Shuang Zeng et.al. |
2505.17685 |
null |
2025-05-23 |
Scaling Image and Video Generation via Test-Time Evolutionary Search |
Haoran He et.al. |
2505.17618 |
null |
2025-05-23 |
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation |
Jihan Yao et.al. |
2505.17613 |
null |
2025-05-23 |
InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO |
Xueji Fang et.al. |
2505.17574 |
link |
2025-05-23 |
Deeper Diffusion Models Amplify Bias |
Shahin Hakemi et.al. |
2505.17560 |
null |
2025-05-22 |
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning |
Chengqi Duan et.al. |
2505.17022 |
link |
2025-05-22 |
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO |
Chengzhuo Tong et.al. |
2505.17017 |
link |
2025-05-22 |
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On |
Siqi Wan et.al. |
2505.16977 |
link |
2025-05-22 |
Creatively Upscaling Images with Global-Regional Priors |
Yurui Qian et.al. |
2505.16976 |
null |
2025-05-22 |
Training-Free Efficient Video Generation via Dynamic Token Carving |
Yuechen Zhang et.al. |
2505.16864 |
link |
2025-05-22 |
Conditional Panoramic Image Generation via Masked Autoregressive Modeling |
Chaoyang Wang et.al. |
2505.16862 |
null |
2025-05-22 |
Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts |
Taewon Kang et.al. |
2505.16819 |
null |
2025-05-22 |
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation |
Hongji Yang et.al. |
2505.16763 |
null |
2025-05-22 |
MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM |
Siwei Meng et.al. |
2505.16456 |
null |
2025-05-22 |
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design |
Renjie Wei et.al. |
2505.16335 |
link |
2025-05-21 |
MMaDA: Multimodal Large Diffusion Language Models |
Ling Yang et.al. |
2505.15809 |
link |
2025-05-21 |
Interspatial Attention for Efficient 4D Human Video Generation |
Ruizhi Shao et.al. |
2505.15800 |
null |
2025-05-21 |
IA-T2I: Internet-Augmented Text-to-Image Generation |
Chuanhao Li et.al. |
2505.15779 |
null |
2025-05-21 |
FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion |
Kazuaki Mishima et.al. |
2505.15313 |
null |
2025-05-21 |
BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution |
Ji Guo et.al. |
2505.15308 |
null |
2025-05-21 |
Scaling Diffusion Transformers Efficiently via $μ$ P |
Chenyu Zheng et.al. |
2505.15270 |
link |
2025-05-21 |
AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection |
Zhipei Xu et.al. |
2505.15173 |
null |
2025-05-21 |
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation |
Xinran Wang et.al. |
2505.15172 |
null |
2025-05-21 |
CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation |
Xinran Wang et.al. |
2505.15145 |
link |
2025-05-20 |
Programmatic Video Prediction Using Large Language Models |
Hao Tang et.al. |
2505.14948 |
link |
2025-05-20 |
Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers |
Sucheng Ren et.al. |
2505.14687 |
link |
2025-05-20 |
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation |
Rui Tian et.al. |
2505.14682 |
null |
2025-05-20 |
Training-Free Watermarking for Autoregressive Image Generation |
Yu Tong et.al. |
2505.14673 |
link |
2025-05-20 |
SparC: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling |
Zhihao Li et.al. |
2505.14521 |
null |
2025-05-20 |
Latent Flow Transformer |
Yen-Chen Wu et.al. |
2505.14513 |
link |
2025-05-20 |
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank |
Tianhe Wu et.al. |
2505.14460 |
link |
2025-05-20 |
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives |
Xingxing Weng et.al. |
2505.14361 |
null |
2025-05-20 |
Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization |
Yuanyuan Chang et.al. |
2505.14254 |
link |
2025-05-20 |
“Haet Bhasha aur Diskrimineshun”: Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs |
Darpan Aswal et.al. |
2505.14226 |
null |
2025-05-20 |
LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer |
Changgu Chen et.al. |
2505.14167 |
null |
2025-05-19 |
VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation |
Huawei Lin et.al. |
2505.13439 |
link |
2025-05-19 |
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance |
Dian Shao et.al. |
2505.13437 |
null |
2025-05-20 |
Swin DiT: Diffusion Transformer using Pseudo Shifted Windows |
Jiafu Wu et.al. |
2505.13219 |
null |
2025-05-19 |
Diffusion Models with Double Guidance: Generate with aggregated datasets |
Yanfeng Yang et.al. |
2505.13213 |
null |
2025-05-19 |
MAGI-1: Autoregressive Video Generation at Scale |
Sand. ai et.al. |
2505.13211 |
link |
2025-05-19 |
A Physics-Inspired Optimizer: Velocity Regularized Adam |
Pranav Vaidhyanathan et.al. |
2505.13196 |
null |
2025-05-19 |
Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model |
Jonas Brenig et.al. |
2505.13152 |
link |
2025-05-19 |
Accelerate TarFlow Sampling with GS-Jacobi Iteration |
Ben Liu et.al. |
2505.12849 |
link |
2025-05-19 |
FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities |
Shibo Hong et.al. |
2505.12795 |
link |
2025-05-19 |
SounDiT: Geo-Contextual Soundscape-to-Landscape Generation |
Junbo Wang et.al. |
2505.12734 |
null |
2025-05-16 |
QVGen: Pushing the Limit of Quantized Video Generative Models |
Yushi Huang et.al. |
2505.11497 |
null |
2025-05-16 |
PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment |
Dingbang Huang et.al. |
2505.11468 |
null |
2025-05-16 |
GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI |
Marc-Antoine Fortin et.al. |
2505.11445 |
link |
2025-05-16 |
Face Consistency Benchmark for GenAI Video |
Michal Podstawski et.al. |
2505.11425 |
null |
2025-05-16 |
DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models |
Giulia Bertazzini et.al. |
2505.11257 |
null |
2025-05-16 |
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models |
Fu-Yun Wang et.al. |
2505.11245 |
link |
2025-05-16 |
CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback |
Yixin Wan et.al. |
2505.11178 |
null |
2025-05-16 |
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework |
Feiran Li et.al. |
2505.11131 |
link |
2025-05-16 |
HSRMamba: Efficient Wavelet Stripe State Space Model for Hyperspectral Image Super-Resolution |
Baisong Li et.al. |
2505.11062 |
link |
2025-05-16 |
Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges |
Yuan Zhang et.al. |
2505.10993 |
null |
2025-05-15 |
End-to-End Vision Tokenizer Tuning |
Wenxuan Wang et.al. |
2505.10562 |
null |
2025-05-15 |
CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs |
Raman Dutt et.al. |
2505.10496 |
link |
2025-05-16 |
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation |
Yanbo Ding et.al. |
2505.10238 |
link |
2025-05-15 |
ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars |
Rui-Yang Ju et.al. |
2505.10072 |
null |
2025-05-15 |
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis |
Bingda Tang et.al. |
2505.10046 |
link |
2025-05-14 |
EnerVerse-AC: Envisioning Embodied Environments with Action Condition |
Yuxin Jiang et.al. |
2505.09723 |
null |
2025-05-14 |
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models |
Hu Yue et.al. |
2505.09694 |
link |
2025-05-14 |
Don’t Forget your Inverse DDIM for Image Editing |
Guillermo Gomez-Trenado et.al. |
2505.09571 |
null |
2025-05-14 |
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset |
Jiuhai Chen et.al. |
2505.09568 |
link |
2025-05-14 |
Train a Multi-Task Diffusion Policy on RLBench-18 in One Day with One GPU |
Yutong Hu et.al. |
2505.09430 |
link |
2025-05-14 |
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis |
Bingxin Ke et.al. |
2505.09358 |
link |
2025-05-14 |
An Initial Exploration of Default Images in Text-to-Image Generation |
Hannu Simonen et.al. |
2505.09166 |
null |
2025-05-15 |
Generating time-consistent dynamics with discriminator-guided image diffusion models |
Philipp Hess et.al. |
2505.09089 |
null |
2025-05-13 |
Generative AI for Autonomous Driving: Frontiers and Opportunities |
Yuping Wang et.al. |
2505.08854 |
link |
2025-05-13 |
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models |
Donghoon Kim et.al. |
2505.08622 |
null |
2025-05-13 |
Symbolically-Guided Visual Plan Inference from Uncurated Video Data |
Wenyan Yang et.al. |
2505.08444 |
null |
2025-05-13 |
Identifying Memorization of Diffusion Models through p-Laplace Analysis |
Jonathan Brokman et.al. |
2505.08246 |
link |
2025-05-12 |
Image-Guided Microstructure Optimization using Diffusion Models: Validated with Li-Mn-rich Cathode Precursors |
Geunho Choi et.al. |
2505.07906 |
null |
2025-05-12 |
DanceGRPO: Unleashing GRPO on Visual Generation |
Zeyue Xue et.al. |
2505.07818 |
null |
2025-05-12 |
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models |
Ozgur Kara et.al. |
2505.07652 |
null |
2025-05-12 |
Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning |
Bohan Wang et.al. |
2505.07538 |
null |
2025-05-12 |
Addressing degeneracies in latent interpolation for diffusion models |
Erik Landolsi et.al. |
2505.07481 |
null |
2025-05-13 |
Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model |
Wei Li et.al. |
2505.07449 |
link |
2025-05-12 |
GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models |
Daria Zotova et.al. |
2505.07364 |
null |
2025-05-12 |
Generative Pre-trained Autoregressive Diffusion Transformer |
Yuan Zhang et.al. |
2505.07344 |
null |
2025-05-12 |
Metrics that matter: Evaluating image quality metrics for medical image generation |
Yash Deo et.al. |
2505.07175 |
link |
2025-05-11 |
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models |
Junhao Xia et.al. |
2505.07057 |
null |
2025-05-11 |
Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation |
Md. Naimur Asif Borno et.al. |
2505.06995 |
null |
2025-05-09 |
Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation |
Dongying Li et.al. |
2505.06117 |
null |
2025-05-09 |
Discovery of the Polar Ring Galaxies with deep learning |
D. V. Dobrycheva et.al. |
2505.05890 |
null |
2025-05-09 |
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition |
Zhiyuan Chen et.al. |
2505.05829 |
link |
2025-05-08 |
InstanceGen: Image Generation with Instance-level Instructions |
Etai Sella et.al. |
2505.05678 |
link |
2025-05-08 |
A Preliminary Study for GPT-4o on Image Restoration |
Hao Yang et.al. |
2505.05621 |
link |
2025-05-11 |
Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation |
Chao Liao et.al. |
2505.05472 |
null |
2025-05-08 |
Normalize Everything: A Preconditioned Magnitude-Preserving Architecture for Diffusion-Based Speech Enhancement |
Julius Richter et.al. |
2505.05216 |
null |
2025-05-12 |
PIDiff: Image Customization for Personalized Identities with Diffusion Models |
Jinyu Gu et.al. |
2505.05081 |
null |
2025-05-08 |
T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models |
Xuyang Guo et.al. |
2505.04946 |
null |
2025-05-07 |
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation |
Viacheslav Vasilev et.al. |
2505.04851 |
null |
2025-05-07 |
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers |
Divyansh Srivastava et.al. |
2505.04718 |
null |
2025-05-08 |
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation |
Teng Hu et.al. |
2505.04512 |
null |
2025-05-08 |
Defining and Quantifying Creative Behavior in Popular Image Generators |
Aditi Ramaswamy et.al. |
2505.04497 |
null |
2025-05-07 |
Efficient Flow Matching using Latent Variables |
Anirban Samaddar et.al. |
2505.04486 |
null |
2025-05-07 |
Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety |
Variath Madhupal Gautham Nair et.al. |
2505.04146 |
null |
2025-05-07 |
RFNNS: Robust Fixed Neural Network Steganography with Popular Deep Generative Models |
Yu Cheng et.al. |
2505.04116 |
null |
2025-05-06 |
Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators |
Will Hawkins et.al. |
2505.03859 |
link |
2025-05-06 |
Revolutionizing Brain Tumor Imaging: Generating Synthetic 3D FA Maps from T1-Weighted MRI using CycleGAN Models |
Xin Du et.al. |
2505.03662 |
null |
2025-05-06 |
Real-Time Person Image Synthesis Using a Flow Matching Model |
Jiwoo Jeong et.al. |
2505.03562 |
link |
2025-05-06 |
Safer Prompts: Reducing IP Risk in Visual Generative AI |
Lena Reissinger et.al. |
2505.03338 |
null |
2025-05-06 |
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning |
Yibin Wang et.al. |
2505.03318 |
null |
2025-05-06 |
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights |
Zhaiming Shen et.al. |
2505.03205 |
null |
2025-05-05 |
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models |
Kuofeng Gao et.al. |
2505.02824 |
link |
2025-05-06 |
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation |
Mingcheng Li et.al. |
2505.02648 |
null |
2025-05-07 |
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities |
Xinjie Zhang et.al. |
2505.02567 |
link |
2025-05-05 |
Text to Image Generation and Editing: A Survey |
Pengfei Yang et.al. |
2505.02527 |
null |
2025-05-07 |
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction |
Inclusion AI et.al. |
2505.02471 |
link |
2025-05-04 |
Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset |
Jakub Wąsala et.al. |
2505.02255 |
null |
2025-05-04 |
Improving Physical Object State Representation in Text-to-Image Generative Systems |
Tianle Chen et.al. |
2505.02236 |
link |
2025-05-04 |
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization |
Wenchuan Wang et.al. |
2505.02192 |
null |
2025-05-06 |
Regression is all you need for medical image translation |
Sebastian Rassmann et.al. |
2505.02048 |
link |
2025-05-03 |
PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth |
Bu Jin et.al. |
2505.01729 |
null |
2025-05-02 |
FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis |
Jiangtong Tan et.al. |
2505.01172 |
link |
2025-05-02 |
Improving Editability in Image Generation with Layer-wise Memory |
Daneul Kim et.al. |
2505.01079 |
null |
2025-05-01 |
Controllable Weather Synthesis and Removal with Video Diffusion Models |
Chih-Hao Lin et.al. |
2505.00704 |
null |
2025-05-01 |
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT |
Dongzhi Jiang et.al. |
2505.00703 |
link |
2025-05-01 |
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers |
Kwon Byung-Ki et.al. |
2505.00482 |
null |
2025-05-01 |
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation |
Xuyang Guo et.al. |
2505.00337 |
null |
2025-04-30 |
Direct Motion Models for Assessing Generated Videos |
Kelsey Allen et.al. |
2505.00209 |
null |
2025-04-30 |
Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis |
Michal Geyer et.al. |
2505.00135 |
null |
2025-04-30 |
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction |
Qihao Liu et.al. |
2504.21855 |
null |
2025-04-30 |
3D Stylization via Large Reconstruction Model |
Ipek Oztas et.al. |
2504.21836 |
null |
2025-04-30 |
Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields |
Yixin Gao et.al. |
2504.21814 |
null |
2025-04-30 |
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation |
Haiyang Zhou et.al. |
2504.21650 |
link |
2025-04-30 |
Latent Feature-Guided Conditional Diffusion for High-Fidelity Generative Image Semantic Communication |
Zehao Chen et.al. |
2504.21577 |
null |
2025-04-30 |
Simple Visual Artifact Detection in Sora-Generated Videos |
Misora Sugiyama et.al. |
2504.21334 |
null |
2025-04-30 |
Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation |
Abdul Sami et.al. |
2504.21325 |
null |
2025-04-30 |
Capturing Conditional Dependence via Auto-regressive Diffusion Models |
Xunpeng Huang et.al. |
2504.21314 |
null |
2025-04-30 |
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images |
Yunhao Li et.al. |
2504.21308 |
null |
2025-04-30 |
Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions |
ZiYi Dong et.al. |
2504.21292 |
null |
2025-04-29 |
YoChameleon: Personalized Vision and Language Generation |
Thao Nguyen et.al. |
2504.20998 |
null |
2025-04-29 |
TesserAct: Learning 4D Embodied World Models |
Haoyu Zhen et.al. |
2504.20995 |
null |
2025-04-29 |
DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs |
Hao Luan et.al. |
2504.20754 |
null |
2025-04-29 |
Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion |
Zesheng Wang et.al. |
2504.20685 |
null |
2025-04-29 |
Advance Fake Video Detection via Vision Transformers |
Joy Battocchio et.al. |
2504.20669 |
null |
2025-04-30 |
PixelHacker: Image Inpainting with Structural and Semantic Consistency |
Ziyang Xu et.al. |
2504.20438 |
null |
2025-04-29 |
Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems |
Shiqian Zhao et.al. |
2504.20376 |
null |
2025-04-29 |
A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks |
Khoi Trinh et.al. |
2504.20340 |
null |
2025-04-28 |
Physics-Informed Diffusion Models for SAR Ship Wake Generation from Text Prompts |
Kamirul Kamirul et.al. |
2504.20241 |
null |
2025-04-28 |
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition |
Quynh Phung et.al. |
2504.19894 |
null |
2025-04-28 |
RepText: Rendering Visual Text via Replicating |
Haofan Wang et.al. |
2504.19724 |
null |
2025-04-28 |
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer |
Junpeng Jiang et.al. |
2504.19614 |
null |
2025-04-28 |
Image Generation Method Based on Heat Diffusion Models |
Pengfei Zhang et.al. |
2504.19600 |
null |
2025-04-29 |
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution |
Pietro Bongini et.al. |
2504.19595 |
null |
2025-04-28 |
GenPTW: In-Generation Image Watermarking for Provenance Tracing and Tamper Localization |
Zhenliang Gan et.al. |
2504.19567 |
null |
2025-04-28 |
Masked Language Prompting for Generative Data Augmentation in Few-shot Fashion Style Recognition |
Yuki Hirakawa et.al. |
2504.19455 |
null |
2025-04-27 |
Flow Along the K-Amplitude for Generative Modeling |
Weitao Du et.al. |
2504.19353 |
null |
2025-04-26 |
Predicting Stress in Two-phase Random Materials and Super-Resolution Method for Stress Images by Embedding Physical Information |
Tengfei Xing et.al. |
2504.18854 |
null |
2025-04-26 |
Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning |
Yifan Xie et.al. |
2504.18810 |
null |
2025-04-25 |
NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration |
Haotian Dong et.al. |
2504.18448 |
null |
2025-04-25 |
HepatoGEN: Generating Hepatobiliary Phase MRI with Perceptual and Adversarial Models |
Jens Hooge et.al. |
2504.18405 |
null |
2025-04-24 |
Fast Autoregressive Models for Continuous Latent Generation |
Tiankai Hang et.al. |
2504.18391 |
null |
2025-04-25 |
TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation |
Shintaro Ozaki et.al. |
2504.18269 |
null |
2025-04-25 |
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding |
Kun Li et.al. |
2504.18204 |
null |
2025-04-25 |
Diffusion-Driven Universal Model Inversion Attack for Face Recognition |
Hanrui Wang et.al. |
2504.18015 |
null |
2025-04-27 |
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models |
Xu Ma et.al. |
2504.17789 |
null |
2025-04-24 |
Dynamic Camera Poses and Where to Find Them |
Chris Rockwell et.al. |
2504.17788 |
null |
2025-04-24 |
Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields |
Zhuo He et.al. |
2504.17712 |
null |
2025-04-24 |
STCL:Curriculum learning Strategies for deep learning image steganography models |
Fengchun Liu et.al. |
2504.17609 |
link |
2025-04-24 |
Text-to-Image Alignment in Denoising-Based Models through Step Selection |
Paul Grimal et.al. |
2504.17525 |
null |
2025-04-24 |
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation |
Aviv Slobodkin et.al. |
2504.17502 |
null |
2025-04-24 |
StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies |
Xu Wang et.al. |
2504.17401 |
null |
2025-04-24 |
DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition |
Yiyan Xu et.al. |
2504.17349 |
null |
2025-04-24 |
Physics-based super-resolved simulation of 3D elastic wave propagation adopting scalable Diffusion Transformer |
Hugo Gabrielidis et.al. |
2504.17308 |
null |
2025-04-24 |
Towards Generalized and Training-Free Text-Guided Semantic Manipulation |
Yu Hong et.al. |
2504.17269 |
null |
2025-04-23 |
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation |
Ruotong Wang et.al. |
2504.16907 |
null |
2025-04-23 |
ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance |
Ying Li et.al. |
2504.16464 |
null |
2025-04-23 |
CLPSTNet: A Progressive Multi-Scale Convolutional Steganography Model Integrating Curriculum Learning |
Fengchun Liu et.al. |
2504.16364 |
link |
2025-04-23 |
VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models |
Xuming Hu et.al. |
2504.16359 |
null |
2025-04-22 |
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching |
Junn Yong Loo et.al. |
2504.16262 |
null |
2025-04-22 |
Survey of Video Diffusion Models: Foundations, Implementations, and Applications |
Yimu Wang et.al. |
2504.16081 |
link |
2025-04-22 |
Boosting Generative Image Modeling via Joint Image-Feature Synthesis |
Theodoros Kouzelis et.al. |
2504.16064 |
null |
2025-04-22 |
Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework |
Xinyuan Song et.al. |
2504.16016 |
null |
2025-04-22 |
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation |
Zebin Yao et.al. |
2504.15958 |
link |
2025-04-22 |
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning |
Wang Lin et.al. |
2504.15932 |
null |
2025-04-22 |
DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers |
Xuyang Zhong et.al. |
2504.15827 |
null |
2025-04-22 |
Satellite to GroundScape – Large-scale Consistent Ground View Generation from Satellite Views |
Ningli Xu et.al. |
2504.15786 |
null |
2025-04-22 |
DiTPainter: Efficient Video Inpainting with Diffusion Transformers |
Xian Wu et.al. |
2504.15661 |
null |
2025-04-21 |
Emergence and Evolution of Interpretable Concepts in Diffusion Models |
Berk Tinaz et.al. |
2504.15473 |
null |
2025-04-21 |
Solving New Tasks by Adapting Internet Video Knowledge |
Calvin Luo et.al. |
2504.15369 |
null |
2025-04-22 |
LACE: Controlled Image Prompting and Iterative Refinement with GenAI for Professional Visual Art Creators |
Yenkai Huang et.al. |
2504.15189 |
null |
2025-04-21 |
Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform |
Xianpan Zhou et.al. |
2504.15182 |
null |
2025-04-21 |
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration |
Junyuan Deng et.al. |
2504.15159 |
null |
2025-04-21 |
GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security |
Xiangkun Wang et.al. |
2504.15139 |
null |
2025-04-22 |
VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation |
Mingxia Zhan et.al. |
2504.15095 |
null |
2025-04-21 |
DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation |
Weijie He et.al. |
2504.15032 |
null |
2025-04-21 |
TWIG: Two-Step Image Generation using Segmentation Masks in Diffusion Models |
Mazharul Islam Rakib et.al. |
2504.14933 |
null |
2025-04-21 |
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation |
Chenjie Cao et.al. |
2504.14899 |
link |
2025-04-21 |
Twin Co-Adaptive Dialogue for Progressive Image Generation |
Jianhui Wang et.al. |
2504.14868 |
null |
2025-04-21 |
LACE: Exploring Turn-Taking and Parallel Interaction Modes in Human-AI Co-Creation for Iterative Image Generation |
YenKai Huang et.al. |
2504.14827 |
null |
2025-04-18 |
MLEP: Multi-granularity Local Entropy Patterns for Universal AI-generated Image Detection |
Lin Yuan et.al. |
2504.13726 |
null |
2025-04-18 |
SupResDiffGAN a new approach for the Super-Resolution task |
Dawid Kopeć et.al. |
2504.13622 |
null |
2025-04-18 |
U-Shape Mamba: State Space Model for faster diffusion |
Alex Ergasti et.al. |
2504.13499 |
link |
2025-04-18 |
Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing |
Joowon Kim et.al. |
2504.13490 |
null |
2025-04-18 |
POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation |
Evans Xu Han et.al. |
2504.13392 |
null |
2025-04-17 |
SMPL-GPTexture: Dual-View 3D Human Texture Estimation using Text-to-Image Generation Models |
Mingxiao Tu et.al. |
2504.13378 |
null |
2025-04-17 |
Personalized Text-to-Image Generation with Auto-Regressive Models |
Kaiyue Sun et.al. |
2504.13162 |
link |
2025-04-18 |
SkyReels-V2: Infinite-length Film Generative Model |
Guibin Chen et.al. |
2504.13074 |
link |
2025-04-17 |
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation |
Wenqi Dong et.al. |
2504.13072 |
null |
2025-04-17 |
ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models |
Linkang Du et.al. |
2504.13061 |
link |
2025-04-17 |
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins |
Yao Mu et.al. |
2504.13059 |
null |
2025-04-17 |
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding |
Qianqian Sun et.al. |
2504.12704 |
null |
2025-04-17 |
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation |
Lvmin Zhang et.al. |
2504.12626 |
link |
2025-04-17 |
Prompt-Driven and Training-Free Forgetting Approach and Dataset for Large Language Models |
Zhenyu Yu et.al. |
2504.12574 |
null |
2025-04-16 |
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework |
Jiale Tao et.al. |
2504.12395 |
link |
2025-04-16 |
VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate |
Zhihang Yuan et.al. |
2504.12259 |
link |
2025-04-16 |
SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction |
Xia Wang et.al. |
2504.12245 |
null |
2025-04-16 |
Cobra: Efficient Line Art COlorization with BRoAder References |
Junhao Zhuang et.al. |
2504.12240 |
null |
2025-04-16 |
Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM |
Zirui Pan et.al. |
2504.12048 |
null |
2025-04-16 |
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching |
Xinli Yue et.al. |
2504.12018 |
null |
2025-04-16 |
Novel-view X-ray Projection Synthesis through Geometry-Integrated Deep Learning |
Daiqi Liu et.al. |
2504.11953 |
link |
2025-04-16 |
Mind2Matter: Creating 3D Models from EEG Signals |
Xia Deng et.al. |
2504.11936 |
link |
2025-04-16 |
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation |
Bingjie Gao et.al. |
2504.11739 |
null |
2025-04-16 |
Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset |
Muhammad Shahid Muneer et.al. |
2504.11707 |
link |
2025-04-15 |
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception |
Ziqi Pang et.al. |
2504.11457 |
link |
2025-04-15 |
ADT: Tuning Diffusion Models with Adversarial Supervision |
Dazhong Shen et.al. |
2504.11423 |
null |
2025-04-15 |
VideoPanda: Video Panoramic Diffusion with Multi-view Attention |
Kevin Xie et.al. |
2504.11389 |
null |
2025-04-15 |
Omni $^2$ : Unifying Omnidirectional Image Generation and Editing in an Omni Model |
Liu Yang et.al. |
2504.11379 |
null |
2025-04-16 |
Seedream 3.0 Technical Report |
Yu Gao et.al. |
2504.11346 |
null |
2025-04-15 |
Using LLMs as prompt modifier to avoid biases in AI image generators |
René Peinl et.al. |
2504.11104 |
null |
2025-04-15 |
AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era |
Chenyang Zhu et.al. |
2504.11015 |
null |
2025-04-15 |
InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation |
Yukang Lin et.al. |
2504.10905 |
null |
2025-04-15 |
Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models |
Karan Jain et.al. |
2504.10883 |
null |
2025-04-15 |
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding |
Dianbing Xi et.al. |
2504.10825 |
null |
2025-04-14 |
Art3D: Training-Free 3D Generation from Flat-Colored Illustration |
Xiaoyan Cong et.al. |
2504.10466 |
null |
2025-04-14 |
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing |
Taihang Hu et.al. |
2504.10434 |
link |
2025-04-14 |
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos |
Rui Chen et.al. |
2504.10358 |
null |
2025-04-14 |
InstructEngine: Instruction-driven Text-to-Image Alignment |
Xingyu Lu et.al. |
2504.10329 |
null |
2025-04-14 |
VibrantLeaves: A principled parametric image generator for training deep restoration models |
Raphael Achddou et.al. |
2504.10201 |
link |
2025-04-14 |
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions |
Jo-Ku Cheng et.al. |
2504.10146 |
link |
2025-04-14 |
Aligning Anime Video Generation with Human Feedback |
Bingwen Zhu et.al. |
2504.10044 |
null |
2025-04-14 |
Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics |
Nikolai Röhrich et.al. |
2504.10021 |
null |
2025-04-14 |
Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes |
Huijie Liu et.al. |
2504.09948 |
null |
2025-04-14 |
EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise |
Chao Liu et.al. |
2504.09789 |
null |
2025-04-11 |
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation |
Tianwei Xiong et.al. |
2504.08736 |
link |
2025-04-11 |
Generating Fine Details of Entity Interactions |
Xinyi Gu et.al. |
2504.08714 |
null |
2025-04-11 |
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model |
Team Seawead et.al. |
2504.08685 |
null |
2025-04-11 |
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization |
Jialu Li et.al. |
2504.08641 |
null |
2025-04-11 |
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging |
Gabriele Lozupone et.al. |
2504.08635 |
link |
2025-04-11 |
Discriminator-Free Direct Preference Optimization for Video Diffusion |
Haoran Cheng et.al. |
2504.08542 |
null |
2025-04-11 |
On the Design of Diffusion-based Neural Speech Codecs |
Pietro Foti et.al. |
2504.08470 |
null |
2025-04-11 |
Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion |
Weiye Chen et.al. |
2504.08451 |
link |
2025-04-11 |
Diffusion Models for Robotic Manipulation: A Survey |
Rosa Wolf et.al. |
2504.08438 |
null |
2025-04-11 |
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization |
Daeun Kim et.al. |
2504.08398 |
null |
2025-04-10 |
PixelFlow: Pixel-Space Generative Models with Flow |
Shoufa Chen et.al. |
2504.07963 |
link |
2025-04-10 |
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction |
Zeren Jiang et.al. |
2504.07961 |
link |
2025-04-10 |
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning |
Zhong-Yu Li et.al. |
2504.07960 |
null |
2025-04-10 |
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos |
Rundong Luo et.al. |
2504.07940 |
null |
2025-04-10 |
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows |
Mashrur M. Morshed et.al. |
2504.07894 |
null |
2025-04-10 |
Towards Sustainable Creativity Support: An Exploratory Study on Prompt Based Image Generation |
Daniel Hove Paludan et.al. |
2504.07879 |
null |
2025-04-10 |
Diffusion Transformers for Tabular Data Time Series Generation |
Fabrizio Garuti et.al. |
2504.07566 |
link |
2025-04-10 |
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation |
Linyan Huang et.al. |
2504.07405 |
null |
2025-04-10 |
ID-Booth: Identity-consistent Face Generation with Diffusion Models |
Darian Tomašević et.al. |
2504.07392 |
link |
2025-04-10 |
Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction |
Qingchao Jiang et.al. |
2504.07382 |
link |
2025-04-09 |
OmniCaptioner: One Captioner to Rule Them All |
Yiting Lu et.al. |
2504.07089 |
link |
2025-04-09 |
A Unified Agentic Framework for Evaluating Conditional Image Generation |
Jifang Wang et.al. |
2504.07046 |
link |
2025-04-09 |
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation |
Diljeet Jagpal et.al. |
2504.06861 |
null |
2025-04-09 |
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation |
Wangbo Zhao et.al. |
2504.06803 |
link |
2025-04-09 |
A Meaningful Perturbation Metric for Evaluating Explainability Methods |
Danielle Cohen et.al. |
2504.06800 |
null |
2025-04-10 |
Compass Control: Multi Object Orientation Control for Text-to-Image Generation |
Rishubh Parihar et.al. |
2504.06752 |
null |
2025-04-09 |
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism |
Elia Peruzzo et.al. |
2504.06672 |
null |
2025-04-09 |
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception |
Ruotian Peng et.al. |
2504.06666 |
null |
2025-04-09 |
Collision avoidance from monocular vision trained with novel view synthesis |
Valentin Tordjman–Levavasseur et.al. |
2504.06651 |
null |
2025-04-09 |
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering |
Yifan Gao et.al. |
2504.06632 |
null |
2025-04-08 |
Transfer between Modalities with MetaQueries |
Xichen Pan et.al. |
2504.06256 |
null |
2025-04-08 |
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance |
Jiazi Bu et.al. |
2504.06232 |
null |
2025-04-08 |
A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model |
Jihun Park et.al. |
2504.06144 |
null |
2025-04-08 |
CamContextI2V: Context-aware Controllable Video Generation |
Luis Denninger et.al. |
2504.06022 |
link |
2025-04-08 |
An Empirical Study of GPT-4o Image Generation Capabilities |
Sixiang Chen et.al. |
2504.05979 |
link |
2025-04-08 |
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking |
Junxi Chen et.al. |
2504.05838 |
link |
2025-04-08 |
Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models |
Jiahao Chen et.al. |
2504.05815 |
null |
2025-04-08 |
Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling |
Jaskirat Singh et.al. |
2504.05800 |
null |
2025-04-07 |
Gaussian Mixture Flow Matching Models |
Hansheng Chen et.al. |
2504.05304 |
link |
2025-04-07 |
One-Minute Video Generation with Test-Time Training |
Karan Dalal et.al. |
2504.05298 |
null |
2025-04-07 |
Video-Bench: Human-Aligned Video Generation Benchmark |
Hui Han et.al. |
2504.04907 |
null |
2025-04-07 |
Imagining the Far East: Exploring Perceived Biases in AI-Generated Images of East Asian Women |
Xingyu Lan et.al. |
2504.04865 |
null |
2025-04-07 |
AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation |
Xiongbo Lu et.al. |
2504.04743 |
null |
2025-04-08 |
Your Image Generator Is Your New Private Dataset |
Nicolo Resmini et.al. |
2504.04582 |
null |
2025-04-06 |
Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification |
Shijian Wang et.al. |
2504.04510 |
null |
2025-04-06 |
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding |
Yang Jiao et.al. |
2504.04423 |
link |
2025-04-05 |
SDEIT: Semantic-Driven Electrical Impedance Tomography |
Dong Liu et.al. |
2504.04185 |
null |
2025-04-05 |
Learning about the Physical World through Analytic Concepts |
Jianhua Sun et.al. |
2504.04170 |
null |
2025-04-04 |
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models |
Wulin Xie et.al. |
2504.03641 |
null |
2025-04-04 |
Dynamic Importance in Diffusion U-Net for Enhanced Image Synthesis |
Xi Wang et.al. |
2504.03471 |
link |
2025-04-04 |
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning |
Quanxing Xu et.al. |
2504.03337 |
null |
2025-04-04 |
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models |
Xuran Ma et.al. |
2504.03140 |
link |
2025-04-03 |
How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models |
Pascal Chang et.al. |
2504.03072 |
null |
2025-04-03 |
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning |
Xianwei Zhuang et.al. |
2504.02949 |
link |
2025-04-03 |
Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments |
Chenyu Zhang et.al. |
2504.02918 |
null |
2025-04-03 |
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets |
Chuning Zhu et.al. |
2504.02792 |
null |
2025-04-03 |
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation |
Zhiyuan Yan et.al. |
2504.02782 |
link |
2025-04-03 |
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model |
Shengjun Zhang et.al. |
2504.02764 |
null |
2025-04-03 |
RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models |
ZhongLi Fang et.al. |
2504.02640 |
null |
2025-04-03 |
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation |
Jiwoo Chung et.al. |
2504.02612 |
link |
2025-04-04 |
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation |
Fa-Ting Hong et.al. |
2504.02542 |
link |
2025-04-03 |
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer |
Jiayi Gao et.al. |
2504.02451 |
link |
2025-04-03 |
SkyReels-A2: Compose Anything in Video Diffusion Transformers |
Zhengcong Fei et.al. |
2504.02436 |
link |
2025-04-04 |
MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition |
Takahiro Shirakawa et.al. |
2504.02361 |
null |
2025-04-03 |
OmniCam: Unified Multimodal Video Generation via Camera Control |
Xiaoda Yang et.al. |
2504.02312 |
null |
2025-04-03 |
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step |
Hanyang Wang et.al. |
2504.01956 |
null |
2025-04-03 |
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement |
Runhui Huang et.al. |
2504.01934 |
null |
2025-04-02 |
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs |
Mothilal Asokan et.al. |
2504.01916 |
link |
2025-04-02 |
Instance Migration Diffusion for Nuclear Instance Segmentation in Pathology |
Lirui Qi et.al. |
2504.01577 |
null |
2025-04-02 |
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model |
Yiyang Shen et.al. |
2504.01512 |
null |
2025-04-01 |
Prompting Forgetting: Unlearning in GANs via Textual Guidance |
Piyush Nagasubramaniam et.al. |
2504.01218 |
null |
2025-04-01 |
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models |
Guy Kaplan et.al. |
2504.01137 |
link |
2025-04-01 |
ShieldGemma 2: Robust and Tractable Image Content Moderation |
Wenjun Zeng et.al. |
2504.01081 |
null |
2025-04-01 |
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction |
Junhao Cheng et.al. |
2504.01014 |
link |
2025-04-01 |
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization |
Siyuan Li et.al. |
2504.00999 |
link |
2025-03-31 |
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy |
Zhonghan Zhao et.al. |
2503.24388 |
null |
2025-03-31 |
Consistent Subject Generation via Contrastive Instantiated Concepts |
Lee Hsin-Ying et.al. |
2503.24387 |
null |
2025-03-31 |
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation |
Shengqiong Wu et.al. |
2503.24379 |
null |
2025-03-31 |
Style Quantization for Data-Efficient GAN Training |
Jian Wang et.al. |
2503.24282 |
null |
2025-03-31 |
FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics |
Yixuan Li et.al. |
2503.24267 |
null |
2025-03-31 |
Threats and Opportunities in AI-generated Images for Armed Forces |
Raphael Meier et.al. |
2503.24095 |
null |
2025-04-01 |
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation |
Boyuan Wang et.al. |
2503.24026 |
null |
2025-03-31 |
JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation |
Fangda Chen et.al. |
2503.23951 |
null |
2025-03-31 |
AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents |
Jiaxiang Chen et.al. |
2503.23948 |
link |
2025-04-01 |
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices |
Bosung Kim et.al. |
2503.23796 |
link |
2025-03-28 |
Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure |
Frank J. Brooks et.al. |
2503.22658 |
null |
2025-03-28 |
Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model |
Jangho Park et.al. |
2503.22622 |
null |
2025-03-28 |
EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation |
Hadrien Reynaud et.al. |
2503.22357 |
null |
2025-03-28 |
Meta-LoRA: Meta-Learning LoRA Components for Domain-Aware ID Personalization |
Barış Batuhan Topal et.al. |
2503.22352 |
null |
2025-03-28 |
CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving |
Yishen Ji et.al. |
2503.22231 |
null |
2025-03-28 |
Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces |
Wonhyeok Choi et.al. |
2503.22209 |
null |
2025-03-28 |
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation |
Yunhong Min et.al. |
2503.22194 |
null |
2025-03-28 |
Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items |
Jianghao Lin et.al. |
2503.22182 |
null |
2025-03-28 |
An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval |
Min Cao et.al. |
2503.22171 |
link |
2025-03-28 |
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis |
Woojung Han et.al. |
2503.22168 |
null |
2025-03-27 |
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models |
Chi-Pin Huang et.al. |
2503.21781 |
null |
2025-03-27 |
Optimal Stepsize for Diffusion Sampling |
Jianning Pei et.al. |
2503.21774 |
link |
2025-03-27 |
Exploring the Evolution of Physics Cognition in Video Generation: A Survey |
Minghui Lin et.al. |
2503.21765 |
link |
2025-03-27 |
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework |
Qi Qin et.al. |
2503.21758 |
link |
2025-03-27 |
A Unified Framework for Diffusion Bridge Problems: Flow Matching and Schrödinger Matching into One |
Minyoung Kim et.al. |
2503.21756 |
null |
2025-03-27 |
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness |
Dian Zheng et.al. |
2503.21755 |
link |
2025-03-27 |
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning |
Aniket Didolkar et.al. |
2503.21747 |
null |
2025-03-27 |
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models |
Yuhan Zhang et.al. |
2503.21745 |
null |
2025-03-27 |
Audio-driven Gesture Generation via Deviation Feature in the Latent Space |
Jiahui Chen et.al. |
2503.21616 |
null |
2025-03-27 |
Zero-Shot Visual Concept Blending Without Text Guidance |
Hiroya Makino et.al. |
2503.21277 |
link |
2025-03-26 |
High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching |
Guoqiang Zhang et.al. |
2503.20744 |
null |
2025-03-26 |
RecTable: Fast Modeling Tabular Data with Rectified Flow |
Masane Fuchi et.al. |
2503.20731 |
link |
2025-03-26 |
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation |
Yuyang Peng et.al. |
2503.20672 |
null |
2025-03-26 |
AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports |
Xiangwen Zhang et.al. |
2503.20654 |
null |
2025-03-26 |
MMGen: Unified Multi-modal Image Generation and Understanding in One Go |
Jiepeng Wang et.al. |
2503.20644 |
null |
2025-03-26 |
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving |
Lloyd Russell et.al. |
2503.20523 |
null |
2025-03-26 |
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization |
Jiale Cheng et.al. |
2503.20491 |
link |
2025-03-26 |
Wan: Open and Advanced Large-Scale Video Generative Models |
WanTeam et.al. |
2503.20314 |
link |
2025-03-26 |
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models |
Prin Phunyaphibarn et.al. |
2503.20240 |
null |
2025-03-26 |
Video Motion Graphs |
Haiyang Liu et.al. |
2503.20218 |
null |
2025-03-25 |
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention |
Xuan Ju et.al. |
2503.19907 |
null |
2025-03-25 |
Scaling Down Text Encoders of Text-to-Image Diffusion Models |
Lifu Wang et.al. |
2503.19897 |
link |
2025-03-25 |
Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation |
Tianhao Qi et.al. |
2503.19881 |
null |
2025-03-25 |
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers |
Jiazhi Guan et.al. |
2503.19824 |
null |
2025-03-25 |
SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation |
Jingdan Kang et.al. |
2503.19791 |
link |
2025-03-25 |
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models |
Kartik Thakral et.al. |
2503.19783 |
null |
2025-03-25 |
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models |
Junhyuk So et.al. |
2503.19731 |
null |
2025-03-25 |
VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models |
Suhas G Hegde et.al. |
2503.19530 |
null |
2025-03-25 |
Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage |
Zhengwentai Sun et.al. |
2503.19486 |
null |
2025-03-25 |
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset |
Haiyu Zhang et.al. |
2503.19462 |
null |
2025-03-25 |
Aether: Geometric-Aware Unified World Modeling |
Aether Team et.al. |
2503.18945 |
null |
2025-03-24 |
Video-T1: Test-Time Scaling for Video Generation |
Fangfu Liu et.al. |
2503.18942 |
null |
2025-03-24 |
Training-free Diffusion Acceleration with Bottleneck Sampling |
Ye Tian et.al. |
2503.18940 |
null |
2025-03-24 |
SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection |
Shrikant Malviya et.al. |
2503.18812 |
link |
2025-03-24 |
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation |
Qin Wang et.al. |
2503.18753 |
null |
2025-03-24 |
Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings |
Cong Liu et.al. |
2503.18719 |
null |
2025-03-25 |
AMD-Hummingbird: Towards an Efficient Text-to-Video Model |
Takashi Isobe et.al. |
2503.18559 |
link |
2025-03-24 |
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models |
Bin Li et.al. |
2503.18556 |
null |
2025-03-24 |
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation |
Qiang Qu et.al. |
2503.18552 |
null |
2025-03-24 |
Can Text-to-Video Generation help Video-Language Alignment? |
Luca Zanella et.al. |
2503.18507 |
null |
2025-03-21 |
Position: Interactive Generative Video as Next-Generation Game Engine |
Jiwen Yu et.al. |
2503.17359 |
null |
2025-03-21 |
Leveraging Text-to-Image Generation for Handling Spurious Correlation |
Aryan Yazdan Parast et.al. |
2503.17226 |
null |
2025-03-21 |
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens |
Panpan Wang et.al. |
2503.17155 |
null |
2025-03-21 |
Halton Scheduler For Masked Generative Image Transformer |
Victor Besnier et.al. |
2503.17076 |
link |
2025-03-21 |
Zero-Shot Styled Text Image Generation, but Make It Autoregressive |
Vittorio Pippi et.al. |
2503.17074 |
null |
2025-03-21 |
AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process |
Junjie Hu et.al. |
2503.17029 |
null |
2025-03-21 |
Enabling Versatile Controls for Video Diffusion Models |
Xu Zhang et.al. |
2503.16983 |
link |
2025-03-21 |
Multiple Ultrasound Image Generation based on Tuned Alignment of Amplitude Hologram over Spatially non-Uniform Ultrasound Source |
Keisuke Hasegawa et.al. |
2503.16949 |
null |
2025-03-21 |
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model |
Yingying Fan et.al. |
2503.16942 |
null |
2025-03-21 |
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO |
Lingfan Zhang et.al. |
2503.16921 |
null |
2025-03-20 |
XAttention: Block Sparse Attention with Antidiagonal Scoring |
Ruyi Xu et.al. |
2503.16428 |
link |
2025-03-20 |
Tokenize Image as a Set |
Zigang Geng et.al. |
2503.16425 |
link |
2025-03-20 |
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance |
Quanhao Li et.al. |
2503.16421 |
null |
2025-03-20 |
SynCity: Training-Free Generation of 3D Worlds |
Paul Engstler et.al. |
2503.16420 |
null |
2025-03-20 |
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity |
Liming Jiang et.al. |
2503.16418 |
link |
2025-03-20 |
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness |
SeungJu Cha et.al. |
2503.16406 |
link |
2025-03-20 |
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos |
Haolin Yang et.al. |
2503.16400 |
null |
2025-03-20 |
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images |
Leyang Wang et.al. |
2503.16376 |
null |
2025-03-20 |
Ultra-Resolution Adaptation with Ease |
Ruonan Yu et.al. |
2503.16322 |
link |
2025-03-20 |
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction |
Ziyao Guo et.al. |
2503.16194 |
null |
2025-03-19 |
Di $\mathtt{[M]}$ O: Distilling Masked Diffusion Models into One-step Generator |
Yuanzhi Zhu et.al. |
2503.15457 |
null |
2025-03-19 |
Temporal Regularization Makes Your Video Generator Stronger |
Harold Haodong Chen et.al. |
2503.15417 |
null |
2025-03-19 |
Visual Persona: Foundation Model for Full-Body Human Customization |
Jisu Nam et.al. |
2503.15406 |
null |
2025-03-19 |
TruthLens:A Training-Free Paradigm for DeepFake Detection |
Ritabrata Chakraborty et.al. |
2503.15342 |
null |
2025-03-19 |
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models |
Teng-Fang Hsiao et.al. |
2503.15283 |
null |
2025-03-19 |
LEGION: Learning to Ground and Explain for Synthetic Image Detection |
Hengrui Kang et.al. |
2503.15264 |
null |
2025-03-19 |
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization |
Feifei Li et.al. |
2503.15197 |
null |
2025-03-20 |
VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention |
Mingzhe Zheng et.al. |
2503.15138 |
null |
2025-03-20 |
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis |
Imanol G. Estepa et.al. |
2503.15060 |
null |
2025-03-19 |
FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis |
Yaofei Duan et.al. |
2503.14906 |
null |
2025-03-18 |
MusicInfuser: Making Video Diffusion Listen and Dance |
Susung Hong et.al. |
2503.14505 |
null |
2025-03-18 |
Deeply Supervised Flow-Based Generative Models |
Inkyu Shin et.al. |
2503.14494 |
null |
2025-03-18 |
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers |
Minglei Shi et.al. |
2503.14487 |
null |
2025-03-18 |
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing |
Yulin Pan et.al. |
2503.14482 |
null |
2025-03-18 |
MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation |
Hongyu Zhang et.al. |
2503.14428 |
null |
2025-03-18 |
Impossible Videos |
Zechen Bai et.al. |
2503.14378 |
null |
2025-03-18 |
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment |
Chao Wang et.al. |
2503.14358 |
null |
2025-03-18 |
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models |
Yu Cheng et.al. |
2503.14325 |
link |
2025-03-18 |
Free-Lunch Color-Texture Disentanglement for Stylized Image Generation |
Jiang Qin et.al. |
2503.14275 |
null |
2025-03-18 |
Concat-ID: Towards Universal Identity-Preserving Video Synthesis |
Yong Zhong et.al. |
2503.14151 |
null |
2025-03-17 |
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens |
Lijie Fan et.al. |
2503.13436 |
null |
2025-03-17 |
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing |
Yaowei Li et.al. |
2503.13434 |
null |
2025-03-17 |
MAME: Multidimensional Adaptive Metamer Exploration with Human Perceptual Feedback |
Mina Kamao et.al. |
2503.13212 |
null |
2025-03-17 |
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation |
Yihong Luo et.al. |
2503.13070 |
null |
2025-03-17 |
Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction |
Zheyuan Liu et.al. |
2503.12953 |
null |
2025-03-17 |
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode |
Junjia Huang et.al. |
2503.12838 |
null |
2025-03-17 |
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations |
Quang Trung Truong et.al. |
2503.12828 |
null |
2025-03-17 |
GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching |
Feng Qiao et.al. |
2503.12720 |
link |
2025-03-16 |
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing |
Tsu-Jui Fu et.al. |
2503.12652 |
null |
2025-03-16 |
Personalize Anything for Free with Diffusion Transformer |
Haoran Feng et.al. |
2503.12590 |
null |
2025-03-14 |
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video |
Jianhong Bai et.al. |
2503.11647 |
null |
2025-03-14 |
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models |
Ziqin Zhou et.al. |
2503.11513 |
null |
2025-03-14 |
T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation |
Seyed Mohammad Hadi Hosseini et.al. |
2503.11481 |
null |
2025-03-14 |
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation |
Hongxiang Zhao et.al. |
2503.11423 |
null |
2025-03-14 |
Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking |
Ziyi Wang et.al. |
2503.11324 |
null |
2025-03-14 |
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model |
Haoyang Huang et.al. |
2503.11251 |
link |
2025-03-14 |
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards |
Zijing Hu et.al. |
2503.11240 |
link |
2025-03-14 |
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption |
Du Chen et.al. |
2503.11221 |
null |
2025-03-14 |
Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation |
Fengchen He et.al. |
2503.11213 |
link |
2025-03-14 |
Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models |
Shree Singhi et.al. |
2503.11195 |
null |
2025-03-13 |
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing |
Rongyao Fang et.al. |
2503.10639 |
link |
2025-03-13 |
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation |
Chen Chen et.al. |
2503.10618 |
null |
2025-03-13 |
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models |
Hao He et.al. |
2503.10592 |
null |
2025-03-13 |
Long Context Tuning for Video Generation |
Yuwei Guo et.al. |
2503.10589 |
null |
2025-03-13 |
Autoregressive Image Generation with Randomized Parallel Decoding |
Haopeng Li et.al. |
2503.10568 |
link |
2025-03-13 |
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models |
Yijing Lin et.al. |
2503.10406 |
null |
2025-03-13 |
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance |
Yufan Deng et.al. |
2503.10391 |
null |
2025-03-13 |
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation |
Zirun Guo et.al. |
2503.10358 |
null |
2025-03-13 |
Do I look like a cat.n.01 to you? A Taxonomy Image Generation Benchmark |
Viktor Moskvoretskii et.al. |
2503.10357 |
null |
2025-03-13 |
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment |
Hao Zhou et.al. |
2503.10287 |
null |
2025-03-12 |
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop |
Chenyu Li et.al. |
2503.09595 |
link |
2025-03-12 |
FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model |
Jiahao Xia et.al. |
2503.09560 |
null |
2025-03-12 |
PromptMap: An Alternative Interaction Style for AI-Based Image Generation |
Krzysztof Adamkiewicz et.al. |
2503.09436 |
link |
2025-03-12 |
LHC Triggers using FPGA Image Recognition |
James Brooke et.al. |
2503.09428 |
null |
2025-03-12 |
Unified Dense Prediction of Video Diffusion |
Lehan Yang et.al. |
2503.09344 |
null |
2025-03-12 |
Revealing the Implicit Noise-based Imprint of Generative Models |
Xinghan Li et.al. |
2503.09314 |
null |
2025-03-12 |
Revealing Unintentional Information Leakage in Low-Dimensional Facial Portrait Representations |
Kathleen Anderson et.al. |
2503.09306 |
link |
2025-03-12 |
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer |
Haoxuan Wang et.al. |
2503.09277 |
null |
2025-03-12 |
NAMI: Efficient Image Generation via Progressive Rectified Flow Transformers |
Yuhang Ma et.al. |
2503.09242 |
null |
2025-03-12 |
Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets |
Hannah Kniesel et.al. |
2503.09221 |
null |
2025-03-11 |
GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing |
Yuanhao Wang et.al. |
2503.08678 |
null |
2025-03-11 |
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder |
Yitian Zhang et.al. |
2503.08665 |
null |
2025-03-11 |
Generating Robot Constitutions & Benchmarks for Semantic Safety |
Pierre Sermanet et.al. |
2503.08663 |
null |
2025-03-11 |
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization |
Xianfeng Wu et.al. |
2503.08619 |
link |
2025-03-11 |
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling |
Subin Kim et.al. |
2503.08605 |
null |
2025-03-11 |
Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum |
Shengpeng Xiao et.al. |
2503.08484 |
null |
2025-03-12 |
Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens |
Qingsong Xie et.al. |
2503.08377 |
null |
2025-03-11 |
Robust Latent Matters: Boosting Image Generation with Sampling Error |
Kai Qiu et.al. |
2503.08354 |
link |
2025-03-12 |
$^R$ FLAV: Rolling Flow matching for infinite Audio Video generation |
Alex Ergasti et.al. |
2503.08307 |
link |
2025-03-11 |
OminiControl2: Efficient Conditioning for Diffusion Transformers |
Zhenxiong Tan et.al. |
2503.08280 |
link |
2025-03-10 |
V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation |
Guiwei Zhang et.al. |
2503.07493 |
link |
2025-03-10 |
GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models |
Ryugo Morita et.al. |
2503.07463 |
null |
2025-03-10 |
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion |
Mingzhen Sun et.al. |
2503.07418 |
null |
2025-03-10 |
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models |
Ruidong Chen et.al. |
2503.07389 |
link |
2025-03-10 |
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment |
Xing Xie et.al. |
2503.07334 |
link |
2025-03-10 |
Automated Movie Generation via Multi-Agent CoT Planning |
Weijia Wu et.al. |
2503.07314 |
link |
2025-03-10 |
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation |
Yuwei Niu et.al. |
2503.07265 |
link |
2025-03-10 |
Effective and Efficient Masked Image Generation Models |
Zebin You et.al. |
2503.07197 |
link |
2025-03-10 |
NFIG: Autoregressive Image Generation with Next-Frequency Prediction |
Zhihao Huang et.al. |
2503.07076 |
null |
2025-03-10 |
TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation |
Victor Shea-Jay Huang et.al. |
2503.07050 |
null |
2025-03-07 |
Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models |
Zheng Li et.al. |
2503.05595 |
link |
2025-03-07 |
Frequency Autoregressive Image Generation with Continuous Tokens |
Hu Yu et.al. |
2503.05305 |
null |
2025-03-07 |
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio |
Xuenan Xu et.al. |
2503.05242 |
link |
2025-03-07 |
Unified Reward Model for Multimodal Understanding and Generation |
Yibin Wang et.al. |
2503.05236 |
null |
2025-03-07 |
RecipeGen: A Benchmark for Real-World Recipe Image Generation |
Ruoxuan Zhang et.al. |
2503.05228 |
null |
2025-03-07 |
Development and Enhancement of Text-to-Image Diffusion Models |
Rajdeep Roshan Sahu et.al. |
2503.05149 |
null |
2025-03-06 |
Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation |
Alexey Buzovkin et.al. |
2503.04871 |
link |
2025-03-06 |
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video |
Yue Gao et.al. |
2503.04720 |
null |
2025-03-06 |
What Are You Doing? A Closer Look at Controllable Human Video Generation |
Emanuele Bugliarello et.al. |
2503.04666 |
null |
2025-03-08 |
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation |
Aoxiong Yin et.al. |
2503.04606 |
link |
2025-03-06 |
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting |
Yecong Wan et.al. |
2503.04314 |
null |
2025-03-06 |
Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models |
Rui Jiang et.al. |
2503.04215 |
null |
2025-03-06 |
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning |
Zhong Ji et.al. |
2503.04050 |
null |
2025-03-06 |
DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation |
Amin Karimi et.al. |
2503.04006 |
null |
2025-03-05 |
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control |
Xuanchi Ren et.al. |
2503.03751 |
link |
2025-03-05 |
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach |
Nianzu Yang et.al. |
2503.03708 |
link |
2025-03-05 |
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance |
Zhao Yang et.al. |
2503.03689 |
link |
2025-03-05 |
A Generative Approach to High Fidelity 3D Reconstruction from Text Data |
Venkat Kumar R et.al. |
2503.03664 |
null |
2025-03-05 |
High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights |
Yuna Kato et.al. |
2503.03558 |
link |
2025-03-05 |
Video Super-Resolution: All You Need is a Video Diffusion Model |
Zhihao Zhan et.al. |
2503.03355 |
null |
2025-03-05 |
GenColor: Generative Color-Concept Association in Visual Design |
Yihan Hou et.al. |
2503.03236 |
null |
2025-03-05 |
An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models |
Binxu Wang et.al. |
2503.03206 |
null |
2025-03-05 |
Find Matching Faces Based On Face Parameters |
Setu A. Bhatt et.al. |
2503.03204 |
null |
2025-03-05 |
From Architectural Sketch to Conceptual Representation: Using Structure-Aware Diffusion Model to Generate Renderings of School Buildings |
Zhengyang Wang et.al. |
2503.03090 |
null |
2025-03-04 |
ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models |
Qinyu Zhao et.al. |
2503.02883 |
link |
2025-03-04 |
Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts |
Marta Skreta et.al. |
2503.02819 |
link |
2025-03-04 |
Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution |
Ru Ito et.al. |
2503.02767 |
null |
2025-03-04 |
Generative Modeling of Microweather Wind Velocities for Urban Air Mobility |
Tristan A. Shah et.al. |
2503.02690 |
link |
2025-03-04 |
SPG: Improving Motion Diffusion by Smooth Perturbation Guidance |
Boseong Jeon et.al. |
2503.02577 |
null |
2025-03-04 |
PVTree: Realistic and Controllable Palm Vein Generation for Recognition Tasks |
Sheng Shang et.al. |
2503.02547 |
null |
2025-03-04 |
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification |
Zhen Yang et.al. |
2503.02537 |
null |
2025-03-04 |
Q&C: When Quantization Meets Cache in Efficient Image Generation |
Xin Ding et.al. |
2503.02508 |
null |
2025-03-04 |
Teaching Metric Distance to Autoregressive Multimodal Foundational Models |
Jiwan Chung et.al. |
2503.02379 |
null |
2025-03-04 |
GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning |
Zhun Mou et.al. |
2503.02341 |
null |
2025-02-28 |
How far can we go with ImageNet for Text-to-Image generation? |
L. Degeorge et.al. |
2502.21318 |
null |
2025-02-28 |
Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos |
Zhiyu Tan et.al. |
2502.21314 |
null |
2025-03-03 |
MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing |
Xueyun Tian et.al. |
2502.21291 |
link |
2025-02-28 |
A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images |
Zineb Sordo et.al. |
2502.21151 |
null |
2025-02-28 |
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation |
Yifei Xia et.al. |
2502.21079 |
null |
2025-02-28 |
Synthesizing Individualized Aging Brains in Health and Disease with Generative Models and Parallel Transport |
Jingru Fu et.al. |
2502.21049 |
link |
2025-02-28 |
DiffBrush:Just Painting the Art by Your Hands |
Jiaming Chu et.al. |
2502.20904 |
null |
2025-02-28 |
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models |
Xiao Wang et.al. |
2502.20811 |
null |
2025-02-28 |
WorldModelBench: Judging Video Generation Models As World Models |
Dacheng Li et.al. |
2502.20694 |
null |
2025-02-28 |
Diffusion Restoration Adapter for Real-World Image Restoration |
Hanbang Liang et.al. |
2502.20679 |
null |
2025-02-27 |
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction |
Siyu Jiao et.al. |
2502.20313 |
link |
2025-02-27 |
Mobius: Text to Seamless Looping Video Generation via Latent Shift |
Xiuli Bi et.al. |
2502.20307 |
link |
2025-02-27 |
Attention Distillation: A Unified Approach to Visual Characteristics Transfer |
Yang Zhou et.al. |
2502.20235 |
link |
2025-02-27 |
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think |
Liang Chen et.al. |
2502.20172 |
link |
2025-02-27 |
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute |
Sotiris Anagnostidis et.al. |
2502.20126 |
null |
2025-02-27 |
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration |
Xuzheng Yang et.al. |
2502.20104 |
null |
2025-02-27 |
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation |
Yuhao Li et.al. |
2502.19868 |
link |
2025-02-27 |
Analyzing CLIP’s Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study |
Reza Abbasi et.al. |
2502.19828 |
null |
2025-02-27 |
MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery |
Lianping Yang et.al. |
2502.19797 |
null |
2025-02-27 |
The erasure of intensive livestock farming in text-to-image generative AI |
Kehan Sheng et.al. |
2502.19771 |
link |
2025-02-26 |
Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making |
Soobin Park et.al. |
2502.18853 |
null |
2025-02-26 |
Optimal Stochastic Trace Estimation in Generative Modeling |
Xinyang Liu et.al. |
2502.18808 |
null |
2025-02-26 |
AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools |
Nathalie Riche et.al. |
2502.18736 |
null |
2025-02-25 |
Investigating Youth AI Auditing |
Jaemarie Solyst et.al. |
2502.18576 |
null |
2025-02-25 |
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation |
Yifan Pu et.al. |
2502.18364 |
null |
2025-02-25 |
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation |
Pengzhi Li et.al. |
2502.18302 |
null |
2025-02-25 |
Training Consistency Models with Variational Noise Coupling |
Gianluigi Silvestri et.al. |
2502.18197 |
link |
2025-02-25 |
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference |
Jintao Zhang et.al. |
2502.18137 |
link |
2025-02-26 |
Bayesian Optimization for Controlled Image Editing via LLMs |
Chengkun Cai et.al. |
2502.18116 |
null |
2025-02-25 |
Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models |
Jia Yu et.al. |
2502.17951 |
link |
2025-02-25 |
ASurvey: Spatiotemporal Consistency in Video Generation |
Zhiyu Yin et.al. |
2502.17863 |
null |
2025-02-25 |
FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks |
Tanawan Premsri et.al. |
2502.17775 |
link |
2025-02-25 |
Fractal Generative Models |
Tianhong Li et.al. |
2502.17437 |
link |
2025-02-24 |
X-Dancer: Expressive Music to Human Dance Video Generation |
Zeyuan Chen et.al. |
2502.17414 |
null |
2025-02-24 |
RELICT: A Replica Detection Framework for Medical Image Generation |
Orhun Utku Aydin et.al. |
2502.17360 |
link |
2025-02-24 |
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing |
Xiangpeng Yang et.al. |
2502.17258 |
null |
2025-02-24 |
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks |
Canyu Zhao et.al. |
2502.17157 |
link |
2025-02-24 |
Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions |
Zhong Li et.al. |
2502.17119 |
link |
2025-02-24 |
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence |
Wenzhe Yin et.al. |
2502.17028 |
null |
2025-02-24 |
Autoregressive Image Generation Guided by Chains of Thought |
Miaomiao Cai et.al. |
2502.16965 |
null |
2025-02-24 |
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment |
Suchae Jeong et.al. |
2502.16902 |
null |
2025-02-24 |
A Survey of fMRI to Image Reconstruction |
Weiyu Guo et.al. |
2502.16861 |
null |
2025-02-21 |
One-step Diffusion Models with $f$ -Divergence Distribution Matching |
Yilun Xu et.al. |
2502.15681 |
null |
2025-02-21 |
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling |
Florent Bartoccioni et.al. |
2502.15672 |
link |
2025-02-21 |
Soybean pod and seed counting in both outdoor fields and indoor laboratories using unions of deep neural networks |
Tianyou Jiang et.al. |
2502.15286 |
null |
2025-02-21 |
Unsettling the Hegemony of Intention: Agonistic Image Generation |
Andre Ye et.al. |
2502.15242 |
null |
2025-02-21 |
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation |
Young Beom Woo et.al. |
2502.15203 |
null |
2025-02-21 |
Methods and Trends in Detecting Generated Images: A Comprehensive Review |
Arpan Mahara et.al. |
2502.15176 |
null |
2025-02-20 |
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers |
Sanghyun Yi et.al. |
2502.15077 |
null |
2025-02-20 |
Generative Modeling of Individual Behavior at Scale |
Nabil Omi et.al. |
2502.14998 |
null |
2025-02-20 |
LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection |
Qingyuan Liu et.al. |
2502.14994 |
null |
2025-02-20 |
Improving the Diffusability of Autoencoders |
Ivan Skorokhodov et.al. |
2502.14831 |
null |
2025-02-20 |
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models |
Hongji Yang et.al. |
2502.14779 |
null |
2025-02-20 |
AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers |
Wen-Fan Wang et.al. |
2502.14747 |
null |
2025-02-20 |
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers |
Ke Cao et.al. |
2502.14377 |
null |
2025-02-20 |
Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation |
Jiayu Yang et.al. |
2502.14247 |
link |
2025-02-20 |
Designing Parameter and Compute Efficient Diffusion Transformers using Distillation |
Vignesh Sundaresha et.al. |
2502.14226 |
null |
2025-02-19 |
d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining |
Prasun Roy et.al. |
2502.14007 |
link |
2025-02-19 |
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length |
Roman Bachmann et.al. |
2502.13967 |
null |
2025-02-19 |
IP-Composer: Semantic Composition of Visual Concepts |
Sara Dorfman et.al. |
2502.13951 |
null |
2025-02-19 |
MagicGeo: Training-Free Text-Guided Geometric Diagram Generation |
Junxiao Wang et.al. |
2502.13855 |
null |
2025-02-19 |
Flow-based generative models as iterative algorithms in probability space |
Yao Xie et.al. |
2502.13394 |
null |
2025-02-18 |
Breaking the bonds of generative artificial intelligence by minimizing the maximum entropy |
Mattia Miotto et.al. |
2502.13287 |
null |
2025-02-18 |
Personalized Image Generation with Deep Generative Models: A Decade Survey |
Yuxiang Wei et.al. |
2502.13081 |
link |
2025-02-19 |
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation |
Junchen Fu et.al. |
2502.12945 |
null |
2025-02-18 |
Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options |
Lakshmi Nair et.al. |
2502.12929 |
link |
2025-02-18 |
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation |
Xinlong Chen et.al. |
2502.12782 |
link |
2025-02-18 |
3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces |
Fabian Bongratz et.al. |
2502.12742 |
null |
2025-02-18 |
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation |
Sihyun Yu et.al. |
2502.12632 |
null |
2025-02-18 |
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation |
Minghao Fu et.al. |
2502.12579 |
link |
2025-02-18 |
DeltaDiff: A Residual-Guided Diffusion Model for Enhanced Image Super-Resolution |
Chao Yang et.al. |
2502.12567 |
null |
2025-02-17 |
LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities |
Florian Sestak et.al. |
2502.12128 |
link |
2025-02-17 |
A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond |
Shreya Shukla et.al. |
2502.12048 |
null |
2025-02-17 |
Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images |
Negar Kamali et.al. |
2502.11989 |
link |
2025-02-17 |
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs |
Yi Fang et.al. |
2502.11925 |
null |
2025-02-17 |
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation |
Zhihang Yuan et.al. |
2502.11897 |
link |
2025-02-17 |
Object-Centric Image to Video Generation with Language Guidance |
Angel Villar-Corrales et.al. |
2502.11655 |
null |
2025-02-17 |
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation |
Taeyoung Yun et.al. |
2502.11477 |
link |
2025-02-16 |
MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation |
Michael Fuest et.al. |
2502.11234 |
null |
2025-02-16 |
Phantom: Subject-consistent video generation via cross-modal alignment |
Lijie Liu et.al. |
2502.11079 |
null |
2025-02-15 |
Hybrid Deepfake Image Detection: A Comprehensive Dataset-Driven Approach Integrating Convolutional and Attention Mechanisms with Frequency Domain Features |
Kafi Anan et.al. |
2502.10682 |
null |
2025-02-14 |
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model |
Guoqing Ma et.al. |
2502.10248 |
link |
2025-02-14 |
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control |
Teng Li et.al. |
2502.10059 |
null |
2025-02-14 |
ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation |
Yuxin He et.al. |
2502.10028 |
null |
2025-02-13 |
CellFlow: Simulating Cellular Morphology Changes via Flow Matching |
Yuhui Zhang et.al. |
2502.09775 |
null |
2025-02-13 |
Designing a Conditional Prior Distribution for Flow-Based Generative Models |
Noam Issachar et.al. |
2502.09611 |
null |
2025-02-13 |
Redistribute Ensemble Training for Mitigating Memorization in Diffusion Models |
Xiaoliu Guan et.al. |
2502.09434 |
link |
2025-02-13 |
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation |
Rotem Shalev-Arkushin et.al. |
2502.09411 |
null |
2025-02-13 |
When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models |
Samuel Joseph Amouyal et.al. |
2502.09307 |
null |
2025-02-14 |
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation |
Hongyin Zhang et.al. |
2502.09268 |
null |
2025-02-13 |
Sequential Covariance Fitting for InSAR Phase Linking |
Dana El Hajjar et.al. |
2502.09248 |
null |
2025-02-13 |
Dynamic watermarks in images generated by diffusion models |
Yunzhuo Chen et.al. |
2502.08927 |
null |
2025-02-13 |
Detecting Malicious Concepts Without Image Generation in AIGC |
Kun Xu et.al. |
2502.08921 |
null |
2025-02-12 |
HistoSmith: Single-Stage Histology Image-Label Generation via Conditional Latent Diffusion for Enhanced Cell Segmentation and Classification |
Valentina Vadori et.al. |
2502.08754 |
link |
2025-02-12 |
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation |
Qinghe Wang et.al. |
2502.08639 |
null |
2025-02-12 |
Enhancing Diffusion Models Efficiency by Disentangling Total-Variance and Signal-to-Noise Ratio |
Khaled Kahouli et.al. |
2502.08598 |
link |
2025-02-12 |
Ultrasound Image Generation using Latent Diffusion Models |
Benoit Freiche et.al. |
2502.08580 |
null |
2025-02-12 |
BCDDM: Branch-Corrected Denoising Diffusion Model for Black Hole Image Generation |
Ao liu et.al. |
2502.08528 |
null |
2025-02-12 |
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis |
Wonjoon Jin et.al. |
2502.08244 |
null |
2025-02-12 |
Learning Human Skill Generators at Key-Step Levels |
Yilu Wu et.al. |
2502.08234 |
null |
2025-02-12 |
AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance |
Zhao Wang et.al. |
2502.08189 |
null |
2025-02-12 |
PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation |
Ziyan Wang et.al. |
2502.08106 |
null |
2025-02-12 |
ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation |
Qianrui Teng et.al. |
2502.08097 |
null |
2025-02-11 |
Training-Free Safe Denoisers for Safe Use of Diffusion Models |
Mingyu Kim et.al. |
2502.08011 |
null |
2025-02-11 |
Direct Ascent Synthesis: Revealing Hidden Generative Capabilities in Discriminative Models |
Stanislav Fort et.al. |
2502.07753 |
null |
2025-02-11 |
CausalGeD: Blending Causality and Diffusion for Spatial Gene Expression Generation |
Rabeya Tus Sadia et.al. |
2502.07751 |
null |
2025-02-11 |
Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling |
Shuhuai Ren et.al. |
2502.07737 |
null |
2025-02-11 |
Magic 1-For-1: Generating One Minute Video Clips within One Minute |
Hongwei Yi et.al. |
2502.07701 |
link |
2025-02-11 |
SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches |
Haichuan Lin et.al. |
2502.07556 |
link |
2025-02-11 |
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation |
Sixiao Zheng et.al. |
2502.07531 |
null |
2025-02-11 |
Enhance-A-Video: Better Generated Video for Free |
Yang Luo et.al. |
2502.07508 |
link |
2025-02-11 |
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation |
Viacheslav Vasilev et.al. |
2502.07455 |
link |
2025-02-11 |
Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers |
Zhaodong Bing et.al. |
2502.07436 |
null |
2025-02-11 |
Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos |
Haowen Gao et.al. |
2502.07327 |
null |
2025-02-10 |
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT |
Dongyang Liu et.al. |
2502.06782 |
null |
2025-02-10 |
History-Guided Video Diffusion |
Kiwhan Song et.al. |
2502.06764 |
null |
2025-02-10 |
Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists |
Bojia Zi et.al. |
2502.06734 |
null |
2025-02-10 |
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models |
Yangguang Li et.al. |
2502.06608 |
link |
2025-02-10 |
A Large-scale AI-generated Image Inpainting Benchmark |
Paschalis Giakoumoglou et.al. |
2502.06593 |
null |
2025-02-10 |
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers |
D. She et.al. |
2502.06527 |
null |
2025-02-10 |
Universal Approximation of Visual Autoregressive Transformers |
Yifang Chen et.al. |
2502.06167 |
null |
2025-02-10 |
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile |
Hangliang Ding et.al. |
2502.06155 |
null |
2025-02-10 |
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models |
Ce Zhang et.al. |
2502.06130 |
link |
2025-02-09 |
Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization |
Jiajun Fan et.al. |
2502.06061 |
null |
2025-02-07 |
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation |
Shilong Zhang et.al. |
2502.05179 |
link |
2025-02-07 |
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation |
Yue Zhao et.al. |
2502.05178 |
null |
2025-02-07 |
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment |
Minh-Quan Le et.al. |
2502.05153 |
null |
2025-02-07 |
C2GM: Cascading Conditional Generation of Multi-scale Maps from Remote Sensing Images Constrained by Geographic Features |
Chenxing Sun et.al. |
2502.04991 |
null |
2025-02-07 |
Cached Multi-Lora Composition for Multi-Concept Image Generation |
Xiandong Zou et.al. |
2502.04923 |
link |
2025-02-07 |
Goku: Flow Based Video Generative Foundation Models |
Shoufa Chen et.al. |
2502.04896 |
null |
2025-02-07 |
HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation |
Qijun Gan et.al. |
2502.04847 |
null |
2025-02-07 |
G2PDiffusion: Genotype-to-Phenotype Prediction with Diffusion Models |
Mengdi Liu et.al. |
2502.04684 |
null |
2025-02-06 |
Fast Video Generation with Sliding Tile Attention |
Peiyuan Zhang et.al. |
2502.04507 |
null |
2025-02-06 |
Augmented Conditioning Is Enough For Effective Training Image Generation |
Jiahui Chen et.al. |
2502.04475 |
null |
2025-02-06 |
HOG-Diff: Higher-Order Guided Diffusion for Graph Generation |
Yiming Huang et.al. |
2502.04308 |
link |
2025-02-06 |
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation |
Jinbo Xing et.al. |
2502.04299 |
null |
2025-02-06 |
Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression |
Lirui Wang et.al. |
2502.04296 |
null |
2025-02-06 |
Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention |
Ayush K. Varshney et.al. |
2502.04260 |
null |
2025-02-06 |
Multi-fidelity emulator for large-scale 21 cm lightcone images: a few-shot transfer learning approach with generative adversarial network |
Kangning Diao et.al. |
2502.04246 |
null |
2025-02-06 |
Generative Adversarial Networks Bridging Art and Machine Intelligence |
Junhao Song et.al. |
2502.04116 |
null |
2025-02-06 |
Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency |
Shangkun Sun et.al. |
2502.04076 |
link |
2025-02-06 |
UniForm: A Unified Diffusion Transformer for Audio-Video Generation |
Lei Zhao et.al. |
2502.03897 |
null |
2025-02-06 |
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing |
Jinya Sakurai et.al. |
2502.03826 |
null |
2025-02-06 |
DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models |
Lingshun Kong et.al. |
2502.03810 |
null |
2025-02-05 |
On Fairness of Unified Multimodal Large Language Model for Image Generation |
Ming Liu et.al. |
2502.03429 |
null |
2025-02-05 |
TruePose: Human-Parsing-guided Attention Diffusion for Full-ID Preserving Pose Transfer |
Zhihong Xu et.al. |
2502.03426 |
null |
2025-02-05 |
Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation |
Alexey A. Novikov et.al. |
2502.03420 |
null |
2025-02-05 |
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent |
Xinyao Liao et.al. |
2502.03207 |
null |
2025-02-05 |
Poisson Flow Joint Model for Multiphase contrast-enhanced CT |
Rongjun Ge et.al. |
2502.03079 |
null |
2025-02-05 |
A Survey of Sample-Efficient Deep Learning for Change Detection in Remote Sensing: Tasks, Strategies, and Challenges |
Lei Ding et.al. |
2502.02835 |
null |
2025-02-04 |
When are Diffusion Priors Helpful in Sparse Reconstruction? A Study with Sparse-view CT |
Matt Y. Cheung et.al. |
2502.02771 |
null |
2025-02-04 |
Controllable Video Generation with Provable Disentanglement |
Yifan Shen et.al. |
2502.02690 |
null |
2025-02-04 |
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models |
Hila Chefer et.al. |
2502.02492 |
null |
2025-02-04 |
On the Guidance of Flow Matching |
Ruiqi Feng et.al. |
2502.02150 |
link |
2025-02-04 |
IPO: Iterative Preference Optimization for Text-to-Video Generation |
Xiaomeng Yang et.al. |
2502.02088 |
null |
2025-02-03 |
VILP: Imitation Learning with Latent Video Planning |
Zhengtong Xu et.al. |
2502.01784 |
link |
2025-02-03 |
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity |
Haocheng Xi et.al. |
2502.01776 |
null |
2025-02-03 |
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation |
Haibo Tong et.al. |
2502.01719 |
null |
2025-02-03 |
MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation |
Yiren Song et.al. |
2502.01572 |
null |
2025-02-03 |
Improved Training Technique for Latent Consistency Models |
Quan Dao et.al. |
2502.01441 |
link |
2025-02-03 |
Assessing the use of Diffusion models for motion artifact correction in brain MRI |
Paolo Angella et.al. |
2502.01418 |
null |
2025-02-04 |
Compressed Image Generation with Denoising Diffusion Codebook Models |
Guy Ohayon et.al. |
2502.01189 |
null |
2025-01-31 |
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search |
Yuta Oshima et.al. |
2501.19252 |
null |
2025-01-31 |
Ambient Denoising Diffusion Generative Adversarial Networks for Establishing Stochastic Object Models from Noisy Image Data |
Xichen Xu et.al. |
2501.19094 |
null |
2025-01-31 |
Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations |
Dahye Kim et.al. |
2501.19066 |
link |
2025-01-31 |
BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics |
Yuxuan Liu et.al. |
2501.18972 |
null |
2025-01-31 |
Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models |
Jaesin Ahn et.al. |
2501.18877 |
link |
2025-01-31 |
REG: Rectified Gradient Guidance for Conditional Diffusion Models |
Zhengqi Gao et.al. |
2501.18865 |
null |
2025-01-30 |
Every Image Listens, Every Image Dances: Music-Driven Image Animation |
Zhikang Dong et.al. |
2501.18801 |
null |
2025-01-30 |
High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2 |
Nandakishor M et.al. |
2501.18670 |
null |
2025-01-30 |
Diffusion Autoencoders are Scalable Image Tokenizers |
Yinbo Chen et.al. |
2501.18593 |
null |
2025-01-30 |
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer |
Enze Xie et.al. |
2501.18427 |
link |
2025-01-30 |
Simulation of microstructures and machine learning |
Katja Schladitz et.al. |
2501.18313 |
null |
2025-01-30 |
LLMs can see and hear without any training |
Kumar Ashutosh et.al. |
2501.18096 |
link |
2025-01-29 |
Generative AI for Vision: A Comprehensive Study of Frameworks and Applications |
Fouad Bousetouane et.al. |
2501.18033 |
null |
2025-01-29 |
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling |
Xiaokang Chen et.al. |
2501.17811 |
link |
2025-01-29 |
A Framework for Generating Realistic Synthetic Tabular Data in a Randomized Controlled Trial Setting |
Niki Z. Petrakos et.al. |
2501.17719 |
null |
2025-01-29 |
Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment |
Zixue Zeng et.al. |
2501.17690 |
link |
2025-01-28 |
Text-to-Image Generation for Vocabulary Learning Using the Keyword Method |
Nuwan T. Attygalle et.al. |
2501.17099 |
null |
2025-01-28 |
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation |
Chenguo Lin et.al. |
2501.16764 |
null |
2025-01-29 |
Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion |
Shengyuan Liu et.al. |
2501.16679 |
link |
2025-01-28 |
Variational Schrödinger Momentum Diffusion |
Kevin Rojas et.al. |
2501.16675 |
null |
2025-01-28 |
CascadeV: An Implementation of Wurstchen Architecture for Video Generation |
Wenfeng Lin et.al. |
2501.16612 |
link |
2025-01-27 |
LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation |
Farzad Farhadzadeh et.al. |
2501.16559 |
null |
2025-01-27 |
RelightVid: Temporal-Consistent Diffusion Model for Video Relighting |
Ye Fang et.al. |
2501.16330 |
null |
2025-01-28 |
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation |
Adil Kaan Akan et.al. |
2501.15878 |
null |
2025-01-27 |
Autonomous Horizon-based Asteroid Navigation With Observability-constrained Maneuvers |
Aditya Arjun Anibha et.al. |
2501.15806 |
null |
2025-01-27 |
Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? |
Yunbo Lyu et.al. |
2501.15775 |
null |
2025-01-26 |
Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting |
Yuxin Zhang et.al. |
2501.15641 |
link |
2025-01-26 |
Comparative clinical evaluation of “memory-efficient” synthetic 3d generative adversarial networks (gan) head-to-head to state of art: results on computed tomography of the chest |
Mahshid shiri et.al. |
2501.15572 |
null |
2025-01-26 |
“See What I Imagine, Imagine What I See”: Human-AI Co-Creation System for 360 $^\circ$ Panoramic Video Generation in VR |
Yunge Wen et.al. |
2501.15456 |
null |
2025-01-26 |
SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity |
Zichen Fan et.al. |
2501.15448 |
null |
2025-01-26 |
StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces |
Kyeongmin Yeo et.al. |
2501.15445 |
null |
2025-01-25 |
Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation |
Yangfan He et.al. |
2501.15167 |
null |
2025-01-24 |
Towards Scalable Topological Regularizers |
Hiu-Tung Wong et.al. |
2501.14641 |
null |
2025-01-24 |
Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.* |
Ludovica Schaerf et.al. |
2501.14524 |
null |
2025-01-24 |
PAID: A Framework of Product-Centric Advertising Image Design |
Hongyu Chen et.al. |
2501.14316 |
null |
2025-01-24 |
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking |
Runyi Hu et.al. |
2501.14195 |
link |
2025-01-23 |
Can We Generate Images with CoT? Let’s Verify and Reinforce Image Generation Step by Step |
Ziyu Guo et.al. |
2501.13926 |
link |
2025-01-23 |
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models |
Jiayi Lei et.al. |
2501.13920 |
null |
2025-01-23 |
Improving Video Generation with Human Feedback |
Jie Liu et.al. |
2501.13918 |
null |
2025-01-23 |
Generating Realistic Forehead-Creases for User Verification via Conditioned Piecewise Polynomial Curves |
Abhishek Tandon et.al. |
2501.13889 |
link |
2025-01-23 |
A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation |
Dario Serez et.al. |
2501.13718 |
null |
2025-01-24 |
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt |
Tao Liu et.al. |
2501.13554 |
link |
2025-01-23 |
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion |
Jiangchuan Wei et.al. |
2501.13452 |
null |
2025-01-23 |
MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize |
Haohang Xu et.al. |
2501.13349 |
null |
2025-01-23 |
Accelerate High-Quality Diffusion Models with Inner Loop Feedback |
Matthew Gwilliam et.al. |
2501.13107 |
null |
2025-01-22 |
Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation |
Akshay Krishnan et.al. |
2501.13087 |
null |
2025-01-22 |
LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation |
Jiahao Wang et.al. |
2501.12976 |
null |
2025-01-22 |
PreciseCam: Precise Camera Control for Text-to-Image Generation |
Edurne Bernal-Berdun et.al. |
2501.12910 |
null |
2025-01-22 |
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation |
Lijun Li et.al. |
2501.12612 |
link |
2025-01-22 |
GPS as a Control Signal for Image Generation |
Chao Feng et.al. |
2501.12390 |
null |
2025-01-21 |
Taming Teacher Forcing for Masked Autoregressive Video Generation |
Deyu Zhou et.al. |
2501.12389 |
null |
2025-01-21 |
Parallel Sequence Modeling via Generalized Spatial Propagation Network |
Hongjun Wang et.al. |
2501.12381 |
null |
2025-01-22 |
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos |
Sili Chen et.al. |
2501.12375 |
null |
2025-01-21 |
Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists |
Thomas F. Eisenmann et.al. |
2501.12374 |
link |
2025-01-21 |
ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions |
Shiyue Zhang et.al. |
2501.12173 |
link |
2025-01-20 |
Are generative models fair? A study of racial bias in dermatological image generation |
Miguel López-Pérez et.al. |
2501.11752 |
null |
2025-01-20 |
GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video |
Zhenliang Ni et.al. |
2501.11340 |
null |
2025-01-20 |
CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation |
Zheng Chong et.al. |
2501.11325 |
link |
2025-01-20 |
Nested Annealed Training Scheme for Generative Adversarial Networks |
Chang Wan et.al. |
2501.11318 |
null |
2025-01-17 |
DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency |
Xiaohui Li et.al. |
2501.10110 |
null |
2025-01-17 |
DiffuEraser: A Diffusion Model for Video Inpainting |
Xiaowen Li et.al. |
2501.10018 |
link |
2025-01-17 |
RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation |
Yuefan Cao et.al. |
2501.09982 |
null |
2025-01-17 |
Physics-informed DeepCT: Sinogram Wavelet Decomposition Meets Masked Diffusion |
Zekun Zhou et.al. |
2501.09935 |
link |
2025-01-17 |
IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment |
Shangkun Sun et.al. |
2501.09927 |
null |
2025-01-16 |
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery |
Shristi Das Biswas et.al. |
2501.09826 |
link |
2025-01-16 |
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos |
Zhongwei Ren et.al. |
2501.09781 |
null |
2025-01-16 |
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation |
Philippe Hansen-Estruch et.al. |
2501.09755 |
null |
2025-01-16 |
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps |
Nanye Ma et.al. |
2501.09732 |
null |
2025-01-16 |
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation |
Junjie He et.al. |
2501.09503 |
link |
2025-01-16 |
Dynamic Neural Style Transfer for Artistic Image Generation using VGG19 |
Kapil Kashyap et.al. |
2501.09420 |
null |
2025-01-16 |
SVIA: A Street View Image Anonymization Framework for Self-Driving Applications |
Dongyu Liu et.al. |
2501.09393 |
link |
2025-01-16 |
Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse |
Guangyuan Liu et.al. |
2501.09391 |
null |
2025-01-15 |
Grounding Text-To-Image Diffusion Models For Controlled High-Quality Image Generation |
Ahmad Süleyman et.al. |
2501.09194 |
null |
2025-01-15 |
Generative diffusion model with inverse renormalization group flows |
Kanta Masuki et.al. |
2501.09064 |
link |
2025-01-15 |
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion |
Jingyuan Chen et.al. |
2501.09019 |
null |
2025-01-15 |
How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias |
Tosin Fadahunsi et.al. |
2501.09014 |
link |
2025-01-15 |
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot |
Ruixiang Jiang et.al. |
2501.09012 |
link |
2025-01-15 |
RepVideo: Rethinking Cross-Layer Representation for Video Generation |
Chenyang Si et.al. |
2501.08994 |
null |
2025-01-15 |
Enhanced Multi-Scale Cross-Attention for Person Image Generation |
Hao Tang et.al. |
2501.08900 |
null |
2025-01-15 |
StereoGen: High-quality Stereo Image Generation from a Single Image |
Xianqi Wang et.al. |
2501.08654 |
null |
2025-01-15 |
Joint Learning of Depth and Appearance for Portrait Image Animation |
Xinya Ji et.al. |
2501.08649 |
null |
2025-01-15 |
Watermarking in Diffusion Model: Gaussian Shading with Exact Diffusion Inversion via Coupled Transformations (EDICT) |
Krishna Panthi et.al. |
2501.08604 |
null |
2025-01-15 |
Comprehensive Subjective and Objective Evaluation Method for Text-generated Video |
Zelu Qi et.al. |
2501.08545 |
null |
2025-01-15 |
Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers |
Zhongwang Zhang et.al. |
2501.08537 |
link |
2025-01-14 |
GameFactory: Creating New Games with Generative Interactive Videos |
Jiwen Yu et.al. |
2501.08325 |
null |
2025-01-14 |
Diffusion Adversarial Post-Training for One-Step Video Generation |
Shanchuan Lin et.al. |
2501.08316 |
null |
2025-01-14 |
LayerAnimate: Layer-specific Control for Animation |
Yuxue Yang et.al. |
2501.08295 |
null |
2025-01-14 |
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors |
Yabo Zhang et.al. |
2501.08225 |
link |
2025-01-14 |
D $^2$ -DPM: Dual Denoising for Quantized Diffusion Probabilistic Models |
Qian Zeng et.al. |
2501.08180 |
link |
2025-01-14 |
Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features |
Evgenii Evstafev et.al. |
2501.08170 |
null |
2025-01-13 |
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens |
Dongwon Kim et.al. |
2501.07730 |
null |
2025-01-13 |
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations |
Weixi Feng et.al. |
2501.07647 |
null |
2025-01-13 |
Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss |
Xinyu Zhang et.al. |
2501.07563 |
null |
2025-01-13 |
Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models |
Yongyu Mu et.al. |
2501.07086 |
link |
2025-01-13 |
Enhancing Image Generation Fidelity via Progressive Prompts |
Zhen Xiong et.al. |
2501.07070 |
link |
2025-01-13 |
Detection of AI Deepfake and Fraud in Online Payments Using GAN-Based Models |
Zong Ke et.al. |
2501.07033 |
null |
2025-01-12 |
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models |
Michael Toker et.al. |
2501.06751 |
null |
2025-01-11 |
Denoising Diffusion Probabilistic Model for Radio Map Estimation in Generative Wireless Networks |
Xuanhao Luo et.al. |
2501.06604 |
null |
2025-01-11 |
DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy |
Wenshu Fan et.al. |
2501.06533 |
link |
2025-01-11 |
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation |
Xiaoying Xing et.al. |
2501.06481 |
null |
2025-01-11 |
Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning |
Maomao Li et.al. |
2501.06438 |
null |
2025-01-10 |
MEt3R: Measuring Multi-View Consistency in Generated Images |
Mohammad Asim et.al. |
2501.06336 |
null |
2025-01-10 |
Multi-subject Open-set Personalization in Video Generation |
Tsai-Shien Chen et.al. |
2501.06187 |
null |
2025-01-10 |
VideoAuteur: Towards Long Narrative Video Generation |
Junfei Xiao et.al. |
2501.06173 |
null |
2025-01-10 |
Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models |
Sofia Jamil et.al. |
2501.05839 |
link |
2025-01-10 |
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model |
Yi He et.al. |
2501.05710 |
null |
2025-01-09 |
Consistent Flow Distillation for Text-to-3D Generation |
Runjie Yan et.al. |
2501.05445 |
null |
2025-01-09 |
Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces |
Aniruddha Mahapatra et.al. |
2501.05442 |
null |
2025-01-09 |
Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation |
Xuyi Meng et.al. |
2501.05427 |
null |
2025-01-09 |
Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation |
Darius Petermann et.al. |
2501.05413 |
null |
2025-01-09 |
CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models |
Junha Park et.al. |
2501.05359 |
null |
2025-01-09 |
Patch-GAN Transfer Learning with Reconstructive Models for Cloud Removal |
Wanli Ma et.al. |
2501.05265 |
null |
2025-01-09 |
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering |
Dewei Zhou et.al. |
2501.05131 |
null |
2025-01-08 |
EditAR: Unified Conditional Generation with Autoregressive Models |
Jiteng Mu et.al. |
2501.04699 |
null |
2025-01-08 |
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning |
Yuzhou Huang et.al. |
2501.04698 |
null |
2025-01-08 |
On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis |
Yekun Ke et.al. |
2501.04377 |
null |
2025-01-08 |
Circuit Complexity Bounds for Visual Autoregressive Model |
Yekun Ke et.al. |
2501.04299 |
null |
2025-01-08 |
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition |
Bowen Hao et.al. |
2501.04204 |
null |
2025-01-07 |
HistoryPalette: Supporting Exploration and Reuse of Past Alternatives in Image Generation and Editing |
Karim Benharrak et.al. |
2501.04163 |
null |
2025-01-07 |
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers |
Yuechen Zhang et.al. |
2501.03931 |
link |
2025-01-07 |
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control |
Zekai Gu et.al. |
2501.03847 |
link |
2025-01-07 |
Motion-Aware Generative Frame Interpolation |
Guozhen Zhang et.al. |
2501.03699 |
null |
2025-01-08 |
Evaluating Image Caption via Cycle-consistent Text-to-Image Generation |
Tianyu Cui et.al. |
2501.03567 |
null |
2025-01-07 |
PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models |
Lingzhi Yuan et.al. |
2501.03544 |
null |
2025-01-07 |
Textualize Visual Prompt for Image Editing via Diffusion Bridge |
Pengcheng Xu et.al. |
2501.03495 |
null |
2025-01-07 |
SceneBooth: Diffusion-based Framework for Subject-preserved Text-to-Image Generation |
Shang Chai et.al. |
2501.03490 |
null |
2025-01-06 |
License Plate Images Generation with Diffusion Models |
Mariia Shpir et.al. |
2501.03374 |
null |
2025-01-06 |
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation |
Guy Yariv et.al. |
2501.03059 |
null |
2025-01-06 |
TransPixar: Advancing Text-to-Video Generation with Transparency |
Luozhou Wang et.al. |
2501.03006 |
link |
2025-01-06 |
Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising |
Yunlong Yuan et.al. |
2501.02741 |
null |
2025-01-06 |
Artificial Intelligence in Creative Industries: Advances Prior to 2025 |
Nantheera Anantrasirichai et.al. |
2501.02725 |
null |
2025-01-05 |
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking |
Weikang Bian et.al. |
2501.02690 |
null |
2025-01-05 |
Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation |
Dawei Dai et.al. |
2501.02523 |
link |
2025-01-05 |
ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling |
Chaojie Mao et.al. |
2501.02487 |
null |
2025-01-05 |
MedSegDiffNCA: Diffusion Models With Neural Cellular Automata for Skin Lesion Segmentation |
Avni Mittal et.al. |
2501.02447 |
null |
2025-01-04 |
Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey |
Zongxia Li et.al. |
2501.02189 |
link |
2025-01-04 |
Generating Multimodal Images with GAN: Integrating Text, Image, and Style |
Chaoyi Tan et.al. |
2501.02167 |
null |
2025-01-03 |
JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing |
Qili Wang et.al. |
2501.01798 |
link |
2025-01-03 |
Controlling your Attributes in Voice |
Xuyuan Li et.al. |
2501.01674 |
null |
2025-01-02 |
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control |
Yuanpeng Tu et.al. |
2501.01427 |
null |
2025-01-03 |
Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions |
Xincheng Shuai et.al. |
2501.01425 |
null |
2025-01-02 |
Object-level Visual Prompts for Compositional Image Generation |
Gaurav Parmar et.al. |
2501.01424 |
null |
2025-01-02 |
On Unifying Video Generation and Camera Pose Estimation |
Chun-Hao Paul Huang et.al. |
2501.01409 |
null |
2025-01-02 |
ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer |
Xuyin Qi et.al. |
2501.01392 |
link |
2025-01-02 |
Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement |
Z. Zhang et.al. |
2501.01368 |
null |
2025-01-02 |
LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge |
Kyoungkook Kang et.al. |
2501.01197 |
null |
2025-01-02 |
HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment |
Zitong Xu et.al. |
2501.01116 |
null |
2025-01-02 |
EliGen: Entity-Level Controlled Image Generation with Regional Attention |
Hong Zhang et.al. |
2501.01097 |
link |
2025-01-01 |
OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes |
Sepehr Dehdashtian et.al. |
2501.00962 |
null |
2025-01-02 |
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation |
Yuanbo Yang et.al. |
2412.21117 |
null |
2024-12-30 |
Quantum Diffusion Model for Quark and Gluon Jet Generation |
Mariia Baidachna et.al. |
2412.21082 |
link |
2024-12-30 |
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model |
Yifei Huang et.al. |
2412.21080 |
link |
2024-12-30 |
Varformer: Adapting VAR’s Generative Prior for Image Restoration |
Siyang Wang et.al. |
2412.21063 |
link |
2024-12-30 |
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation |
Jiazheng Xu et.al. |
2412.21059 |
link |
2024-12-30 |
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation |
Ting Zhang et.al. |
2412.20901 |
null |
2024-12-30 |
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control |
Shaojin Wu et.al. |
2412.20800 |
link |
2024-12-30 |
Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling |
Min Zhang et.al. |
2412.20725 |
null |
2024-12-30 |
HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images |
Sungik Choi et.al. |
2412.20704 |
null |
2024-12-30 |
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis |
Yousef Yeganeh et.al. |
2412.20651 |
null |
2024-12-27 |
Generative Video Propagation |
Shaoteng Liu et.al. |
2412.19761 |
null |
2024-12-27 |
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models |
Tao Wu et.al. |
2412.19645 |
null |
2024-12-27 |
P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision |
Junjie Hu et.al. |
2412.19533 |
null |
2024-12-27 |
DrivingWorld: ConstructingWorld Model for Autonomous Driving via Video GPT |
Xiaotao Hu et.al. |
2412.19505 |
link |
2024-12-27 |
Focusing Image Generation to Mitigate Spurious Correlations |
Xuewei Li et.al. |
2412.19457 |
null |
2024-12-25 |
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation |
Lunhao Duan et.al. |
2412.18928 |
null |
2024-12-25 |
Accelerating Diffusion Transformers with Dual Feature Caching |
Chang Zou et.al. |
2412.18911 |
link |
2024-12-25 |
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering |
Ruohong Yang et.al. |
2412.18838 |
null |
2024-12-25 |
DebiasDiff: Debiasing Text-to-image Diffusion Models with Self-discovering Latent Attribute Directions |
Yilei Jiang et.al. |
2412.18810 |
null |
2024-12-25 |
Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation |
Sen Peng et.al. |
2412.18791 |
null |
2024-12-24 |
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers |
Yuntao Chen et.al. |
2412.18607 |
null |
2024-12-24 |
ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation |
Hongjie Li et.al. |
2412.18600 |
null |
2024-12-24 |
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation |
Minghong Cai et.al. |
2412.18597 |
link |
2024-12-24 |
Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models |
Qice Qin et.al. |
2412.18421 |
null |
2024-12-24 |
Extract Free Dense Misalignment from CLIP |
JeongYeon Nam et.al. |
2412.18404 |
link |
2024-12-24 |
TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization |
Yucong Luo et.al. |
2412.18185 |
null |
2024-12-24 |
EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation |
Shuhao Han et.al. |
2412.18150 |
link |
2024-12-24 |
Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction |
Xiao Guo et.al. |
2412.18149 |
null |
2024-12-24 |
Ensuring Consistency for In-Image Translation |
Chengpeng Fu et.al. |
2412.18139 |
null |
2024-12-23 |
Large Motion Video Autoencoding with Cross-modal Video VAE |
Yazhou Xing et.al. |
2412.17805 |
null |
2024-12-23 |
VidTwin: Video VAE with Decoupled Structure and Dynamics |
Yuchi Wang et.al. |
2412.17726 |
link |
2024-12-23 |
Personalized Large Vision-Language Models |
Chau Pham et.al. |
2412.17610 |
null |
2024-12-23 |
FFA Sora, video generation as fundus fluorescein angiography simulator |
Xinyuan Wu et.al. |
2412.17346 |
null |
2024-12-23 |
Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory |
Xingyao Li et.al. |
2412.17254 |
null |
2024-12-23 |
Discriminative Image Generation with Diffusion Models for Zero-Shot Learning |
Dingjie Fu et.al. |
2412.17219 |
null |
2024-12-22 |
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching |
Enshu Liu et.al. |
2412.17153 |
link |
2024-12-22 |
Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images |
Dennis Menn et.al. |
2412.17109 |
null |
2024-12-22 |
DreamOmni: Unified Image Generation and Editing |
Bin Xia et.al. |
2412.17098 |
null |
2024-12-22 |
SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults |
Jinzhi Wang et.al. |
2412.17077 |
null |
2024-12-20 |
Personalized Representation from Personalized Generation |
Shobhita Sundaram et.al. |
2412.16156 |
link |
2024-12-20 |
NeRF-To-Real Tester: Neural Radiance Fields as Test Image Generators for Vision of Autonomous Systems |
Laura Weihl et.al. |
2412.16141 |
null |
2024-12-20 |
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up |
Songhua Liu et.al. |
2412.16112 |
link |
2024-12-20 |
SafeCFG: Redirecting Harmful Classifier-Free Guidance for Safe Generation |
Jiadong Pan et.al. |
2412.16039 |
null |
2024-12-20 |
Semi-Supervised Adaptation of Diffusion Models for Handwritten Text Generation |
Kai Brandenbusch et.al. |
2412.15853 |
null |
2024-12-20 |
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization |
Zihan Ding et.al. |
2412.15689 |
null |
2024-12-20 |
PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium |
Xinzhe Li et.al. |
2412.15674 |
link |
2024-12-20 |
BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models |
Yifei Sun et.al. |
2412.15670 |
link |
2024-12-20 |
CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training |
Xiuli Bi et.al. |
2412.15646 |
link |
2024-12-20 |
Stylish and Functional: Guided Interpolation Subject to Physical Constraints |
Yan-Ying Chen et.al. |
2412.15507 |
null |
2024-12-19 |
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution |
Qihao Liu et.al. |
2412.15213 |
null |
2024-12-19 |
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching |
Sucheng Ren et.al. |
2412.15205 |
link |
2024-12-19 |
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation |
Moayed Haji-Ali et.al. |
2412.15191 |
null |
2024-12-19 |
LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation |
Weijia Shi et.al. |
2412.15188 |
null |
2024-12-19 |
Tiled Diffusion |
Or Madar et.al. |
2412.15185 |
null |
2024-12-19 |
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM |
Yatai Ji et.al. |
2412.15156 |
link |
2024-12-19 |
Parallelized Autoregressive Visual Generation |
Yuqing Wang et.al. |
2412.15119 |
null |
2024-12-19 |
DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space |
Mang Ning et.al. |
2412.15032 |
link |
2024-12-19 |
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations |
Yucheng Hu et.al. |
2412.14803 |
null |
2024-12-19 |
Qua $^2$ SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models |
Keith G. Mills et.al. |
2412.14628 |
null |
2024-12-18 |
E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling |
Zhihang Yuan et.al. |
2412.14170 |
null |
2024-12-18 |
Autoregressive Video Generation without Vector Quantization |
Haoge Deng et.al. |
2412.14169 |
link |
2024-12-18 |
FashionComposer: Compositional Fashion Image Generation |
Sihui Ji et.al. |
2412.14168 |
null |
2024-12-18 |
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation |
Runtao Liu et.al. |
2412.14167 |
null |
2024-12-18 |
AKiRa: Augmentation Kit on Rays for optical video generation |
Xi Wang et.al. |
2412.14158 |
null |
2024-12-18 |
SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical Video Generation |
Tong Chen et.al. |
2412.14018 |
null |
2024-12-18 |
Text2Relight: Creative Portrait Relighting with Text Guidance |
Junuk Cha et.al. |
2412.13734 |
null |
2024-12-18 |
Diffusion models and stochastic quantisation in lattice field theory |
Gert Aarts et.al. |
2412.13704 |
null |
2024-12-18 |
MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing |
Chuang Yang et.al. |
2412.13684 |
null |
2024-12-18 |
Self-control: A Better Conditional Mechanism for Masked Autoregressive Model |
Qiaoying Qu et.al. |
2412.13635 |
null |
2024-12-17 |
MotionBridge: Dynamic Video Inbetweening with Flexible Controls |
Maham Tanveer et.al. |
2412.13190 |
null |
2024-12-17 |
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration |
Lu Liu et.al. |
2412.13155 |
null |
2024-12-17 |
Prompt Augmentation for Self-supervised Text-guided Image Manipulation |
Rumeysa Bodur et.al. |
2412.13081 |
null |
2024-12-17 |
VidTok: A Versatile and Open-Source Video Tokenizer |
Anni Tang et.al. |
2412.13061 |
link |
2024-12-17 |
3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation |
Haoshen Wang et.al. |
2412.13059 |
null |
2024-12-17 |
Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression |
Ruijie Chen et.al. |
2412.12982 |
null |
2024-12-17 |
Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance |
Wenhao Sun et.al. |
2412.12974 |
link |
2024-12-17 |
Unsupervised Region-Based Image Editing of Denoising Diffusion Models |
Zixiang Li et.al. |
2412.12912 |
null |
2024-12-17 |
ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction |
Zhongjie Duan et.al. |
2412.12888 |
link |
2024-12-17 |
Rethinking Diffusion-Based Image Generators for Fundus Fluorescein Angiography Synthesis on Limited Data |
Chengzhou Yu et.al. |
2412.12778 |
null |
2024-12-16 |
Causal Diffusion Transformers for Generative Modeling |
Chaorui Deng et.al. |
2412.12095 |
link |
2024-12-16 |
A LoRA is Worth a Thousand Pictures |
Chenxi Liu et.al. |
2412.12048 |
null |
2024-12-16 |
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models |
Rick Akkerman et.al. |
2412.11785 |
null |
2024-12-16 |
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation |
Tianyi Zhu et.al. |
2412.11755 |
link |
2024-12-16 |
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation |
Yiren Song et.al. |
2412.11638 |
null |
2024-12-16 |
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting |
Muhammet Furkan Ilaslan et.al. |
2412.11621 |
link |
2024-12-16 |
3D $^2$ -Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling |
Zichen Tang et.al. |
2412.11599 |
link |
2024-12-16 |
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model |
Xi Wang et.al. |
2412.11519 |
null |
2024-12-16 |
FedCAR: Cross-client Adaptive Re-weighting for Generative Models in Federated Learning |
Minjun Kim et.al. |
2412.11463 |
link |
2024-12-16 |
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models |
Namhyuk Ahn et.al. |
2412.11423 |
null |
2024-12-13 |
OP-LoRA: The Blessing of Dimensionality |
Piotr Teterwak et.al. |
2412.10362 |
null |
2024-12-13 |
TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation |
Xingrui Wang et.al. |
2412.10275 |
null |
2024-12-13 |
Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark |
Yudong Jiang et.al. |
2412.10255 |
link |
2024-12-13 |
Simple Guidance Mechanisms for Discrete Diffusion Models |
Yair Schiff et.al. |
2412.10193 |
link |
2024-12-13 |
Financial Fine-tuning a Large Time Series Model |
Xinghong Fu et.al. |
2412.09880 |
link |
2024-12-13 |
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity |
Hongjie Wang et.al. |
2412.09856 |
null |
2024-12-13 |
MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion |
Xunnong Xu et.al. |
2412.09828 |
null |
2024-12-12 |
Human vs. AI: A Novel Benchmark and a Comparative Study on the Detection of Generated Images and the Impact of Prompts |
Philipp Moeßner et.al. |
2412.09715 |
link |
2024-12-12 |
Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation |
Chun-Mei Feng et.al. |
2412.09706 |
link |
2024-12-12 |
Doe-1: Closed-Loop Autonomous Driving with Large World Model |
Wenzhao Zheng et.al. |
2412.09627 |
link |
2024-12-12 |
OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation |
Weiqi Li et.al. |
2412.09623 |
null |
2024-12-12 |
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models |
Enis Simsar et.al. |
2412.09622 |
null |
2024-12-12 |
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM |
Zhuofan Zong et.al. |
2412.09618 |
null |
2024-12-12 |
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers |
Yusuf Dalva et.al. |
2412.09611 |
null |
2024-12-12 |
Spectral Image Tokenizer |
Carlos Esteves et.al. |
2412.09607 |
null |
2024-12-12 |
Owl-1: Omni World Model for Consistent Long Video Generation |
Yuanhui Huang et.al. |
2412.09600 |
link |
2024-12-12 |
LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors |
Yabo Chen et.al. |
2412.09597 |
null |
2024-12-12 |
Video Creation by Demonstration |
Yihong Sun et.al. |
2412.09551 |
null |
2024-12-12 |
UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer |
Delong Liu et.al. |
2412.09389 |
link |
2024-12-11 |
Fast Prompt Alignment for Text-to-Image Generation |
Khalil Mrini et.al. |
2412.08639 |
link |
2024-12-11 |
Multimodal Latent Language Modeling with Next-Token Diffusion |
Yutao Sun et.al. |
2412.08635 |
link |
2024-12-11 |
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations |
Zejian Li et.al. |
2412.08580 |
link |
2024-12-11 |
Learning Flow Fields in Attention for Controllable Person Image Generation |
Zijian Zhou et.al. |
2412.08486 |
link |
2024-12-11 |
InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models |
Min Hou et.al. |
2412.08480 |
link |
2024-12-11 |
CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis |
Mu Zhang et.al. |
2412.08464 |
null |
2024-12-11 |
Pysical Informed Driving World Model |
Zhuoran Yang et.al. |
2412.08410 |
null |
2024-12-11 |
FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks |
Chongkai Gao et.al. |
2412.08261 |
null |
2024-12-11 |
VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation |
Zhiqiang Yuan et.al. |
2412.08259 |
null |
2024-12-11 |
Analyzing and Improving Model Collapse in Rectified Flow Models |
Huminhao Zhu et.al. |
2412.08175 |
null |
2024-12-10 |
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics |
Xi Chen et.al. |
2412.07774 |
null |
2024-12-10 |
From Slow Bidirectional to Fast Causal Video Generators |
Tianwei Yin et.al. |
2412.07772 |
null |
2024-12-10 |
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints |
Jianhong Bai et.al. |
2412.07760 |
link |
2024-12-10 |
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation |
Xiao Fu et.al. |
2412.07759 |
null |
2024-12-10 |
Multi-Shot Character Consistency for Text-to-Video Generation |
Yuval Atzmon et.al. |
2412.07750 |
null |
2024-12-10 |
StyleMaster: Stylize Your Video with Artistic Generation and Translation |
Zixuan Ye et.al. |
2412.07744 |
null |
2024-12-10 |
STIV: Scalable Text and Image Conditioned Video Generation |
Zongyu Lin et.al. |
2412.07730 |
null |
2024-12-10 |
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer |
Jinyi Hu et.al. |
2412.07720 |
link |
2024-12-10 |
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models |
Tong Wu et.al. |
2412.07674 |
null |
2024-12-10 |
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation |
Jianzong Wu et.al. |
2412.07589 |
null |
2024-12-09 |
Visual Lexicon: Rich Image Features in Language Space |
XuDong Wang et.al. |
2412.06774 |
null |
2024-12-09 |
Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty |
Meera Hahn et.al. |
2412.06771 |
link |
2024-12-09 |
ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet |
Andrei-Robert Alexandrescu et.al. |
2412.06742 |
null |
2024-12-09 |
EMOv2: Pushing 5M Vision Model Frontier |
Jiangning Zhang et.al. |
2412.06674 |
link |
2024-12-09 |
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance |
Chunwei Wang et.al. |
2412.06673 |
null |
2024-12-09 |
Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion |
Shuaiting Li et.al. |
2412.06661 |
null |
2024-12-09 |
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment |
Kim Sung-Bin et.al. |
2412.06209 |
link |
2024-12-09 |
ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance |
Yuming Li et.al. |
2412.06163 |
null |
2024-12-09 |
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters |
Yuan Wang et.al. |
2412.06143 |
link |
2024-12-08 |
GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis |
Ashish Goswami et.al. |
2412.06089 |
null |
2024-12-06 |
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model |
Lening Wang et.al. |
2412.05280 |
link |
2024-12-06 |
Mind the Time: Temporally-Controlled Multi-Event Video Generation |
Ziyi Wu et.al. |
2412.05263 |
null |
2024-12-06 |
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation |
Donald Shenaj et.al. |
2412.05148 |
link |
2024-12-06 |
The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation |
Ruoyu Wang et.al. |
2412.05101 |
null |
2024-12-06 |
Noise Matters: Diffusion Model-based Urban Mobility Generation with Collaborative Noise Priors |
Yuheng Zhang et.al. |
2412.05000 |
null |
2024-12-06 |
Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction |
Gaurav Shrivastava et.al. |
2412.04929 |
null |
2024-12-06 |
UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving |
Rui Chen et.al. |
2412.04842 |
link |
2024-12-05 |
Hidden in the Noise: Two-Stage Robust Watermarking for Images |
Kasra Arabi et.al. |
2412.04653 |
link |
2024-12-05 |
One Communication Round is All It Needs for Federated Fine-Tuning Foundation Models |
Ziyao Wang et.al. |
2412.04650 |
null |
2024-12-05 |
Using Diffusion Priors for Video Amodal Segmentation |
Kaihua Chen et.al. |
2412.04623 |
null |
2024-12-05 |
PaintScene4D: Consistent 4D Scene Generation from Text Prompts |
Vinayak Gupta et.al. |
2412.04471 |
null |
2024-12-05 |
LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors |
Yusuf Dalva et.al. |
2412.04460 |
null |
2024-12-05 |
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation |
Longtao Zheng et.al. |
2412.04448 |
null |
2024-12-05 |
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models |
Yizhuo Li et.al. |
2412.04446 |
null |
2024-12-05 |
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration |
Kaiyi Huang et.al. |
2412.04440 |
null |
2024-12-05 |
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation |
Yuying Ge et.al. |
2412.04432 |
link |
2024-12-05 |
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation |
Fredrik Carlsson et.al. |
2412.04318 |
null |
2024-12-05 |
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts |
Ziwei Huang et.al. |
2412.04300 |
null |
2024-12-05 |
Structure-Aware Stylized Image Synthesis for Robust Medical Image Segmentation |
Jie Bao et.al. |
2412.04296 |
link |
2024-12-05 |
Instructional Video Generation |
Yayuan Li et.al. |
2412.04189 |
null |
2024-12-04 |
Navigation World Models |
Amir Bar et.al. |
2412.03572 |
null |
2024-12-04 |
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation |
Zehuan Huang et.al. |
2412.03558 |
null |
2024-12-04 |
Imagine360: Immersive 360 Video Generation from Perspective Anchor |
Jing Tan et.al. |
2412.03552 |
null |
2024-12-04 |
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention |
Hannan Lu et.al. |
2412.03520 |
null |
2024-12-04 |
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective |
Neta Shaul et.al. |
2412.03487 |
null |
2024-12-04 |
SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model |
Yan Li et.al. |
2412.03430 |
null |
2024-12-04 |
Skel3D: Skeleton Guided Novel View Synthesis |
Aron Fóthi et.al. |
2412.03407 |
null |
2024-12-04 |
Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment |
Feng He et.al. |
2412.03400 |
null |
2024-12-04 |
DIVE: Taming DINO for Subject-Driven Video Editing |
Yi Huang et.al. |
2412.03347 |
null |
2024-12-04 |
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation |
Qingdong He et.al. |
2412.03255 |
null |
2024-12-03 |
Motion Prompting: Controlling Video Generation with Motion Trajectories |
Daniel Geng et.al. |
2412.02700 |
null |
2024-12-03 |
Taming Scalable Visual Tokenizer for Autoregressive Image Generation |
Fengyuan Shi et.al. |
2412.02692 |
link |
2024-12-03 |
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation |
Kefan Chen et.al. |
2412.02690 |
null |
2024-12-03 |
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance |
Viet Nguyen et.al. |
2412.02687 |
null |
2024-12-03 |
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction |
Lingteng Qiu et.al. |
2412.02684 |
null |
2024-12-03 |
Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback |
Hiroki Furuta et.al. |
2412.02617 |
null |
2024-12-03 |
WEM-GAN: Wavelet transform based facial expression manipulation |
Dongya Sun et.al. |
2412.02530 |
null |
2024-12-03 |
ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation? |
Leixin Zhang et.al. |
2412.02368 |
link |
2024-12-03 |
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation |
Mingzhe Zheng et.al. |
2412.02259 |
link |
2024-12-03 |
Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models |
Jungwon Park et.al. |
2412.02237 |
link |
2024-11-29 |
JetFormer: An Autoregressive Generative Model of Raw Images and Text |
Michael Tschannen et.al. |
2411.19722 |
link |
2024-11-29 |
Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing |
Wenyi Mo et.al. |
2411.19652 |
link |
2024-11-29 |
QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain |
Wenfang Sun et.al. |
2411.19534 |
null |
2024-11-29 |
Fleximo: Towards Flexible Text-to-Human Motion Video Generation |
Yuhang Zhang et.al. |
2411.19459 |
null |
2024-11-29 |
Achromatic single-layer hologram |
Zhi Li et.al. |
2411.19445 |
null |
2024-11-28 |
AMO Sampler: Enhancing Text Rendering with Overshooting |
Xixi Hu et.al. |
2411.19415 |
link |
2024-11-28 |
DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models |
Shwetha Ram et.al. |
2411.19390 |
null |
2024-11-28 |
Trajectory Attention for Fine-grained Video Motion Control |
Zeqi Xiao et.al. |
2411.19324 |
null |
2024-11-28 |
Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention |
Huiguo He et.al. |
2411.19261 |
null |
2024-11-28 |
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation |
Yuhan Pei et.al. |
2411.19182 |
null |
2024-11-27 |
Diffusion Self-Distillation for Zero-Shot Customized Image Generation |
Shengqu Cai et.al. |
2411.18616 |
null |
2024-11-27 |
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion |
Haosen Yang et.al. |
2411.18552 |
null |
2024-11-27 |
Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models |
Yiming Wu et.al. |
2411.18375 |
null |
2024-11-27 |
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models |
Riza Velioglu et.al. |
2411.18350 |
link |
2024-11-27 |
MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation |
Haopeng Fang et.al. |
2411.18281 |
null |
2024-11-27 |
Prediction with Action: Visual Policy Learning via Joint Denoising Process |
Yanjiang Guo et.al. |
2411.18179 |
null |
2024-11-27 |
Type-R: Automatically Retouching Typos for Text-to-Image Generation |
Wataru Shimoda et.al. |
2411.18159 |
null |
2024-11-27 |
PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion |
Gwanghyun Kim et.al. |
2411.18068 |
null |
2024-11-27 |
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models |
Shuyang Hao et.al. |
2411.18000 |
null |
2024-11-27 |
Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space |
Lingxiao Li et.al. |
2411.17784 |
null |
2024-11-26 |
Accelerating Vision Diffusion Transformers with Skip Branches |
Guanjie Chen et.al. |
2411.17616 |
link |
2024-11-26 |
IMPROVE: Improving Medical Plausibility without Reliance on HumanValidation – An Enhanced Prototype-Guided Diffusion Framework |
Anurag Shandilya et.al. |
2411.17535 |
null |
2024-11-26 |
Identity-Preserving Text-to-Video Generation by Frequency Decomposition |
Shenghai Yuan et.al. |
2411.17440 |
link |
2024-11-26 |
Image Generation with Multimodule Semantic Feature-Aided Selection for Semantic Communications |
Chengyang Liang et.al. |
2411.17428 |
null |
2024-11-26 |
Cross-modal Medical Image Generation Based on Pyramid Convolutional Attention Network |
Fuyou Mao et.al. |
2411.17420 |
null |
2024-11-26 |
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation |
Ziyi Xu et.al. |
2411.17383 |
null |
2024-11-26 |
Reward Incremental Learning in Text-to-Image Generation |
Maorong Wang et.al. |
2411.17310 |
null |
2024-11-26 |
From Graph Diffusion to Graph Classification |
Jia Jun Cheng Xian et.al. |
2411.17236 |
null |
2024-11-26 |
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM |
Jiarui Wang et.al. |
2411.17221 |
link |
2024-11-26 |
cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis |
Paul Friedrich et.al. |
2411.17203 |
link |
2024-11-25 |
Factorized Visual Tokenization and Generation |
Zechen Bai et.al. |
2411.16681 |
null |
2024-11-25 |
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation |
Zun Wang et.al. |
2411.16657 |
null |
2024-11-25 |
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric |
Zhichao Zhang et.al. |
2411.16619 |
null |
2024-11-25 |
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing |
Kaifeng Gao et.al. |
2411.16375 |
link |
2024-11-25 |
CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain |
Jingchao Peng et.al. |
2411.16327 |
null |
2024-11-25 |
Image Generation Diversity Issues and How to Tame Them |
Mischa Dombrowski et.al. |
2411.16171 |
link |
2024-11-25 |
Text-to-Image Synthesis: A Decade Survey |
Nonghai Zhang et.al. |
2411.16164 |
null |
2024-11-25 |
Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models |
Donggeun Ko et.al. |
2411.16079 |
null |
2024-11-25 |
Label-Free Intraoperative Mean-Transition-Time Image Generation Using Statistical Gating and Deep Learning |
Yan Shi et.al. |
2411.16039 |
null |
2024-11-24 |
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs |
Teng Zhou et.al. |
2411.15867 |
link |
2024-11-22 |
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement |
Daeun Lee et.al. |
2411.15115 |
null |
2024-11-22 |
Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion |
Samarth N Ramesh et.al. |
2411.15113 |
null |
2024-11-22 |
OminiControl: Minimal and Universal Control for Diffusion Transformer |
Zhenxiong Tan et.al. |
2411.15098 |
link |
2024-11-22 |
Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation |
Lakshmikar R. Polamreddy et.al. |
2411.15084 |
link |
2024-11-22 |
HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads |
Yu Xu et.al. |
2411.15034 |
null |
2024-11-22 |
Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation |
Dingyuan Shi et.al. |
2411.14871 |
null |
2024-11-22 |
Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation |
Jeongsol Kim et.al. |
2411.14863 |
null |
2024-11-22 |
Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering |
Haoyuan Li et.al. |
2411.14816 |
null |
2024-11-22 |
High-Resolution Image Synthesis via Next-Token Prediction |
Dengsheng Chen et.al. |
2411.14808 |
null |
2024-11-22 |
FairAdapter: Detecting AI-generated Images with Improved Fairness |
Feng Ding et.al. |
2411.14755 |
link |
2024-11-21 |
StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart |
Jian Shi et.al. |
2411.14295 |
link |
2024-11-21 |
ComfyGI: Automatic Improvement of Image Generation Workflows |
Dominik Sobania et.al. |
2411.14193 |
null |
2024-11-21 |
TaQ-DiT: Time-aware Quantization for Diffusion Transformers |
Xinyan Liu et.al. |
2411.14172 |
null |
2024-11-21 |
MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective |
Hailang Huang et.al. |
2411.14062 |
link |
2024-11-21 |
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction |
Jordan Vice et.al. |
2411.13982 |
null |
2024-11-21 |
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models |
Jordan Vice et.al. |
2411.13981 |
null |
2024-11-21 |
Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion |
Jinhong He et.al. |
2411.13961 |
link |
2024-11-21 |
iHQGAN: A Lightweight Invertible Hybrid Quantum-Classical Generative Adversarial Network for Unsupervised Image-to-Image Translation |
Xue Yang et.al. |
2411.13920 |
link |
2024-11-21 |
Dealing with Synthetic Data Contamination in Online Continual Learning |
Maorong Wang et.al. |
2411.13852 |
link |
2024-11-21 |
Detecting Human Artifacts from Text-to-Image Models |
Kaihong Wang et.al. |
2411.13842 |
link |
2024-11-20 |
REDUCIO! Generating 1024 $\times$ 1024 Video within 16 Seconds using Extremely Compressed Motion Latents |
Rui Tian et.al. |
2411.13552 |
link |
2024-11-20 |
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models |
Ziqi Huang et.al. |
2411.13503 |
link |
2024-11-20 |
From Prompt Engineering to Prompt Craft |
Joseph Lindley et.al. |
2411.13422 |
null |
2024-11-20 |
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation |
Christoph Reinders et.al. |
2411.13150 |
link |
2024-11-20 |
CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models |
Naen Xu et.al. |
2411.13144 |
null |
2024-11-19 |
From Text to Pose to Image: Improving Diffusion Model Control and Quality |
Clément Bonnett et.al. |
2411.12872 |
link |
2024-11-19 |
Towards motion from video diffusion models |
Paul Janson et.al. |
2411.12831 |
null |
2024-11-19 |
Stylecodes: Encoding Stylistic Information For Image Generation |
Ciara Rowles et.al. |
2411.12811 |
link |
2024-11-19 |
Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting |
Haoyu Zhao et.al. |
2411.12789 |
null |
2024-11-19 |
PoM: Efficient Image and Video Generation with the Polynomial Mixer |
David Picard et.al. |
2411.12663 |
link |
2024-11-19 |
Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models |
Shuntaro Okada et.al. |
2411.12188 |
null |
2024-11-19 |
Enhancing Low Dose Computed Tomography Images Using Consistency Training Techniques |
Mahmut S. Gokmen et.al. |
2411.12181 |
null |
2024-11-18 |
Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution |
Brian B. Moser et.al. |
2411.12072 |
link |
2024-11-18 |
Medical Video Generation for Disease Progression Simulation |
Xu Cao et.al. |
2411.11943 |
null |
2024-11-18 |
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input |
Zhen Lv et.al. |
2411.11934 |
null |
2024-11-18 |
Conceptwm: A Diffusion Model Watermark for Concept Protection |
Liangqi Lei et.al. |
2411.11688 |
null |
2024-11-18 |
A Modular Open Source Framework for Genomic Variant Calling |
Ankita Vaishnobi Bisoi et.al. |
2411.11513 |
null |
2024-11-19 |
SoK: On the Role and Future of AIGC Watermarking in the Era of Gen-AI |
Kui Ren et.al. |
2411.11478 |
null |
2024-11-18 |
MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion |
Dongseok Shim et.al. |
2411.11475 |
null |
2024-11-18 |
Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge |
Qinglong Cao et.al. |
2411.11343 |
null |
2024-11-18 |
BeautyBank: Encoding Facial Makeup in Latent Space |
Qianwen Lu et.al. |
2411.11231 |
null |
2024-11-17 |
Enhanced Anime Image Generation Using USE-CMHSA-GAN |
J. Lu et.al. |
2411.11179 |
null |
2024-11-17 |
Time Step Generating: A Universal Synthesized Deepfake Image Detector |
Ziyue Zeng et.al. |
2411.11016 |
link |
2024-11-17 |
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration |
Jintao Zhang et.al. |
2411.10958 |
link |
2024-11-16 |
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models |
Vipula Rawte et.al. |
2411.10867 |
null |
2024-11-15 |
M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation |
Sucheng Ren et.al. |
2411.10433 |
link |
2024-11-15 |
Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding |
Huming Qiu et.al. |
2411.10329 |
null |
2024-11-15 |
The Unreasonable Effectiveness of Guidance for Diffusion Models |
Tim Kaiser et.al. |
2411.10257 |
null |
2024-11-15 |
Visual question answering based evaluation metrics for text-to-image generation |
Mizuki Miyamoto et.al. |
2411.10183 |
null |
2024-11-15 |
CART: Compositional Auto-Regressive Transformer for Image Generation |
Siddharth Roheda et.al. |
2411.10180 |
null |
2024-11-15 |
Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training |
Myunsoo Kim et.al. |
2411.09998 |
null |
2024-11-15 |
Content-Aware Preserving Image Generation |
Giang H. Le et.al. |
2411.09871 |
null |
2024-11-14 |
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting |
Yian Wang et.al. |
2411.09823 |
null |
2024-11-14 |
GAN-Based Architecture for Low-dose Computed Tomography Imaging Denoising |
Yunuo Wang et.al. |
2411.09512 |
null |
2024-11-14 |
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models |
Chutian Meng et.al. |
2411.09449 |
null |
2024-11-14 |
Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance |
Md Fahim Anjum et.al. |
2411.09174 |
null |
2024-11-14 |
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation |
Youpeng Wen et.al. |
2411.09153 |
null |
2024-11-13 |
A Survey on Vision Autoregressive Model |
Kai Jiang et.al. |
2411.08666 |
null |
2024-11-13 |
Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models |
Chengdong Dong et.al. |
2411.08642 |
null |
2024-11-13 |
I Can Embrace and Avoid Vagueness Myself: Supporting the Design Process by Balancing Vagueness through Text-to-Image Generative AI |
Myungjin Kim et.al. |
2411.08588 |
null |
2024-11-13 |
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation |
Xiaofeng Wang et.al. |
2411.08380 |
null |
2024-11-13 |
Physics Informed Distillation for Diffusion Models |
Joshua Tian Jin Tee et.al. |
2411.08378 |
link |
2024-11-13 |
Motion Control for Enhanced Complex Action Video Generation |
Qiang Zhou et.al. |
2411.08328 |
null |
2024-11-12 |
Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing |
Zitao Shuai et.al. |
2411.08196 |
null |
2024-11-12 |
TIPO: Text to Image with Text Presampling for Prompt Optimization |
Shih-Ying Yeh et.al. |
2411.08127 |
null |
2024-11-12 |
Evaluating the Generation of Spatial Relations in Text and Image Generative Models |
Shang Hong Sim et.al. |
2411.07664 |
null |
2024-11-12 |
Leveraging Previous Steps: A Training-free Fast Solver for Flow Diffusion |
Kaiyu Song et.al. |
2411.07627 |
null |
2024-11-12 |
Artificial Intelligence for Biomedical Video Generation |
Linyuan Li et.al. |
2411.07619 |
null |
2024-11-12 |
GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering |
Zhihao Liang et.al. |
2411.07478 |
null |
2024-11-11 |
Exploring Variational Autoencoders for Medical Image Generation: A Comprehensive Study |
Khadija Rais et.al. |
2411.07348 |
null |
2024-11-11 |
Learning from Limited and Imperfect Data |
Harsh Rangwani et.al. |
2411.07229 |
null |
2024-11-11 |
DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID |
Nyle Siddiqui et.al. |
2411.07205 |
link |
2024-11-11 |
More Expressive Attention with Negative Weights |
Ang Lv et.al. |
2411.07176 |
link |
2024-11-11 |
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models |
NVIDIA et.al. |
2411.07126 |
null |
2024-11-11 |
Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models |
Yanchen Wang et.al. |
2411.07121 |
link |
2024-11-11 |
Layout Control and Semantic Guidance with Attention Loss Backward for T2I Diffusion Model |
Guandong Li et.al. |
2411.06692 |
null |
2024-11-11 |
SeedEdit: Align Image Re-Generation to Image Editing |
Yichun Shi et.al. |
2411.06686 |
null |
2024-11-10 |
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement |
Zhennan Chen et.al. |
2411.06558 |
link |
2024-11-10 |
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength |
Wanquan Feng et.al. |
2411.06525 |
null |
2024-11-10 |
DDIM-Driven Coverless Steganography Scheme with Real Key |
Mingyu Yu et.al. |
2411.06486 |
null |
2024-11-08 |
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models |
Jia-Hong Huang et.al. |
2411.05706 |
null |
2024-11-08 |
WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making |
Zhilong Zhang et.al. |
2411.05619 |
null |
2024-11-08 |
A Nerf-Based Color Consistency Method for Remote Sensing Images |
Zongcheng Zuo et.al. |
2411.05557 |
null |
2024-11-08 |
Improving image synthesis with diffusion-negative sampling |
Alakh Desai et.al. |
2411.05473 |
null |
2024-11-07 |
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model |
Sheng Cheng et.al. |
2411.05079 |
link |
2024-11-07 |
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models |
Weixin Liang et.al. |
2411.04996 |
null |
2024-11-07 |
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation |
Koichi Namekata et.al. |
2411.04989 |
null |
2024-11-07 |
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation |
Anil Kag et.al. |
2411.04967 |
null |
2024-11-07 |
Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification |
Mischa Dombrowski et.al. |
2411.04956 |
null |
2024-11-07 |
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion |
Wenqiang Sun et.al. |
2411.04928 |
null |
2024-11-07 |
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration |
Panwen Hu et.al. |
2411.04925 |
null |
2024-11-07 |
MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views |
Yuedong Chen et.al. |
2411.04924 |
link |
2024-11-07 |
Taming Rectified Flow for Inversion and Editing |
Jiangshan Wang et.al. |
2411.04746 |
link |
2024-11-07 |
DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning |
Yuxuan Duan et.al. |
2411.04571 |
link |
2024-11-07 |
BendVLM: Test-Time Debiasing of Vision-Language Embeddings |
Walter Gerych et.al. |
2411.04420 |
link |
2024-11-06 |
ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks |
Ziji Shi et.al. |
2411.03999 |
null |
2024-11-06 |
Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation |
Chihaya Matsuhira et.al. |
2411.03595 |
null |
2024-11-05 |
Enhancing Weakly Supervised Semantic Segmentation for Fibrosis via Controllable Image Generation |
Zhiling Yue et.al. |
2411.03551 |
null |
2024-11-05 |
DiT4Edit: Diffusion Transformer for Image Editing |
Kunyu Feng et.al. |
2411.03286 |
null |
2024-11-05 |
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models |
Tariq Berrada Ifriqi et.al. |
2411.03177 |
null |
2024-11-05 |
Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising |
Tao Huang et.al. |
2411.03053 |
null |
2024-11-05 |
Textual Aesthetics in Large Language Models |
Lingjie Jiang et.al. |
2411.02930 |
link |
2024-11-05 |
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey |
Ao Fu et.al. |
2411.02914 |
null |
2024-11-05 |
BrainBits: How Much of the Brain are Generative Reconstruction Methods Using? |
David Mayo et.al. |
2411.02783 |
null |
2024-11-04 |
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives |
Maitreya Patel et.al. |
2411.02545 |
null |
2024-11-04 |
Adaptive Caching for Faster Video Generation with Diffusion Transformers |
Kumara Kahatapitiya et.al. |
2411.02397 |
null |
2024-11-04 |
Training-free Regional Prompting for Diffusion Transformers |
Anthony Chen et.al. |
2411.02395 |
link |
2024-11-04 |
How Far is Video Generation from World Model: A Physical Law Perspective |
Bingyi Kang et.al. |
2411.02385 |
null |
2024-11-04 |
Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models |
Anjith George et.al. |
2411.02188 |
null |
2024-11-03 |
Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation |
Zhenbin Wang et.al. |
2411.01647 |
null |
2024-11-03 |
DreamPolish: Domain Score Distillation With Progressive Geometry Generation |
Yean Cheng et.al. |
2411.01602 |
null |
2024-11-03 |
Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach |
Qihe Pan et.al. |
2411.01545 |
link |
2024-11-03 |
DPCL-Diff: The Temporal Knowledge Graph Reasoning based on Graph Node Diffusion Model with Dual-Domain Periodic Contrastive Learning |
Yukun Cao et.al. |
2411.01477 |
null |
2024-11-02 |
Guided Synthesis of Labeled Brain MRI Data Using Latent Diffusion Models for Segmentation of Enlarged Ventricles |
Tim Ruschke et.al. |
2411.01351 |
null |
2024-11-02 |
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference |
Zheng Zhan et.al. |
2411.01171 |
link |
2024-10-31 |
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning |
Penghui Ruan et.al. |
2410.24219 |
link |
2024-10-31 |
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts |
Xiang Deng et.al. |
2410.23836 |
null |
2024-11-01 |
In-Context LoRA for Diffusion Transformers |
Lianghua Huang et.al. |
2410.23775 |
link |
2024-10-31 |
Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization |
Xiao Guo et.al. |
2410.23556 |
null |
2024-10-30 |
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts |
Jie Zhu et.al. |
2410.23332 |
null |
2024-10-30 |
RelationBooth: Towards Relation-Aware Customized Object Generation |
Qingyu Shi et.al. |
2410.23280 |
null |
2024-10-31 |
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation |
Yining Hong et.al. |
2410.23277 |
null |
2024-10-30 |
Multi-student Diffusion Distillation for Better One-step Generators |
Yanke Song et.al. |
2410.23274 |
null |
2024-10-30 |
LumiSculpt: A Consistency Lighting Control Network for Video Generation |
Yuxin Zhang et.al. |
2410.22979 |
null |
2024-10-30 |
Private Synthetic Text Generation with Diffusion Models |
Sebastian Ochs et.al. |
2410.22971 |
link |
2024-10-30 |
An Individual Identity-Driven Framework for Animal Re-Identification |
Yihao Wu et.al. |
2410.22927 |
link |
2024-10-30 |
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models |
Shengkai Zhang et.al. |
2410.22901 |
link |
2024-10-30 |
Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images |
Hanlin Wu et.al. |
2410.22830 |
link |
2024-10-30 |
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models |
Arash Marioriyad et.al. |
2410.22775 |
null |
2024-10-30 |
Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots |
Vincent Guan et.al. |
2410.22729 |
link |
2024-10-29 |
Investigating Memorization in Video Diffusion Models |
Chen Chen et.al. |
2410.21669 |
null |
2024-10-29 |
Exploring Local Memorization in Diffusion Models via Bright Ending Attention |
Chen Chen et.al. |
2410.21665 |
null |
2024-10-29 |
Fingerprints of Super Resolution Networks |
Jeremy Vonderfecht et.al. |
2410.21653 |
null |
2024-10-28 |
Denoising Diffusion Planner: Learning Complex Paths from Low-Quality Demonstrations |
Michiel Nikken et.al. |
2410.21497 |
link |
2024-10-28 |
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior |
Hanyu Wang et.al. |
2410.21264 |
null |
2024-10-28 |
Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences |
Zhihao Zhao et.al. |
2410.21130 |
null |
2024-10-28 |
Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models |
Wenda Li et.al. |
2410.21088 |
link |
2024-10-28 |
Markov spin models for image generation : explicit large deviations with respect to the number of pixels |
Cecile Monthus et.al. |
2410.20906 |
null |
2024-10-28 |
Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models |
Weijian Luo et.al. |
2410.20898 |
link |
2024-10-28 |
Murine AI excels at cats and cheese: Structural differences between human and mouse neurons and their implementation in generative AIs |
Rino Saiga et.al. |
2410.20735 |
null |
2024-10-28 |
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians |
Chongjian Ge et.al. |
2410.20723 |
null |
2024-10-28 |
Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient |
Yintai Ma et.al. |
2410.20657 |
null |
2024-10-27 |
Generator Matching: Generative modeling with arbitrary Markov processes |
Peter Holderrieth et.al. |
2410.20587 |
null |
2024-10-27 |
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation |
Zongyi Li et.al. |
2410.20502 |
null |
2024-10-25 |
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality |
Zhengyao Lv et.al. |
2410.19355 |
null |
2024-10-25 |
High Resolution Seismic Waveform Generation using Denoising Diffusion |
Andreas Bergmeister et.al. |
2410.19343 |
null |
2024-10-24 |
Framer: Interactive Frame Interpolation |
Wen Wang et.al. |
2410.18978 |
null |
2024-10-24 |
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences |
Weijian Luo et.al. |
2410.18881 |
null |
2024-10-24 |
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation |
Xiaoyu Zhang et.al. |
2410.18830 |
null |
2024-10-24 |
Towards Visual Text Design Transfer Across Languages |
Yejin Choi et.al. |
2410.18823 |
null |
2024-10-24 |
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances |
Shilin Lu et.al. |
2410.18775 |
link |
2024-10-24 |
Ali-AUG: Innovative Approaches to Labeled Data Augmentation using One-Step Diffusion Model |
Ali Hamza et.al. |
2410.18678 |
null |
2024-10-24 |
FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation |
Christopher T. H Teo et.al. |
2410.18615 |
null |
2024-10-24 |
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling |
Zhengqiang Zhang et.al. |
2410.18410 |
link |
2024-10-23 |
Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing |
Dongliang Guo et.al. |
2410.18267 |
null |
2024-10-23 |
WorldSimBench: Towards Video Generation Models as World Simulators |
Yiran Qin et.al. |
2410.18072 |
null |
2024-10-23 |
Scalable Ranked Preference Optimization for Text-to-Image Generation |
Shyamgopal Karthik et.al. |
2410.18013 |
null |
2024-10-23 |
A Wavelet Diffusion GAN for Image Super-Resolution |
Lorenzo Aloisi et.al. |
2410.17966 |
null |
2024-10-23 |
TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation |
Ruicheng Zhang et.al. |
2410.17855 |
null |
2024-10-23 |
VISAGE: Video Synthesis using Action Graphs for Surgery |
Yousef Yeganeh et.al. |
2410.17751 |
null |
2024-10-22 |
Offline Evaluation of Set-Based Text-to-Image Generation |
Negar Arabzadeh et.al. |
2410.17331 |
link |
2024-10-22 |
Altogether: Image Captioning via Re-aligning Alt-text |
Hu Xu et.al. |
2410.17251 |
link |
2024-10-22 |
IdenBAT: Disentangled Representation Learning for Identity-Preserved Brain Age Transformation |
Junyeong Maeng et.al. |
2410.16945 |
link |
2024-10-22 |
DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization |
Haowei Zhu et.al. |
2410.16942 |
null |
2024-10-22 |
Hierarchical Clustering for Conditional Diffusion in Image Generation |
Jorge da Silva Goncalves et.al. |
2410.16910 |
link |
2024-10-22 |
MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model |
Meng Xu et.al. |
2410.16840 |
null |
2024-10-22 |
Progressive Compositionality In Text-to-Image Generative Models |
Xu Han et.al. |
2410.16719 |
link |
2024-10-22 |
Dual-Model Defense: Safeguarding Diffusion Models from Membership Inference Attacks through Disjoint Data Splitting |
Bao Q. Tran et.al. |
2410.16657 |
null |
2024-10-21 |
MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors |
Honghua Chen et.al. |
2410.16272 |
null |
2024-10-21 |
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors |
Xi Liu et.al. |
2410.16266 |
null |
2024-10-21 |
Elucidating the design space of language models for image generation |
Xuantong Liu et.al. |
2410.16257 |
link |
2024-10-21 |
A Framework for Evaluating Predictive Models Using Synthetic Image Covariates and Longitudinal Data |
Simon Deltadahl et.al. |
2410.16177 |
null |
2024-10-21 |
Continuous Speech Synthesis using per-token Latent Diffusion |
Arnon Turetzky et.al. |
2410.16048 |
null |
2024-10-20 |
EVA: An Embodied World Model for Future Video Anticipation |
Xiaowei Chi et.al. |
2410.15461 |
null |
2024-10-20 |
Allegro: Open the Black Box of Commercial-Level Video Generation Model |
Yuan Zhou et.al. |
2410.15458 |
link |
2024-10-20 |
FrameBridge: Improving Image-to-Video Generation with Bridge Models |
Yuji Wang et.al. |
2410.15371 |
null |
2024-10-19 |
SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning |
Zhewei Dai et.al. |
2410.14987 |
link |
2024-10-19 |
Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence |
Vansh Bansal et.al. |
2410.14949 |
link |
2024-10-18 |
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities |
Shaozhe Hao et.al. |
2410.14672 |
link |
2024-10-18 |
FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models |
Rui Hu et.al. |
2410.14429 |
null |
2024-10-18 |
HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation |
Bo Cheng et.al. |
2410.14324 |
link |
2024-10-18 |
HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects |
Oliverio Theophilus Nathanael et.al. |
2410.14265 |
null |
2024-10-18 |
Text-to-Image Representativity Fairness Evaluation Framework |
Asma Yamani et.al. |
2410.14201 |
null |
2024-10-18 |
Personalized Image Generation with Large Multimodal Models |
Yiyan Xu et.al. |
2410.14170 |
link |
2024-10-18 |
Assessing Open-world Forgetting in Generative Image Model Customization |
Héctor Laria et.al. |
2410.14159 |
null |
2024-10-17 |
Inference of morphology and dynamical state of nearby $Planck$ -SZ galaxy clusters with Zernike polynomials |
Valentina Capalbo et.al. |
2410.13929 |
null |
2024-10-17 |
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens |
Lijie Fan et.al. |
2410.13863 |
null |
2024-10-17 |
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation |
Rongyao Fang et.al. |
2410.13861 |
link |
2024-10-17 |
VidPanos: Generative Panoramic Videos from Casual Panning Videos |
Jingwei Ma et.al. |
2410.13832 |
null |
2024-10-17 |
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control |
Yujie Wei et.al. |
2410.13830 |
null |
2024-10-18 |
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation |
Hanbo Cheng et.al. |
2410.13726 |
link |
2024-10-17 |
Movie Gen: A Cast of Media Foundation Models |
Adam Polyak et.al. |
2410.13720 |
link |
2024-10-17 |
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning |
Yiming Shi et.al. |
2410.13618 |
link |
2024-10-17 |
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation |
Guosheng Zhao et.al. |
2410.13571 |
null |
2024-10-17 |
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models |
Donghao Zhou et.al. |
2410.13370 |
null |
2024-10-18 |
Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model |
Weiyi Zhang et.al. |
2410.13242 |
null |
2024-10-16 |
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation |
Jaehong Yoon et.al. |
2410.12761 |
null |
2024-10-16 |
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation |
Dewei Zhou et.al. |
2410.12669 |
link |
2024-10-16 |
Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation |
Kathrin Khadra et.al. |
2410.12542 |
null |
2024-10-16 |
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective |
Yongxin Zhu et.al. |
2410.12490 |
link |
2024-10-16 |
Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks |
Pranjali Pathre et.al. |
2410.12432 |
link |
2024-10-16 |
FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization |
Cheng Yu et.al. |
2410.12312 |
link |
2024-10-16 |
Facing Identity: The Formation and Performance of Identity via Face-Based Artificial Intelligence Technologies |
Wells Lucas Santo et.al. |
2410.12148 |
null |
2024-10-15 |
On the Effectiveness of Dataset Alignment for Fake Image Detection |
Anirudh Sundara Rajan et.al. |
2410.11835 |
null |
2024-10-15 |
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities |
Hsin-Ping Huang et.al. |
2410.11824 |
null |
2024-10-16 |
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices |
Zhiyuan Ma et.al. |
2410.11795 |
null |
2024-10-15 |
Generative Image Steganography Based on Point Cloud |
Zhong Yangjie et.al. |
2410.11673 |
null |
2024-10-15 |
InvSeg: Test-Time Prompt Inversion for Semantic Segmentation |
Jiayi Lin et.al. |
2410.11473 |
null |
2024-10-15 |
A Simple Approach to Unifying Diffusion-based Conditional Generation |
Xirui Li et.al. |
2410.11439 |
null |
2024-10-15 |
Evolutionary Retrofitting |
Mathurin Videau et.al. |
2410.11330 |
null |
2024-10-15 |
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling |
Guiyu Zhang et.al. |
2410.11236 |
null |
2024-10-14 |
Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models |
Jingzhi Bao et.al. |
2410.10821 |
link |
2024-10-14 |
When Does Perceptual Alignment Benefit Vision Representations? |
Shobhita Sundaram et.al. |
2410.10817 |
null |
2024-10-14 |
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions |
Tianwei Xiong et.al. |
2410.10816 |
link |
2024-10-14 |
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer |
Haotian Tang et.al. |
2410.10812 |
link |
2024-10-14 |
Boosting Camera Motion Control for Video Diffusion Transformers |
Soon Yau Cheong et.al. |
2410.10802 |
null |
2024-10-15 |
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling |
Jian Yang et.al. |
2410.10798 |
null |
2024-10-14 |
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention |
Dejia Xu et.al. |
2410.10774 |
null |
2024-10-14 |
DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships |
Zhang Wan et.al. |
2410.10751 |
null |
2024-10-14 |
Evaluating SQL Understanding in Large Language Models |
Ananya Rahaman et.al. |
2410.10680 |
null |
2024-10-14 |
ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection |
Martin Aubard et.al. |
2410.10554 |
link |
2024-10-11 |
SceneCraft: Layout-Guided 3D Scene Generation |
Xiuyu Yang et.al. |
2410.09049 |
link |
2024-10-11 |
MiRAGeNews: Multimodal Realistic AI-Generated News Detection |
Runsheng Huang et.al. |
2410.09045 |
link |
2024-10-11 |
One-shot Generative Domain Adaptation in 3D GANs |
Ziqiang Li et.al. |
2410.08824 |
link |
2024-10-11 |
Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting |
Purushothaman Natarajan et.al. |
2410.08612 |
link |
2024-10-11 |
Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models |
Pascl Zwick et.al. |
2410.08551 |
link |
2024-10-11 |
Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities |
Abhijay Ghildyal et.al. |
2410.08534 |
null |
2024-10-11 |
Diffusion Models Need Visual Priors for Image Generation |
Xiaoyu Yue et.al. |
2410.08531 |
null |
2024-10-10 |
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis |
Jinbin Bai et.al. |
2410.08261 |
link |
2024-10-10 |
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content |
Qiuheng Wang et.al. |
2410.08260 |
null |
2024-10-10 |
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models |
Xiaoxiao He et.al. |
2410.08207 |
null |
2024-10-10 |
Scaling Laws For Diffusion Transformers |
Zhengyang Liang et.al. |
2410.08184 |
null |
2024-10-10 |
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation |
Jiatao Gu et.al. |
2410.08159 |
null |
2024-10-10 |
RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace |
Pragyan Shrestha et.al. |
2410.08152 |
link |
2024-10-10 |
Progressive Autoregressive Video Diffusion Models |
Desai Xie et.al. |
2410.08151 |
link |
2024-10-10 |
Generated Bias: Auditing Internal Bias Dynamics of Text-To-Image Generative Models |
Abhishek Mandal et.al. |
2410.07884 |
null |
2024-10-10 |
MinorityPrompt: Text to Minority Image Generation via Prompt Optimization |
Soobin Um et.al. |
2410.07838 |
link |
2024-10-10 |
HARIVO: Harnessing Text-to-Image Models for Video Generation |
Mingi Kwon et.al. |
2410.07763 |
null |
2024-10-10 |
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation |
Jiahao Cui et.al. |
2410.07718 |
link |
2024-10-10 |
Relational Diffusion Distillation for Efficient Image Generation |
Weilun Feng et.al. |
2410.07679 |
link |
2024-10-09 |
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation |
Xinchen Zhang et.al. |
2410.07171 |
link |
2024-10-09 |
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis |
Bohan Zeng et.al. |
2410.07155 |
link |
2024-10-10 |
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models |
Rui Zhao et.al. |
2410.07133 |
link |
2024-10-09 |
Personalized Visual Instruction Tuning |
Renjie Pi et.al. |
2410.07113 |
link |
2024-10-09 |
Decouple-Then-Merge: Towards Better Training for Diffusion Models |
Qianli Ma et.al. |
2410.06664 |
null |
2024-10-09 |
On the Solution of Linearized Inverse Scattering Problems in Near-Field Microwave Imaging by Operator Inversion and Matched Filtering |
Matthias M. Saurer et.al. |
2410.06465 |
null |
2024-10-08 |
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization |
Jiawei Mao et.al. |
2410.06244 |
null |
2024-10-08 |
BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way |
Jiazi Bu et.al. |
2410.06241 |
null |
2024-10-08 |
SD- $π$ XL: Generating Low-Resolution Quantized Imagery via Score Distillation |
Alexandre Binninger et.al. |
2410.06236 |
link |
2024-10-08 |
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation |
Chi-Lam Cheang et.al. |
2410.06158 |
null |
2024-10-07 |
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models |
Ailing Zeng et.al. |
2410.05227 |
null |
2024-10-07 |
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality |
Ge Ya et.al. |
2410.05203 |
link |
2024-10-07 |
Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning |
Ayano Hiranaka et.al. |
2410.05116 |
null |
2024-10-07 |
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction |
Leheng Li et.al. |
2410.04932 |
null |
2024-10-07 |
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing |
Feng Tian et.al. |
2410.04844 |
link |
2024-10-07 |
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction |
Hyungjin Chung et.al. |
2410.04721 |
null |
2024-10-06 |
Realizing Video Summarization from the Path of Language-based Semantic Understanding |
Kuan-Chen Mu et.al. |
2410.04511 |
null |
2024-10-06 |
Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training |
Wenbo Li et.al. |
2410.04439 |
null |
2024-10-06 |
Disentangling Regional Primitives for Image Generation |
Zhengting Chen et.al. |
2410.04421 |
null |
2024-10-05 |
The Visualization JUDGE : Can Multimodal Foundation Models Guide Visualization Design Through Visual Perception? |
Matthew Berger et.al. |
2410.04280 |
null |
2024-10-04 |
Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features |
Benyuan Meng et.al. |
2410.03558 |
link |
2024-10-04 |
Dynamic Diffusion Transformer |
Wangbo Zhao et.al. |
2410.03456 |
link |
2024-10-04 |
Images Speak Volumes: User-Centric Assessment of Image Generation for Accessible Communication |
Miriam Anschütz et.al. |
2410.03430 |
link |
2024-10-04 |
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding |
Doohyuk Jang et.al. |
2410.03355 |
null |
2024-10-04 |
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization |
Zichen Miao et.al. |
2410.03190 |
null |
2024-10-04 |
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach |
Yaofang Liu et.al. |
2410.03160 |
link |
2024-10-04 |
ECHOPulse: ECG controlled echocardio-grams video generation |
Yiwei Li et.al. |
2410.03143 |
link |
2024-10-03 |
Revealing the Unseen: Guiding Personalized Diffusion Models to Expose Training Data |
Xiaoyu Wu et.al. |
2410.03039 |
null |
2024-10-03 |
Loong: Generating Minute-level Long Videos with Autoregressive Language Models |
Yuqing Wang et.al. |
2410.02757 |
null |
2024-10-03 |
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models |
Hongxiang Zhang et.al. |
2410.02710 |
null |
2024-10-03 |
ControlAR: Controllable Image Generation with Autoregressive Models |
Zongming Li et.al. |
2410.02705 |
link |
2024-10-03 |
Grounded Answers for Multi-agent Decision-making Problem through Generative World Model |
Zeyang Liu et.al. |
2410.02664 |
null |
2024-10-03 |
Event-Customized Image Generation |
Zhen Wang et.al. |
2410.02483 |
null |
2024-10-04 |
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation |
Muzhi Zhu et.al. |
2410.02369 |
link |
2024-10-03 |
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration |
Jintao Zhang et.al. |
2410.02367 |
link |
2024-10-03 |
Plug-and-Play Controllable Generation for Discrete Masked Models |
Wei Guo et.al. |
2410.02143 |
null |
2024-10-02 |
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing |
Haotian Sun et.al. |
2410.02098 |
null |
2024-10-02 |
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation |
Jing He et.al. |
2410.02067 |
null |
2024-10-02 |
Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space |
Yangming Li et.al. |
2410.01796 |
null |
2024-10-02 |
ImageFolder: Autoregressive Image Generation with Folded Tokens |
Xiang Li et.al. |
2410.01756 |
link |
2024-10-02 |
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation |
Rinon Gal et.al. |
2410.01731 |
null |
2024-10-02 |
COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation |
Mingzhen Sun et.al. |
2410.01718 |
null |
2024-10-02 |
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding |
Yao Teng et.al. |
2410.01699 |
link |
2024-10-02 |
Data Extrapolation for Text-to-image Generation on Small Datasets |
Senmao Ye et.al. |
2410.01638 |
link |
2024-10-02 |
KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models |
Pouyan Navard et.al. |
2410.01595 |
link |
2024-10-02 |
MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation |
Mingzhen Sun et.al. |
2410.01594 |
link |
2024-10-02 |
Edge-preserving noise for diffusion models |
Jente Vandersanden et.al. |
2410.01540 |
null |
2024-10-02 |
Aggregation of Multi Diffusion Models for Enhancing Learned Representations |
Conghan Yue et.al. |
2410.01262 |
link |
2024-09-30 |
Inverse Painting: Reconstructing The Painting Process |
Bowei Chen et.al. |
2409.20556 |
null |
2024-09-30 |
All-optical autoencoder machine learning framework using diffractive processors |
Peijie Feng et.al. |
2409.20346 |
null |
2024-09-30 |
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs |
Zicheng Zhang et.al. |
2409.20063 |
null |
2024-09-30 |
Illustrious: an Open Advanced Illustration Model |
Sang Hyun Park et.al. |
2409.19946 |
null |
2024-09-30 |
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation |
Wenchao Chen et.al. |
2409.19937 |
null |
2024-09-30 |
Replace Anyone in Videos |
Xiang Wang et.al. |
2409.19911 |
link |
2024-09-29 |
OrganiQ: Mitigating Classical Resource Bottlenecks of Quantum Generative Adversarial Networks on NISQ-Era Machines |
Daniel Silver et.al. |
2409.19823 |
null |
2024-09-29 |
Simple and Fast Distillation of Diffusion Models |
Zhenyu Zhou et.al. |
2409.19681 |
link |
2024-09-29 |
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection |
Yuhang Ma et.al. |
2409.19624 |
null |
2024-09-29 |
Effective Diffusion Transformer Architecture for Image Super-Resolution |
Kun Cheng et.al. |
2409.19589 |
link |
2024-09-27 |
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation |
Shaowei Liu et.al. |
2409.18964 |
link |
2024-09-27 |
Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions |
Iskander Azangulov et.al. |
2409.18804 |
null |
2024-09-26 |
Realistic Evaluation of Model Merging for Compositional Generalization |
Derek Tam et.al. |
2409.18314 |
link |
2024-09-26 |
Harnessing Wavelet Transformations for Generalizable Deepfake Forgery Detection |
Lalith Bharadwaj Baru et.al. |
2409.18301 |
link |
2024-09-26 |
Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey |
Yi Zhang et.al. |
2409.18214 |
link |
2024-09-26 |
FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner |
Wenliang Zhao et.al. |
2409.18128 |
link |
2024-09-26 |
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction |
Jing He et.al. |
2409.18124 |
null |
2024-09-26 |
DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models |
Helin Cao et.al. |
2409.18092 |
null |
2024-09-26 |
Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion |
Hengrui Gu et.al. |
2409.17928 |
link |
2024-09-26 |
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation |
Qihan Huang et.al. |
2409.17920 |
link |
2024-09-26 |
Text Image Generation for Low-Resource Languages with Dual Translation Learning |
Chihiro Noguchi et.al. |
2409.17747 |
null |
2024-09-26 |
AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status |
Jinghao Zhang et.al. |
2409.17740 |
null |
2024-09-26 |
Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation |
Huan Yang et.al. |
2409.17674 |
null |
2024-09-26 |
ID $^3$ : Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition |
Shen Li et.al. |
2409.17576 |
null |
2024-09-26 |
Pixel-Space Post-Training of Latent Diffusion Models |
Christina Zhang et.al. |
2409.17565 |
null |
2024-09-25 |
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design |
Phillip Mueller et.al. |
2409.17045 |
null |
2024-09-25 |
Pose-Guided Fine-Grained Sign Language Video Generation |
Tongkai Shi et.al. |
2409.16709 |
null |
2024-09-25 |
Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation |
Youngwan Jin et.al. |
2409.16706 |
link |
2024-09-25 |
Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement |
Yihao Zhou et.al. |
2409.16661 |
null |
2024-09-25 |
ECG-Image-Database: A Dataset of ECG Images with Real-World Imaging and Scanning Artifacts; A Foundation for Computerized ECG Image Digitization and Analysis |
Matthew A. Reyna et.al. |
2409.16612 |
link |
2024-09-24 |
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation |
Homanga Bharadhwaj et.al. |
2409.16283 |
null |
2024-09-24 |
MonoFormer: One Transformer for Both Diffusion and Autoregression |
Chuyang Zhao et.al. |
2409.16280 |
link |
2024-09-24 |
Label-Augmented Dataset Distillation |
Seoungyoon Kang et.al. |
2409.16239 |
null |
2024-09-24 |
MaskBit: Embedding-free Image Generation via Bit Tokens |
Mark Weber et.al. |
2409.16211 |
link |
2024-09-26 |
Enhanced Unsupervised Image-to-Image Translation Using Contrastive Learning and Histogram of Oriented Gradients |
Wanchen Zhao et.al. |
2409.16042 |
null |
2024-09-24 |
Deep chroma compression of tone-mapped images |
Xenios Milidonis et.al. |
2409.16032 |
link |
2024-09-24 |
Improvements to SDXL in NovelAI Diffusion V3 |
Juan Ossa et.al. |
2409.15997 |
null |
2024-09-23 |
Critic Loss for Image Classification |
Brendan Hogan Rappazzo et.al. |
2409.15565 |
null |
2024-09-23 |
Bayesian computation with generative diffusion models by Multilevel Monte Carlo |
Abdul-Lateef Haji-Ali et.al. |
2409.15511 |
link |
2024-09-23 |
Revealing an Unattractivity Bias in Mental Reconstruction of Occluded Faces using Generative Image Models |
Frederik Riedmann et.al. |
2409.15443 |
null |
2024-09-18 |
Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance |
Jaehoon Joo et.al. |
2409.12099 |
null |
2024-09-18 |
ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation |
Peiyu Li et.al. |
2409.12010 |
link |
2024-09-18 |
Tracking Any Point with Frame-Event Fusion Network at High Frame Rate |
Jiaxiong Liu et.al. |
2409.11953 |
null |
2024-09-18 |
Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation |
Dimitrios Christodoulou et.al. |
2409.11904 |
null |
2024-09-18 |
RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and Blankets |
Jikai Ye et.al. |
2409.11831 |
null |
2024-09-18 |
GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation |
Shuowen Liang et.al. |
2409.11689 |
link |
2024-09-17 |
Using Physics Informed Generative Adversarial Networks to Model 3D porous media |
Zihan Ren et.al. |
2409.11541 |
null |
2024-09-17 |
OSV: One Step is Enough for High-Quality Image to Video Generation |
Xiaofeng Mao et.al. |
2409.11367 |
null |
2024-09-17 |
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think |
Gonzalo Martin Garcia et.al. |
2409.11355 |
link |
2024-09-17 |
OmniGen: Unified Image Generation |
Shitao Xiao et.al. |
2409.11340 |
link |
2024-09-18 |
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives |
Samee Arif et.al. |
2409.11261 |
link |
2024-09-17 |
Improving the Efficiency of Visually Augmented Language Models |
Paula Ontalvilla et.al. |
2409.11148 |
link |
2024-09-17 |
MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance |
Debin Meng et.al. |
2409.11010 |
link |
2024-09-16 |
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models |
Bingchen Liu et.al. |
2409.10695 |
null |
2024-09-16 |
SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing |
Qi Qian et.al. |
2409.10476 |
null |
2024-09-16 |
VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation |
Aaron Mark Thomas et.al. |
2409.10339 |
null |
2024-09-16 |
On Synthetic Texture Datasets: Challenges, Creation, and Curation |
Blaine Hoak et.al. |
2409.10297 |
null |
2024-09-16 |
Embodiment-Agnostic Action Planning via Object-Part Scene Flow |
Weiliang Tang et.al. |
2409.10032 |
null |
2024-09-15 |
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion |
Vitor Guizilini et.al. |
2409.09896 |
null |
2024-09-15 |
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$ -divergence Minimization |
Haoyuan Sun et.al. |
2409.09774 |
null |
2024-09-15 |
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection |
Yaning Zhang et.al. |
2409.09724 |
link |
2024-09-15 |
Finetuning CLIP to Reason about Pairwise Differences |
Dylan Sam et.al. |
2409.09721 |
link |
2024-09-15 |
E-Commerce Inpainting with Mask Guidance in Controlnet for Reducing Overcompletion |
Guandong Li et.al. |
2409.09681 |
null |
2024-09-13 |
InstantDrag: Improving Interactivity in Drag-based Image Editing |
Joonghyuk Shin et.al. |
2409.08857 |
null |
2024-09-13 |
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment |
Yong Ren et.al. |
2409.08601 |
null |
2024-09-13 |
Enhancing Privacy in ControlNet and Stable Diffusion via Split Learning |
Dixi Yao et.al. |
2409.08503 |
null |
2024-09-12 |
Click2Mask: Local Editing with Dynamic Mask Generation |
Omer Regev et.al. |
2409.08272 |
link |
2024-09-12 |
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder |
NaHyeon Park et.al. |
2409.08248 |
link |
2024-09-12 |
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation |
Yinwei Wu et.al. |
2409.08240 |
null |
2024-09-12 |
High-Frequency Anti-DreamBooth: Robust Defense Against Image Synthesis |
Takuto Onikubo et.al. |
2409.08167 |
link |
2024-09-12 |
EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance |
Zicheng Duan et.al. |
2409.08091 |
link |
2024-09-12 |
Scribble-Guided Diffusion for Training-free Text-to-Image Generation |
Seonho Lee et.al. |
2409.08026 |
link |
2024-09-11 |
DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures |
Steven Hogue et.al. |
2409.07649 |
null |
2024-09-11 |
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models |
Haibo Yang et.al. |
2409.07452 |
link |
2024-09-11 |
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process |
Yang Luo et.al. |
2409.07451 |
null |
2024-09-11 |
Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy |
Somayeh Pakdelmoez et.al. |
2409.07422 |
null |
2024-09-11 |
EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion |
Jian Zhang et.al. |
2409.07255 |
link |
2024-09-11 |
Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education |
Ali Forootani et.al. |
2409.07110 |
link |
2024-09-10 |
DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images |
Taslim Murad et.al. |
2409.06694 |
null |
2024-09-10 |
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation |
Teng Hu et.al. |
2409.06633 |
null |
2024-09-10 |
PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation |
Ginger Delmas et.al. |
2409.06535 |
null |
2024-09-10 |
DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement |
Jia-Wei Liao et.al. |
2409.06355 |
null |
2024-09-10 |
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer |
Jinzhi Zhang et.al. |
2409.06322 |
null |
2024-09-11 |
MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control |
Yining Yao et.al. |
2409.06189 |
null |
2024-09-09 |
SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values |
Chengwei Sun et.al. |
2409.05926 |
null |
2024-09-11 |
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation |
Wei Wu et.al. |
2409.05463 |
null |
2024-09-09 |
CipherDM: Secure Three-Party Inference for Diffusion Model Sampling |
Xin Zhao et.al. |
2409.05414 |
null |
2024-09-09 |
TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors |
Yichuan Mo et.al. |
2409.05294 |
link |
2024-09-08 |
Can OOD Object Detectors Learn from Foundation Models? |
Jiahui Liu et.al. |
2409.05162 |
link |
2024-09-07 |
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation |
Jiaxin Cheng et.al. |
2409.04847 |
link |
2024-09-07 |
SpotActor: Training-Free Layout-Controlled Consistent Image Generation |
Jiahao Wang et.al. |
2409.04801 |
null |
2024-09-07 |
Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis |
Arjun Krishna et.al. |
2409.04670 |
null |
2024-09-06 |
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation |
Yecheng Wu et.al. |
2409.04429 |
link |
2024-09-06 |
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation |
Zhuoyan Luo et.al. |
2409.04410 |
link |
2024-09-06 |
Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks |
Hangcheng Cao et.al. |
2409.04133 |
null |
2024-09-06 |
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task |
Jing Wang et.al. |
2409.04005 |
link |
2024-09-06 |
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes |
Jianbiao Mei et.al. |
2409.04003 |
link |
2024-09-05 |
ArtiFade: Learning to Generate High-quality Subject from Blemished Images |
Shuya Yang et.al. |
2409.03745 |
null |
2024-09-05 |
Blended Latent Diffusion under Attention Control for Real-World Video Editing |
Deyin Liu et.al. |
2409.03514 |
null |
2024-09-05 |
Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks |
Akshay Jain et.al. |
2409.03458 |
link |
2024-09-05 |
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities |
Wei Lu et.al. |
2409.03444 |
link |
2024-09-09 |
RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning |
Lawrence Yunliang Chen et.al. |
2409.03403 |
null |
2024-09-05 |
Enhancing digital core image resolution using optimal upscaling algorithm: with application to paired SEM images |
Shaohua You et.al. |
2409.03265 |
null |
2024-09-06 |
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts |
Xinyu Liu et.al. |
2409.02919 |
link |
2024-09-04 |
PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation |
Jun Ling et.al. |
2409.02657 |
null |
2024-09-04 |
Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects |
Kyungmin Jo et.al. |
2409.02653 |
null |
2024-09-05 |
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency |
Jianwen Jiang et.al. |
2409.02634 |
null |
2024-09-04 |
StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models |
Wen Li et.al. |
2409.02543 |
link |
2024-09-04 |
A Learnable Color Correction Matrix for RAW Reconstruction |
Anqi Liu et.al. |
2409.02497 |
null |
2024-09-04 |
Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing |
Siyi Chen et.al. |
2409.02374 |
link |
2024-09-03 |
QID $^2$ : An Image-Conditioned Diffusion Model for Q-space Up-sampling of DWI Data |
Zijian Chen et.al. |
2409.02309 |
null |
2024-09-03 |
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos |
Wenbo Hu et.al. |
2409.02095 |
link |
2024-09-03 |
Probing Noncentrosymmetric 2D Materials by Fourier Space Second Harmonic Imaging |
Lucas Lafeta et.al. |
2409.02071 |
null |
2024-08-30 |
CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion |
Yiran Chen et.al. |
2408.17424 |
null |
2024-08-30 |
Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution |
Yixin Wu et.al. |
2408.17285 |
null |
2024-08-30 |
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers |
Juncan Deng et.al. |
2408.17131 |
null |
2024-08-30 |
FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition |
Chen Hu et.al. |
2408.17090 |
link |
2024-08-30 |
Text-to-Image Generation Via Energy-Based CLIP |
Roy Ganz et.al. |
2408.17046 |
null |
2024-08-30 |
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding |
Yonghui Wang et.al. |
2408.16986 |
link |
2024-08-30 |
Contrastive Learning with Synthetic Positives |
Dewen Zeng et.al. |
2408.16965 |
link |
2024-08-29 |
STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models |
Koushik Srivatsan et.al. |
2408.16807 |
link |
2024-09-04 |
CSGO: Content-Style Composition in Text-to-Image Generation |
Peng Xing et.al. |
2408.16766 |
null |
2024-08-29 |
One-Shot Learning Meets Depth Diffusion in Multi-Object Videos |
Anisha Jain et.al. |
2408.16704 |
null |
2024-08-29 |
GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models |
Moreno D’Incà et.al. |
2408.16700 |
link |
2024-08-29 |
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving |
Yongjie Fu et.al. |
2408.16647 |
null |
2024-08-29 |
RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model |
Zhuan Shi et.al. |
2408.16634 |
null |
2024-08-29 |
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors |
Xiangchen Yin et.al. |
2408.16540 |
link |
2024-08-29 |
Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation |
Xiaoyu Jin et.al. |
2408.16506 |
null |
2024-08-29 |
Spiking Diffusion Models |
Jiahang Cao et.al. |
2408.16467 |
link |
2024-08-29 |
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding |
Minghang Zheng et.al. |
2408.16314 |
link |
2024-08-29 |
Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation |
Yanghao Wang et.al. |
2408.16266 |
link |
2024-08-28 |
Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data |
Ayodeji Ijishakin et.al. |
2408.15890 |
null |
2024-08-28 |
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model |
Yongjie Fu et.al. |
2408.15868 |
null |
2024-08-28 |
Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas |
Fabio Quattrini et.al. |
2408.15660 |
link |
2024-08-28 |
Hand1000: Generating Realistic Hands from Text with Only 1,000 Images |
Haozhuo Zhang et.al. |
2408.15461 |
null |
2024-08-28 |
Avoiding Generative Model Writer’s Block With Embedding Nudging |
Ali Zand et.al. |
2408.15450 |
null |
2024-08-27 |
GenRec: Unifying Video Generation and Recognition with Diffusion Models |
Zejia Weng et.al. |
2408.15241 |
link |
2024-08-27 |
Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance |
Weiyi Zhang et.al. |
2408.15217 |
link |
2024-08-27 |
Alfie: Democratising RGBA Image Generation With No $$$ |
Fabio Quattrini et.al. |
2408.14826 |
link |
2024-08-27 |
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation |
Abdelrahman Eldesokey et.al. |
2408.14819 |
null |
2024-08-27 |
CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis |
Weijia Li et.al. |
2408.14765 |
null |
2024-08-27 |
Sequential-Scanning Dual-Energy CT Imaging Using High Temporal Resolution Image Reconstruction and Error-Compensated Material Basis Image Generation |
Qiaoxin Li et.al. |
2408.14754 |
null |
2024-08-27 |
Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation |
Bochao Liu et.al. |
2408.14738 |
null |
2024-08-26 |
DIAGen: Diverse Image Augmentation with Generative Models |
Tobias Lingenberg et.al. |
2408.14584 |
link |
2024-08-26 |
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy |
Peiyan Li et.al. |
2408.14368 |
link |
2024-08-26 |
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty |
Xindi Wu et.al. |
2408.14339 |
null |
2024-08-26 |
Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models |
Chaohua Shi et.al. |
2408.14135 |
null |
2024-08-26 |
SurGen: Text-Guided Diffusion Model for Surgical Video Generation |
Joseph Cho et.al. |
2408.14028 |
null |
2024-08-27 |
RT-Attack: Jailbreaking Text-to-Image Models via Random Token |
Sensen Gao et.al. |
2408.13896 |
null |
2024-08-25 |
Prior Learning in Introspective VAEs |
Ioannis Athanasiadis et.al. |
2408.13805 |
null |
2024-08-25 |
SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting |
Wenrui Li et.al. |
2408.13711 |
link |
2024-08-27 |
Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing |
Yitong Yang et.al. |
2408.13623 |
null |
2024-08-24 |
DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation |
Ying Jin et.al. |
2408.13509 |
link |
2024-08-24 |
Explainable Concept Generation through Vision-Language Preference Learning |
Aditya Taparia et.al. |
2408.13438 |
null |
2024-08-23 |
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities |
Tao Wu et.al. |
2408.13239 |
link |
2024-08-23 |
Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation |
Bonan Li et.al. |
2408.13149 |
null |
2024-08-23 |
G3FA: Geometry-guided GAN for Face Animation |
Alireza Javanmardi et.al. |
2408.13049 |
null |
2024-08-23 |
EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation |
Cong Wang et.al. |
2408.13005 |
null |
2024-08-22 |
Unlocking Intrinsic Fairness in Stable Diffusion |
Eunji Kim et.al. |
2408.12692 |
null |
2024-08-22 |
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations |
Can Qin et.al. |
2408.12590 |
null |
2024-08-22 |
Real-Time Video Generation with Pyramid Attention Broadcast |
Xuanlei Zhao et.al. |
2408.12588 |
link |
2024-08-25 |
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation |
Jinheng Xie et.al. |
2408.12528 |
null |
2024-08-22 |
CODE: Confident Ordinary Differential Editing |
Bastien van Delft et.al. |
2408.12418 |
link |
2024-08-22 |
Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce |
Ádám Tibor Czapp et.al. |
2408.12392 |
null |
2024-08-22 |
Scalable Autoregressive Image Generation with Mamba |
Haopeng Li et.al. |
2408.12245 |
link |
2024-08-22 |
MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient |
Yanzeng Li et.al. |
2408.12236 |
null |
2024-08-22 |
BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking |
Hanzheng Wang et.al. |
2408.12232 |
null |
2024-08-22 |
DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models |
Wuchao Li et.al. |
2408.12153 |
null |
2024-08-21 |
Approaching Deep Learning through the Spectral Dynamics of Weights |
David Yunis et.al. |
2408.11804 |
link |
2024-08-21 |
DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework |
Zhifei Xie et.al. |
2408.11788 |
null |
2024-08-21 |
Iterative Object Count Optimization for Text-to-image Diffusion Models |
Oz Zafar et.al. |
2408.11721 |
null |
2024-08-21 |
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting |
Liyao Jiang et.al. |
2408.11706 |
null |
2024-08-21 |
TrackGo: A Flexible and Efficient Method for Controllable Video Generation |
Haitao Zhou et.al. |
2408.11475 |
null |
2024-08-21 |
Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection |
Jingwei Sun et.al. |
2408.11408 |
link |
2024-08-21 |
Gender Bias Evaluation in Text-to-image Generation: A Survey |
Yankun Wu et.al. |
2408.11358 |
null |
2024-08-21 |
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation |
Xiangyu Zhao et.al. |
2408.11305 |
link |
2024-08-20 |
Compress Guidance in Conditional Diffusion Sampling |
Anh-Dung Dinh et.al. |
2408.11194 |
null |
2024-08-20 |
MS $^3$ D: A RG Flow-Based Regularization for GAN Training with Limited Data |
Jian Wang et.al. |
2408.11135 |
null |
2024-08-20 |
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning |
Haoning Wu et.al. |
2408.11001 |
link |
2024-08-20 |
A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse |
Zhongliang Guo et.al. |
2408.10901 |
null |
2024-08-21 |
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration |
Yanbo Ding et.al. |
2408.10605 |
link |
2024-08-20 |
Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models |
Cong Wan et.al. |
2408.10571 |
link |
2024-08-19 |
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation |
Liu He et.al. |
2408.10453 |
null |
2024-08-19 |
The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks |
Niyar R Barman et.al. |
2408.10446 |
null |
2024-08-19 |
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data |
Tao Yang et.al. |
2408.10119 |
null |
2024-08-19 |
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation |
Yunxin Li et.al. |
2408.09787 |
link |
2024-08-19 |
TraDiffusion: Trajectory-Based Training-Free Image Generation |
Mingrui Wu et.al. |
2408.09739 |
link |
2024-08-21 |
Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning |
Zhi Qiao et.al. |
2408.09731 |
null |
2024-08-18 |
AnomalyFactory: Regard Anomaly Generation as Unsupervised Anomaly Localization |
Ying Zhao et.al. |
2408.09533 |
null |
2024-08-18 |
Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned Pairs |
Bowen Xin et.al. |
2408.09432 |
null |
2024-08-18 |
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama |
Jing Tang et.al. |
2408.09333 |
link |
2024-08-16 |
PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future |
Guangyi Wang et.al. |
2408.08822 |
link |
2024-08-16 |
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation |
Peiming Guo et.al. |
2408.08650 |
link |
2024-08-16 |
Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness |
Hefei Mei et.al. |
2408.08502 |
link |
2024-08-15 |
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations |
Xiaochuang Han et.al. |
2408.08459 |
null |
2024-08-15 |
METR: Image Watermarking with Large Number of Unique Messages |
Alexander Varlamov et.al. |
2408.08340 |
link |
2024-08-15 |
Can Large Language Models Understand Symbolic Graphics Programs? |
Zeju Qiu et.al. |
2408.08313 |
null |
2024-08-15 |
Accelerated Image-Aware Generative Diffusion Modeling |
Tanmay Asthana et.al. |
2408.08306 |
null |
2024-08-15 |
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding |
Xiner Li et.al. |
2408.08252 |
link |
2024-08-16 |
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance |
Jiasong Feng et.al. |
2408.08189 |
null |
2024-08-15 |
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding |
Pingping Zhang et.al. |
2408.08093 |
null |
2024-08-15 |
A Novel Generative Artificial Intelligence Method for Interference Study on Multiplex Brightfield Immunohistochemistry Images |
Satarupa Mukherjee et.al. |
2408.07860 |
null |
2024-08-14 |
Boosting Unconstrained Face Recognition with Targeted Style Adversary |
Mohammad Saeed Ebrahimi Saadabadi et.al. |
2408.07642 |
null |
2024-08-14 |
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving |
Yuqing Wen et.al. |
2408.07605 |
null |
2024-08-14 |
KIND: Knowledge Integration and Diversion in Diffusion Models |
Yucheng Xie et.al. |
2408.07337 |
link |
2024-08-13 |
Generative Photomontage |
Sean J. Liu et.al. |
2408.07116 |
null |
2024-08-13 |
Definition of multispectral camera system parameters to model the asteroid 2001 SN263 |
Gabriela de Carvalho Assis Goulart et.al. |
2408.06886 |
null |
2024-08-13 |
Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective |
Ouxiang Li et.al. |
2408.06741 |
link |
2024-08-13 |
DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion |
Yujia Wu et.al. |
2408.06740 |
null |
2024-08-13 |
DiffSG: A Generative Solver for Network Optimization with Diffusion Model |
Ruihuai Liang et.al. |
2408.06701 |
link |
2024-08-12 |
Prompt Recovery for Image Generation Models: A Comparative Study of Discrete Optimizers |
Joshua Nathaniel Williams et.al. |
2408.06502 |
null |
2024-08-15 |
ControlNeXt: Powerful and Efficient Control for Image and Video Generation |
Bohao Peng et.al. |
2408.06070 |
link |
2024-08-10 |
ZePo: Zero-Shot Portrait Stylization with Faster Sampling |
Jin Liu et.al. |
2408.05492 |
link |
2024-08-10 |
Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE |
Yiying Yang et.al. |
2408.05477 |
null |
2024-08-10 |
Artworks Reimagined: Exploring Human-AI Co-Creation through Body Prompting |
Jonas Oppenlaender et.al. |
2408.05476 |
null |
2024-08-10 |
High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model |
Weizhi Zhong et.al. |
2408.05416 |
null |
2024-08-09 |
Instruction Tuning-free Visual Token Complement for Multimodal LLMs |
Dongsheng Wang et.al. |
2408.05019 |
null |
2024-08-09 |
DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting |
Jihoon Lee et.al. |
2408.04962 |
null |
2024-08-08 |
Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets |
Hao Li et.al. |
2408.04777 |
null |
2024-08-08 |
Zero-Shot Uncertainty Quantification using Diffusion Probabilistic Models |
Dule Shu et.al. |
2408.04718 |
null |
2024-08-08 |
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics |
Ruining Li et.al. |
2408.04631 |
null |
2024-08-07 |
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling |
William Y. Zhu et.al. |
2408.04102 |
link |
2024-08-07 |
Counterfactuals and Uncertainty-Based Explainable Paradigm for the Automated Detection and Segmentation of Renal Cysts in Computed Tomography Images: A Multi-Center Study |
Zohaib Salahuddin et.al. |
2408.03789 |
null |
2024-08-07 |
Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model |
Guoqing Zhu et.al. |
2408.03748 |
link |
2024-08-07 |
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling |
Zilyu Ye et.al. |
2408.03695 |
link |
2024-08-07 |
A comparative study of generative adversarial networks for image recognition algorithms based on deep learning and traditional methods |
Yihao Zhong et.al. |
2408.03568 |
null |
2024-08-06 |
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey |
Vu Tuan Truong et.al. |
2408.03400 |
null |
2024-08-06 |
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts |
Ciara Rowles et.al. |
2408.03209 |
null |
2024-08-06 |
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion |
Xingguang Yan et.al. |
2408.03178 |
null |
2024-08-06 |
Iterative CT Reconstruction via Latent Variable Optimization of Shallow Diffusion Models |
Sho Ozaki et.al. |
2408.03156 |
null |
2024-08-06 |
Multitask and Multimodal Neural Tuning for Large Models |
Hao Sun et.al. |
2408.03001 |
null |
2024-08-06 |
DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model |
Yiming Zhong et.al. |
2408.02993 |
link |
2024-08-05 |
Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services |
Shaopeng Fu et.al. |
2408.02814 |
link |
2024-08-05 |
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining |
Dongyang Liu et.al. |
2408.02657 |
link |
2024-08-05 |
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation |
Zhiyu Tan et.al. |
2408.02629 |
null |
2024-08-06 |
ProCreate, Don’t Reproduce! Propulsive Energy Diffusion for Creative Generation |
Jack Lu et.al. |
2408.02226 |
link |
2024-08-04 |
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance |
Aoming Liu et.al. |
2408.02157 |
null |
2024-08-04 |
LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation |
Dwij Mehta et.al. |
2408.02078 |
null |
2024-08-04 |
Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation |
Jean Yu et.al. |
2408.02054 |
null |
2024-08-04 |
Robustness of Watermarking on Text-to-Image Diffusion Models |
Xiaodong Wu et.al. |
2408.02035 |
null |
2024-08-03 |
SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm |
Junyan Ye et.al. |
2408.01812 |
null |
2024-08-03 |
A Novel Evaluation Framework for Image2Text Generation |
Jia-Hong Huang et.al. |
2408.01723 |
null |
2024-08-03 |
Controllable Unlearning for Image-to-Image Generative Models via $\varepsilon$ -Constrained Optimization |
Xiaohua Feng et.al. |
2408.01689 |
null |
2024-08-02 |
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling |
Qian Zhang et.al. |
2408.01181 |
link |
2024-08-02 |
PINNs for Medical Image Analysis: A Survey |
Chayan Banerjee et.al. |
2408.01026 |
null |
2024-08-02 |
EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts |
Die Chen et.al. |
2408.01014 |
null |
2024-08-02 |
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation |
Xiang Gao et.al. |
2408.00998 |
link |
2024-08-01 |
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention |
Susung Hong et.al. |
2408.00760 |
link |
2024-08-01 |
Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function |
Matias Oscar Volman Stern et.al. |
2408.00707 |
null |
2024-08-01 |
A new approach for encoding code and assisting code understanding |
Mengdan Fan et.al. |
2408.00521 |
null |
2024-08-01 |
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion |
Manuel Kansy et.al. |
2408.00458 |
null |
2024-08-01 |
Towards Reliable Advertising Image Generation Using Human Feedback |
Zhenbang Du et.al. |
2408.00418 |
link |
2024-08-01 |
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving |
Xuemeng Yang et.al. |
2408.00415 |
null |
2024-08-01 |
On the Limitations and Prospects of Machine Unlearning for Generative AI |
Shiji Zhou et.al. |
2408.00376 |
null |
2024-08-01 |
Few-shot Defect Image Generation based on Consistency Modeling |
Qingfeng Shi et.al. |
2408.00372 |
link |
2024-08-01 |
Navigating Text-to-Image Generative Bias across Indic Languages |
Surbhi Mittal et.al. |
2408.00283 |
null |
2024-07-31 |
WAS: Dataset and Methods for Artistic Text Segmentation |
Xudong Xie et.al. |
2408.00106 |
link |
2024-07-31 |
Detecting, Explaining, and Mitigating Memorization in Diffusion Models |
Yuxin Wen et.al. |
2407.21720 |
link |
2024-07-31 |
Tora: Trajectory-oriented Diffusion Transformer for Video Generation |
Zhenghao Zhang et.al. |
2407.21705 |
link |
2024-07-31 |
Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation |
Junxuan Yu et.al. |
2407.21490 |
null |
2024-07-31 |
Fine-gained Zero-shot Video Sampling |
Dengsheng Chen et.al. |
2407.21475 |
null |
2024-07-31 |
Deformable 3D Shape Diffusion Model |
Dengsheng Chen et.al. |
2407.21428 |
null |
2024-07-31 |
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model |
Zhichao Zhang et.al. |
2407.21408 |
null |
2024-07-31 |
Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging |
Wenhua Wu et.al. |
2407.21381 |
null |
2024-07-31 |
ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images |
Xilei Zhu et.al. |
2407.21363 |
null |
2024-07-30 |
Adding Multi-modal Controls to Whole-body Human Motion Generation |
Yuxuan Bian et.al. |
2407.21136 |
link |
2024-07-29 |
Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory |
Xiaoyan Xing et.al. |
2407.20785 |
null |
2024-07-30 |
Understanding the Impact of Synchronous, Asynchronous, and Hybrid In-Situ Techniques in Computational Fluid Dynamics Applications |
Yi Ju et.al. |
2407.20717 |
null |
2024-07-30 |
DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis |
Julien Lerouge et.al. |
2407.20662 |
link |
2024-07-30 |
Autonomous Improvement of Instruction Following Skills via Foundation Models |
Zhiyuan Zhou et.al. |
2407.20635 |
link |
2024-07-30 |
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos |
Aashish Rai et.al. |
2407.20592 |
null |
2024-07-29 |
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities |
Lorenzo Baraldi et.al. |
2407.20337 |
link |
2024-07-29 |
MaskInversion: Localized Embeddings via Optimization of Explainability Maps |
Walid Bousselham et.al. |
2407.20034 |
null |
2024-07-29 |
Reproducibility Study of “ITI-GEN: Inclusive Text-to-Image Generation” |
Daniel Gallo Fernández et.al. |
2407.19996 |
link |
2024-07-29 |
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention |
Yu Lu et.al. |
2407.19918 |
null |
2024-07-29 |
Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture |
Stefanos Gkikas et.al. |
2407.19811 |
null |
2024-07-28 |
Temporal Feature Matters: A Framework for Diffusion Model Quantization |
Yushi Huang et.al. |
2407.19547 |
null |
2024-07-28 |
VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary |
Hanjun Luo et.al. |
2407.19524 |
link |
2024-07-28 |
MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability |
Buyu Liu et.al. |
2407.19468 |
link |
2024-07-28 |
FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models |
Changgu Chen et.al. |
2407.19453 |
link |
2024-07-28 |
\textsc{Perm}: A Parametric Representation for Multi-Style 3D Hair Modeling |
Chengan He et.al. |
2407.19451 |
link |
2024-07-27 |
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions |
Ashkan Taghipour et.al. |
2407.19205 |
null |
2024-07-26 |
SHIC: Shape-Image Correspondences with no Keypoint Supervision |
Aleksandar Shtedritski et.al. |
2407.18907 |
null |
2024-07-26 |
Adversarial Robustification via Text-to-Image Diffusion Models |
Daewon Choi et.al. |
2407.18658 |
link |
2024-07-25 |
AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild |
Junho Park et.al. |
2407.18034 |
link |
2024-07-25 |
Guided Latent Slot Diffusion for Object-Centric Learning |
Krishnakant Singh et.al. |
2407.17929 |
null |
2024-07-25 |
ReCorD: Reasoning and Correcting Diffusion for HOI Generation |
Jian-Yu Jiang-Lin et.al. |
2407.17911 |
link |
2024-07-24 |
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency |
Yiming Xie et.al. |
2407.17470 |
null |
2024-07-24 |
ViPer: Visual Personalization of Generative Models via Individual Preference Learning |
Sogand Salehi et.al. |
2407.17365 |
null |
2024-07-25 |
LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model |
Wanggong Yang et.al. |
2407.17229 |
null |
2024-07-24 |
MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models |
Chunsan Hong et.al. |
2407.17095 |
link |
2024-07-24 |
An Adaptive Gradient Regularization Method |
Huixiu Jiang et.al. |
2407.16944 |
null |
2024-07-24 |
Synthetic Trajectory Generation Through Convolutional Neural Networks |
Jesse Merhi et.al. |
2407.16938 |
link |
2024-07-23 |
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions |
Fabio Tosi et.al. |
2407.16698 |
link |
2024-07-23 |
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence |
Canyu Zhao et.al. |
2407.16655 |
null |
2024-07-23 |
On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models |
Deniz Daum et.al. |
2407.16405 |
link |
2024-07-23 |
Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data |
Hengyu Fu et.al. |
2407.16134 |
null |
2024-07-23 |
Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos |
Jiahe Liu et.al. |
2407.16124 |
link |
2024-07-22 |
DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design |
Zhi Hao Luo et.al. |
2407.15723 |
link |
2024-07-22 |
SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time |
Stanislav Frolov et.al. |
2407.15507 |
link |
2024-07-22 |
TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping |
Despina Konstantinidou et.al. |
2407.15500 |
link |
2024-07-22 |
DiffX: Guide Your Layout to Cross-Modal Generative Modeling |
Zeyu Wang et.al. |
2407.15488 |
link |
2024-07-22 |
Text2Place: Affordance-aware Text Guided Human Placement |
Rishubh Parihar et.al. |
2407.15446 |
null |
2024-07-23 |
BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM |
Hanjun Luo et.al. |
2407.15240 |
link |
2024-07-21 |
Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative Modelling |
Junn Yong Loo et.al. |
2407.15238 |
null |
2024-07-21 |
Flow as the Cross-Domain Manipulation Interface |
Mengda Xu et.al. |
2407.15208 |
null |
2024-07-21 |
The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation |
Junwen Luo et.al. |
2407.15167 |
null |
2024-07-21 |
Anchored Diffusion for Video Face Reenactment |
Idan Kligvasser et.al. |
2407.15153 |
null |
2024-07-19 |
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation |
Kaiyue Sun et.al. |
2407.14505 |
link |
2024-07-19 |
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations |
Decheng Liu et.al. |
2407.14367 |
link |
2024-07-19 |
Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model |
Kun Zhao et.al. |
2407.14326 |
null |
2024-07-19 |
Unlearning Concepts from Text-to-Video Diffusion Models |
Shiqi Liu et.al. |
2407.14209 |
null |
2024-07-19 |
Time Series Generative Learning with Application to Brain Imaging Analysis |
Zhenghao Li et.al. |
2407.14003 |
null |
2024-07-18 |
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion |
Boyang Deng et.al. |
2407.13759 |
null |
2024-07-18 |
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models |
Xiaoyu Zhu et.al. |
2407.13642 |
null |
2024-07-18 |
Training-free Composite Scene Generation for Layout-to-Image Synthesis |
Jiaqi Liu et.al. |
2407.13609 |
link |
2024-07-18 |
Multi-sentence Video Grounding for Long Video Generation |
Wei Feng et.al. |
2407.13219 |
null |
2024-07-18 |
Image Inpainting Models are Effective Tools for Instruction-guided Image Editing |
Xuan Ju et.al. |
2407.13139 |
null |
2024-07-19 |
From Principles to Practices: Lessons Learned from Applying Partnership on AI’s (PAI) Synthetic Media Framework to 11 Use Cases |
Claire R. Leibowicz et.al. |
2407.13025 |
null |
2024-07-17 |
Denoising Diffusions in Latent Space for Medical Image Segmentation |
Fahim Ahmed Zaman et.al. |
2407.12952 |
link |
2024-07-17 |
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control |
Sherwin Bahmani et.al. |
2407.12781 |
null |
2024-07-17 |
Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs |
Yiqing Shen et.al. |
2407.12678 |
link |
2024-07-17 |
Zero-shot Text-guided Infinite Image Synthesis with LLM guidance |
Soyeong Kwon et.al. |
2407.12642 |
null |
2024-07-17 |
Towards Understanding Unsafe Video Generation |
Yan Pang et.al. |
2407.12581 |
link |
2024-07-17 |
The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation |
Yi Yao et.al. |
2407.12579 |
null |
2024-07-17 |
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps |
Junseo Park et.al. |
2407.12331 |
null |
2024-07-17 |
Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process |
Yang Cheng et.al. |
2407.12261 |
null |
2024-07-18 |
Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts |
Sam Yu-Te Lee et.al. |
2407.12192 |
null |
2024-07-16 |
Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis |
Haeil Lee et.al. |
2407.12173 |
null |
2024-07-16 |
Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning |
Yanting Miao et.al. |
2407.12164 |
link |
2024-07-16 |
Efficient Training with Denoised Neural Weights |
Yifan Gong et.al. |
2407.11966 |
null |
2024-07-16 |
Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model |
Dominik Winter et.al. |
2407.11664 |
null |
2024-07-16 |
Scaling Diffusion Transformers to 16 Billion Parameters |
Zhengcong Fei et.al. |
2407.11633 |
link |
2024-07-16 |
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training |
Guillermo Jimenez-Perez et.al. |
2407.11594 |
null |
2024-07-16 |
How Control Information Influences Multilingual Text Image Generation and Editing? |
Boqiang Zhang et.al. |
2407.11502 |
link |
2024-07-16 |
AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models |
Lei Ren et.al. |
2407.11480 |
null |
2024-07-16 |
Cover-separable Fixed Neural Network Steganography via Deep Generative Models |
Guobiao Li et.al. |
2407.11405 |
link |
2024-07-16 |
Flatfish Disease Detection Based on Part Segmentation Approach and Disease Image Generation |
Seo-Bin Hwang et.al. |
2407.11348 |
null |
2024-07-16 |
Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems |
Yaşar Utku Alçalar et.al. |
2407.11288 |
null |
2024-07-15 |
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation |
Yuanhao Zhai et.al. |
2407.10937 |
link |
2024-07-15 |
OPa-Ma: Text Guided Mamba for 360-degree Image Out-painting |
Penglei Gao et.al. |
2407.10923 |
null |
2024-07-16 |
DataDream: Few-shot Guided Dataset Generation |
Jae Myung Kim et.al. |
2407.10910 |
link |
2024-07-15 |
Optical Diffusion Models for Image Generation |
Ilker Oguz et.al. |
2407.10897 |
null |
2024-07-15 |
Physics-Inspired Generative Models in Medical Imaging: A Review |
Dennis Hein et.al. |
2407.10856 |
null |
2024-07-15 |
An Autonomous Drone Swarm for Detecting and Tracking Anomalies among Dense Vegetation |
Rakesh John Amala Arokia Nathan et.al. |
2407.10754 |
null |
2024-07-15 |
AccDiffusion: An Accurate Method for Higher-Resolution Image Generation |
Zhihang Lin et.al. |
2407.10738 |
link |
2024-07-15 |
Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval |
Youngsun Lim et.al. |
2407.10683 |
null |
2024-07-15 |
Spatio-temporal neural distance fields for conditional generative modeling of the heart |
Kristine Sørensen et.al. |
2407.10663 |
link |
2024-07-15 |
A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication |
Jingyi Deng et.al. |
2407.10575 |
null |
2024-07-12 |
FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 |
Georgios Makridis et.al. |
2407.09467 |
null |
2024-07-12 |
PID: Physics-Informed Diffusion Model for Infrared Image Generation |
Fangyuan Mao et.al. |
2407.09299 |
link |
2024-07-12 |
Surgical Text-to-Image Generation |
Chinedu Innocent Nwoye et.al. |
2407.09230 |
null |
2024-07-12 |
DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training |
Chen Xin et.al. |
2407.09174 |
link |
2024-07-12 |
Machine Apophenia: The Kaleidoscopic Generation of Architectural Images |
Alexey Tikhonov et.al. |
2407.09172 |
null |
2024-07-12 |
Inference Optimization of Foundation Models on AI Accelerators |
Youngsuk Park et.al. |
2407.09111 |
null |
2024-07-12 |
Bora: Biomedical Generalist Video Generation Model |
Weixiang Sun et.al. |
2407.08944 |
null |
2024-07-11 |
SEED-Story: Multimodal Long Story Generation with Large Language Model |
Shuai Yang et.al. |
2407.08683 |
link |
2024-07-11 |
CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs |
Leah Chong et.al. |
2407.08675 |
null |
2024-07-11 |
Still-Moving: Customized Video Generation without Customized Video Data |
Hila Chefer et.al. |
2407.08674 |
null |
2024-07-11 |
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights |
Wentao Lei et.al. |
2407.08428 |
link |
2024-07-11 |
E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors |
Jinxiu Liang et.al. |
2407.08231 |
null |
2024-07-10 |
Generative Image as Action Models |
Mohit Shridhar et.al. |
2407.07875 |
link |
2024-07-10 |
StoryDiffusion: How to Support UX Storyboarding With Generative-AI |
Zhaohui Liang et.al. |
2407.07672 |
null |
2024-07-10 |
VEnhancer: Generative Space-Time Enhancement for Video Generation |
Jingwen He et.al. |
2407.07667 |
null |
2024-07-11 |
Trainable Highly-expressive Activation Functions |
Irit Chelly et.al. |
2407.07564 |
link |
2024-07-10 |
Video-to-Audio Generation with Hidden Alignment |
Manjie Xu et.al. |
2407.07464 |
null |
2024-07-10 |
Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis |
Jian-Qing Zheng et.al. |
2407.07295 |
link |
2024-07-09 |
Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion |
Yu Cao et.al. |
2407.07249 |
null |
2024-07-09 |
Accelerating Mobile Edge Generation (MEG) by Constrained Learning |
Xiaoxia Xu et.al. |
2407.07245 |
null |
2024-07-09 |
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction |
Shaozhe Hao et.al. |
2407.07077 |
link |
2024-07-09 |
Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation |
Filipe Lauar et.al. |
2407.06950 |
link |
2024-07-09 |
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance |
Guian Fang et.al. |
2407.06937 |
link |
2024-07-09 |
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning |
Fanyue Wei et.al. |
2407.06642 |
link |
2024-07-09 |
Mobius: An High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task |
Yiran Yang et.al. |
2407.06617 |
link |
2024-07-09 |
Sketch-Guided Scene Image Generation |
Tianyu Zhang et.al. |
2407.06469 |
null |
2024-07-08 |
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions |
Xuan Ju et.al. |
2407.06358 |
null |
2024-07-08 |
Dynamics of quantum turbulence in axially rotating thermal counterflow |
Ritesh Dwivedi et.al. |
2407.06311 |
link |
2024-07-08 |
VIMI: Grounding Video Generation through Multi-modal Instruction |
Yuwei Fang et.al. |
2407.06304 |
null |
2024-07-08 |
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation |
Yu Zeng et.al. |
2407.06187 |
null |
2024-07-08 |
The Tug-of-War Between Deepfake Generation and Detection |
Hannah Lee et.al. |
2407.06174 |
null |
2024-07-08 |
PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models |
Jinhua Zhang et.al. |
2407.06109 |
link |
2024-07-08 |
MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition |
Hozaifa Kassab et.al. |
2407.05980 |
null |
2024-07-08 |
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models |
Yibo Miao et.al. |
2407.05965 |
null |
2024-07-08 |
3D Vessel Graph Generation Using Denoising Diffusion |
Chinmay Prabhakar et.al. |
2407.05842 |
link |
2024-07-08 |
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing |
Zhenyu Wang et.al. |
2407.05600 |
null |
2024-07-08 |
This&That: Language-Gesture Controlled Video Generation for Robot Planning |
Boyang Wang et.al. |
2407.05530 |
null |
2024-07-07 |
Diffusion as Sound Propagation: Physics-inspired Model for Ultrasound Image Generation |
Marina Domínguez et.al. |
2407.05428 |
link |
2024-07-07 |
Enhancing Label-efficient Medical Image Segmentation with Text-guided Diffusion Models |
Chun-Mei Feng et.al. |
2407.05323 |
null |
2024-07-05 |
PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation |
Yinghua Yao et.al. |
2407.04493 |
link |
2024-07-05 |
Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator |
Mehryar Abbasi et.al. |
2407.04258 |
null |
2024-07-04 |
Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion |
Yutian Zhong et.al. |
2407.03992 |
link |
2024-07-04 |
Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection |
Federico Girella et.al. |
2407.03961 |
link |
2024-07-04 |
Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations |
Zhiyang Xu et.al. |
2407.03604 |
null |
2024-07-03 |
BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations |
Zhantao Yang et.al. |
2407.03314 |
null |
2024-07-03 |
Towards High Resolution Real-Time Optical Flow Particle Image Velocimetry |
Juan Pimienta et.al. |
2407.03057 |
null |
2024-07-03 |
Robot Shape and Location Retention in Video Generation Using Diffusion Models |
Peng Wang et.al. |
2407.02873 |
link |
2024-07-03 |
Representation learning with CGAN for casual inference |
Zhaotian Weng et.al. |
2407.02825 |
null |
2024-07-03 |
Mobile Edge Generation-Enabled Digital Twin: Architecture Design and Research Opportunities |
Xiaoxia Xu et.al. |
2407.02804 |
link |
2024-07-02 |
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation |
Kepan Nan et.al. |
2407.02371 |
null |
2024-07-04 |
UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks |
Jingjing Ren et.al. |
2407.02158 |
null |
2024-07-02 |
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules |
Suyi Li et.al. |
2407.02031 |
null |
2024-07-04 |
GVDIFF: Grounded Text-to-Video Generation with Diffusion Models |
Huanzhang Dou et.al. |
2407.01921 |
null |
2024-07-01 |
Label-free Neural Semantic Image Synthesis |
Jiayi Wang et.al. |
2407.01790 |
null |
2024-06-30 |
BADM: Batch ADMM for Deep Learning |
Ouya Wang et.al. |
2407.01640 |
null |
2024-07-01 |
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective |
Mingxiang Liao et.al. |
2407.01094 |
link |
2024-06-30 |
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation |
Haofan Wang et.al. |
2407.00788 |
link |
2024-06-30 |
Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation |
Peng Huang et.al. |
2407.00752 |
null |
2024-06-30 |
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation |
Mushui Liu et.al. |
2407.00737 |
null |
2024-06-28 |
Wavelets Are All You Need for Autoregressive Image Generation |
Wael Mattar et.al. |
2406.19997 |
null |
2024-06-28 |
Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs |
Sangwon Jeong et.al. |
2406.19987 |
null |
2024-06-28 |
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance |
Yuang Zhang et.al. |
2406.19680 |
null |
2024-06-28 |
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation |
Shufan Li et.al. |
2406.19668 |
link |
2024-06-28 |
Network Bending of Diffusion Models for Audio-Visual Generation |
Luke Dzwonczyk et.al. |
2406.19589 |
link |
2024-06-27 |
What Matters in Detecting AI-Generated Videos like Sora? |
Chirui Chang et.al. |
2406.19568 |
null |
2024-06-27 |
Understanding Modality Preferences in Search Clarification |
Leila Tavakoli et.al. |
2406.19546 |
link |
2024-06-27 |
Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model |
Jiangtong Tan et.al. |
2406.19030 |
link |
2024-06-28 |
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation |
Yanan Sun et.al. |
2406.18958 |
link |
2024-06-27 |
CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation |
Zuo Zuo et.al. |
2406.18941 |
null |
2024-06-26 |
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data |
William Berman et.al. |
2406.18790 |
null |
2024-06-26 |
MultiDiff: Consistent Novel View Synthesis from a Single Image |
Norman Müller et.al. |
2406.18524 |
null |
2024-06-26 |
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation |
Shenghai Yuan et.al. |
2406.18522 |
link |
2024-06-26 |
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance |
Younghyun Kim et.al. |
2406.18459 |
link |
2024-06-25 |
Text-Animator: Controllable Visual Text Video Generation |
Lin Liu et.al. |
2406.17777 |
null |
2024-06-25 |
MotionBooth: Motion-Aware Customized Text-to-Video Generation |
Jianzong Wu et.al. |
2406.17758 |
null |
2024-06-25 |
Detection of Synthetic Face Images: Accuracy, Robustness, Generalization |
Nela Petrzelkova et.al. |
2406.17547 |
null |
2024-06-25 |
TSynD: Targeted Synthetic Data Generation for Enhanced Medical Image Classification |
Joshua Niemeijer et.al. |
2406.17473 |
null |
2024-06-25 |
SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing |
Ruihuang Li et.al. |
2406.17396 |
null |
2024-06-25 |
Semantic Deep Hiding for Robust Unlearnable Examples |
Ruohan Meng et.al. |
2406.17349 |
null |
2024-06-25 |
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers |
Lei Chen et.al. |
2406.17343 |
link |
2024-06-25 |
Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds |
Hongliang Zeng et.al. |
2406.17342 |
null |
2024-06-24 |
Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation |
Zhenyi Liao et.al. |
2406.17100 |
link |
2024-06-24 |
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models |
Haonan Qiu et.al. |
2406.16863 |
link |
2024-06-24 |
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation |
Junbang Liang et.al. |
2406.16862 |
null |
2024-06-24 |
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation |
Yuang Peng et.al. |
2406.16855 |
link |
2024-06-24 |
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation |
Katherine M. Collins et.al. |
2406.16807 |
null |
2024-06-24 |
Repulsive Score Distillation for Diverse Sampling of Diffusion Models |
Nicolas Zilberstein et.al. |
2406.16683 |
link |
2024-06-24 |
EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations |
Zhiyu Tan et.al. |
2406.16562 |
link |
2024-06-24 |
Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization |
Yuhang Ma et.al. |
2406.16537 |
link |
2024-06-24 |
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance |
Shuwei Shi et.al. |
2406.16476 |
null |
2024-06-24 |
Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models |
Yichen Sun et.al. |
2406.16333 |
null |
2024-06-24 |
Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement |
Zhiyuan Chang et.al. |
2406.16272 |
link |
2024-06-21 |
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation |
Xuan He et.al. |
2406.15252 |
null |
2024-06-21 |
Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors |
Ali Naseh et.al. |
2406.15213 |
link |
2024-06-21 |
Disability Representations: Finding Biases in Automatic Image Generation |
Yannis Tevissen et.al. |
2406.14993 |
null |
2024-06-21 |
Latent diffusion models for parameterization and data assimilation of facies-based geomodels |
Guido Di Federico et.al. |
2406.14815 |
null |
2024-06-20 |
Evaluating Numerical Reasoning in Text-to-Image Models |
Ivana Kajić et.al. |
2406.14774 |
link |
2024-06-20 |
Holistic Evaluation for Interleaved Text-and-Image Generation |
Minqian Liu et.al. |
2406.14643 |
null |
2024-06-20 |
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps |
Nikita Starodubcev et.al. |
2406.14539 |
null |
2024-06-20 |
Fantastic Copyrighted Beasts and How (Not) to Generate Them |
Luxi He et.al. |
2406.14526 |
null |
2024-06-20 |
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset |
Josef Dai et.al. |
2406.14477 |
link |
2024-06-20 |
Video Generation with Learned Action Prior |
Meenakshi Sarkar et.al. |
2406.14436 |
null |
2024-06-20 |
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning |
Zhongjie Duan et.al. |
2406.14130 |
link |
2024-06-19 |
Splatter a Video: Video Gaussian Representation for Versatile Processing |
Yang-Tian Sun et.al. |
2406.13870 |
null |
2024-06-19 |
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation |
Baiqi Li et.al. |
2406.13743 |
link |
2024-06-19 |
Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery |
Jonathan Gallagher et.al. |
2406.13688 |
null |
2024-06-19 |
Improving Visual Commonsense in Language Models via Multiple Image Generation |
Guy Yariv et.al. |
2406.13621 |
link |
2024-06-19 |
What’s Next? Exploring Utilization, Challenges, and Future Directions of AI-Generated Image Tools in Graphic Design |
Yuying Tang et.al. |
2406.13436 |
null |
2024-06-19 |
AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation |
Xinyu Hou et.al. |
2406.12805 |
link |
2024-06-18 |
Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images |
Shivank Garg et.al. |
2406.12592 |
link |
2024-06-18 |
Training Diffusion Models with Federated Learning |
Matthijs de Goede et.al. |
2406.12575 |
null |
2024-06-18 |
Generative Artificial Intelligence-Guided User Studies: An Application for Air Taxi Services |
Shengdi Xiao et.al. |
2406.12296 |
null |
2024-06-17 |
ARTIST: Improving the Generation of Text-rich Images by Disentanglement |
Jianyi Zhang et.al. |
2406.12044 |
null |
2024-06-17 |
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models |
Alireza Ganjdanesh et.al. |
2406.12042 |
link |
2024-06-17 |
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI |
Robert Hönig et.al. |
2406.12027 |
link |
2024-06-17 |
Decomposed evaluations of geographic disparities in text-to-image models |
Abhishek Sureddy et.al. |
2406.11988 |
null |
2024-06-17 |
Autoregressive Image Generation without Vector Quantization |
Tianhong Li et.al. |
2406.11838 |
link |
2024-06-17 |
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% |
Lei Zhu et.al. |
2406.11837 |
link |
2024-06-17 |
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models |
Bingqi Ma et.al. |
2406.11831 |
null |
2024-06-17 |
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models |
Fanqing Meng et.al. |
2406.11802 |
link |
2024-06-17 |
Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes |
Aghiles Kebaili et.al. |
2406.11659 |
null |
2024-06-17 |
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation |
Shihao Cai et.al. |
2406.11503 |
link |
2024-06-17 |
Generative Visual Instruction Tuning |
Jefferson Hernandez et.al. |
2406.11262 |
link |
2024-06-17 |
NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation |
Niu Guanchen et.al. |
2406.11259 |
null |
2024-06-17 |
Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion |
Rishab Parthasarathy et.al. |
2406.11196 |
link |
2024-06-16 |
An Analysis on Quantizing Diffusion Transformers |
Yuewei Yang et.al. |
2406.11100 |
null |
2024-06-14 |
Make It Count: Text-to-Image Generation with an Accurate Number of Objects |
Lital Binyamin et.al. |
2406.10210 |
null |
2024-06-14 |
Crafting Parts for Expressive Object Composition |
Harsh Rangwani et.al. |
2406.10197 |
null |
2024-06-14 |
Training-free Camera Control for Video Generation |
Chen Hou et.al. |
2406.10126 |
null |
2024-06-14 |
High-efficiency generation of vectorial holograms with metasurfaces |
Tong Liu et.al. |
2406.10072 |
null |
2024-06-14 |
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval |
Imanol Miranda et.al. |
2406.09952 |
link |
2024-06-14 |
ControlVAR: Exploring Controllable Visual Autoregressive Modeling |
Xiang Li et.al. |
2406.09750 |
link |
2024-06-13 |
Turns Out I’m Not Real: Towards Robust Detection of AI-Generated Videos |
Qingyuan Liu et.al. |
2406.09601 |
null |
2024-06-13 |
You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes |
Jabez Magomere et.al. |
2406.09496 |
link |
2024-06-13 |
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models |
Qihao Liu et.al. |
2406.09416 |
link |
2024-06-13 |
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels |
Duy-Kien Nguyen et.al. |
2406.09415 |
null |
2024-06-13 |
Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs |
Zijia Zhao et.al. |
2406.09367 |
link |
2024-06-13 |
Understanding Hallucinations in Diffusion Models through Mode Interpolation |
Sumukh K Aithal et.al. |
2406.09358 |
link |
2024-06-13 |
Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps |
Shuqing Li et.al. |
2406.09313 |
null |
2024-06-13 |
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation |
Yufan Zhou et.al. |
2406.09305 |
null |
2024-06-13 |
StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning |
Giuseppe Vecchio et.al. |
2406.09293 |
null |
2024-06-13 |
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts |
Yucheng Han et.al. |
2406.09162 |
null |
2024-06-13 |
Complex Image-Generative Diffusion Transformer for Audio Denoising |
Junhui Li et.al. |
2406.09161 |
null |
2024-06-13 |
EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts |
Zahraa Al Sahili et.al. |
2406.09070 |
null |
2024-06-12 |
Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation |
Raphael Tang et.al. |
2406.08482 |
null |
2024-06-12 |
What If We Recaption Billions of Web Images with LLaMA-3? |
Xianhang Li et.al. |
2406.08478 |
null |
2024-06-12 |
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences |
Daiwei Chen et.al. |
2406.08469 |
link |
2024-06-12 |
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models |
Benjamin Biggs et.al. |
2406.08431 |
null |
2024-06-12 |
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks |
Jiannan Wu et.al. |
2406.08394 |
link |
2024-06-12 |
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation |
Xinzhi Mu et.al. |
2406.08392 |
null |
2024-06-12 |
WMAdapter: Adding WaterMark Control to Latent Diffusion Models |
Hai Ci et.al. |
2406.08337 |
null |
2024-06-12 |
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models |
Hyungjin Chung et.al. |
2406.08070 |
null |
2024-06-12 |
Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models |
Arman Zarei et.al. |
2406.07844 |
link |
2024-06-12 |
Hierarchical Patch Diffusion Models for High-Resolution Video Generation |
Ivan Skorokhodov et.al. |
2406.07792 |
null |
2024-06-11 |
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? |
Xingyu Fu et.al. |
2406.07546 |
null |
2024-06-11 |
Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance |
Kuan Heng Lin et.al. |
2406.07540 |
null |
2024-06-11 |
Neural Gaffer: Relighting Any Object via Diffusion |
Haian Jin et.al. |
2406.07520 |
null |
2024-06-11 |
Instant 3D Human Avatar Generation using Image Diffusion Models |
Nikos Kolotouros et.al. |
2406.07516 |
null |
2024-06-11 |
Understanding Visual Concepts Across Models |
Brandon Trabucco et.al. |
2406.07506 |
link |
2024-06-11 |
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions |
Renjie Pi et.al. |
2406.07502 |
link |
2024-06-12 |
SPIN: Spacecraft Imagery for Navigation |
Javier Montalvo et.al. |
2406.07500 |
link |
2024-06-11 |
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models |
Heng Yu et.al. |
2406.07472 |
null |
2024-06-11 |
Beware of Aliases – Signal Preservation is Crucial for Robust Image Restoration |
Shashank Agnihotri et.al. |
2406.07435 |
null |
2024-06-11 |
Visual Representation Learning with Stochastic Frame Prediction |
Huiwon Jang et.al. |
2406.07398 |
null |
2024-06-10 |
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation |
Peize Sun et.al. |
2406.06525 |
link |
2024-06-10 |
The Effect of Training Dataset Size on Discriminative and Diffusion-Based Speech Enhancement Systems |
Philippe Gonzalez et.al. |
2406.06160 |
null |
2024-06-10 |
ProcessPainter: Learn Painting Process from Sequence Data |
Yiren Song et.al. |
2406.06062 |
link |
2024-06-09 |
OmniControlNet: Dual-stage Integration for Conditional Image Generation |
Yilin Wang et.al. |
2406.05871 |
null |
2024-06-09 |
Unified Text-to-Image Generation and Retrieval |
Leigang Qu et.al. |
2406.05814 |
null |
2024-06-11 |
MLCM: Multistep Consistency Distillation of Latent Diffusion Model |
Qingsong Xie et.al. |
2406.05768 |
link |
2024-06-09 |
Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion |
Ge Ya Luo et.al. |
2406.05630 |
link |
2024-06-09 |
Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models |
Philip Wootaek Shin et.al. |
2406.05602 |
null |
2024-06-08 |
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context |
Sucheng Ren et.al. |
2406.05565 |
link |
2024-06-08 |
Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models |
Minho Park et.al. |
2406.05432 |
link |
2024-06-07 |
CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion |
Xingrui Wang et.al. |
2406.05082 |
null |
2024-06-07 |
GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications |
Shakhnaz Akhmedova et.al. |
2406.05023 |
link |
2024-06-07 |
AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation |
Lianyu Pang et.al. |
2406.05000 |
null |
2024-06-07 |
Zero-Shot Video Editing through Adaptive Sliding Score Distillation |
Lianghan Zhu et.al. |
2406.04888 |
null |
2024-06-07 |
Online Continual Learning of Video Diffusion Models From a Single Video Stream |
Jason Yoo et.al. |
2406.04814 |
null |
2024-06-07 |
TEDi Policy: Temporally Entangled Diffusion for Robotic Control |
Sigmund H. Høeg et.al. |
2406.04806 |
link |
2024-06-07 |
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction |
Eduard Poesina et.al. |
2406.04746 |
link |
2024-06-07 |
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models |
Diptanu De et.al. |
2406.04654 |
null |
2024-06-07 |
CLoG: Benchmarking Continual Learning of Image Generation Models |
Haotian Zhang et.al. |
2406.04584 |
link |
2024-06-06 |
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance |
Reyhane Askari Hemmat et.al. |
2406.04551 |
null |
2024-06-06 |
Coherent Zero-Shot Visual Instruction Generation |
Quynh Phung et.al. |
2406.04337 |
null |
2024-06-06 |
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model |
Yang Sui et.al. |
2406.04333 |
link |
2024-06-06 |
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions |
Lin Chen et.al. |
2406.04325 |
null |
2024-06-06 |
SF-V: Single Forward Video Generation Model |
Zhixing Zhang et.al. |
2406.04324 |
link |
2024-06-06 |
VideoTetris: Towards Compositional Text-to-Video Generation |
Ye Tian et.al. |
2406.04277 |
link |
2024-06-06 |
Diffusion-based image inpainting with internal learning |
Nicolas Cherel et.al. |
2406.04206 |
link |
2024-06-06 |
Machine Learning-Driven Microwave Imaging for Soil Moisture Estimation near Leaky Pipe |
Mohammad Ramezaninia et.al. |
2406.04193 |
null |
2024-06-06 |
Quantum Implicit Neural Representations |
Jiaming Zhao et.al. |
2406.03873 |
link |
2024-06-06 |
Semantic Similarity Score for Measuring Visual Similarity at Semantic Level |
Senran Fan et.al. |
2406.03865 |
null |
2024-06-06 |
Malware Classification Based on Image Segmentation |
Wanhu Nie et.al. |
2406.03831 |
link |
2024-06-05 |
Tackling GenAI Copyright Issues: Originality Estimation and Genericization |
Hiroaki Chiba-Okabe et.al. |
2406.03341 |
link |
2024-06-05 |
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion |
Hao Wen et.al. |
2406.03184 |
link |
2024-06-05 |
Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control |
Jingyun Xue et.al. |
2406.03035 |
null |
2024-06-05 |
Language-guided Detection and Mitigation of Unknown Dataset Bias |
Zaiying Zhao et.al. |
2406.02889 |
null |
2024-06-06 |
Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter |
Peng Xing et.al. |
2406.02881 |
null |
2024-06-04 |
Latent Style-based Quantum GAN for high-quality Image Generation |
Su Yeon Chang et.al. |
2406.02668 |
null |
2024-06-04 |
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation |
Tianchen Zhao et.al. |
2406.02540 |
link |
2024-06-04 |
DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering |
Zhongpai Gao et.al. |
2406.02518 |
null |
2024-06-04 |
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation |
Cong Wang et.al. |
2406.02511 |
null |
2024-06-04 |
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation |
Dejia Xu et.al. |
2406.02509 |
null |
2024-06-04 |
Guiding a Diffusion Model with a Bad Version of Itself |
Tero Karras et.al. |
2406.02507 |
link |
2024-06-04 |
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation |
Jiajun Wang et.al. |
2406.02485 |
link |
2024-06-04 |
Generative Active Learning for Long-tailed Instance Segmentation |
Muzhi Zhu et.al. |
2406.02435 |
link |
2024-06-04 |
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation |
Clement Chadebec et.al. |
2406.02347 |
link |
2024-06-04 |
I4VGen: Image as Stepping Stone for Text-to-Video Generation |
Xiefan Guo et.al. |
2406.02230 |
null |
2024-06-04 |
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise |
Yuanhao Ban et.al. |
2406.01970 |
null |
2024-05-31 |
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling |
Jiatao Gu et.al. |
2405.21048 |
null |
2024-05-31 |
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet |
Zhen Qin et.al. |
2405.21022 |
null |
2024-05-31 |
Amortizing intractable inference in diffusion models for vision, language, and control |
Siddarth Venkatraman et.al. |
2405.20971 |
link |
2024-05-31 |
Information Theoretic Text-to-Image Alignment |
Chao Wang et.al. |
2405.20759 |
null |
2024-05-31 |
Diffusion Models Are Innate One-Step Generators |
Bowen Zheng et.al. |
2405.20750 |
link |
2024-05-31 |
Cyclic image generation using chaotic dynamics |
Takaya Tanaka et.al. |
2405.20717 |
link |
2024-05-31 |
Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space |
Yukai Zhang et.al. |
2405.20685 |
null |
2024-05-31 |
4Diffusion: Multi-view Video Diffusion Model for 4D Generation |
Haiyu Zhang et.al. |
2405.20674 |
null |
2024-05-31 |
Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation |
Shuzhou Yang et.al. |
2405.20669 |
link |
2024-05-31 |
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization |
Yisu Liu et.al. |
2405.20584 |
link |
2024-05-30 |
Improving the Training of Rectified Flows |
Sangyun Lee et.al. |
2405.20320 |
link |
2024-05-30 |
CV-VAE: A Compatible Video VAE for Latent Generative Video Models |
Sijie Zhao et.al. |
2405.20279 |
link |
2024-05-30 |
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model |
Muyao Niu et.al. |
2405.20222 |
link |
2024-05-30 |
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback |
Sanghyeon Na et.al. |
2405.20216 |
null |
2024-05-30 |
RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection |
Zhiyuan He et.al. |
2405.20112 |
null |
2024-05-30 |
Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion |
Jiangkai Wu et.al. |
2405.20032 |
link |
2024-05-30 |
Mitigating annotation shift in cancer classification using single image generative models |
Marta Buetas Arcas et.al. |
2405.19754 |
link |
2024-05-30 |
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark |
Haoxing Chen et.al. |
2405.19707 |
link |
2024-05-30 |
Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian |
Wei Sun et.al. |
2405.19657 |
null |
2024-05-29 |
MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection |
Raman Dutt et.al. |
2405.19458 |
link |
2024-05-29 |
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning |
Ruchika Chavhan et.al. |
2405.19237 |
link |
2024-05-29 |
Going beyond compositional generalization, DDPMs can produce zero-shot interpolation |
Justin Deschenaux et.al. |
2405.19201 |
link |
2024-05-29 |
The ethical situation of DALL-E 2 |
Eduard Hogea et.al. |
2405.19176 |
null |
2024-05-29 |
Patch-enhanced Mask Encoder Prompt Image Generation |
Shusong Xu et.al. |
2405.19085 |
null |
2024-05-29 |
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture |
Jiaqi Xu et.al. |
2405.18991 |
link |
2024-05-29 |
Topological Perspectives on Optimal Multimodal Embedding Spaces |
Abdul Aziz A. B et.al. |
2405.18867 |
null |
2024-05-30 |
Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation |
Jiyoon Myung et.al. |
2405.18762 |
null |
2024-05-29 |
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback |
Jiachen Li et.al. |
2405.18750 |
link |
2024-05-29 |
SketchDeco: Decorating B&W Sketches with Colour |
Chaitat Utintu et.al. |
2405.18716 |
link |
2024-05-28 |
Scalable Surrogate Verification of Image-based Neural Network Control Systems using Composition and Unrolling |
Feiyang Cai et.al. |
2405.18554 |
null |
2024-05-28 |
Phased Consistency Model |
Fu-Yun Wang et.al. |
2405.18407 |
link |
2024-05-28 |
RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives |
Jaehong Yoon et.al. |
2405.18406 |
link |
2024-05-28 |
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers |
Jun Zheng et.al. |
2405.18326 |
null |
2024-05-28 |
Multi-modal Generation via Cross-Modal In-Context Learning |
Amandeep Kumar et.al. |
2405.18304 |
link |
2024-05-28 |
EG4D: Explicit Generation of 4D Object without Score Distillation |
Qi Sun et.al. |
2405.18132 |
link |
2024-05-28 |
Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers? |
Zebin You et.al. |
2405.18029 |
null |
2024-05-28 |
MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling |
Bowen Zhang et.al. |
2405.18003 |
link |
2024-05-28 |
Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage Detection |
Zhengji Li et.al. |
2405.17905 |
null |
2024-05-28 |
Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation |
Akio Hayakawa et.al. |
2405.17842 |
link |
2024-05-27 |
RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance |
Jiaojiao Fan et.al. |
2405.17661 |
null |
2024-05-27 |
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control |
Zhengfei Kuang et.al. |
2405.17414 |
null |
2024-05-27 |
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer |
Ruizhi Shao et.al. |
2405.17405 |
null |
2024-05-27 |
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability |
Shenyuan Gao et.al. |
2405.17398 |
link |
2024-05-27 |
Prompt Optimization with Human Feedback |
Xiaoqiang Lin et.al. |
2405.17346 |
link |
2024-05-28 |
Controllable Longer Image Animation with Diffusion Models |
Qiang Wang et.al. |
2405.17306 |
null |
2024-05-27 |
Training-free Editioning of Text-to-Image Models |
Jinqi Wang et.al. |
2405.17069 |
null |
2024-05-27 |
The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models |
Saravanan Kandasamy et.al. |
2405.17068 |
null |
2024-05-27 |
Glauber Generative Model: Discrete Diffusion Models via Binary Classification |
Harshit Varma et.al. |
2405.17035 |
null |
2024-05-27 |
Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation |
Liang Shi et.al. |
2405.16895 |
null |
2024-05-27 |
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks |
Yunqi Zhang et.al. |
2405.16860 |
link |
2024-05-24 |
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence |
Ali Kashefi et.al. |
2405.15406 |
link |
2024-05-24 |
Stochastic SR for Gaussian microtextures |
Emile Pierret et.al. |
2405.15399 |
null |
2024-05-24 |
Challenges and Opportunities in 3D Content Generation |
Ke Zhao et.al. |
2405.15335 |
null |
2024-05-24 |
Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model |
Mingyang Yi et.al. |
2405.15330 |
null |
2024-05-24 |
SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance |
Guibao Shen et.al. |
2405.15321 |
null |
2024-05-24 |
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient |
Yongliang Wu et.al. |
2405.15304 |
link |
2024-05-24 |
StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models |
Chengming Xu et.al. |
2405.15287 |
null |
2024-05-24 |
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models |
Yimeng Zhang et.al. |
2405.15234 |
link |
2024-05-24 |
iVideoGPT: Interactive VideoGPTs are Scalable World Models |
Jialong Wu et.al. |
2405.15223 |
link |
2024-05-24 |
ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models |
Jingyuan Zhu et.al. |
2405.15199 |
null |
2024-05-23 |
Improved Distribution Matching Distillation for Fast Image Synthesis |
Tianwei Yin et.al. |
2405.14867 |
link |
2024-05-23 |
Video Diffusion Models are Training-free Motion Interpreter and Controller |
Zeqi Xiao et.al. |
2405.14864 |
null |
2024-05-23 |
Semantica: An Adaptable Image-Conditioned Diffusion Model |
Manoj Kumar et.al. |
2405.14857 |
null |
2024-05-23 |
TerDiT: Ternary Diffusion Models with Transformers |
Xudong Lu et.al. |
2405.14854 |
link |
2024-05-23 |
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models |
Katherine Xu et.al. |
2405.14828 |
null |
2024-05-24 |
Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation |
Hongxu Jiang et.al. |
2405.14802 |
link |
2024-05-23 |
Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy |
Shengfang Zhai et.al. |
2405.14800 |
link |
2024-05-23 |
RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices |
Qiaoyi Chen et.al. |
2405.14794 |
null |
2024-05-23 |
OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance |
Shuheng Ge et.al. |
2405.14709 |
null |
2024-05-23 |
Learning Multi-dimensional Human Preference for Text-to-Image Generation |
Sixian Zhang et.al. |
2405.14705 |
null |
2024-05-21 |
Personalized Residuals for Concept-Driven Text-to-Image Generation |
Cusuh Ham et.al. |
2405.12978 |
null |
2024-05-21 |
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation |
Zhiyu Tan et.al. |
2405.12914 |
link |
2024-05-21 |
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models |
Zhaojian Yu et.al. |
2405.12843 |
link |
2024-05-21 |
DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control |
Hong Chen et.al. |
2405.12796 |
null |
2024-05-21 |
Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations |
Antoine Legrand et.al. |
2405.12728 |
null |
2024-05-21 |
CustomText: Customized Textual Image Generation using Diffusion Models |
Shubham Paliwal et.al. |
2405.12531 |
null |
2024-05-20 |
Diffusion for World Modeling: Visual Details Matter in Atari |
Eloi Alonso et.al. |
2405.12399 |
link |
2024-05-20 |
Diffusion Models for Generating Ballistic Spacecraft Trajectories |
Tyler Presser et.al. |
2405.11738 |
link |
2024-05-19 |
URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images |
Zoey Chen et.al. |
2405.11656 |
null |
2024-05-19 |
FIFO-Diffusion: Generating Infinite Videos from Text without Training |
Jihwan Kim et.al. |
2405.11473 |
link |
2024-05-18 |
UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers |
Duo Peng et.al. |
2405.11336 |
null |
2024-05-18 |
On the Trajectory Regularity of ODE-based Diffusion Sampling |
Defang Chen et.al. |
2405.11326 |
link |
2024-05-18 |
TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation |
Chengcheng Feng et.al. |
2405.11236 |
null |
2024-05-17 |
Improving face generation quality and prompt following with synthetic captions |
Michail Tarasiou et.al. |
2405.10864 |
null |
2024-05-17 |
From Sora What We Can See: A Survey of Text-to-Video Generation |
Rui Sun et.al. |
2405.10674 |
link |
2024-05-17 |
Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image |
Jianshun Zeng et.al. |
2405.10504 |
null |
2024-05-17 |
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers |
Rya Sanovar et.al. |
2405.10480 |
null |
2024-05-16 |
UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models |
Sahel Sharifymoghaddam et.al. |
2405.10311 |
link |
2024-05-16 |
VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing |
Binghui Chen et.al. |
2405.09985 |
null |
2024-05-16 |
Chameleon: Mixed-Modal Early-Fusion Foundation Models |
Chameleon Team et.al. |
2405.09818 |
null |
2024-05-16 |
Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images |
Memoona Aziz et.al. |
2405.09426 |
null |
2024-05-15 |
DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations |
Nima Fathi et.al. |
2405.09288 |
link |
2024-05-15 |
Dance Any Beat: Blending Beats with Visuals in Dance Video Generation |
Xuanchen Wang et.al. |
2405.09266 |
null |
2024-05-14 |
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding |
Zhimin Li et.al. |
2405.08748 |
link |
2024-05-13 |
The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective |
Andrew Shin et.al. |
2405.08720 |
null |
2024-05-14 |
Compositional Text-to-Image Generation with Dense Blob Representations |
Weili Nie et.al. |
2405.08246 |
null |
2024-05-13 |
CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models |
Nick Stracke et.al. |
2405.07913 |
null |
2024-05-13 |
SAR Image Synthesis with Diffusion Models |
Denisa Qosja et.al. |
2405.07776 |
null |
2024-05-12 |
Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning |
Jiarui Wang et.al. |
2405.07346 |
link |
2024-05-12 |
Stable Signature is Unstable: Removing Image Watermark from Diffusion Models |
Yuepeng Hu et.al. |
2405.07145 |
null |
2024-05-12 |
MAxPrototyper: A Multi-Agent Generation System for Interactive User Interface Prototyping |
Mingyue Yuan et.al. |
2405.07131 |
null |
2024-05-11 |
Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior |
Ce Wang et.al. |
2405.07044 |
link |
2024-05-11 |
Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation |
Shengyuan Liu et.al. |
2405.06948 |
null |
2024-05-10 |
Deep MMD Gradient Flow without adversarial training |
Alexandre Galashov et.al. |
2405.06780 |
null |
2024-05-10 |
OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation |
Jinwei Lin et.al. |
2405.06547 |
link |
2024-05-10 |
Controllable Image Generation With Composed Parallel Token Prediction |
Jamie Stirling et.al. |
2405.06535 |
null |
2024-05-10 |
SketchDream: Sketch-based Text-to-3D Generation and Editing |
Feng-Lin Liu et.al. |
2405.06461 |
null |
2024-05-09 |
Photonic quantum generative adversarial networks for classical data |
Tigran Sedrakyan et.al. |
2405.06023 |
link |
2024-05-09 |
Frame Interpolation with Consecutive Brownian Bridge Diffusion |
Zonglin Lyu et.al. |
2405.05953 |
link |
2024-05-09 |
Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models |
Zhe Ma et.al. |
2405.05846 |
link |
2024-05-10 |
MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation |
Yuxiang Wei et.al. |
2405.05806 |
link |
2024-05-09 |
Exploring Text-Guided Single Image Editing for Remote Sensing Images |
Fangzhou Han et.al. |
2405.05769 |
link |
2024-05-09 |
End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base |
Shuling Li et.al. |
2405.05738 |
null |
2024-05-09 |
A Survey on Personalized Content Synthesis with Diffusion Models |
Xulu Zhang et.al. |
2405.05538 |
null |
2024-05-08 |
Cross-Modality Translation with Generative Adversarial Networks to Unveil Alzheimer’s Disease Biomarkers |
Reihaneh Hassanzadeh et.al. |
2405.05462 |
null |
2024-05-08 |
DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation |
Joshua N. Williams et.al. |
2405.05382 |
link |
2024-05-08 |
Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo |
Nayantara Mudur et.al. |
2405.05255 |
link |
2024-05-08 |
Reviewing Intelligent Cinematography: AI research for camera-based video production |
Adrian Azzarelli et.al. |
2405.05039 |
null |
2024-05-08 |
Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI |
Keqiang Fan et.al. |
2405.04974 |
null |
2024-05-08 |
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation |
Xuehai He et.al. |
2405.04834 |
null |
2024-05-07 |
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation |
Hritik Bansal et.al. |
2405.04682 |
link |
2024-05-07 |
TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model |
Yongming Zhang et.al. |
2405.04675 |
null |
2024-05-07 |
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models |
Melissa Hall et.al. |
2405.04457 |
null |
2024-05-07 |
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation |
Jihyun Kim et.al. |
2405.04356 |
link |
2024-05-07 |
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation |
Dogucan Yaman et.al. |
2405.04327 |
null |
2024-05-08 |
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer |
Zhuoyi Yang et.al. |
2405.04312 |
link |
2024-05-07 |
Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map |
Yuxuan Xia et.al. |
2405.04290 |
null |
2024-05-07 |
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models |
Fan Bao et.al. |
2405.04233 |
null |
2024-05-07 |
Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models |
Zhixuan Chu et.al. |
2405.04180 |
link |
2024-05-07 |
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method |
Peisong He et.al. |
2405.04133 |
null |
2024-05-07 |
Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model |
Joo Young Choi et.al. |
2405.03958 |
null |
2024-05-06 |
CCDM: Continuous Conditional Diffusion Models for Image Generation |
Xin Ding et.al. |
2405.03546 |
link |
2024-05-06 |
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond |
Zheng Zhu et.al. |
2405.03520 |
link |
2024-05-06 |
Video Diffusion Models: A Survey |
Andrew Melnik et.al. |
2405.03150 |
link |
2024-05-05 |
Matten: Video Generation with Mamba-Attention |
Yu Gao et.al. |
2405.03025 |
null |
2024-05-05 |
Data-Efficient Molecular Generation with Hierarchical Textual Inversion |
Seojin Kim et.al. |
2405.02845 |
link |
2024-05-05 |
ImageInWords: Unlocking Hyper-Detailed Image Descriptions |
Roopal Garg et.al. |
2405.02793 |
link |
2024-05-04 |
U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers |
Yuchuan Tian et.al. |
2405.02730 |
link |
2024-05-03 |
Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification |
Siqi Yin et.al. |
2405.02155 |
null |
2024-05-03 |
AI-generated art perceptions with GenFrame – an image-generating picture frame |
Peter Kun et.al. |
2405.01901 |
null |
2024-05-03 |
Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition |
Yichun Tai et.al. |
2405.01872 |
null |
2024-05-02 |
Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning |
Rafael Elberg et.al. |
2405.01705 |
link |
2024-05-02 |
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation |
Yupeng Zhou et.al. |
2405.01434 |
link |
2024-05-02 |
Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration |
Praveen Kumar Chandaliya et.al. |
2405.01273 |
null |
2024-05-02 |
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines |
Ye Tian et.al. |
2405.01248 |
null |
2024-05-02 |
On Mechanistic Knowledge Localization in Text-to-Image Generative Models |
Samyadeep Basu et.al. |
2405.01008 |
link |
2024-05-01 |
SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models |
Burak Can Biner et.al. |
2405.00878 |
null |
2024-05-01 |
UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement |
Ruiquan Ge et.al. |
2405.00542 |
link |
2024-05-01 |
Compressive Sensing Imaging Using Caustic Lens Mask Generated by Periodic Perturbation in a Ripple Tank |
Doğan Tunca Arık et.al. |
2405.00407 |
null |
2024-05-01 |
Streamlining Image Editing with Layered Diffusion Brushes |
Peyman Gholami et.al. |
2405.00313 |
null |
2024-04-30 |
DOCCI: Descriptions of Connected and Contrasting Images |
Yasumasa Onoe et.al. |
2404.19753 |
null |
2024-04-30 |
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation |
Yunhao Ge et.al. |
2404.19752 |
null |
2024-04-30 |
SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration |
Yuto Nakashima et.al. |
2404.19693 |
null |
2024-04-30 |
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization |
Yuliang Liu et.al. |
2404.19652 |
link |
2024-04-30 |
TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models |
Teng Zhou et.al. |
2404.19475 |
link |
2024-04-30 |
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation |
Chanran Kim et.al. |
2404.19427 |
null |
2024-04-30 |
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model |
Wentao Lei et.al. |
2404.19277 |
null |
2024-05-01 |
FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills |
Yongqiang Zhao et.al. |
2404.19217 |
link |
2024-04-30 |
NeRF-Insert: 3D Local Editing with Multimodal Control Signals |
Benet Oriol Sabat et.al. |
2404.19204 |
null |
2024-04-29 |
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing |
Minghao Chen et.al. |
2404.18929 |
null |
2024-04-29 |
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation |
Junhao Cheng et.al. |
2404.18919 |
link |
2024-04-29 |
Hide and Seek: How Does Watermarking Impact Face Recognition? |
Yuguang Yao et.al. |
2404.18890 |
null |
2024-04-29 |
Learning Mixtures of Gaussians Using Diffusion Models |
Khashayar Gatmiry et.al. |
2404.18869 |
null |
2024-04-29 |
FlexiFilm: Long Video Generation with Flexible Conditions |
Yichen Ouyang et.al. |
2404.18620 |
link |
2024-04-29 |
Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting |
Tianyidan Xie et.al. |
2404.18598 |
null |
2024-04-29 |
Autonomous Quality and Hallucination Assessment for Virtual Tissue Staining and Digital Pathology |
Luzhe Huang et.al. |
2404.18458 |
null |
2024-04-29 |
PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images |
Jiquan Yuan et.al. |
2404.18409 |
link |
2024-04-30 |
Equivalence: An analysis of artists’ roles with Image Generative AI from Conceptual Art perspective through an interactive installation design practice |
Yixuan Li et.al. |
2404.18385 |
null |
2024-04-29 |
G-Refine: A General Quality Refiner for Text-to-Image Generation |
Chunyi Li et.al. |
2404.18343 |
link |
2024-04-26 |
Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement |
Zishu Yao et.al. |
2404.17400 |
link |
2024-04-26 |
Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection |
Jiawei Song et.al. |
2404.17254 |
null |
2024-04-26 |
Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis |
Shivangi Yadav et.al. |
2404.17105 |
null |
2024-04-25 |
REBEL: Reinforcement Learning via Regressing Relative Rewards |
Zhaolin Gao et.al. |
2404.16767 |
link |
2024-04-27 |
Denoising: from classical methods to deep CNNs |
Jean-Eric Campagne et.al. |
2404.16617 |
link |
2024-04-25 |
MuseumMaker: Continual Style Customization without Catastrophic Forgetting |
Chenxi Liu et.al. |
2404.16612 |
null |
2024-04-25 |
Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models |
Parul Gupta et.al. |
2404.16556 |
null |
2024-04-25 |
OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images |
Ye Mao et.al. |
2404.16538 |
link |
2024-04-25 |
Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series |
Aimi Okabayashi et.al. |
2404.16409 |
link |
2024-04-25 |
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models |
Haomiao Ni et.al. |
2404.16306 |
link |
2024-04-26 |
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model |
Gehui Chen et.al. |
2404.16305 |
null |
2024-04-26 |
Guardians of the Quantum GAN |
Archisman Ghosh et.al. |
2404.16156 |
null |
2024-04-24 |
Spinning solar jets explained through the interplay between plasma sheets and vortex columns |
Sahel Dey et.al. |
2404.16096 |
null |
2024-04-23 |
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning |
Weifeng Chen et.al. |
2404.15449 |
null |
2024-04-23 |
GLoD: Composing Global Contexts and Local Details in Image Generation |
Moyuru Yamada et.al. |
2404.15447 |
null |
2024-04-23 |
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation |
Xuanhua He et.al. |
2404.15275 |
link |
2024-04-23 |
From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation |
Zehuan Huang et.al. |
2404.15267 |
link |
2024-04-23 |
Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment |
Tianwei Zhou et.al. |
2404.15163 |
null |
2024-04-23 |
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation |
Xun Wu et.al. |
2404.15100 |
null |
2024-04-23 |
SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models |
Bo Lin et.al. |
2404.14755 |
null |
2024-04-23 |
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction |
Hang Hua et.al. |
2404.14715 |
null |
2024-04-22 |
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking |
Yuying Li et.al. |
2404.14581 |
null |
2024-04-22 |
GeoDiffuser: Geometry-Based Image Editing with Diffusion Models |
Rahul Sajnani et.al. |
2404.14403 |
null |
2024-04-22 |
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation |
Yuying Ge et.al. |
2404.14396 |
link |
2024-04-22 |
TAVGBench: Benchmarking Text to Audible-Video Generation |
Yuxin Mao et.al. |
2404.14381 |
link |
2024-04-22 |
MultiBooth: Towards Generating All Your Concepts in an Image from Text |
Chenyang Zhu et.al. |
2404.14239 |
link |
2024-04-22 |
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance |
Chengrui Wang et.al. |
2404.13984 |
null |
2024-04-23 |
Accelerating Image Generation with Sub-path Linear Approximation Model |
Chen Xu et.al. |
2404.13903 |
null |
2024-04-22 |
Towards Better Text-to-Image Generation Alignment via Attention Modulation |
Yihang Wu et.al. |
2404.13899 |
null |
2024-04-21 |
Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation |
Jensen Hwa et.al. |
2404.13798 |
null |
2024-04-21 |
Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control |
Maria Mihaela Trusca et.al. |
2404.13766 |
null |
2024-04-21 |
Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models |
Vitali Petsiuk et.al. |
2404.13706 |
null |
2024-04-19 |
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation |
Tianyuan Zhang et.al. |
2404.13026 |
null |
2024-04-19 |
Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images |
Santosh et.al. |
2404.12908 |
link |
2024-04-19 |
ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model |
Dingming Liu et.al. |
2404.12903 |
null |
2024-04-19 |
Generative Modelling with High-Order Langevin Dynamics |
Ziqiang Shi et.al. |
2404.12814 |
null |
2024-04-19 |
How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples |
Dren Fazlija et.al. |
2404.12653 |
null |
2024-04-18 |
On the Content Bias in Fréchet Video Distance |
Songwei Ge et.al. |
2404.12391 |
null |
2024-04-18 |
RoboDreamer: Learning Compositional World Models for Robot Imagination |
Siyuan Zhou et.al. |
2404.12377 |
null |
2024-04-18 |
AniClipart: Clipart Animation with Text-to-Video Priors |
Ronghuan Wu et.al. |
2404.12347 |
null |
2024-04-18 |
Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models |
Israel A. Laurensi et.al. |
2404.12260 |
null |
2024-04-18 |
First 2D electron density measurements using Coherence Imaging Spectroscopy in the MAST-U Super-X divertor |
N. Lonigro et.al. |
2404.12021 |
null |
2024-04-18 |
©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model |
Chao Zhou et.al. |
2404.11962 |
link |
2024-04-18 |
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights |
Thibault Castells et.al. |
2404.11936 |
null |
2024-04-18 |
EdgeFusion: On-Device Text-to-Image Generation |
Thibault Castells et.al. |
2404.11925 |
null |
2024-04-18 |
TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation |
Tianyi Liang et.al. |
2404.11824 |
link |
2024-04-17 |
On the Scalability of GNNs for Molecular Graphs |
Maciej Sypetkowski et.al. |
2404.11568 |
null |
2024-04-17 |
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation |
Kuan-Chieh et.al. |
2404.11565 |
null |
2024-04-17 |
SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening |
Yu Zhong et.al. |
2404.11537 |
null |
2024-04-17 |
Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt |
Zhanjie Zhang et.al. |
2404.11474 |
link |
2024-04-17 |
Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks |
Eri Hosonuma et.al. |
2404.11280 |
null |
2024-04-17 |
Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case |
João Gabriel Vinholi et.al. |
2404.11243 |
null |
2024-04-17 |
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing |
Sherry X. Chen et.al. |
2404.11120 |
link |
2024-04-16 |
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? |
Yuchi Wang et.al. |
2404.10763 |
link |
2024-04-16 |
Adversarial Identity Injection for Semantic Face Image Synthesis |
Giuseppe Tarollo et.al. |
2404.10408 |
null |
2024-04-16 |
CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout |
Jiafu Wei et.al. |
2404.10352 |
null |
2024-04-17 |
OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model |
Runyi Li et.al. |
2404.10312 |
null |
2024-04-16 |
OneActor: Consistent Character Generation via Cluster-Conditioned Guidance |
Jiahao Wang et.al. |
2404.10267 |
null |
2024-04-16 |
Diffusion assisted image reconstruction in optoacoustic tomography |
M. G. González et.al. |
2404.10239 |
null |
2024-04-15 |
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models |
Nithin Gopalakrishnan Nair et.al. |
2404.09977 |
null |
2024-04-15 |
Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers |
Nithin Gopalakrishnan Nair et.al. |
2404.09976 |
null |
2024-04-15 |
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model |
Han Lin et.al. |
2404.09967 |
null |
2024-04-15 |
Scalable photonic diffractive generators through sampling noises from scattering medium |
Ziyu Zhan et.al. |
2404.09948 |
null |
2024-04-15 |
Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models |
Ziwei Luo et.al. |
2404.09732 |
link |
2024-04-15 |
In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation |
Han Xue et.al. |
2404.09633 |
null |
2024-04-15 |
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models |
Peifei Zhu et.al. |
2404.09401 |
null |
2024-04-14 |
DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling |
Xuening Yuan et.al. |
2404.09227 |
null |
2024-04-14 |
LoopAnimate: Loopable Salient Object Animation |
Fanyi Wang et.al. |
2404.09172 |
null |
2024-04-13 |
THQA: A Perceptual Quality Assessment Database for Talking Heads |
Yingjie Zhou et.al. |
2404.09003 |
link |
2024-04-13 |
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field |
Jiyang Li et.al. |
2404.08966 |
link |
2024-04-13 |
Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives |
Yidan Liu et.al. |
2404.08926 |
null |
2024-04-12 |
E3: Ensemble of Expert Embedders for Adapting Synthetic Image Detectors to New Generators Using Limited Data |
Aref Azizpour et.al. |
2404.08814 |
link |
2024-04-12 |
Semantic Approach to Quantifying the Consistency of Diffusion Model Image Generation |
Brinnae Bent et.al. |
2404.08799 |
link |
2024-04-12 |
Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts |
Yang Li et.al. |
2404.08341 |
link |
2024-04-11 |
Latent Guard: a Safety Framework for Text-to-image Generation |
Runtao Liu et.al. |
2404.08031 |
link |
2024-04-11 |
Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models |
Mazda Moayeri et.al. |
2404.08030 |
null |
2024-04-11 |
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models |
Moreno D’Incà et.al. |
2404.07990 |
link |
2024-04-11 |
Taming Stable Diffusion for Text to 360° Panorama Image Generation |
Cheng Zhang et.al. |
2404.07949 |
link |
2024-04-11 |
Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models – Technical Challenges and Implications for Monitoring and Verification |
Tuong Vy Nguyen et.al. |
2404.07754 |
null |
2024-04-11 |
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models |
Tuomas Kynkäänniemi et.al. |
2404.07724 |
link |
2024-04-11 |
ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation |
Stanislav Frolov et.al. |
2404.07564 |
null |
2024-04-11 |
CAT: Contrastive Adapter Training for Personalized Image Generation |
Jae Wan Park et.al. |
2404.07554 |
link |
2024-04-10 |
A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos |
Suleyman Ozdel et.al. |
2404.07351 |
null |
2024-04-10 |
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion |
Jaidev Shriram et.al. |
2404.07199 |
null |
2024-04-10 |
A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks |
Neel Mishra et.al. |
2404.07172 |
link |
2024-04-10 |
Fine color guidance in diffusion models and its application to image compression at extremely low bitrates |
Tom Bordin et.al. |
2404.06865 |
null |
2024-04-10 |
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion |
Junsheng Zhou et.al. |
2404.06851 |
null |
2024-04-10 |
MedRG: Medical Report Grounding with Multi-modal Large Language Model |
Ke Zou et.al. |
2404.06798 |
null |
2024-04-10 |
Deep Generative Data Assimilation in Multimodal Setting |
Yongquan Qu et.al. |
2404.06665 |
link |
2024-04-09 |
High Noise Scheduling is a Must |
Mahmut S. Gokmen et.al. |
2404.06353 |
null |
2024-04-09 |
DiffHarmony: Latent Diffusion Model Meets Image Harmonization |
Pengfei Zhou et.al. |
2404.06139 |
link |
2024-04-09 |
Tackling Structural Hallucination in Image Translation with Local Diffusion |
Seunghoi Kim et.al. |
2404.05980 |
link |
2024-04-09 |
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion |
Ming Tao et.al. |
2404.05979 |
link |
2024-04-08 |
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing |
Jing Gu et.al. |
2404.05717 |
null |
2024-04-08 |
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation |
Kunpeng Song et.al. |
2404.05674 |
link |
2024-04-08 |
Automatic Controllable Colorization via Imagination |
Xiaoyan Cong et.al. |
2404.05661 |
null |
2024-04-08 |
UniFL: Improve Stable Diffusion via Unified Feedback Learning |
Jiacheng Zhang et.al. |
2404.05595 |
null |
2024-04-08 |
Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI |
Hugo Caselles-Dupré et.al. |
2404.05468 |
null |
2024-04-08 |
Action-conditioned video data improves predictability |
Meenakshi Sarkar et.al. |
2404.05439 |
null |
2024-04-08 |
Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt |
Zhiqi Huang et.al. |
2404.05331 |
null |
2024-04-08 |
MC $^2$ : Multi-concept Guidance for Customized Multi-concept Generation |
Jiaxiu Jiang et.al. |
2404.05268 |
link |
2024-04-08 |
Text-to-Image Synthesis for Any Artistic Styles: Advancements in Personalized Artistic Image Generation via Subdivision and Dual Binding |
Junseo Park et.al. |
2404.05256 |
null |
2024-04-08 |
A secure and private ensemble matcher using multi-vault obfuscated templates |
Babak Poorebrahim Gilkalaye et.al. |
2404.05205 |
null |
2024-04-04 |
No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance |
Vishaal Udandarao et.al. |
2404.04125 |
link |
2024-04-05 |
3D Facial Expressions through Analysis-by-Neural-Synthesis |
George Retsinas et.al. |
2404.04104 |
null |
2024-04-05 |
Dynamic Prompt Optimizing for Text-to-Image Generation |
Wenyi Mo et.al. |
2404.04095 |
link |
2024-04-05 |
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models |
Gihyun Kwon et.al. |
2404.03913 |
null |
2024-04-04 |
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching |
Dongzhi Jiang et.al. |
2404.03653 |
link |
2024-04-04 |
Reference-Based 3D-Aware Image Editing with Triplane |
Bahri Batuhan Bilecen et.al. |
2404.03632 |
null |
2024-04-04 |
Robust Concept Erasure Using Task Vectors |
Minh Pham et.al. |
2404.03631 |
null |
2024-04-04 |
Multi Positive Contrastive Learning with Pose-Consistent Generated Images |
Sho Inayoshi et.al. |
2404.03256 |
null |
2024-04-04 |
Would Deep Generative Models Amplify Bias in Future Models? |
Tianwei Chen et.al. |
2404.03242 |
null |
2024-04-04 |
Diverse and Tailored Image Generation for Zero-shot Multi-label Classification |
Kaixin Zhang et.al. |
2404.03144 |
null |
2024-04-03 |
Many-to-many Image Generation with Auto-regressive Diffusion Models |
Ying Shen et.al. |
2404.03109 |
null |
2024-04-03 |
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction |
Keyu Tian et.al. |
2404.02905 |
link |
2024-04-03 |
MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment |
Duygu Ceylan et.al. |
2404.02899 |
null |
2024-04-03 |
On the Scalability of Diffusion-based Text-to-Image Generation |
Hao Li et.al. |
2404.02883 |
null |
2024-04-03 |
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation |
Petru-Daniel Tudosiu et.al. |
2404.02790 |
null |
2024-04-03 |
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation |
Haofan Wang et.al. |
2404.02733 |
link |
2024-04-03 |
Model-agnostic Origin Attribution of Generated Images with Few-shot Examples |
Fengyuan Liu et.al. |
2404.02697 |
link |
2024-04-03 |
Severity Controlled Text-to-Image Generative Model Bias Manipulation |
Jordan Vice et.al. |
2404.02530 |
null |
2024-04-02 |
Diffusion $^2$ : Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models |
Zeyu Yang et.al. |
2404.02148 |
link |
2024-04-02 |
3D Congealing: 3D-Aware Image Alignment in the Wild |
Yunzhi Zhang et.al. |
2404.02125 |
null |
2024-04-02 |
CameraCtrl: Enabling Camera Control for Text-to-Video Generation |
Hao He et.al. |
2404.02101 |
link |
2024-04-02 |
Real, fake and synthetic faces – does the coin have three sides? |
Shahzeb Naeem et.al. |
2404.01878 |
null |
2024-04-02 |
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model |
Xu He et.al. |
2404.01862 |
link |
2024-04-02 |
Disentangled Pre-training for Human-Object Interaction Detection |
Zhuolong Li et.al. |
2404.01725 |
link |
2024-04-01 |
PlayFutures: Imagining Civic Futures with AI and Puppets |
Supratim Pait et.al. |
2404.01527 |
null |
2024-04-01 |
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data |
Matthias Gerstgrasser et.al. |
2404.01413 |
null |
2024-04-01 |
Evaluating Text-to-Visual Generation with Image-to-Text Generation |
Zhiqiu Lin et.al. |
2404.01291 |
link |
2024-04-01 |
Condition-Aware Neural Network for Controlled Image Generation |
Han Cai et.al. |
2404.01143 |
null |
2024-03-29 |
Benchmarking Counterfactual Image Generation |
Thomas Melistas et.al. |
2403.20287 |
link |
2024-03-29 |
Motion Inversion for Video Customization |
Luozhou Wang et.al. |
2403.20193 |
null |
2024-03-29 |
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models |
Barbara Toniella Corradini et.al. |
2403.20105 |
null |
2024-04-02 |
FairRAG: Fair Human Generation via Fair Retrieval Augmentation |
Robik Shrestha et.al. |
2403.19964 |
null |
2024-03-28 |
Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks |
Pooria Ashrafian et.al. |
2403.19880 |
link |
2024-03-28 |
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization |
Yuhang Li et.al. |
2403.19866 |
null |
2024-03-28 |
CLoRA: A Contrastive Approach to Compose Multiple LoRA Models |
Tuna Han Salih Meral et.al. |
2403.19776 |
null |
2024-03-28 |
Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond |
Katherine Xu et.al. |
2403.19653 |
link |
2024-03-28 |
GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models |
Yusuf Dalva et.al. |
2403.19645 |
null |
2024-03-28 |
Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models |
Ole Hall et.al. |
2403.19620 |
link |
2024-03-28 |
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model |
Zhicai Wang et.al. |
2403.19600 |
link |
2024-03-28 |
Frame by Familiar Frame: Understanding Replication in Video Diffusion Models |
Aimon Rahman et.al. |
2403.19593 |
null |
2024-03-28 |
Imperceptible Protection against Style Imitation from Diffusion Models |
Namhyuk Ahn et.al. |
2403.19254 |
null |
2024-03-28 |
Synthetic Medical Imaging Generation with Generative Adversarial Networks For Plain Radiographs |
John R. McNulty et.al. |
2403.19107 |
null |
2024-03-28 |
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation |
Yutong He et.al. |
2403.19103 |
null |
2024-03-28 |
Purposeful remixing with generative AI: Constructing designer voice in multimodal composing |
Xiao Tan et.al. |
2403.19095 |
null |
2024-03-27 |
TextCraftor: Your Text Encoder Can be Image Quality Controller |
Yanyu Li et.al. |
2403.18978 |
null |
2024-03-27 |
Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching |
Jannis Chemseddine et.al. |
2403.18705 |
link |
2024-03-27 |
Attention Calibration for Disentangled Text-to-Image Personalization |
Yanbing Zhang et.al. |
2403.18551 |
link |
2024-03-27 |
DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis |
Zhongxi Chen et.al. |
2403.18471 |
link |
2024-03-27 |
ECNet: Effective Controllable Text-to-Image Diffusion Models |
Sicheng Li et.al. |
2403.18417 |
null |
2024-03-27 |
Ship in Sight: Diffusion Models for Ship-Image Super Resolution |
Luigi Sigillo et.al. |
2403.18370 |
link |
2024-03-26 |
Tutorial on Diffusion Models for Imaging and Vision |
Stanley H. Chan et.al. |
2403.18103 |
null |
2024-03-26 |
TC4D: Trajectory-Conditioned Text-to-4D Generation |
Sherwin Bahmani et.al. |
2403.17920 |
null |
2024-03-26 |
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain |
Yurui Qian et.al. |
2403.17870 |
null |
2024-03-26 |
Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields |
Rüveyda Yilmaz et.al. |
2403.17808 |
link |
2024-03-26 |
LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection |
Yunpeng Luo et.al. |
2403.17465 |
link |
2024-03-25 |
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment |
Stella Bounareli et.al. |
2403.17217 |
null |
2024-03-25 |
FlashFace: Human Image Personalization with High-fidelity Identity Preservation |
Shilong Zhang et.al. |
2403.17008 |
null |
2024-03-25 |
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models |
Zhongwei Zhang et.al. |
2403.17005 |
null |
2024-03-25 |
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer |
Rui Zhu et.al. |
2403.17004 |
null |
2024-03-25 |
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation |
Omer Dahary et.al. |
2403.16990 |
null |
2024-03-25 |
Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance |
Jingyuan Zhu et.al. |
2403.16954 |
null |
2024-03-25 |
Iso-Diffusion: Improving Diffusion Probabilistic Models Using the Isotropy of the Additive Gaussian Noise |
Dilum Fernando et.al. |
2403.16790 |
null |
2024-03-25 |
Multi-Scale Texture Loss for CT denoising with GANs |
Francesco Di Feola et.al. |
2403.16640 |
link |
2024-03-25 |
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions |
Yuda Song et.al. |
2403.16627 |
link |
2024-03-25 |
An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models |
Zizhao Hu et.al. |
2403.16530 |
null |
2024-03-25 |
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation |
Sanyam Lakhanpal et.al. |
2403.16422 |
null |
2024-03-25 |
A Survey on Long Video Generation: Challenges, Methods, and Prospects |
Chengxuan Li et.al. |
2403.16407 |
null |
2024-03-25 |
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation |
Yingshan Chang et.al. |
2403.16394 |
link |
2024-03-25 |
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models |
Lin Zhao et.al. |
2403.16379 |
null |
2024-03-24 |
Opportunities and challenges in the application of large artificial intelligence models in radiology |
Liangrui Pan et.al. |
2403.16112 |
null |
2024-03-23 |
Adaptive Super Resolution For One-Shot Talking-Head Generation |
Luchuan Song et.al. |
2403.15944 |
link |
2024-03-23 |
Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content |
Zhicheng Du et.al. |
2403.15876 |
link |
2024-03-22 |
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects |
Ruining Li et.al. |
2403.15382 |
null |
2024-03-22 |
Long-CLIP: Unlocking the Long-Text Capability of CLIP |
Beichen Zhang et.al. |
2403.15378 |
link |
2024-03-22 |
Controlled Training Data Generation with Diffusion Models |
Teresa Yeo et.al. |
2403.15309 |
null |
2024-03-22 |
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models |
Geon Yeong Park et.al. |
2403.15249 |
null |
2024-03-22 |
A Multimodal Approach for Cross-Domain Image Retrieval |
Lucas Iijima et.al. |
2403.15152 |
null |
2024-03-22 |
MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration |
Zhichao Wei et.al. |
2403.15059 |
null |
2024-03-22 |
Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning |
Bumsoo Kim et.al. |
2403.15048 |
null |
2024-03-22 |
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model |
Seungdae Han et.al. |
2403.14944 |
link |
2024-03-22 |
Geometric Generative Models based on Morphological Equivariant PDEs and GANs |
El Hadji S. Diop et.al. |
2403.14897 |
null |
2024-03-21 |
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text |
Roberto Henschel et.al. |
2403.14773 |
link |
2024-03-21 |
Explorative Inbetweening of Time and Space |
Haiwen Feng et.al. |
2403.14611 |
null |
2024-03-21 |
DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing |
Yueru Jia et.al. |
2403.14487 |
link |
2024-03-22 |
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks |
Max Ku et.al. |
2403.14468 |
link |
2024-03-21 |
Analysing Diffusion Segmentation for Medical Images |
Mathias Öttl et.al. |
2403.14440 |
null |
2024-03-21 |
Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation |
Mathias Öttl et.al. |
2403.14429 |
null |
2024-03-21 |
Enabling Visual Composition and Animation in Unsupervised Video Generation |
Aram Davtyan et.al. |
2403.14368 |
null |
2024-03-21 |
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models |
Pablo Marcos-Manchón et.al. |
2403.14291 |
link |
2024-03-21 |
Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations |
Xun Lin et.al. |
2403.14250 |
null |
2024-03-21 |
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN |
Jongwoo Choi et.al. |
2403.14186 |
link |
2024-03-21 |
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition |
Sihyun Yu et.al. |
2403.14148 |
null |
2024-03-20 |
Learning from Models and Data for Visual Grounding |
Ruozhen He et.al. |
2403.13804 |
null |
2024-03-20 |
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation |
Fu-Yun Wang et.al. |
2403.13745 |
link |
2024-03-20 |
Step-Calibrated Diffusion for Biomedical Optical Image Restoration |
Yiwei Lyu et.al. |
2403.13680 |
link |
2024-03-20 |
ReGround: Improving Textual and Spatial Grounding at No Cost |
Yuseung Lee et.al. |
2403.13589 |
null |
2024-03-20 |
Diversity-aware Channel Pruning for StyleGAN Compression |
Jiwoo Chung et.al. |
2403.13548 |
link |
2024-03-21 |
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models |
Siying Cui et.al. |
2403.13535 |
null |
2024-03-20 |
Deepfake Detection without Deepfakes: Generalization via Synthetic Frequency Patterns Injection |
Davide Alessandro Coccomini et.al. |
2403.13479 |
link |
2024-03-20 |
S2DM: Sector-Shaped Diffusion Models for Video Generation |
Haoran Lang et.al. |
2403.13408 |
null |
2024-03-20 |
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation |
Jingkun An et.al. |
2403.13352 |
null |
2024-03-20 |
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation |
Santosh Sanjeev et.al. |
2403.13343 |
link |
2024-03-19 |
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis |
Linjiang Huang et.al. |
2403.12963 |
link |
2024-03-19 |
Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties |
Efrain Torres-Lomas et.al. |
2403.12935 |
null |
2024-03-19 |
Ultra-High-Resolution Image Synthesis with Pyramid Diffusion Model |
Jiajie Yang et.al. |
2403.12915 |
link |
2024-03-19 |
How Spammers and Scammers Leverage AI-Generated Images on Facebook for Audience Growth |
Renee DiResta et.al. |
2403.12838 |
null |
2024-03-19 |
Total Disentanglement of Font Images into Style and Character Class Features |
Daichi Haraguchi et.al. |
2403.12784 |
null |
2024-03-19 |
AnimateDiff-Lightning: Cross-Model Diffusion Distillation |
Shanchuan Lin et.al. |
2403.12706 |
null |
2024-03-18 |
Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts |
Anh Bui et.al. |
2403.12326 |
null |
2024-03-18 |
Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat? |
Melanie Mathys et.al. |
2403.12207 |
null |
2024-03-18 |
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility |
Bojia Zi et.al. |
2403.12035 |
link |
2024-03-18 |
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation |
Axel Sauer et.al. |
2403.12015 |
null |
2024-03-18 |
Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing |
Juan Zhang et.al. |
2403.11700 |
null |
2024-03-19 |
Urban Scene Diffusion through Semantic Occupancy Map |
Junge Zhang et.al. |
2403.11697 |
null |
2024-03-18 |
Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection |
Julia Wolleb et.al. |
2403.11667 |
link |
2024-03-18 |
QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation |
Zhizhen Zhou et.al. |
2403.11626 |
null |
2024-03-18 |
CRS-Diff: Controllable Generative Remote Sensing Foundation Model |
Datao Tang et.al. |
2403.11614 |
link |
2024-03-17 |
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining |
Tushar Kataria et.al. |
2403.11340 |
null |
2024-03-17 |
Fast Personalized Text-to-Image Syntheses With Attention Injection |
Yuxuan Zhang et.al. |
2403.11284 |
null |
2024-03-17 |
Understanding Diffusion Models by Feynman’s Path Integral |
Yuji Hirono et.al. |
2403.11262 |
null |
2024-03-17 |
The Effects of Generative AI on Design Fixation and Divergent Thinking |
Samangi Wadinambiarachchi et.al. |
2403.11164 |
null |
2024-03-17 |
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion |
Xiaoyu Wu et.al. |
2403.11162 |
null |
2024-03-15 |
Denoising Task Difficulty-based Curriculum for Training Diffusion Models |
Jin-Young Kim et.al. |
2403.10348 |
null |
2024-03-15 |
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers |
Xuanlei Zhao et.al. |
2403.10266 |
link |
2024-03-15 |
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder |
Jinseok Kim et.al. |
2403.10255 |
null |
2024-03-15 |
Animate Your Motion: Turning Still Images into Dynamic Videos |
Mingxiao Li et.al. |
2403.10179 |
null |
2024-03-15 |
SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model |
Tao Wu et.al. |
2403.10044 |
null |
2024-03-14 |
SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior |
Huan-ang Gao et.al. |
2403.09638 |
null |
2024-03-14 |
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering |
Zeyu Liu et.al. |
2403.09622 |
null |
2024-03-14 |
PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation |
Yuhan Guo et.al. |
2403.09615 |
null |
2024-03-14 |
Counterfactual contrastive learning: robust representations via causal image synthesis |
Melanie Roschewitz et.al. |
2403.09605 |
link |
2024-03-14 |
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing |
Wonjun Kang et.al. |
2403.09468 |
link |
2024-03-14 |
Mitigating attribute amplification in counterfactual image generation |
Tian Xia et.al. |
2403.09422 |
null |
2024-03-14 |
Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction |
Hanyu Chen et.al. |
2403.09355 |
null |
2024-03-14 |
Video Editing via Factorized Diffusion Distillation |
Uriel Singer et.al. |
2403.09334 |
null |
2024-03-14 |
Noise Dimension of GAN: An Image Compression Perspective |
Ziran Zhu et.al. |
2403.09196 |
null |
2024-03-14 |
Intention-driven Ego-to-Exo Video Generation |
Hongchen Luo et.al. |
2403.09194 |
null |
2024-03-13 |
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis |
Enric Corona et.al. |
2403.08764 |
null |
2024-03-13 |
HAIFIT: Human-Centered AI for Fashion Image Translation |
Jianan Jiang et.al. |
2403.08651 |
link |
2024-03-13 |
Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation |
Tianyi Chu et.al. |
2403.08294 |
null |
2024-03-13 |
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts |
Yue Ma et.al. |
2403.08268 |
link |
2024-03-13 |
Make Me Happier: Evoking Emotions Through Image Diffusion Models |
Qing Lin et.al. |
2403.08255 |
null |
2024-03-12 |
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation |
Shihao Zhao et.al. |
2403.07860 |
link |
2024-03-12 |
Quantifying and Mitigating Privacy Risks for Tabular Generative Models |
Chaoyi Zhu et.al. |
2403.07842 |
null |
2024-03-12 |
Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model |
Yuxuan Zhang et.al. |
2403.07764 |
link |
2024-03-12 |
Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings |
Sahand Sharifzadeh et.al. |
2403.07750 |
null |
2024-03-14 |
Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion |
Dongyang Li et.al. |
2403.07721 |
link |
2024-03-12 |
SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces |
Yuta Oshima et.al. |
2403.07711 |
link |
2024-03-12 |
Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation |
Michael Ogezi et.al. |
2403.07605 |
null |
2024-03-12 |
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation |
Likun Li et.al. |
2403.07500 |
null |
2024-03-12 |
Backdoor Attack with Mode Mixture Latent Modification |
Hongwei Zhang et.al. |
2403.07463 |
null |
2024-03-13 |
DragAnything: Motion Control for Anything using Entity Representation |
Weijia Wu et.al. |
2403.07420 |
link |
2024-03-11 |
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation |
Guosheng Zhao et.al. |
2403.06845 |
null |
2024-03-11 |
Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting |
Wenting Chen et.al. |
2403.06835 |
null |
2024-03-11 |
Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection |
Chuangchuang Tan et.al. |
2403.06803 |
link |
2024-03-11 |
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation |
Pengchong Qiao et.al. |
2403.06775 |
link |
2024-03-11 |
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback |
Adarsh N L et.al. |
2403.06735 |
null |
2024-03-11 |
Active Generation for Image Classification |
Tao Huang et.al. |
2403.06517 |
link |
2024-03-11 |
Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning |
Woojung Han et.al. |
2403.06516 |
null |
2024-03-11 |
3D-aware Image Generation and Editing with Multi-modal Conditions |
Bo Li et.al. |
2403.06470 |
null |
2024-03-11 |
A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos |
Weixia Zhang et.al. |
2403.06421 |
link |
2024-03-11 |
DivCon: Divide and Conquer for Progressive Text-to-Image Generation |
Yuhao Jia et.al. |
2403.06400 |
link |
2024-03-08 |
Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola |
Yijiang Li et.al. |
2403.05523 |
null |
2024-03-08 |
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models |
Yabo Zhang et.al. |
2403.05438 |
link |
2024-03-08 |
A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN |
Cristiana Tiago et.al. |
2403.05384 |
null |
2024-03-08 |
Fine-tuning a Multiple Instance Learning Feature Extractor with Masked Context Modelling and Knowledge Distillation |
Juan I. Pisula et.al. |
2403.05325 |
null |
2024-03-08 |
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation |
Junyan Wang et.al. |
2403.05239 |
null |
2024-03-08 |
Synthetic Privileged Information Enhances Medical Image Representation Learning |
Lucas Farndale et.al. |
2403.05220 |
null |
2024-03-08 |
Denoising Autoregressive Representation Learning |
Yazhe Li et.al. |
2403.05196 |
null |
2024-03-08 |
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment |
Xiwei Hu et.al. |
2403.05135 |
null |
2024-03-08 |
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation |
Joseph Cho et.al. |
2403.05131 |
null |
2024-03-08 |
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis |
Muxi Chen et.al. |
2403.05125 |
link |
2024-03-07 |
Photonic probabilistic machine learning using quantum vacuum noise |
Seou Choi et.al. |
2403.04731 |
null |
2024-03-07 |
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation |
Junsong Chen et.al. |
2403.04692 |
link |
2024-03-07 |
Pix2Gif: Motion-Guided Diffusion for GIF Generation |
Hitesh Kandala et.al. |
2403.04634 |
link |
2024-03-07 |
Discriminative Probing and Tuning for Text-to-Image Generation |
Leigang Qu et.al. |
2403.04321 |
null |
2024-03-06 |
PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement |
Zhijie Wang et.al. |
2403.04014 |
link |
2024-03-06 |
Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer |
Naifu Xue et.al. |
2403.03736 |
null |
2024-03-06 |
Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving |
He Li et.al. |
2403.03541 |
null |
2024-03-06 |
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging |
Takahiro Shirakawa et.al. |
2403.03485 |
link |
2024-03-07 |
DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network |
Xiangquan Gui et.al. |
2403.03456 |
null |
2024-03-06 |
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing |
Bingyan Liu et.al. |
2403.03431 |
null |
2024-03-05 |
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis |
Patrick Esser et.al. |
2403.03206 |
null |
2024-03-05 |
Behavior Generation with Latent Actions |
Seungjae Lee et.al. |
2403.03181 |
link |
2024-03-05 |
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation |
Weijie Li et.al. |
2403.02827 |
null |
2024-03-05 |
Bias in Generative AI |
Mi Zhou et.al. |
2403.02726 |
null |
2024-03-04 |
Transformer for Times Series: an Application to the S&P500 |
Pierre Brugiere et.al. |
2403.02523 |
null |
2024-03-04 |
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function |
Abdullah Nazhat Abdullah et.al. |
2403.02411 |
link |
2024-03-05 |
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control |
Xuweiyi Chen et.al. |
2403.02332 |
link |
2024-03-04 |
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models |
Jiaxiang Cheng et.al. |
2403.02084 |
link |
2024-03-04 |
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models |
Lukas Höllein et.al. |
2403.01807 |
link |
2024-03-05 |
AtomoVideo: High Fidelity Image-to-Video Generation |
Litong Gong et.al. |
2403.01800 |
null |
2024-03-02 |
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models |
Neta Shaul et.al. |
2403.01329 |
null |
2024-03-02 |
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code |
Ziniu Hu et.al. |
2403.01248 |
null |
2024-03-02 |
TCIG: Two-Stage Controlled Image Generation with Quality Enhancement through Diffusion |
Salaheldin Mohamed et.al. |
2403.01212 |
null |
2024-03-01 |
Improving Android Malware Detection Through Data Augmentation Using Wasserstein Generative Adversarial Networks |
Kawana Stalin et.al. |
2403.00890 |
null |
2024-03-01 |
Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset |
Ander Salaberria et.al. |
2403.00587 |
link |
2024-03-01 |
Rethinking cluster-conditioned diffusion models |
Nikolas Adaloglou et.al. |
2403.00570 |
link |
2024-03-01 |
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks |
Xiangxiang Chu et.al. |
2403.00522 |
link |
2024-03-01 |
An Ordinal Diffusion Model for Generating Medical Images with Different Severity Levels |
Shumpei Takezaki et.al. |
2403.00452 |
null |
2024-03-01 |
Abductive Ego-View Accident Video Understanding for Safe Driving Perception |
Jianwu Fang et.al. |
2403.00436 |
null |
2024-02-29 |
Learning to Find Missing Video Frames with Synthetic Data Augmentation: A General Framework and Application in Generating Thermal Images Using RGB Cameras |
Mathias Viborg Andersen et.al. |
2403.00196 |
null |
2024-02-29 |
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers |
Tsai-Shien Chen et.al. |
2402.19479 |
null |
2024-02-29 |
A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation |
Hanxi Li et.al. |
2402.19330 |
link |
2024-02-29 |
Disentangling representations of retinal images with generative models |
Sarah Müller et.al. |
2402.19186 |
link |
2024-02-29 |
Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection |
Christos Koutlis et.al. |
2402.19091 |
link |
2024-02-29 |
WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis |
Paul Friedrich et.al. |
2402.19043 |
link |
2024-02-29 |
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising |
Xianghui Yang et.al. |
2402.18842 |
link |
2024-02-29 |
A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D |
Xiaohan Fei et.al. |
2402.18780 |
null |
2024-02-28 |
FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes |
Ziying Pan et.al. |
2402.18331 |
link |
2024-02-28 |
Balancing Act: Distribution-Guided Debiasing in Diffusion Models |
Rishubh Parihar et.al. |
2402.18206 |
null |
2024-02-28 |
VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation |
Tao Peng et.al. |
2402.18189 |
link |
2024-02-28 |
Block and Detail: Scaffolding Sketch-to-Image Generation |
Vishnu Sarukkai et.al. |
2402.18116 |
null |
2024-02-28 |
Context-aware Talking Face Video Generation |
Meidai Xuanyuan et.al. |
2402.18092 |
null |
2024-02-28 |
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis |
Yanzuo Lu et.al. |
2402.18078 |
link |
2024-02-27 |
Structure-Guided Adversarial Training of Diffusion Models |
Ling Yang et.al. |
2402.17563 |
null |
2024-02-27 |
Diffusion Model-Based Image Editing: A Survey |
Yi Huang et.al. |
2402.17525 |
link |
2024-02-27 |
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions |
Linrui Tian et.al. |
2402.17485 |
null |
2024-02-27 |
Sora Generates Videos with Stunning Geometrical Consistency |
Xuanyi Li et.al. |
2402.17403 |
null |
2024-02-27 |
Accelerating Diffusion Sampling with Optimized Time Steps |
Shuchen Xue et.al. |
2402.17376 |
link |
2024-02-27 |
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation |
Daiqing Li et.al. |
2402.17245 |
null |
2024-02-27 |
Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System |
Majid Memari et.al. |
2402.17204 |
null |
2024-02-27 |
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models |
Yixin Liu et.al. |
2402.17177 |
link |
2024-02-27 |
Video as the New Language for Real-World Decision Making |
Sherry Yang et.al. |
2402.17139 |
null |
2024-02-27 |
Transparent Image Layer Diffusion using Latent Transparency |
Lvmin Zhang et.al. |
2402.17113 |
link |
2024-02-26 |
Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion |
Xuantong Liu et.al. |
2402.16305 |
null |
2024-02-25 |
Towards Efficient Quantum Hybrid Diffusion Models |
Francesca De Falco et.al. |
2402.16147 |
null |
2024-02-23 |
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition |
Chun-Hsiao Yeh et.al. |
2402.15504 |
link |
2024-02-23 |
BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators |
Yu Tian et.al. |
2402.15218 |
null |
2024-02-23 |
The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling |
Jiajun Ma et.al. |
2402.15170 |
null |
2024-02-22 |
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis |
Willi Menapace et.al. |
2402.14797 |
null |
2024-02-22 |
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models |
Yixuan Ren et.al. |
2402.14780 |
null |
2024-02-25 |
Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening |
Zhenrong Shen et.al. |
2402.14707 |
null |
2024-02-22 |
Visual Hallucinations of Multi-modal Large Language Models |
Wen Huang et.al. |
2402.14683 |
link |
2024-02-22 |
MVD $^2$ : Efficient Multiview 3D Reconstruction for Multiview Diffusion |
Xin-Yang Zheng et.al. |
2402.14253 |
null |
2024-02-21 |
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching |
Zizheng Pan et.al. |
2402.14167 |
link |
2024-02-21 |
SDXL-Lightning: Progressive Adversarial Diffusion Distillation |
Shanchuan Lin et.al. |
2402.13929 |
null |
2024-02-21 |
SRNDiff: Short-term Rainfall Nowcasting with Condition Diffusion Model |
Xudong Ling et.al. |
2402.13737 |
link |
2024-02-21 |
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation |
Kihong Kim et.al. |
2402.13729 |
null |
2024-02-21 |
Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving |
Mehdi Azarafza et.al. |
2402.13602 |
link |
2024-02-21 |
Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models |
Chen Wu et.al. |
2402.13490 |
null |
2024-02-20 |
Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control |
Denis Lukovnikov et.al. |
2402.13404 |
null |
2024-02-20 |
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples |
Jianrui Zhang et.al. |
2402.13254 |
link |
2024-02-20 |
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing |
Jianhong Bai et.al. |
2402.13185 |
null |
2024-02-20 |
Neural Network Diffusion |
Kai Wang et.al. |
2402.13144 |
link |
2024-02-20 |
VGMShield: Mitigating Misuse of Video Generative Models |
Yan Pang et.al. |
2402.13126 |
link |
2024-02-20 |
Visual Style Prompting with Swapping Self-Attention |
Jaeseok Jeong et.al. |
2402.12974 |
link |
2024-02-20 |
RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models |
Xinchen Zhang et.al. |
2402.12908 |
link |
2024-02-20 |
Two-stage Rainfall-Forecasting Diffusion Model |
XuDong Ling et.al. |
2402.12779 |
link |
2024-02-20 |
MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion |
Sen Li et.al. |
2402.12741 |
link |
2024-02-20 |
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction |
Shitao Tang et.al. |
2402.12712 |
null |
2024-02-19 |
The (R)Evolution of Multimodal Large Language Models: A Survey |
Davide Caffagni et.al. |
2402.12451 |
null |
2024-02-19 |
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability |
Xuelin Qian et.al. |
2402.12225 |
null |
2024-02-19 |
Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation |
Yi Liu et.al. |
2402.12100 |
null |
2024-02-19 |
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation |
Chong Zeng et.al. |
2402.11929 |
link |
2024-02-18 |
SDiT: Spiking Diffusion Model with Transformer |
Shu Yang et.al. |
2402.11588 |
null |
2024-02-18 |
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model |
Tanzila Rahman et.al. |
2402.11487 |
null |
2024-02-18 |
Deep learning methods for Hamiltonian parameter estimation and magnetic domain image generation in twisted van der Waals magnets |
Woo Seok Lee et.al. |
2402.11434 |
null |
2024-02-17 |
TC-DiffRecon: Texture coordination MRI reconstruction method based on diffusion model and modified MF-UNet method |
Chenyan Zhang et.al. |
2402.11274 |
link |
2024-02-16 |
The Male CEO and the Female Assistant: Probing Gender Biases in Text-To-Image Models Through Paired Stereotype Test |
Yixin Wan et.al. |
2402.11089 |
null |
2024-02-16 |
Universal Prompt Optimizer for Safe Text-to-Image Generation |
Zongyu Wu et.al. |
2402.10882 |
link |
2024-02-16 |
Exploring Precision and Recall to assess the quality and diversity of LLMs |
Le Bronnec Florian et.al. |
2402.10693 |
link |
2024-02-16 |
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation |
Lanqing Guo et.al. |
2402.10491 |
link |
2024-02-16 |
UMAIR-FPS: User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style |
Yan Kang et.al. |
2402.10381 |
link |
2024-02-15 |
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation |
Huizhuo Yuan et.al. |
2402.10210 |
null |
2024-02-15 |
Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning |
Euclid Collaboration et.al. |
2402.10187 |
link |
2024-02-15 |
Classification Diffusion Models |
Shahar Yadin et.al. |
2402.10095 |
null |
2024-02-15 |
Accelerating Parallel Sampling of Diffusion Models |
Zhiwei Tang et.al. |
2402.09970 |
link |
2024-02-15 |
Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation |
Junjie Shentu et.al. |
2402.09966 |
link |
2024-02-14 |
Magic-Me: Identity-Specific Video Customized Diffusion |
Ze Ma et.al. |
2402.09368 |
link |
2024-02-14 |
Switch EMA: A Free Lunch for Better Flatness and Sharpness |
Siyuan Li et.al. |
2402.09240 |
link |
2024-02-14 |
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects |
Yutaro Yamada et.al. |
2402.09052 |
null |
2024-02-14 |
Multi-modality transrectal ultrasound vudei classification for identification of clinically significant prostate cancer |
Hong Wu et.al. |
2402.08987 |
link |
2024-02-13 |
Towards the Detection of AI-Synthesized Human Face Images |
Yuhang Lu et.al. |
2402.08750 |
null |
2024-02-13 |
IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation |
Luke Melas-Kyriazi et.al. |
2402.08682 |
null |
2024-02-13 |
Learning Continuous 3D Words for Text-to-Image Generation |
Ta-Ying Cheng et.al. |
2402.08654 |
link |
2024-02-13 |
Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models |
Jason Tang et.al. |
2402.08532 |
null |
2024-02-12 |
Using AI for Wavefront Estimation with the Rubin Observatory Active Optics System |
John Franklin Crenshaw et.al. |
2402.08094 |
null |
2024-02-14 |
Score-based generative models break the curse of dimensionality in learning a family of sub-Gaussian probability distributions |
Frank Cole et.al. |
2402.08082 |
null |
2024-02-12 |
Trustworthy SR: Resolving Ambiguity in Image Super-resolution via Diffusion Models and Human Feedback |
Cansu Korkmaz et.al. |
2402.07597 |
null |
2024-02-11 |
The Aleph & Other Metaphors for Image Generation |
Gonzalo Ramos et.al. |
2402.07104 |
null |
2024-02-10 |
Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework |
Mengqi Wu et.al. |
2402.06875 |
null |
2024-02-09 |
Cardiac ultrasound simulation for autonomous ultrasound navigation |
Abdoul Aziz Amadou et.al. |
2402.06463 |
null |
2024-02-08 |
Collaborative Control for Geometry-Conditioned PBR Image Generation |
Shimon Vainer et.al. |
2402.05919 |
null |
2024-02-08 |
Scalable Diffusion Models with State Space Backbone |
Zhengcong Fei et.al. |
2402.05608 |
link |
2024-02-08 |
Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application |
Bumsoo Kim et.al. |
2402.05448 |
null |
2024-02-08 |
Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport |
Jaemoo Choi et.al. |
2402.05443 |
null |
2024-02-09 |
Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models |
Nicholas Konz et.al. |
2402.05210 |
link |
2024-02-07 |
ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12 |
Liuqing Chen et.al. |
2402.04975 |
null |
2024-02-07 |
Text2Street: Controllable Text-to-image Generation for Street Views |
Jinming Su et.al. |
2402.04504 |
null |
2024-02-07 |
ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation |
Jirayu Burapacheep et.al. |
2402.04492 |
link |
2024-02-06 |
Denoising Diffusion Probabilistic Models in Six Simple Steps |
Richard E. Turner et.al. |
2402.04384 |
null |
2024-02-06 |
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation |
Weiming Ren et.al. |
2402.04324 |
link |
2024-02-06 |
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning |
Haoxuan Wang et.al. |
2402.03666 |
link |
2024-02-05 |
Projected Generative Diffusion Models for Constraint Satisfaction |
Jacob K Christopher et.al. |
2402.03559 |
link |
2024-02-05 |
Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images |
Xiaodan Xing et.al. |
2402.03473 |
null |
2024-02-05 |
Do Diffusion Models Learn Semantically Meaningful and Efficient Representations? |
Qiyao Liang et.al. |
2402.03305 |
null |
2024-02-05 |
InstanceDiffusion: Instance-level Control for Image Generation |
Xudong Wang et.al. |
2402.03290 |
link |
2024-02-05 |
Training-Free Consistent Text-to-Image Generation |
Yoad Tewel et.al. |
2402.03286 |
null |
2024-02-05 |
IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR images |
Vincent Roca et.al. |
2402.03227 |
link |
2024-02-05 |
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion |
Shiyuan Yang et.al. |
2402.03162 |
null |
2024-02-05 |
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions |
Yiyuan Zhang et.al. |
2402.03040 |
link |
2024-02-05 |
SynthVision – Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data |
Yudara Kularathne et.al. |
2402.02826 |
null |
2024-02-04 |
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing |
Chong Mou et.al. |
2402.02583 |
link |
2024-02-04 |
M $^3$ Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing |
Mohammadreza Mofayezi et.al. |
2402.02369 |
null |
2024-02-03 |
DeCoF: Generated Video Detection via Frame Consistency |
Long Ma et.al. |
2402.02085 |
link |
2024-02-02 |
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties |
Jingyuan Sun et.al. |
2402.01590 |
null |
2024-02-02 |
The galactic bubbles of starburst galaxies The influence of galactic large-scale magnetic fields |
Z. Meliani et.al. |
2402.01541 |
null |
2024-02-02 |
Cross-view Masked Diffusion Transformers for Person Image Synthesis |
Trung X. Pham et.al. |
2402.01516 |
link |
2024-02-02 |
Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors |
Dingcheng Yang et.al. |
2402.01369 |
link |
2024-02-02 |
Can MLLMs Perform Text-to-Image In-Context Learning? |
Yuchen Zeng et.al. |
2402.01293 |
link |
2024-02-02 |
Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion? |
Cristian Sbrolli et.al. |
2402.01241 |
null |
2024-02-01 |
AI-generated faces free from racial and gender stereotypes |
Nouar AlDahoul et.al. |
2402.01002 |
link |
2024-02-01 |
Examining the Influence of Digital Phantom Models in Virtual Imaging Trials for Tomographic Breast Imaging |
Amar Kavuri et.al. |
2402.00812 |
null |
2024-02-01 |
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning |
Fu-Yun Wang et.al. |
2402.00769 |
link |
2024-02-01 |
DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras |
Weixing Xie et.al. |
2402.00740 |
null |
2024-01-31 |
SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation |
Yang Li et.al. |
2402.00631 |
null |
2024-02-01 |
CapHuman: Capture Your Moments in Parallel Universes |
Chao Liang et.al. |
2402.00627 |
link |
2024-02-01 |
Masked Conditional Diffusion Model for Enhancing Deepfake Detection |
Tiewen Chen et.al. |
2402.00541 |
null |
2024-02-01 |
High-Quality Medical Image Generation from Free-hand Sketch |
Quan Huu Cap et.al. |
2402.00353 |
null |
2024-02-01 |
Machine Unlearning for Image-to-Image Generative Models |
Guihong Li et.al. |
2402.00351 |
link |
2024-01-31 |
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation |
Yuanhuiyi Lyu et.al. |
2401.17664 |
null |
2024-01-31 |
Head and Neck Tumor Segmentation from [18F]F-FDG PET/CT Images Based on 3D Diffusion Model |
Yafei Dong et.al. |
2401.17593 |
null |
2024-01-31 |
Task-Oriented Diffusion Model Compression |
Geonung Kim et.al. |
2401.17547 |
null |
2024-01-31 |
Fréchet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels |
Negar Arabzadeh et.al. |
2401.17543 |
null |
2024-01-30 |
OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision |
Bruno Berenguel-Baeta et.al. |
2401.17061 |
link |
2024-01-30 |
Repositioning the Subject within Image |
Yikai Wang et.al. |
2401.16861 |
link |
2024-01-30 |
X-ray Image Generation as a Method of Performance Prediction for Real-Time Inspection: a Case Study |
Vladyslav Andriiashen et.al. |
2401.16847 |
link |
2024-01-29 |
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors |
Shiyin Dong et.al. |
2401.16459 |
null |
2024-01-29 |
Spatial-Aware Latent Initialization for Controllable Image Generation |
Wenqiang Sun et.al. |
2401.16157 |
null |
2024-01-31 |
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You |
Felix Friedrich et.al. |
2401.16092 |
link |
2024-01-29 |
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling |
Xiaoyu Shi et.al. |
2401.15977 |
null |
2024-01-29 |
Diffusion Facial Forgery Detection |
Harry Cheng et.al. |
2401.15859 |
link |
2024-01-29 |
2L3: Lifting Imperfect Generated 2D Images into Accurate 3D |
Yizheng Chen et.al. |
2401.15841 |
null |
2024-01-28 |
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding |
Jianxiang Lu et.al. |
2401.15708 |
null |
2024-01-28 |
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation |
Zhenyu Wang et.al. |
2401.15688 |
null |
2024-01-28 |
IntentTuner: An Interactive Framework for Integrating Human Intents in Fine-tuning Text-to-Image Generative Models |
Xingchen Zeng et.al. |
2401.15559 |
null |
2024-01-27 |
GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis |
Jing Hao et.al. |
2401.15282 |
link |
2024-01-26 |
Annotated Hands for Generative Models |
Yue Yang et.al. |
2401.15075 |
link |
2024-01-26 |
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support |
Xiaojun Wu et.al. |
2401.14688 |
link |
2024-01-25 |
Deconstructing Denoising Diffusion Models for Self-Supervised Learning |
Xinlei Chen et.al. |
2401.14404 |
null |
2024-01-25 |
UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation and Diffusion Models |
Timo Kapsalis et.al. |
2401.14379 |
null |
2024-01-26 |
Image Synthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs |
Rameshwar Mishra et.al. |
2401.14111 |
null |
2024-01-25 |
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion |
Nisha Huang et.al. |
2401.14066 |
link |
2024-01-25 |
Diffusion-based Data Augmentation for Object Counting Problems |
Zhen Wang et.al. |
2401.13992 |
null |
2024-01-25 |
Learning to Manipulate Artistic Images |
Wei Guo et.al. |
2401.13976 |
link |
2024-01-25 |
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models |
Senthil Purushwalkam et.al. |
2401.13974 |
link |
2024-01-25 |
A New Image Quality Database for Multiple Industrial Processes |
Xuanchao Ma et.al. |
2401.13956 |
null |
2024-01-25 |
StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models |
Yalong Bai et.al. |
2401.13942 |
null |
2024-01-24 |
Research about the Ability of LLM in the Tamper-Detection Area |
Xinyu Yang et.al. |
2401.13504 |
null |
2024-01-24 |
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion |
Wei Li et.al. |
2401.13388 |
null |
2024-01-24 |
Deep Learning for Improved Polyp Detection from Synthetic Narrow-Band Imaging |
Mathias Ramm Haugland et.al. |
2401.13315 |
null |
2024-01-24 |
Choose Your Diffusion: Efficient and flexible ways to accelerate the diffusion model in fast high energy physics simulation |
Cheng Jiang et.al. |
2401.13162 |
null |
2024-01-23 |
CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data |
Chandrakanth Gudavalli et.al. |
2401.13006 |
null |
2024-01-23 |
Lumiere: A Space-Time Diffusion Model for Video Generation |
Omer Bar-Tal et.al. |
2401.12945 |
null |
2024-01-23 |
A Unified Generation-Registration Framework for Improved MR-based CT Synthesis in Proton Therapy |
Xia Li et.al. |
2401.12878 |
null |
2024-01-23 |
UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators |
Hengjia Li et.al. |
2401.12596 |
null |
2024-01-23 |
The Neglected Tails of Vision-Language Models |
Shubham Parashar et.al. |
2401.12425 |
null |
2024-01-20 |
Large-scale Reinforcement Learning for Diffusion Models |
Yinan Zhang et.al. |
2401.12244 |
null |
2024-01-23 |
Control of OSIRIS-REx OTES Observations using OCAMS TAG Images |
Kris J. Becker et.al. |
2401.12177 |
null |
2024-01-22 |
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs |
Ling Yang et.al. |
2401.11708 |
link |
2024-01-21 |
Text-to-Image Cross-Modal Generation: A Systematic Review |
Maciej Żelaszczyk et.al. |
2401.11631 |
null |
2024-01-21 |
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers |
Katherine Crowson et.al. |
2401.11605 |
link |
2024-01-19 |
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion |
Zuoyue Li et.al. |
2401.10786 |
null |
2024-01-18 |
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution |
Xin Yuan et.al. |
2401.10404 |
null |
2024-01-22 |
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation |
Changgu Chen et.al. |
2401.10150 |
null |
2024-01-18 |
DiffusionGPT: LLM-Driven Text-to-Image Generation System |
Jie Qin et.al. |
2401.10061 |
null |
2024-01-18 |
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens |
Xiaofeng Wang et.al. |
2401.09985 |
null |
2024-01-18 |
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects |
Zhao Wang et.al. |
2401.09962 |
null |
2024-01-17 |
MITS-GAN: Safeguarding Medical Imaging from Tampering with Generative Adversarial Networks |
Giovanni Pasqualino et.al. |
2401.09624 |
link |
2024-01-17 |
Efficient generative adversarial networks using linear additive-attention Transformers |
Emilio Morales-Juarez et.al. |
2401.09596 |
link |
2024-01-17 |
Vlogger: Make Your Dream A Vlog |
Shaobin Zhuang et.al. |
2401.09414 |
link |
2024-01-17 |
UniVG: Towards UNIfied-modal Video Generation |
Ludan Ruan et.al. |
2401.09084 |
null |
2024-01-17 |
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models |
Haoxin Chen et.al. |
2401.09047 |
link |
2024-01-16 |
Fast Dynamic 3D Object Generation from a Single-view Video |
Zijie Pan et.al. |
2401.08742 |
link |
2024-01-16 |
Fixed Point Diffusion Models |
Xingjian Bai et.al. |
2401.08741 |
link |
2024-01-16 |
Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks |
Chenyu Zhang et.al. |
2401.08725 |
link |
2024-01-16 |
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data |
Yuhui Zhang et.al. |
2401.08567 |
link |
2024-01-16 |
Instilling Multi-round Thinking to Text-guided Image Generation |
Lidong Zeng et.al. |
2401.08472 |
null |
2024-01-16 |
Key-point Guided Deformable Image Manipulation Using Diffusion Model |
Seok-Hwan Oh et.al. |
2401.08178 |
null |
2024-01-16 |
E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning |
Qiang Qu et.al. |
2401.08117 |
link |
2024-01-16 |
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation |
Zhixuan Liu et.al. |
2401.08053 |
null |
2024-01-15 |
Towards A Better Metric for Text-to-Video Generation |
Jay Zhangjie Wu et.al. |
2401.07781 |
null |
2024-01-15 |
Collaboratively Self-supervised Video Representation Learning for Action Recognition |
Jie Zhang et.al. |
2401.07584 |
null |
2024-01-15 |
InstantID: Zero-shot Identity-Preserving Generation in Seconds |
Qixun Wang et.al. |
2401.07519 |
link |
2024-01-14 |
Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs |
Viktor Seib et.al. |
2401.07370 |
null |
2024-01-13 |
Quantum Denoising Diffusion Models |
Michael Kölle et.al. |
2401.07049 |
null |
2024-01-13 |
Progressive Feature Fusion Network for Enhancing Image Quality Assessment |
Kaiqun Wu et.al. |
2401.06992 |
null |
2024-01-12 |
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model |
Qian Wang et.al. |
2401.06578 |
null |
2024-01-12 |
Beyond the Surface: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation |
Akshita Jha et.al. |
2401.06310 |
link |
2024-01-11 |
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications |
Yuwen Xiong et.al. |
2401.06197 |
link |
2024-01-10 |
AI Art is Theft: Labour, Extraction, and Exploitation, Or, On the Dangers of Stochastic Pollocks |
Trystan S. Goetze et.al. |
2401.06178 |
null |
2024-01-11 |
RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks |
Partha Ghosh et.al. |
2401.06035 |
null |
2024-01-11 |
EraseDiff: Erasing Data Influence in Diffusion Models |
Jing Wu et.al. |
2401.05779 |
link |
2024-01-11 |
Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification |
Kunpeng Qiu et.al. |
2401.05771 |
link |
2024-01-11 |
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation |
Seung Hyun Lee et.al. |
2401.05675 |
null |
2024-01-10 |
From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho Heritage |
Marcellus Amadeus et.al. |
2401.05520 |
null |
2024-01-10 |
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models |
Junsong Chen et.al. |
2401.05252 |
link |
2024-01-09 |
Content-Conditioned Generation of Stylized Free hand Sketches |
Jiajun Liu et.al. |
2401.04739 |
null |
2024-01-09 |
Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks |
Tanmay Garg et.al. |
2401.04647 |
null |
2024-01-09 |
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models |
Jingyuan Yang et.al. |
2401.04608 |
null |
2024-01-09 |
Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models |
Xuewen Liu et.al. |
2401.04585 |
link |
2024-01-09 |
Let’s Go Shopping (LGS) – Web-Scale Image-Text Dataset for Visual Concept Understanding |
Yatong Bai et.al. |
2401.04575 |
null |
2024-01-09 |
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation |
Weimin Wang et.al. |
2401.04468 |
null |
2024-01-09 |
Vision Reimagined: AI-Powered Breakthroughs in WiFi Indoor Imaging |
Jianyang Shi et.al. |
2401.04317 |
null |
2024-01-07 |
A Classification of Critical Configurations for any Number of Projective Views |
Martin Bråtelund et.al. |
2401.03450 |
link |
2024-01-05 |
Latte: Latent Diffusion Transformer for Video Generation |
Xin Ma et.al. |
2401.03048 |
link |
2024-01-05 |
Dataset of turbulent flow over interacting barchan dunes |
Jimmy Gabriel Alvarez et.al. |
2401.03032 |
null |
2024-01-04 |
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos |
Elia Peruzzo et.al. |
2401.02473 |
null |
2024-01-04 |
Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection |
Yabin Wang et.al. |
2401.02335 |
link |
2024-01-04 |
Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and Geometry |
Xinzhe Luo et.al. |
2401.02141 |
null |
2024-01-04 |
Improving Diffusion-Based Image Synthesis with Context Prediction |
Ling Yang et.al. |
2401.02015 |
null |
2024-01-03 |
Instruct-Imagen: Image Generation with Multi-modal Instruction |
Hexiang Hu et.al. |
2401.01952 |
null |
2024-01-03 |
Can We Generate Realistic Hands Only Using Convolution? |
Mehran Hosseini et.al. |
2401.01951 |
null |
2024-01-03 |
A Vision Check-up for Language Models |
Pratyusha Sharma et.al. |
2401.01862 |
null |
2024-01-03 |
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions |
David Junhao Zhang et.al. |
2401.01827 |
link |
2024-01-03 |
aMUSEd: An Open MUSE Reproduction |
Suraj Patil et.al. |
2401.01808 |
link |
2024-01-03 |
Few-shot Image Generation via Information Transfer from the Built Geodesic Surface |
Yuexing Han et.al. |
2401.01749 |
null |
2024-01-03 |
An Edge-Cloud Collaboration Framework for Generative AI Service Provision with Synergetic Big Cloud Model and Small Edge Models |
Yuqing Tian et.al. |
2401.01666 |
null |
2024-01-03 |
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI |
Fanda Fan et.al. |
2401.01651 |
link |
2024-01-02 |
VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics |
Ammar A. Siddiqui et.al. |
2401.01414 |
null |
2024-01-02 |
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM |
Fuchen Long et.al. |
2401.01256 |
link |
2024-01-02 |
Joint Generative Modeling of Scene Graphs and Images via Diffusion Models |
Bicheng Xu et.al. |
2401.01130 |
null |
2024-01-02 |
SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM |
Weijin Cheng et.al. |
2401.01128 |
null |
2023-12-31 |
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation |
Wan-Duo Kurt Ma et.al. |
2401.00896 |
null |
2023-12-30 |
Improving the Stability of Diffusion Models for Content Consistent Super-Resolution |
Lingchen Sun et.al. |
2401.00877 |
link |
2024-01-01 |
New Job, New Gender? Measuring the Social Bias in Image Generation Models |
Wenxuan Wang et.al. |
2401.00763 |
link |
2024-01-01 |
DiffMorph: Text-less Image Morphing with Diffusion Models |
Shounak Chatterjee et.al. |
2401.00739 |
null |
2023-12-31 |
Generative Model-Driven Synthetic Training Image Generation: An Approach to Cognition in Rail Defect Detection |
Rahatara Ferdousi et.al. |
2401.00393 |
link |
2023-12-30 |
GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image Generation |
M. AbdulRazek et.al. |
2401.00314 |
null |
2023-12-30 |
CamPro: Camera-based Anti-Facial Recognition |
Wenjun Zhu et.al. |
2401.00151 |
link |
2023-12-27 |
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement |
Fan Shi et.al. |
2312.17274 |
null |
2023-12-28 |
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action |
Jiasen Lu et.al. |
2312.17172 |
link |
2023-12-27 |
Prompt Expansion for Adaptive Text-to-Image Generation |
Siddhartha Datta et.al. |
2312.16720 |
null |
2023-12-27 |
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models |
Xun Guo et.al. |
2312.16693 |
link |
2023-12-27 |
Participatory prompting: a user-centric research method for eliciting AI assistance opportunities in knowledge workflows |
Advait Sarkar et.al. |
2312.16633 |
null |
2023-12-27 |
A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss |
Xiao Fang et.al. |
2312.16498 |
link |
2023-12-29 |
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion |
Guansong Lu et.al. |
2312.16486 |
null |
2023-12-27 |
Bellman Optimal Step-size Straightening of Flow-Matching Models |
Bao Nguyen et.al. |
2312.16414 |
link |
2023-12-26 |
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation |
Yuxuan Zhang et.al. |
2312.16272 |
link |
2023-12-26 |
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications |
Mengyao Lyu et.al. |
2312.16145 |
null |
2023-12-26 |
Semantic Guidance Tuning for Text-To-Image Diffusion Models |
Hyun Kang et.al. |
2312.15964 |
link |
2023-12-26 |
Cross Initialization for Personalized Text-to-Image Generation |
Lianyu Pang et.al. |
2312.15905 |
link |
2023-12-25 |
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos |
Xiang Wang et.al. |
2312.15770 |
null |
2023-12-25 |
High-Fidelity Diffusion-based Image Editing |
Chen Hou et.al. |
2312.15707 |
null |
2023-12-24 |
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes |
Jianqiang Ren et.al. |
2312.15430 |
null |
2023-12-23 |
Prompt-Propose-Verify: A Reliable Hand-Object-Interaction Data Generation Framework using Foundational Models |
Gurusha Juneja et.al. |
2312.15247 |
null |
2023-12-22 |
Generative AI and the History of Architecture |
Joern Ploennigs et.al. |
2312.15106 |
null |
2023-12-22 |
Emage: Non-Autoregressive Text-to-Image Generation |
Zhangyin Feng et.al. |
2312.14988 |
null |
2023-12-22 |
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation |
Max Ku et.al. |
2312.14867 |
link |
2023-12-22 |
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks |
Haz Sameen Shahgir et.al. |
2312.14440 |
link |
2023-12-21 |
Fine-grained Forecasting Models Via Gaussian Process Blurring Effect |
Sepideh Koohfar et.al. |
2312.14280 |
link |
2023-12-21 |
VCoder: Versatile Vision Encoders for Multimodal Large Language Models |
Jitesh Jain et.al. |
2312.14233 |
link |
2023-12-21 |
Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation |
Nina Weng et.al. |
2312.14223 |
link |
2023-12-21 |
VideoPoet: A Large Language Model for Zero-Shot Video Generation |
Dan Kondratyuk et.al. |
2312.14125 |
null |
2023-12-21 |
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models |
Huan Ling et.al. |
2312.13763 |
null |
2023-12-21 |
DreamTuner: Single Image is Enough for Subject-Driven Generation |
Miao Hua et.al. |
2312.13691 |
null |
2023-12-21 |
Free-Editor: Zero-shot Text-driven 3D Scene Editing |
Nazmul Karim et.al. |
2312.13663 |
link |
2023-12-21 |
Diff-Oracle: Diffusion Model for Oracle Character Generation with Controllable Styles and Contents |
Jing Li et.al. |
2312.13631 |
null |
2023-12-21 |
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting |
Junwu Zhang et.al. |
2312.13271 |
link |
2023-12-20 |
Conditional Image Generation with Pretrained Generative Model |
Rajesh Shrestha et.al. |
2312.13253 |
null |
2023-12-21 |
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation |
Hongtao Wu et.al. |
2312.13139 |
null |
2023-12-20 |
Quantifying Bias in Text-to-Image Generative Models |
Jordan Vice et.al. |
2312.13053 |
null |
2023-12-20 |
A self-attention-based differentially private tabular GAN with high data utility |
Zijian Li et.al. |
2312.13031 |
null |
2023-12-20 |
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models |
Seunghoo Hong et.al. |
2312.12807 |
null |
2023-12-19 |
Surf-CDM: Score-Based Surface Cold-Diffusion Model For Medical Image Segmentation |
Fahim Ahmed Zaman et.al. |
2312.12649 |
null |
2023-12-19 |
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing |
Shutong Jin et.al. |
2312.12635 |
null |
2023-12-19 |
On Inference Stability for Diffusion Models |
Viet Nguyen et.al. |
2312.12431 |
link |
2023-12-19 |
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models |
Shweta Mahajan et.al. |
2312.12416 |
null |
2023-12-19 |
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model |
Lingjun Zhang et.al. |
2312.12232 |
link |
2023-12-19 |
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method |
Jiachun Pan et.al. |
2312.12030 |
link |
2023-12-19 |
Decoupled Textual Embeddings for Customized Image Generation |
Yufei Cai et.al. |
2312.11826 |
link |
2023-12-18 |
Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics |
Yesukhei Jagvaral et.al. |
2312.11707 |
null |
2023-12-18 |
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing |
Zeyinzi Jiang et.al. |
2312.11392 |
link |
2023-12-18 |
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model |
Decheng Liu et.al. |
2312.11285 |
link |
2023-12-18 |
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising |
Bingyuan Wang et.al. |
2312.10899 |
null |
2023-12-18 |
The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses |
Mahmoud Ahmed et.al. |
2312.10854 |
null |
2023-12-17 |
VidToMe: Video Token Merging for Zero-Shot Video Editing |
Xirui Li et.al. |
2312.10656 |
link |
2023-12-17 |
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability |
Jaehui Hwang et.al. |
2312.10634 |
null |
2023-12-16 |
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection |
Chuangchuang Tan et.al. |
2312.10461 |
link |
2023-12-16 |
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content |
Wentao Wang et.al. |
2312.10407 |
link |
2023-12-16 |
Fusing Conditional Submodular GAN and Programmatic Weak Supervision |
Kumar Shubham et.al. |
2312.10366 |
link |
2023-12-16 |
Operator-learning-inspired Modeling of Neural Ordinary Differential Equations |
Woojin Cho et.al. |
2312.10274 |
null |
2023-12-15 |
Rich Human Feedback for Text-to-Image Generation |
Youwei Liang et.al. |
2312.10240 |
link |
2023-12-15 |
Data-Efficient Multimodal Fusion on a Single GPU |
Noël Vouitsis et.al. |
2312.10144 |
link |
2023-12-15 |
Data and Approaches for German Text simplification – towards an Accessibility-enhanced Communication |
Thorben Schomacker et.al. |
2312.09966 |
null |
2023-12-14 |
High-Resolution Maps of Left Atrial Displacements and Strains Estimated with 3D CINE MRI and Unsupervised Neural Networks |
Christoforos Galazis et.al. |
2312.09387 |
link |
2023-12-14 |
ArchiGuesser – AI Art Architecture Educational Game |
Joern Ploennigs et.al. |
2312.09334 |
link |
2023-12-14 |
LIME: Localized Image Editing via Attention Regularization in Diffusion Models |
Enis Simsar et.al. |
2312.09256 |
null |
2023-12-14 |
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection |
Hongsuk Choi et.al. |
2312.09252 |
null |
2023-12-14 |
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation |
Jinguo Zhu et.al. |
2312.09251 |
link |
2023-12-14 |
Fast Sampling via De-randomization for Discrete Diffusion Models |
Zixiang Chen et.al. |
2312.09193 |
null |
2023-12-14 |
VideoLCM: Video Latent Consistency Model |
Xiang Wang et.al. |
2312.09109 |
null |
2023-12-13 |
SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance |
Yuanyou Xu et.al. |
2312.08889 |
null |
2023-12-14 |
Agent Attention: On the Integration of Softmax and Linear Attention |
Dongchen Han et.al. |
2312.08874 |
link |
2023-12-14 |
Local Conditional Controlling for Text-to-Image Diffusion Models |
Yibo Zhao et.al. |
2312.08768 |
link |
2023-12-13 |
A Survey of Generative AI for Intelligent Transportation Systems |
Huan Yan et.al. |
2312.08248 |
null |
2023-12-13 |
Black-box Membership Inference Attacks against Fine-tuned Diffusion Models |
Yan Pang et.al. |
2312.08207 |
link |
2023-12-13 |
$ρ$ -Diffusion: A diffusion-based density estimation framework for computational physics |
Maxwell X. Cai et.al. |
2312.08153 |
link |
2023-12-13 |
Clockwork Diffusion: Efficient Generation With Model-Step Distillation |
Amirhossein Habibian et.al. |
2312.08128 |
link |
2023-12-13 |
3DGEN: A GAN-based approach for generating novel 3D models from image data |
Antoine Schnepf et.al. |
2312.08094 |
null |
2023-12-13 |
Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision |
Shengguang Wu et.al. |
2312.08056 |
null |
2023-12-13 |
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing |
Zhiyuan Ma et.al. |
2312.08019 |
link |
2023-12-13 |
Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users |
Tianxun Zhou et.al. |
2312.07854 |
null |
2023-12-13 |
Stable Rivers: A Case Study in the Application of Text-to-Image Generative Models for Earth Sciences |
C Kupferschmidt et.al. |
2312.07833 |
null |
2023-12-12 |
FreeInit: Bridging Initialization Gap in Video Diffusion Models |
Tianxing Wu et.al. |
2312.07537 |
link |
2023-12-12 |
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition |
Sicheng Mo et.al. |
2312.07536 |
null |
2023-12-12 |
PEEKABOO: Interactive Video Generation via Masked-Diffusion |
Yash Jain et.al. |
2312.07509 |
link |
2023-12-12 |
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation |
Zhongyi Han et.al. |
2312.07424 |
link |
2023-12-12 |
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing |
Kaiwen Zhang et.al. |
2312.07409 |
null |
2023-12-12 |
Learned representation-guided diffusion models for large-image generation |
Alexandros Graikos et.al. |
2312.07330 |
link |
2023-12-12 |
Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description |
Mianzhi Pan et.al. |
2312.07294 |
null |
2023-12-12 |
Image Content Generation with Causal Reasoning |
Xiaochuan Li et.al. |
2312.07132 |
link |
2023-12-12 |
Divide-and-Conquer Attack: Harnessing the Power of LLM to Bypass the Censorship of Text-to-Image Generation Model |
Yimo Deng et.al. |
2312.07130 |
link |
2023-12-11 |
User Friendly and Adaptable Discriminative AI: Using the Lessons from the Success of LLMs and Image Generation Models |
Son The Nguyen et.al. |
2312.06826 |
null |
2023-12-11 |
Photorealistic Video Generation with Diffusion Models |
Agrim Gupta et.al. |
2312.06662 |
null |
2023-12-11 |
ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models |
Denis Zavadski et.al. |
2312.06573 |
link |
2023-12-11 |
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization |
Xu Peng et.al. |
2312.06354 |
null |
2023-12-11 |
Compensation Sampling for Improved Convergence in Diffusion Models |
Hui Lu et.al. |
2312.06285 |
link |
2023-12-11 |
UIEDP:Underwater Image Enhancement with Diffusion Prior |
Dazhao Du et.al. |
2312.06240 |
link |
2023-12-11 |
Invariant Representation Learning via Decoupling Style and Spurious Features |
Ruimeng Li et.al. |
2312.06226 |
null |
2023-12-11 |
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods |
Panos Achlioptas et.al. |
2312.06116 |
null |
2023-12-10 |
Correcting Diffusion Generation through Resampling |
Yujian Liu et.al. |
2312.06038 |
link |
2023-12-10 |
Disentangled Representation Learning for Controllable Person Image Generation |
Wenju Xu et.al. |
2312.05798 |
null |
2023-12-10 |
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model |
Teng Hu et.al. |
2312.05767 |
link |
2023-12-08 |
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation |
Thuan Hoang Nguyen et.al. |
2312.05239 |
link |
2023-12-08 |
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models |
Mengyang Feng et.al. |
2312.05107 |
null |
2023-12-08 |
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control |
Jaskirat Singh et.al. |
2312.05039 |
null |
2023-12-08 |
Synthesizing Traffic Datasets using Graph Neural Networks |
Daniel Rodriguez-Criado et.al. |
2312.05031 |
link |
2023-12-08 |
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models |
Yiming Zhao et.al. |
2312.04884 |
link |
2023-12-08 |
MVDD: Multi-View Depth Diffusion Models |
Zhen Wang et.al. |
2312.04875 |
null |
2023-12-08 |
RS-Corrector: Correcting the Racial Stereotypes in Latent Diffusion Models |
Yue Jiang et.al. |
2312.04810 |
null |
2023-12-07 |
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations |
Maitreya Patel et.al. |
2312.04655 |
null |
2023-12-07 |
Autoencoding Labeled Interpolator, Inferring Parameters From Image, And Image From Parameters |
Ali SaraerToosi et.al. |
2312.04640 |
null |
2023-12-07 |
Scaling Laws of Synthetic Images for Model Training … for Now |
Lijie Fan et.al. |
2312.04567 |
link |
2023-12-07 |
Gen2Det: Generate to Detect |
Saksham Suri et.al. |
2312.04566 |
null |
2023-12-07 |
GenDeF: Learning Generative Deformation Field for Video Generation |
Wen Wang et.al. |
2312.04561 |
null |
2023-12-07 |
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation |
Shoufa Chen et.al. |
2312.04557 |
null |
2023-12-07 |
Generating Illustrated Instructions |
Sachit Menon et.al. |
2312.04552 |
link |
2023-12-07 |
Free3D: Consistent Novel View Synthesis without 3D Representation |
Chuanxia Zheng et.al. |
2312.04551 |
link |
2023-12-07 |
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation |
Zhiwu Qing et.al. |
2312.04483 |
link |
2023-12-07 |
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding |
Zhen Li et.al. |
2312.04461 |
link |
2023-12-07 |
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion |
Yujie Wei et.al. |
2312.04433 |
link |
2023-12-07 |
Approximate Caching for Efficiently Serving Diffusion Models |
Shubham Agarwal et.al. |
2312.04429 |
null |
2023-12-06 |
Self-conditioned Image Generation via Generating Representations |
Tianhong Li et.al. |
2312.03701 |
link |
2023-12-06 |
Memory Triggers: Unveiling Memorization in Text-To-Image Generative Models through Word-Level Duplication |
Ali Naseh et.al. |
2312.03692 |
null |
2023-12-06 |
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation |
Zhouxia Wang et.al. |
2312.03641 |
link |
2023-12-06 |
TokenCompose: Grounding Diffusion with Token-level Supervision |
Zirui Wang et.al. |
2312.03626 |
link |
2023-12-06 |
DiffusionSat: A Generative Foundation Model for Satellite Imagery |
Samar Khanna et.al. |
2312.03606 |
null |
2023-12-06 |
Context Diffusion: In-Context Aware Image Generation |
Ivona Najdenkoska et.al. |
2312.03584 |
null |
2023-12-06 |
FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation |
Olivia Markham et.al. |
2312.03540 |
null |
2023-12-06 |
FRDiff: Feature Reuse for Exquisite Zero-shot Acceleration of Diffusion Models |
Junhyuk So et.al. |
2312.03517 |
null |
2023-12-06 |
Kandinsky 3.0 Technical Report |
Vladimir Arkhipkin et.al. |
2312.03511 |
link |
2023-12-06 |
Data-driven Crop Growth Simulation on Time-varying Generated Images using Multi-conditional Generative Adversarial Networks |
Lukas Drees et.al. |
2312.03443 |
link |
2023-12-05 |
GPT4Point: A Unified Framework for Point-Language Understanding and Generation |
Zhangyang Qi et.al. |
2312.02980 |
null |
2023-12-05 |
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures |
Zhangyang Xiong et.al. |
2312.02963 |
null |
2023-12-05 |
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation |
Jiachen Lu et.al. |
2312.02934 |
link |
2023-12-05 |
LivePhoto: Real Image Animation with Text-guided Motion Control |
Xi Chen et.al. |
2312.02928 |
null |
2023-12-05 |
Fine-grained Controllable Video Generation via Object Appearance and Context |
Hsin-Ping Huang et.al. |
2312.02919 |
null |
2023-12-05 |
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models |
Fengyuan Shi et.al. |
2312.02813 |
link |
2023-12-05 |
Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler |
Philippe Gonzalez et.al. |
2312.02683 |
null |
2023-12-05 |
FaceStudio: Put Your Face Everywhere in Seconds |
Yuxuan Yan et.al. |
2312.02663 |
null |
2023-12-05 |
GeNIe: Generative Hard Negative Images Through Diffusion |
Soroush Abbasi Koohpayegani et.al. |
2312.02548 |
link |
2023-12-05 |
Retrieving Conditions from Reference Images for Diffusion Models |
Haoran Tang et.al. |
2312.02521 |
null |
2023-12-04 |
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation |
Bingxin Ke et.al. |
2312.02145 |
link |
2023-12-04 |
DiffiT: Diffusion Vision Transformers for Image Generation |
Ali Hatamizadeh et.al. |
2312.02139 |
link |
2023-12-04 |
Style Aligned Image Generation via Shared Attention |
Amir Hertz et.al. |
2312.02133 |
link |
2023-12-04 |
GIVT: Generative Infinite-Vocabulary Transformers |
Michael Tschannen et.al. |
2312.02116 |
link |
2023-12-04 |
UniGS: Unified Representation for Image Generation and Segmentation |
Lu Qi et.al. |
2312.01985 |
link |
2023-12-04 |
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models |
Xunguang Wang et.al. |
2312.01886 |
link |
2023-12-04 |
Fully Spiking Denoising Diffusion Implicit Models |
Ryo Watanabe et.al. |
2312.01742 |
link |
2023-12-04 |
ResEnsemble-DDPM: Residual Denoising Diffusion Probabilistic Models for Ensemble Learning |
Shi Zhenning et.al. |
2312.01682 |
null |
2023-12-03 |
Diffusion Posterior Sampling for Nonlinear CT Reconstruction |
Shudong Li et.al. |
2312.01464 |
null |
2023-12-03 |
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models |
Shengqu Cai et.al. |
2312.01409 |
null |
2023-12-01 |
VideoBooth: Diffusion-based Video Generation with Image Prompts |
Yuming Jiang et.al. |
2312.00777 |
null |
2023-12-01 |
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter |
Gongye Liu et.al. |
2312.00330 |
link |
2023-11-30 |
S2ST: Image-to-Image Translation in the Seed Space of Latent Diffusion |
Or Greenberg et.al. |
2312.00116 |
null |
2023-11-30 |
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models |
Zhen Xing et.al. |
2311.18837 |
null |
2023-11-30 |
ART $\boldsymbol{\cdot}$ V: Auto-Regressive Text-to-Video Generation with Diffusion Models |
Wenming Weng et.al. |
2311.18834 |
null |
2023-11-30 |
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation |
Yanhui Wang et.al. |
2311.18829 |
null |
2023-11-30 |
One-step Diffusion with Distribution Matching Distillation |
Tianwei Yin et.al. |
2311.18828 |
null |
2023-11-30 |
ElasticDiffusion: Training-free Arbitrary Size Image Generation |
Moayed Haji-Ali et.al. |
2311.18822 |
link |
2023-11-30 |
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation |
Zineng Tang et.al. |
2311.18775 |
null |
2023-11-30 |
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model |
Jianhao Zeng et.al. |
2311.18405 |
link |
2023-11-30 |
Situating the social issues of image generation models in the model life cycle: a sociotechnical approach |
Amelia Katirai et.al. |
2311.18345 |
null |
2023-11-30 |
Diffusion Models Without Attention |
Jing Nathan Yan et.al. |
2311.18257 |
null |
2023-11-30 |
Few-shot Image Generation via Style Adaptation and Content Preservation |
Xiaosheng He et.al. |
2311.18169 |
null |
2023-11-29 |
SODA: Bottleneck Diffusion Models for Representation Learning |
Drew A. Hudson et.al. |
2311.17901 |
null |
2023-11-29 |
Analyzing and Explaining Image Classifiers via Diffusion Guidance |
Maximilian Augustin et.al. |
2311.17833 |
link |
2023-11-29 |
BAND-2k: Banding Artifact Noticeable Database for Banding Detection and Quality Assessment |
Zijian Chen et.al. |
2311.17752 |
link |
2023-11-29 |
Fair Text-to-Image Diffusion via Fair Mapping |
Jia Li et.al. |
2311.17695 |
null |
2023-11-29 |
Query-Relevant Images Jailbreak Large Multi-Modal Models |
Xin Liu et.al. |
2311.17600 |
link |
2023-11-29 |
Non-Visible Light Data Synthesis and Application: A Case Study for Synthetic Aperture Radar Imagery |
Zichen Tian et.al. |
2311.17486 |
null |
2023-11-29 |
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation |
Xiaoming Li et.al. |
2311.17461 |
link |
2023-11-29 |
VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model |
Haoyu Zhao et.al. |
2311.17338 |
link |
2023-11-28 |
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation |
Hang Li et.al. |
2311.17216 |
null |
2023-11-28 |
Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features |
Niladri Shekhar Dutt et.al. |
2311.17024 |
link |
2023-11-28 |
COLE: A Hierarchical Generation Framework for Graphic Design |
Peidong Jia et.al. |
2311.16974 |
null |
2023-11-28 |
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models |
Yuwei Guo et.al. |
2311.16933 |
null |
2023-11-28 |
Denoising Diffusion Probabilistic Models for Image Inpainting of Cell Distributions in the Human Brain |
Jan-Oliver Kropp et.al. |
2311.16821 |
null |
2023-11-28 |
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving |
Yuqing Wen et.al. |
2311.16813 |
null |
2023-11-28 |
Multi-Channel Cross Modal Detection of Synthetic Face Images |
M. Ibsen et.al. |
2311.16773 |
link |
2023-11-28 |
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation |
Sitong Su et.al. |
2311.16635 |
null |
2023-11-28 |
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices |
Yang Zhao et.al. |
2311.16567 |
null |
2023-11-28 |
Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks |
Ye Lin Tun et.al. |
2311.16538 |
link |
2023-11-28 |
Text-Driven Image Editing via Learnable Regions |
Yuanze Lin et.al. |
2311.16432 |
link |
2023-11-27 |
Self-correcting LLM-controlled Diffusion Models |
Tsung-Han Wu et.al. |
2311.16090 |
link |
2023-11-27 |
ViT-Lens-2: Gateway to Omni-modal Intelligence |
Weixian Lei et.al. |
2311.16081 |
link |
2023-11-27 |
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion |
Yuanxun Lu et.al. |
2311.15980 |
null |
2023-11-27 |
Tell2Design: A Dataset for Language-Guided Floor Plan Generation |
Sicong Leng et.al. |
2311.15941 |
link |
2023-11-27 |
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation |
Siteng Huang et.al. |
2311.15841 |
null |
2023-11-27 |
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax |
Yu Lu et.al. |
2311.15813 |
null |
2023-11-27 |
C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing |
Avigyan Bhattacharya et.al. |
2311.15812 |
null |
2023-11-27 |
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation |
Biao Gong et.al. |
2311.15773 |
null |
2023-11-27 |
Reinforcement Learning from Diffusion Feedback: Q* for Image Search |
Aboli Marathe et.al. |
2311.15648 |
null |
2023-11-27 |
ET3D: Efficient Text-to-3D Generation via Multi-View Distillation |
Yiming Chen et.al. |
2311.15561 |
null |
2023-11-24 |
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization |
Ruoyu Zhao et.al. |
2311.14631 |
null |
2023-11-24 |
MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation |
Zhiqi Li et.al. |
2311.14494 |
link |
2023-11-24 |
Decouple Content and Motion for Conditional Image-to-Video Generation |
Cuifeng Shen et.al. |
2311.14294 |
null |
2023-11-24 |
Paragraph-to-Image Generation with Information-Enriched Diffusion Model |
Weijia Wu et.al. |
2311.14284 |
link |
2023-11-24 |
Image Super-Resolution with Text Prompt Diffusion |
Zheng Chen et.al. |
2311.14282 |
link |
2023-11-23 |
ACT: Adversarial Consistency Models |
Fei Kong et.al. |
2311.14097 |
link |
2023-11-22 |
The Challenges of Image Generation Models in Generating Multi-Component Images |
Tham Yik Foong et.al. |
2311.13620 |
null |
2023-11-22 |
Guided Flows for Generative Modeling and Decision Making |
Qinqing Zheng et.al. |
2311.13443 |
null |
2023-11-23 |
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes |
Jaeyoung Chung et.al. |
2311.13384 |
null |
2023-11-22 |
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models |
Mengyang Feng et.al. |
2311.13141 |
link |
2023-11-22 |
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline |
Vladimir Arkhipkin et.al. |
2311.13073 |
link |
2023-11-21 |
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning |
Jiaxi Lv et.al. |
2311.12631 |
null |
2023-11-20 |
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation |
Shachar Rosenman et.al. |
2311.12229 |
link |
2023-11-20 |
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models |
Rohit Gandikota et.al. |
2311.12092 |
link |
2023-11-20 |
Advancing Urban Renewal: An Automated Approach to Generating Historical Arcade Facades with Stable Diffusion Models |
Zheyuan Kuang et.al. |
2311.11590 |
null |
2023-11-19 |
Data efficient protein backmapping with backbone-to-side chain transformers |
Shriram Chennakesavalu et.al. |
2311.11459 |
link |
2023-11-19 |
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model |
Zhenghao Pan et.al. |
2311.11417 |
link |
2023-11-19 |
A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI |
Yuheng Fan et.al. |
2311.11383 |
null |
2023-11-19 |
MoVideo: Motion-Aware Video Generation with Diffusion Models |
Jingyun Liang et.al. |
2311.11325 |
null |
2023-11-19 |
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort |
Wen Wang et.al. |
2311.11243 |
link |
2023-11-19 |
GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise |
Xinhai Li et.al. |
2311.11221 |
null |
2023-11-18 |
Mitigating Exposure Bias in Discriminator Guided Diffusion Models |
Eleftherios Tsonis et.al. |
2311.11164 |
null |
2023-11-18 |
User-Centric Interactive AI for Distributed Diffusion Model-based AI-Generated Content |
Hongyang Du et.al. |
2311.11094 |
null |
2023-11-18 |
Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models |
Xuefeng Gao et.al. |
2311.11003 |
null |
2023-11-17 |
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning |
Rohit Girdhar et.al. |
2311.10709 |
null |
2023-11-17 |
SelfEval: Leveraging the discriminative nature of generative models for evaluation |
Sai Saketh Rambhatla et.al. |
2311.10708 |
null |
2023-11-17 |
Enhancing Object Coherence in Layout-to-Image Synthesis |
Yibin Wang et.al. |
2311.10522 |
link |
2023-11-17 |
End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks |
Aghiles Kebaili et.al. |
2311.10472 |
null |
2023-11-17 |
High-fidelity Person-centric Subject-to-Image Synthesis |
Yibin Wang et.al. |
2311.10329 |
link |
2023-11-16 |
K-space Cold Diffusion: Learning to Reconstruct Accelerated MRI without Noise |
Guoyao Shen et.al. |
2311.10162 |
link |
2023-11-16 |
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models |
Omri Avrahami et.al. |
2311.10093 |
null |
2023-11-16 |
MAM-E: Mammographic synthetic image generation with diffusion models |
Ricardo Montoya-del-Angel et.al. |
2311.09822 |
link |
2023-11-16 |
DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics |
Aniket Roy et.al. |
2311.09753 |
null |
2023-11-14 |
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs |
Yanwu Xu et.al. |
2311.09257 |
link |
2023-11-14 |
Finding AI-Generated Faces in the Wild |
Gonzalo J. Aniano Porcile et.al. |
2311.08577 |
null |
2023-11-14 |
Peer is Your Pillar: A Data-unbalanced Conditional GANs for Few-shot Image Generation |
Ziqiang Li et.al. |
2311.08217 |
null |
2023-11-14 |
Diffusion-based generation of Histopathological Whole Slide Images at a Gigapixel scale |
Robert Harb et.al. |
2311.08199 |
null |
2023-11-14 |
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion |
Minghua Liu et.al. |
2311.07885 |
null |
2023-11-13 |
The Impact of Generative Artificial Intelligence |
Kaichen Zhang et.al. |
2311.07071 |
null |
2023-11-12 |
IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models |
Zhaoyuan Yang et.al. |
2311.06792 |
link |
2023-11-12 |
ChatAnything: Facetime Chat with LLM-Enhanced Personas |
Yilin Zhao et.al. |
2311.06772 |
null |
2023-11-12 |
BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis |
Tingfeng Cao et.al. |
2311.06752 |
null |
2023-11-12 |
How do Minimum-Norm Shallow Denoisers Look in Function Space? |
Chen Zeno et.al. |
2311.06748 |
null |
2023-11-11 |
Generative AI for Space-Air-Ground Integrated Networks (SAGIN) |
Ruichen Zhang et.al. |
2311.06523 |
null |
2023-11-10 |
A Survey of AI Text-to-Image and AI Text-to-Video Generators |
Aditi Singh et.al. |
2311.06329 |
null |
2023-11-09 |
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module |
Simian Luo et.al. |
2311.05556 |
link |
2023-11-09 |
L-WaveBlock: A Novel Feature Extractor Leveraging Wavelets for Generative Adversarial Networks |
Mirat Shah et.al. |
2311.05548 |
null |
2023-11-09 |
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors |
Jingwen Chen et.al. |
2311.05463 |
null |
2023-11-09 |
ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image |
Senthil Purushwalkam et.al. |
2311.05230 |
null |
2023-11-08 |
Image-Based Virtual Try-On: A Survey |
Dan Song et.al. |
2311.04811 |
link |
2023-11-07 |
Energy-based Calibrated VAE with Test Time Free Lunch |
Yihong Luo et.al. |
2311.04071 |
link |
2023-11-07 |
MeVGAN: GAN-based Plugin Model for Video Generation with Applications in Colonoscopy |
Łukasz Struski et.al. |
2311.03884 |
null |
2023-11-07 |
SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation |
Iman Abbasnejad et.al. |
2311.03866 |
null |
2023-11-07 |
Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models |
Shengzhe Zhou et.al. |
2311.03830 |
link |
2023-11-07 |
CapST: An Enhanced and Lightweight Method for Deepfake Video Classification |
Wasim Ahmad et.al. |
2311.03782 |
link |
2023-11-07 |
LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators |
Allen Roush et.al. |
2311.03716 |
null |
2023-11-07 |
Image Generation and Learning Strategy for Deep Document Forgery Detection |
Yamato Okamoto et.al. |
2311.03650 |
null |
2023-11-06 |
SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis |
Hanrong Ye et.al. |
2311.03355 |
null |
2023-11-06 |
Cross-Image Attention for Zero-Shot Appearance Transfer |
Yuval Alaluf et.al. |
2311.03335 |
null |
2023-11-04 |
From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models |
Zhuoshi Pan et.al. |
2311.02373 |
link |
2023-11-04 |
Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting |
Hao Ai et.al. |
2311.02343 |
link |
2023-11-03 |
PRISM: Progressive Restoration for Scene Graph-based Image Manipulation |
Pavel Jahoda et.al. |
2311.02247 |
null |
2023-11-06 |
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches |
Jiayuan Gu et.al. |
2311.01977 |
null |
2023-11-03 |
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation |
Yuanxin Liu et.al. |
2311.01813 |
link |
2023-11-02 |
Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation |
Hadrien Reynaud et.al. |
2311.01567 |
null |
2023-11-02 |
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning |
Hong Chen et.al. |
2311.00990 |
null |
2023-11-02 |
Optimal Noise pursuit for Augmenting Text-to-Video Generation |
Shijie Ma et.al. |
2311.00949 |
null |
2023-11-02 |
The Age of Generative AI and AI-Generated Everything |
Hongyang Du et.al. |
2311.00947 |
null |
2023-11-02 |
Gaussian Mixture Solvers for Diffusion Models |
Hanzhong Guo et.al. |
2311.00941 |
link |
2023-11-02 |
Towards High-quality HDR Deghosting with Conditional Diffusion Models |
Qingsen Yan et.al. |
2311.00932 |
null |
2023-11-01 |
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing |
Wei-Ge Chen et.al. |
2311.00571 |
null |
2023-11-01 |
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding |
Xuelin Qian et.al. |
2311.00342 |
null |
2023-11-01 |
Flooding Regularization for Stable Training of Generative Adversarial Networks |
Iu Yahiro et.al. |
2311.00318 |
null |
2023-10-31 |
Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion |
David Marwood et.al. |
2311.00056 |
null |
2023-10-31 |
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction |
Xinyuan Chen et.al. |
2310.20700 |
null |
2023-10-31 |
HWD: A Novel Evaluation Score for Styled Handwritten Text Generation |
Vittorio Pippi et.al. |
2310.20316 |
link |
2023-10-31 |
Machine learning refinement of in situ images acquired by low electron dose LC-TEM |
Hiroyasu Katsuno et.al. |
2310.20279 |
null |
2023-10-31 |
Beyond U: Making Diffusion Models Faster & Lighter |
Sergio Calvo-Ordonez et.al. |
2310.20092 |
null |
2023-10-30 |
‘Person’ == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion |
Sourojit Ghosh et.al. |
2310.19981 |
null |
2023-10-30 |
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models |
Ziyang Yuan et.al. |
2310.19784 |
null |
2023-10-30 |
Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities |
Zhengliang Liu et.al. |
2310.19626 |
null |
2023-10-30 |
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation |
Haoxin Chen et.al. |
2310.19512 |
link |
2023-10-30 |
Few-shot Hybrid Domain Adaptation of Image Generators |
Hengjia Li et.al. |
2310.19378 |
link |
2023-10-30 |
On Measuring Fairness in Generative Models |
Christopher T. H. Teo et.al. |
2310.19297 |
null |
2023-10-29 |
FPGAN-Control: A Controllable Fingerprint Generator for Training with Synthetic Data |
Alon Shoshan et.al. |
2310.19024 |
link |
2023-10-30 |
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation |
Jaemin Cho et.al. |
2310.18235 |
null |
2023-10-27 |
Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images |
Pai Chet Ng et.al. |
2310.17911 |
link |
2023-10-27 |
One Style is All you Need to Generate a Video |
Sandeep Manandhar et.al. |
2310.17835 |
link |
2023-10-26 |
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation |
Yongxin Zhu et.al. |
2310.17570 |
null |
2023-10-26 |
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors |
You-Ming Chang et.al. |
2310.17419 |
link |
2023-10-26 |
Exploring the Potential of Generative AI for the World Wide Web |
Nouar AlDahoul et.al. |
2310.17370 |
null |
2023-10-26 |
Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics |
Shuai Yang et.al. |
2310.17316 |
link |
2023-10-26 |
Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise |
Zhenkai Zhang et.al. |
2310.17167 |
null |
2023-10-25 |
Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs |
Jianwei Fei et.al. |
2310.16919 |
null |
2023-10-25 |
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images |
Aaron Gokaslan et.al. |
2310.16825 |
link |
2023-10-25 |
Interferometric Neural Networks |
Arun Sehrawat et.al. |
2310.16742 |
link |
2023-10-25 |
Local Statistics for Generative Image Detection |
Yung Jer Wong et.al. |
2310.16684 |
null |
2023-10-25 |
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation |
Eyal Segalis et.al. |
2310.16656 |
null |
2023-10-25 |
Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models |
Weijie Chen et.al. |
2310.16573 |
null |
2023-10-25 |
Learning Robust Deep Visual Representations from EEG Brain Recordings |
Prajwal Singh et.al. |
2310.16532 |
link |
2023-10-24 |
Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights |
Alokendu Mazumder et.al. |
2310.16194 |
link |
2023-10-24 |
Complex Image Generation SwinTransformer Network for Audio Denoising |
Youshan Zhang et.al. |
2310.16109 |
link |
2023-10-24 |
RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image Synthesis |
Anant Khandelwal et.al. |
2310.16074 |
null |
2023-10-24 |
CVPR 2023 Text Guided Video Editing Competition |
Jay Zhangjie Wu et.al. |
2310.16003 |
link |
2023-10-23 |
Fast Forward Modelling of Galaxy Spatial and Statistical Distributions |
Pascale Berner et.al. |
2310.15223 |
null |
2023-10-23 |
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling |
Haonan Qiu et.al. |
2310.15169 |
link |
2023-10-23 |
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design |
Kevin Lin et.al. |
2310.15144 |
link |
2023-10-23 |
Matryoshka Diffusion Models |
Jiatao Gu et.al. |
2310.15111 |
link |
2023-10-23 |
ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling |
Qiugang Zhan et.al. |
2310.14839 |
link |
2023-10-23 |
Large Language Models can Share Images, Too! |
Young-Jun Lee et.al. |
2310.14804 |
link |
2023-10-22 |
A Pytorch Reproduction of Masked Generative Image Transformer |
Victor Besnier et.al. |
2310.14400 |
link |
2023-10-21 |
Adversarial Image Generation by Spatial Transformation in Perceptual Colorspaces |
Ayberk Aydin et.al. |
2310.13950 |
link |
2023-10-20 |
Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models |
Shawn Shan et.al. |
2310.13828 |
null |
2023-10-20 |
Localizing and Editing Knowledge in Text-to-Image Generative Models |
Samyadeep Basu et.al. |
2310.13730 |
null |
2023-10-20 |
Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation |
Wenyu Guo et.al. |
2310.13361 |
link |
2023-10-20 |
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics |
Kaiwen Zheng et.al. |
2310.13268 |
link |
2023-10-19 |
Conditional Generative Modeling for Images, 3D Animations, and Video |
Vikram Voleti et.al. |
2310.13157 |
null |
2023-10-19 |
Particle Guidance: non-I.I.D. Diverse Sampling with Diffusion Models |
Gabriele Corso et.al. |
2310.13102 |
link |
2023-10-19 |
Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning |
Amey Pasarkar et.al. |
2310.12952 |
link |
2023-10-19 |
STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models |
Belhal Karimi et.al. |
2310.12667 |
null |
2023-10-19 |
PrivacyGAN: robust generative image privacy |
Mariia Zameshina et.al. |
2310.12590 |
null |
2023-10-19 |
Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation |
Mariia Zameshina et.al. |
2310.12583 |
null |
2023-10-19 |
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping |
Zijie Pan et.al. |
2310.12474 |
link |
2023-10-18 |
An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning |
Chen Jin et.al. |
2310.12274 |
link |
2023-10-18 |
Quality Diversity through Human Feedback |
Li Ding et.al. |
2310.12103 |
link |
2023-10-20 |
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach |
Feng Luo et.al. |
2310.12004 |
link |
2023-10-17 |
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment |
Dhruba Ghosh et.al. |
2310.11513 |
link |
2023-10-18 |
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models |
Yaofang Liu et.al. |
2310.11440 |
link |
2023-10-17 |
Elucidating The Design Space of Classifier-Guided Diffusion Generation |
Jiajun Ma et.al. |
2310.11311 |
link |
2023-10-17 |
BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference |
Siqi Kou et.al. |
2310.11142 |
link |
2023-10-16 |
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation |
Ruiqi Wu et.al. |
2310.10769 |
link |
2023-10-18 |
BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys |
Yu Gu et.al. |
2310.10765 |
null |
2023-10-16 |
A Survey on Video Diffusion Models |
Zhen Xing et.al. |
2310.10647 |
link |
2023-10-16 |
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts |
Hanan Gani et.al. |
2310.10640 |
link |
2023-10-16 |
ViPE: Visualise Pretty-much Everything |
Hassan Shahmohammadi et.al. |
2310.10543 |
link |
2023-10-16 |
ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion |
Jiayu Yang et.al. |
2310.10343 |
link |
2023-10-16 |
Scene Graph Conditioning in Latent Diffusion |
Frank Fundel et.al. |
2310.10338 |
link |
2023-10-16 |
Evading Detection Actively: Toward Anti-Forensics against Forgery Localization |
Long Zhuo et.al. |
2310.10036 |
null |
2023-10-15 |
Segment Anything Model for Pedestrian Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data |
Jiahao Xia et.al. |
2310.09918 |
null |
2023-10-14 |
Unified High-binding Watermark for Unconditional Image Generation Models |
Ruinan Ma et.al. |
2310.09479 |
null |
2023-10-13 |
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs |
Xiangyu Zhao et.al. |
2310.08949 |
link |
2023-10-13 |
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation |
Jiayu Xiao et.al. |
2310.08872 |
null |
2023-10-12 |
SSG2: A new modelling paradigm for semantic segmentation |
Foivos I. Diakogiannis et.al. |
2310.08671 |
link |
2023-10-12 |
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion |
Xian Liu et.al. |
2310.08579 |
null |
2023-10-12 |
MotionDirector: Motion Customization of Text-to-Video Diffusion Models |
Rui Zhao et.al. |
2310.08465 |
link |
2023-10-12 |
Neural Diffusion Models |
Grigory Bartosh et.al. |
2310.08337 |
null |
2023-10-12 |
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting |
Zijie Chen et.al. |
2310.08129 |
link |
2023-10-12 |
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing |
Zijie Wu et.al. |
2310.08094 |
null |
2023-10-12 |
CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity |
Abdullah Hayajneh et.al. |
2310.07969 |
link |
2023-10-13 |
Generative Modeling with Phase Stochastic Bridges |
Tianrong Chen et.al. |
2310.07805 |
link |
2023-10-11 |
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model |
Xiaofan Li et.al. |
2310.07771 |
link |
2023-10-11 |
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models |
Yingqing He et.al. |
2310.07702 |
link |
2023-10-11 |
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation |
Bo Peng et.al. |
2310.07697 |
link |
2023-10-11 |
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models |
Lai Zeqiang et.al. |
2310.07653 |
link |
2023-10-11 |
Distance-based Weighted Transformer Network for Image Completion |
Pourya Shamsolmoali et.al. |
2310.07440 |
null |
2023-10-11 |
Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else |
Hazarapet Tunanyan et.al. |
2310.07419 |
null |
2023-10-11 |
Crowd Counting in Harsh Weather using Image Denoising with Pix2Pix GANs |
Muhammad Asif Khan et.al. |
2310.07245 |
null |
2023-10-11 |
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model |
Shiyuan Yang et.al. |
2310.07222 |
link |
2023-10-11 |
Echocardiography video synthesis from end diastolic semantic map via diffusion model |
Phi Nguyen Van et.al. |
2310.07131 |
null |
2023-10-10 |
Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images |
Che Liu et.al. |
2310.07027 |
link |
2023-10-10 |
ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning |
Alec Helbling et.al. |
2310.06968 |
null |
2023-10-10 |
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling |
Huangjie Zheng et.al. |
2310.06389 |
link |
2023-10-10 |
JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling |
Jingyang Zhang et.al. |
2310.06347 |
null |
2023-10-10 |
Improving Compositional Text-to-image Generation with Large Vision-Language Models |
Song Wen et.al. |
2310.06311 |
null |
2023-10-09 |
Latent Diffusion Model for DNA Sequence Generation |
Zehui Li et.al. |
2310.06150 |
link |
2023-10-09 |
A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models |
Sebastian G. Gruber et.al. |
2310.05833 |
link |
2023-10-09 |
Language Model Beats Diffusion – Tokenizer is Key to Visual Generation |
Lijun Yu et.al. |
2310.05737 |
link |
2023-10-09 |
Locality-Aware Generalizable Implicit Neural Representation} |
Doyup Lee et.al. |
2310.05624 |
null |
2023-10-09 |
Adaptive Multi-head Contrastive Learning |
Lei Wang et.al. |
2310.05615 |
link |
2023-10-09 |
A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers |
Matteo Bastico et.al. |
2310.05572 |
link |
2023-10-09 |
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers |
Shiyue Cao et.al. |
2310.05400 |
null |
2023-10-08 |
The Emergence of Reproducibility and Consistency in Diffusion Models |
Huijie Zhang et.al. |
2310.05264 |
null |
2023-10-07 |
Generative AI May Prefer to Present National-level Characteristics of Cities Based on Stereotypical Geographic Impressions at the Continental Level |
Shan Ye et.al. |
2310.04897 |
null |
2023-10-07 |
Understanding and Improving Adversarial Attacks on Latent Diffusion Model |
Boyang Zheng et.al. |
2310.04687 |
link |
2023-10-07 |
X-Transfer: A Transfer Learning-Based Framework for Robust GAN-Generated Fake Image Detection |
Lei Zhang et.al. |
2310.04639 |
null |
2023-10-06 |
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference |
Simian Luo et.al. |
2310.04378 |
link |
2023-10-06 |
Assessing Robustness via Score-Based Adversarial Image Generation |
Marcel Kollovieh et.al. |
2310.04285 |
null |
2023-10-05 |
Aligning Text-to-Image Diffusion Models with Reward Backpropagation |
Mihir Prabhudesai et.al. |
2310.03739 |
link |
2023-10-05 |
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency |
Tianhong Li et.al. |
2310.03734 |
null |
2023-10-06 |
MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images |
Yanwu Xu et.al. |
2310.03559 |
link |
2023-10-05 |
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion |
Anton Razzhigaev et.al. |
2310.03502 |
link |
2023-10-04 |
Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel |
Paul Hagemann et.al. |
2310.03054 |
link |
2023-10-04 |
Kosmos-G: Generating Images in Context with Multimodal Large Language Models |
Xichen Pan et.al. |
2310.02992 |
link |
2023-10-04 |
GETAvatar: Generative Textured Meshes for Animatable Human Avatars |
Xuanmeng Zhang et.al. |
2310.02714 |
null |
2023-10-04 |
ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF |
Jangho Park et.al. |
2310.02712 |
null |
2023-10-03 |
GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature |
Aaron Ferber et.al. |
2310.02442 |
null |
2023-10-03 |
FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models |
Yingqian Cui et.al. |
2310.02401 |
null |
2023-10-03 |
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens |
Kaizhi Zheng et.al. |
2310.02239 |
link |
2023-10-03 |
Optimizing microlens arrays for incoherent HiLo microscopy |
Ziao Jiao et.al. |
2310.01939 |
null |
2023-10-03 |
Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation |
Jun Li et.al. |
2310.01819 |
null |
2023-10-02 |
ImagenHub: Standardizing the evaluation of conditional image generation models |
Max Ku et.al. |
2310.01596 |
link |
2023-10-02 |
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code |
Xuan Ju et.al. |
2310.01506 |
link |
2023-10-02 |
Conditional Diffusion Distillation |
Kangfu Mei et.al. |
2310.01407 |
link |
2023-10-02 |
Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models |
Takumi Harada et.al. |
2310.00936 |
null |
2023-10-02 |
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP |
Zixiang Chen et.al. |
2310.00927 |
null |
2023-10-02 |
RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches |
Shawn Mathew et.al. |
2310.00868 |
link |
2023-10-01 |
Completing Visual Objects via Bridging Generation and Segmentation |
Xiang Li et.al. |
2310.00808 |
null |
2023-10-02 |
LLM-grounded Video Diffusion Models |
Long Lian et.al. |
2309.17444 |
null |
2023-09-29 |
Directly Fine-Tuning Diffusion Models on Differentiable Rewards |
Kevin Clark et.al. |
2309.17400 |
null |
2023-09-29 |
Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning |
Zihan Ding et.al. |
2309.16984 |
link |
2023-09-29 |
Leveraging Optimization for Adaptive Attacks on Image Watermarks |
Nils Lukas et.al. |
2309.16952 |
link |
2023-09-29 |
Denoising Diffusion Bridge Models |
Linqi Zhou et.al. |
2309.16948 |
link |
2023-09-28 |
CCEdit: Creative and Controllable Video Editing via Diffusion Models |
Ruoyu Feng et.al. |
2309.16496 |
null |
2023-09-28 |
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation |
Guy Yariv et.al. |
2309.16429 |
link |
2023-09-28 |
Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning |
Albert Mohwald et.al. |
2309.16351 |
link |
2023-09-28 |
OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions |
Jin Liu et.al. |
2309.16148 |
null |
2023-09-27 |
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness |
Valentin Barriere et.al. |
2309.15991 |
null |
2023-09-27 |
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation |
David Junhao Zhang et.al. |
2309.15818 |
link |
2023-09-27 |
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack |
Xiaoliang Dai et.al. |
2309.15807 |
null |
2023-09-27 |
Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation |
Xin Yuan et.al. |
2309.15726 |
null |
2023-09-27 |
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing |
Kai Wang et.al. |
2309.15664 |
link |
2023-09-27 |
Position and Orientation-Aware One-Shot Learning for Medical Action Recognition from Signal Data |
Leiyu Xie et.al. |
2309.15635 |
null |
2023-09-28 |
Jointly Training Large Autoregressive Multimodal Models |
Emanuele Aiello et.al. |
2309.15564 |
null |
2023-09-27 |
Teaching Text-to-Image Models to Communicate |
Xiaowen Sun et.al. |
2309.15516 |
null |
2023-09-27 |
DreamCom: Finetuning Text-guided Inpainting Model for Image Composition |
Lingxiao Lu et.al. |
2309.15508 |
null |
2023-09-27 |
Finite Scalar Quantization: VQ-VAE Made Simple |
Fabian Mentzer et.al. |
2309.15505 |
link |
2023-09-27 |
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models |
Yaohui Wang et.al. |
2309.15103 |
link |
2023-09-26 |
Seimei KOOLS-IFU mapping of the gas and dust distributions in Galactic PNe: Unveiling the origin and evolution of Galactic halo PN H4-1 |
Masaaki Otsuka et.al. |
2309.15099 |
null |
2023-09-26 |
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning |
Han Lin et.al. |
2309.15091 |
null |
2023-09-26 |
Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation |
Shin-Ying Yeh et.al. |
2309.14859 |
link |
2023-09-26 |
On quantifying and improving realism of images generated with diffusion |
Yunzhuo Chen et.al. |
2309.14756 |
null |
2023-09-27 |
Text-to-Image Generation for Abstract Concepts |
Jiayi Liao et.al. |
2309.14623 |
null |
2023-09-25 |
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator |
Hanzhuo Huang et.al. |
2309.14494 |
link |
2023-09-25 |
Chop & Learn: Recognizing and Generating Object-State Compositions |
Nirat Saini et.al. |
2309.14339 |
null |
2023-09-27 |
Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation |
Quang Nguyen et.al. |
2309.14303 |
link |
2023-09-25 |
Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments |
Najmeh Mohammadbagheri et.al. |
2309.14267 |
null |
2023-09-25 |
SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution |
Zhongjie Ba et.al. |
2309.14122 |
link |
2023-09-25 |
Diverse Semantic Image Editing with Style Codes |
Hakan Sivuk et.al. |
2309.13975 |
link |
2023-09-23 |
GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER |
Mingzhen Sun et.al. |
2309.13274 |
link |
2023-09-23 |
Randomize to Generalize: Domain Randomization for Runway FOD Detection |
Javaria Farooq et.al. |
2309.13264 |
null |
2023-09-23 |
NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation |
Rui Yu et.al. |
2309.13240 |
null |
2023-09-21 |
POLAR3D: Augmenting NASA’s POLAR Dataset for Data-Driven Lunar Perception and Rover Simulation |
Bo-Hsun Chen et.al. |
2309.12397 |
link |
2023-09-21 |
TextCLIP: Text-Guided Face Image Generation And Manipulation Without Adversarial Training |
Xiaozhou You et.al. |
2309.11923 |
null |
2023-09-21 |
PIE: Simulating Disease Progression via Progressive Image Editing |
Kaizhao Liang et.al. |
2309.11745 |
link |
2023-09-24 |
Latent Diffusion Models for Structural Component Design |
Ethan Herron et.al. |
2309.11601 |
null |
2023-09-20 |
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge |
Manuel Brack et.al. |
2309.11575 |
null |
2023-09-20 |
FreeU: Free Lunch in Diffusion U-Net |
Chenyang Si et.al. |
2309.11497 |
link |
2023-09-20 |
Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation |
Hyelin Nam et.al. |
2309.11127 |
null |
2023-09-21 |
Learning End-to-End Channel Coding with Diffusion Models |
Muah Kim et.al. |
2309.10505 |
null |
2023-09-23 |
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration |
Lijiang Li et.al. |
2309.10438 |
link |
2023-09-19 |
Language Guided Adversarial Purification |
Himanshu Singh et.al. |
2309.10348 |
link |
2023-09-18 |
Multimodal Foundation Models: From Specialists to General-Purpose Assistants |
Chunyuan Li et.al. |
2309.10020 |
link |
2023-09-18 |
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving |
Xiaofeng Wang et.al. |
2309.09777 |
null |
2023-09-18 |
Gradpaint: Gradient-Guided Inpainting with Diffusion Models |
Asya Grechka et.al. |
2309.09614 |
null |
2023-09-18 |
DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues |
Kun Pan et.al. |
2309.09526 |
link |
2023-09-18 |
Progressive Text-to-Image Diffusion with Soft Latent Direction |
YuTeng Ye et.al. |
2309.09466 |
link |
2023-09-15 |
Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models |
Feihong He et.al. |
2309.08251 |
null |
2023-09-15 |
Talkin’ ‘Bout AI Generation: Copyright and the Generative-AI Supply Chain |
Katherine Lee et.al. |
2309.08133 |
null |
2023-09-15 |
Increasing diversity of omni-directional images generated from single image using cGAN based on MLPMixer |
Atsuya Nakata et.al. |
2309.08129 |
link |
2023-09-14 |
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset |
Iya Chivileva et.al. |
2309.08009 |
null |
2023-09-14 |
Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models |
James Burgess et.al. |
2309.07986 |
link |
2023-09-14 |
ALWOD: Active Learning for Weakly-Supervised Object Detection |
Yuting Wang et.al. |
2309.07914 |
link |
2023-09-13 |
Unbiased Face Synthesis With Diffusion Models: Are We There Yet? |
Harrison Rosenberg et.al. |
2309.07277 |
link |
2023-09-13 |
MagiCapture: High-Resolution Multi-Concept Portrait Customization |
Junha Hyung et.al. |
2309.06895 |
null |
2023-09-12 |
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation |
Xingchao Liu et.al. |
2309.06380 |
link |
2023-09-12 |
Elucidating the solution space of extended reverse-time SDE for diffusion models |
Qinpeng Cui et.al. |
2309.06169 |
link |
2023-09-12 |
Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation |
Ling Huang et.al. |
2309.05919 |
link |
2023-09-11 |
Divergences in Color Perception between Deep Neural Networks and Humans |
Ethan O. Nadler et.al. |
2309.05809 |
link |
2023-09-11 |
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models |
Li Chen et.al. |
2309.05793 |
null |
2023-09-11 |
ITI-GEN: Inclusive Text-to-Image Generation |
Cheng Zhang et.al. |
2309.05569 |
link |
2023-09-11 |
PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud |
Chengyu Wang et.al. |
2309.05534 |
null |
2023-09-11 |
Comprehensive analysis of synthetic learning applied to neonatal brain MRI segmentation |
R Valabregue et.al. |
2309.05306 |
link |
2023-09-10 |
Gender Bias in Multimodal Models: A Transnational Feminist Approach Considering Geographical Region and Culture |
Abhishek Mandal et.al. |
2309.04997 |
null |
2023-09-09 |
Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video |
Xiuzhe Wu et.al. |
2309.04814 |
link |
2023-09-08 |
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion |
Yujin Jeong et.al. |
2309.04509 |
null |
2023-09-08 |
Create Your World: Lifelong Text-to-Image Diffusion |
Gan Sun et.al. |
2309.04430 |
null |
2023-09-08 |
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers |
Sijia Li et.al. |
2309.04372 |
null |
2023-09-08 |
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation |
Hyelin Nam et.al. |
2309.04287 |
null |
2023-09-08 |
Robot Localization and Mapping Final Report – Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry |
Akankshya Kar et.al. |
2309.04147 |
null |
2023-09-08 |
From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models |
Changming Xiao et.al. |
2309.04109 |
link |
2023-09-07 |
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis |
Jiapeng Zhu et.al. |
2309.03904 |
link |
2023-09-07 |
T2IW: Joint Text to Image & Watermark Generation |
An-An Liu et.al. |
2309.03815 |
null |
2023-09-07 |
Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model |
Sungwon Hwang et.al. |
2309.03550 |
null |
2023-09-07 |
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation |
Jiaxi Gu et.al. |
2309.03549 |
null |
2023-09-07 |
Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation |
Zhuqiang Lu et.al. |
2309.03467 |
link |
2023-09-06 |
My Art My Choice: Adversarial Protection Against Unruly AI |
Anthony Rhodes et.al. |
2309.03198 |
null |
2023-09-06 |
Hierarchical-level rain image generative model based on GAN |
Zhenyuan Liu et.al. |
2309.02964 |
null |
2023-09-06 |
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network |
Takashi Shibuya et.al. |
2309.02836 |
link |
2023-09-06 |
Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter |
Jinglong Wang et.al. |
2309.02773 |
link |
2023-09-05 |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning |
Lili Yu et.al. |
2309.02591 |
null |
2023-09-05 |
Diffusion on the Probability Simplex |
Griffin Floto et.al. |
2309.02530 |
null |
2023-09-05 |
Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface |
Atieh Taheri et.al. |
2309.02402 |
null |
2023-09-05 |
Exchanging-based Multimodal Fusion with Transformer |
Renyu Zhu et.al. |
2309.02190 |
link |
2023-09-04 |
Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images |
Jamiu Idowu et.al. |
2309.01850 |
null |
2023-09-04 |
StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation |
Zhouxia Wang et.al. |
2309.01770 |
null |
2023-09-04 |
Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion |
Ryota Yoshihashi et.al. |
2309.01369 |
null |
2023-09-04 |
Mutual Information Maximizing Quantum Generative Adversarial Network and Its Applications in Finance |
Mingyu Lee et.al. |
2309.01363 |
null |
2023-09-03 |
Diffusion Models with Deterministic Normalizing Flow Priors |
Mohsen Zand et.al. |
2309.01274 |
link |
2023-09-03 |
Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection |
Weijie Wang et.al. |
2309.01104 |
null |
2023-09-02 |
Constrained CycleGAN for Effective Generation of Ultrasound Sector Images of Improved Spatial Resolution |
Xiaofei Sun et.al. |
2309.00995 |
link |
2023-09-02 |
Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities |
Shanyuan Liu et.al. |
2309.00952 |
null |
2023-09-01 |
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation |
Xin Li et.al. |
2309.00398 |
null |
2023-09-01 |
DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using Stable Diffusion Models |
Michael Shenoda et.al. |
2309.00248 |
link |
2023-09-01 |
Diffusion Model with Clustering-based Conditioning for Food Image Generation |
Yue Han et.al. |
2309.00199 |
null |
2023-08-31 |
StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation |
Yuhan Wang et.al. |
2308.16909 |
link |
2023-08-31 |
Diffusion Models for Interferometric Satellite Aperture Radar |
Alexandre Tuel et.al. |
2308.16847 |
link |
2023-08-31 |
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images |
Cuican Yu et.al. |
2308.16758 |
null |
2023-08-31 |
Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps |
Miguel Espinosa et.al. |
2308.16648 |
link |
2023-08-31 |
Detecting Out-of-Context Image-Caption Pairs in News: A Counter-Intuitive Method |
Eivind Moholdt et.al. |
2308.16611 |
link |
2023-08-30 |
Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation |
Mengping Yang et.al. |
2308.16110 |
link |
2023-08-30 |
Semantic Image Synthesis via Class-Adaptive Cross-Attention |
Tomaso Fontanini et.al. |
2308.16071 |
null |
2023-08-30 |
Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models |
Takami Sato et.al. |
2308.15692 |
null |
2023-08-29 |
Learning Modulated Transformation in GANs |
Ceyuan Yang et.al. |
2308.15472 |
link |
2023-08-29 |
IndGIC: Supervised Action Recognition under Low Illumination |
Jingbo Zeng et.al. |
2308.15345 |
null |
2023-08-29 |
A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information |
Shuxiao Ma et.al. |
2308.15142 |
null |
2023-08-28 |
Automated Conversion of Music Videos into Lyric Videos |
Jiaju Ma et.al. |
2308.14922 |
null |
2023-08-28 |
RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-centric Learning |
Nathan Drenkow et.al. |
2308.14899 |
null |
2023-08-28 |
Identifying and Mitigating the Security Risks of Generative AI |
Clark Barrett et.al. |
2308.14840 |
null |
2023-08-28 |
MagicAvatar: Multimodal Avatar Generation and Animation |
Jianfeng Zhang et.al. |
2308.14748 |
null |
2023-08-28 |
Causality-Based Feature Importance Quantifying Methods:PN-FI, PS-FI and PNS-FI |
Shuxian Du et.al. |
2308.14474 |
null |
2023-08-28 |
Semi-Supervised Semantic Depth Estimation using Symbiotic Transformer and NearFarMix Augmentation |
Md Awsafur Rahman et.al. |
2308.14400 |
null |
2023-08-28 |
FaceChain: A Playground for Identity-Preserving Portrait Generation |
Yang Liu et.al. |
2308.14256 |
link |
2023-08-28 |
HoloFusion: Towards Photo-realistic 3D Generative Modeling |
Animesh Karnewar et.al. |
2308.14244 |
null |
2023-08-27 |
A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy |
Forough Fazeli-Asl et.al. |
2308.14048 |
null |
2023-08-26 |
Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models |
Hao Fei et.al. |
2308.13812 |
null |
2023-08-26 |
ORES: Open-vocabulary Responsible Visual Synthesis |
Minheng Ni et.al. |
2308.13785 |
link |
2023-08-25 |
Residual Denoising Diffusion Models |
Jiawei Liu et.al. |
2308.13712 |
link |
2023-08-25 |
Is Deep Learning Network Necessary for Image Generation? |
Chenqiu Zhao et.al. |
2308.13612 |
null |
2023-08-25 |
WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI |
Hai Dang et.al. |
2308.13355 |
null |
2023-08-25 |
Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model |
Xunpeng Yi et.al. |
2308.13164 |
null |
2023-08-25 |
A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions |
Tianyi Zhang et.al. |
2308.13142 |
null |
2023-08-24 |
Dense Text-to-Image Generation with Attention Modulation |
Yunji Kim et.al. |
2308.12964 |
link |
2023-08-24 |
APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency |
Yupu Yao et.al. |
2308.12605 |
null |
2023-08-23 |
Augmenting medical image classifiers with synthetic data from latent diffusion models |
Luke W. Sagers et.al. |
2308.12453 |
null |
2023-08-23 |
DISGAN: Wavelet-informed Discriminator Guides GAN to MRI Super-resolution with Noise Cleaning |
Qi Wang et.al. |
2308.12084 |
link |
2023-08-23 |
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages |
Jinyi Hu et.al. |
2308.12038 |
link |
2023-08-23 |
Efficient Transfer Learning in Diffusion Models via Adversarial Noise |
Xiyu Wang et.al. |
2308.11948 |
null |
2023-08-23 |
LFS-GAN: Lifelong Few-Shot Image Generation |
Juwon Seo et.al. |
2308.11917 |
link |
2023-08-23 |
CoC-GAN: Employing Context Cluster for Unveiling a New Pathway in Image Generation |
Zihao Wang et.al. |
2308.11857 |
null |
2023-08-22 |
Ceci n’est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings |
Eugene Bagdasaryan et.al. |
2308.11804 |
link |
2023-08-22 |
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization |
Emanuele Bugliarello et.al. |
2308.11606 |
link |
2023-08-22 |
Open Set Synthetic Image Source Attribution |
Shengbang Fang et.al. |
2308.11557 |
null |
2023-08-22 |
Hamiltonian GAN |
Christine Allen-Blanchette et.al. |
2308.11216 |
null |
2023-08-22 |
MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers |
Daniel Silver et.al. |
2308.11096 |
null |
2023-08-21 |
Debiasing Counterfactuals In the Presence of Spurious Correlations |
Amar Kumar et.al. |
2308.10984 |
null |
2023-08-21 |
Sampling From Autoencoders’ Latent Space via Quantization And Probability Mass Function Concepts |
Aymene Mohammed Bouayed et.al. |
2308.10704 |
null |
2023-08-20 |
Turning Waste into Wealth: Leveraging Low-Quality Samples for Enhancing Continuous Conditional Generative Adversarial Networks |
Xin Ding et.al. |
2308.10273 |
link |
2023-08-20 |
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data |
Yanda Li et.al. |
2308.10253 |
link |
2023-08-20 |
Spiking-Diffusion: Vector Quantized Discrete Diffusion Model with Spiking Neural Networks |
Mingxuan Liu et.al. |
2308.10187 |
link |
2023-08-20 |
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation |
Chengyou Jia et.al. |
2308.10156 |
null |
2023-08-19 |
ASPIRE: Language-Guided Augmentation for Robust Image Classification |
Sreyan Ghosh et.al. |
2308.10103 |
link |
2023-08-19 |
ControlCom: Controllable Image Composition using Diffusion Model |
Bo Zhang et.al. |
2308.10040 |
link |
2023-08-19 |
ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval |
Kaihang Pan et.al. |
2308.10025 |
null |
2023-08-19 |
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization |
Xiaoyu Ye et.al. |
2308.09889 |
null |
2023-08-18 |
Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis |
Shijie Li et.al. |
2308.09835 |
link |
2023-08-18 |
SimDA: Simple Diffusion Adapter for Efficient Video Generation |
Zhen Xing et.al. |
2308.09710 |
null |
2023-08-18 |
Guide3D: Create 3D Avatars from Text and Image Guidance |
Yukang Cao et.al. |
2308.09705 |
null |
2023-08-18 |
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability |
Runhui Huang et.al. |
2308.09306 |
null |
2023-08-18 |
RFDforFin: Robust Deep Forgery Detection for GAN-generated Fingerprint Images |
Hui Miao et.al. |
2308.09285 |
null |
2023-08-17 |
Watch Your Steps: Local Image and Scene Editing by Text Instructions |
Ashkan Mirzaei et.al. |
2308.08947 |
null |
2023-08-16 |
Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment |
Qi Chen et.al. |
2308.08525 |
link |
2023-08-16 |
Painter: Teaching Auto-regressive Language Models to Draw Sketches |
Reza Pourreza et.al. |
2308.08520 |
null |
2023-08-16 |
Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising Diffusion Model |
Ran Jiang et.al. |
2308.08367 |
null |
2023-08-16 |
Denoising Diffusion Probabilistic Model for Retinal Image Generation and Segmentation |
Alnur Alimanov et.al. |
2308.08339 |
link |
2023-08-18 |
Dual-Stream Diffusion Net for Text-to-Video Generation |
Binhui Liu et.al. |
2308.08316 |
null |
2023-08-16 |
Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis |
Minho Park et.al. |
2308.08157 |
link |
2023-08-16 |
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory |
Shengming Yin et.al. |
2308.08089 |
null |
2023-08-15 |
Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training |
Ximing Xing et.al. |
2308.07665 |
link |
2023-08-15 |
Story Visualization by Online Text Augmentation with Context Memory |
Daechul Ahn et.al. |
2308.07575 |
link |
2023-08-13 |
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks |
David Junhao Zhang et.al. |
2308.06739 |
null |
2023-08-13 |
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models |
Hu Ye et.al. |
2308.06721 |
null |
2023-08-13 |
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts |
Binbin Yang et.al. |
2308.06713 |
null |
2023-08-12 |
Semantic Communications with Explicit Semantic Base for Image Transmission |
Yuan Zheng et.al. |
2308.06599 |
null |
2023-08-11 |
White-box Membership Inference Attacks against Diffusion Models |
Yan Pang et.al. |
2308.06405 |
null |
2023-08-15 |
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity |
Melissa Hall et.al. |
2308.06198 |
link |
2023-08-11 |
Improving Joint Speech-Text Representations Without Alignment |
Cal Peyser et.al. |
2308.06125 |
null |
2023-08-11 |
Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation |
Yuki Endo et.al. |
2308.06027 |
link |
2023-08-10 |
SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network |
Chenwei Wang et.al. |
2308.05489 |
null |
2023-08-10 |
Beyond Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization |
Hongyang Du et.al. |
2308.05384 |
link |
2023-08-09 |
PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions |
John Joon Young Chung et.al. |
2308.05184 |
link |
2023-08-12 |
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation |
Leigang Qu et.al. |
2308.05095 |
null |
2023-08-13 |
TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design |
Yifan Gao et.al. |
2308.04733 |
null |
2023-08-09 |
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization |
Hao Fang et.al. |
2308.04699 |
link |
2023-08-08 |
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images |
Xuechao Zou et.al. |
2308.04417 |
link |
2023-08-08 |
The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings |
Timothy Merino et.al. |
2308.04052 |
link |
2023-08-05 |
DermoSegDiff: A Boundary-aware Segmentation Diffusion Model for Skin Lesion Delineation |
Afshin Bozorgpour et.al. |
2308.02959 |
link |
2023-08-05 |
Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation |
Zijie Wu et.al. |
2308.02874 |
null |
2023-08-03 |
ConceptLab: Creative Generation using Diffusion Prior Constraints |
Elad Richardson et.al. |
2308.02669 |
link |
2023-08-04 |
Towards Personalized Prompt-Model Retrieval for Generative Recommendation |
Yuanhe Guo et.al. |
2308.02205 |
link |
2023-08-04 |
SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation |
Shikun Sun et.al. |
2308.02154 |
null |
2023-08-03 |
Focus on Content not Noise: Improving Image Generation for Nuclei Segmentation by Suppressing Steganography in CycleGAN |
Jonas Utz et.al. |
2308.01769 |
null |
2023-08-07 |
BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout |
Kairui Yang et.al. |
2308.01661 |
null |
2023-08-03 |
Interleaving GANs with knowledge graphs to support design creativity for book covers |
Alexandru Motogna et.al. |
2308.01626 |
link |
2023-08-03 |
Circumventing Concept Erasure Methods For Text-to-Image Generative Models |
Minh Pham et.al. |
2308.01508 |
link |
2023-08-02 |
Reverse Stable Diffusion: What prompt was used to generate this image? |
Florinel-Alin Croitoru et.al. |
2308.01472 |
link |
2023-08-02 |
Revisiting DETR Pre-training for Object Detection |
Yan Ma et.al. |
2308.01300 |
null |
2023-08-02 |
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation |
Guojin Zhong et.al. |
2308.01147 |
link |
2023-08-01 |
The Bias Amplification Paradox in Text-to-Image Generation |
Preethi Seshadri et.al. |
2308.00755 |
link |
2023-08-01 |
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models |
Cheng-Yu Hsieh et.al. |
2308.00675 |
null |
2023-08-01 |
Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design |
Jaechang Ko et.al. |
2308.00227 |
null |
2023-08-01 |
SkullGAN: Synthetic Skull CT Generation with Generative Adversarial Networks |
Kasra Naftchi-Ardebili et.al. |
2308.00206 |
link |
2023-07-28 |
Testing the Depth of ChatGPT’s Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5’s Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking |
David Bayani et.al. |
2307.16806 |
null |
2023-07-31 |
DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation |
Runyang Feng et.al. |
2307.16687 |
null |
2023-07-31 |
Towards General Visual-Linguistic Face Forgery Detection |
Ke Sun et.al. |
2307.16545 |
null |
2023-07-31 |
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models |
Jordan Vice et.al. |
2307.16489 |
link |
2023-07-31 |
HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution |
Minyi Zhao et.al. |
2307.16410 |
null |
2023-07-31 |
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text |
Junchen Zhu et.al. |
2307.16371 |
null |
2023-07-30 |
Mask-guided Data Augmentation for Multiparametric MRI Generation with a Rare Hepatocellular Carcinoma |
Karen Sanchez et.al. |
2307.16314 |
null |
2023-07-30 |
Stylized Projected GAN: A Novel Architecture for Fast and Realistic Image Generation |
Md Nurul Muttakin et.al. |
2307.16275 |
null |
2023-07-29 |
HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation |
Zuyan Liu et.al. |
2307.16061 |
null |
2023-07-28 |
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search |
Alexander Chebykin et.al. |
2307.15621 |
link |
2023-07-28 |
RAWIW: RAW Image Watermarking Robust to ISP Pipeline |
Kang Fu et.al. |
2307.15443 |
null |
2023-07-28 |
Staging E-Commerce Products for Online Advertising using Retrieval Assisted Image Generation |
Yueh-Ning Ku et.al. |
2307.15326 |
null |
2023-07-27 |
Semantic Image Completion and Enhancement using GANs |
Priyansh Saxena et.al. |
2307.14748 |
null |
2023-07-31 |
Pre-training Vision Transformers with Very Limited Synthesized Images |
Ryo Nakamura et.al. |
2307.14710 |
link |
2023-07-27 |
LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement |
Tao Wang et.al. |
2307.14659 |
link |
2023-07-27 |
EqGAN: Feature Equalization Fusion for Few-shot Image Generation |
Yingbo Zhou et.al. |
2307.14638 |
null |
2023-07-26 |
Deepfake Image Generation for Improved Brain Tumor Segmentation |
Roa’a Al-Emaryeen et.al. |
2307.14273 |
null |
2023-07-26 |
Learning Disentangled Discrete Representations |
David Friede et.al. |
2307.14151 |
link |
2023-07-26 |
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet |
Zhihao Hu et.al. |
2307.14073 |
null |
2023-07-25 |
**Composite Diffusion |
whole >= Σparts** |
Vikram Jamwal et.al. |
2307.13720 |
2023-07-25 |
Fake It Without Making It: Conditioned Face Generation for Accurate 3D Face Shape Estimation |
Will Rowan et.al. |
2307.13639 |
null |
2023-07-25 |
XDLM: Cross-lingual Diffusion Language Model for Machine Translation |
Linyao Chen et.al. |
2307.13560 |
null |
2023-07-25 |
Not with my name! Inferring artists’ names of input strings employed by Diffusion Models |
Roberto Leotta et.al. |
2307.13527 |
link |
2023-07-24 |
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models |
Jindong Gu et.al. |
2307.12980 |
link |
2023-07-24 |
Interpolating between Images with Diffusion Models |
Clinton J. Wang et.al. |
2307.12560 |
null |
2023-07-22 |
Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis |
Hao Tang et.al. |
2307.12084 |
link |
2023-07-21 |
PartDiff: Image Super-resolution with Partial Diffusion Models |
Kai Zhao et.al. |
2307.11926 |
null |
2023-07-21 |
UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN |
Zhaojie Fang et.al. |
2307.11530 |
link |
2023-07-21 |
Attention Consistency Refined Masked Frequency Forgery Representation for Generalizing Face Forgery Detection |
Decheng Liu et.al. |
2307.11438 |
link |
2023-07-21 |
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning |
Jian Ma et.al. |
2307.11410 |
link |
2023-07-20 |
Diffusion Sampling with Momentum for Mitigating Divergence Artifacts |
Suttisak Wizadwongsa et.al. |
2307.11118 |
link |
2023-07-20 |
Progressive distillation diffusion for raw music generation |
Svetlana Pavlova et.al. |
2307.10994 |
null |
2023-07-20 |
Divide & Bind Your Attention for Improved Generative Semantic Nursing |
Yumeng Li et.al. |
2307.10864 |
link |
2023-07-20 |
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation |
Fa-Ting Hong et.al. |
2307.09906 |
link |
2023-07-19 |
Compressive Image Scanning Microscope |
Ajay Gunalan et.al. |
2307.09841 |
link |
2023-07-19 |
A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images |
Lydia Abady et.al. |
2307.09822 |
link |
2023-07-19 |
Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline |
Zhigang Chang et.al. |
2307.09821 |
null |
2023-07-19 |
Text2Layer: Layered Image Generation using Latent Diffusion Model |
Xinyang Zhang et.al. |
2307.09781 |
null |
2023-07-18 |
AnyDoor: Zero-shot Object-level Image Customization |
Xi Chen et.al. |
2307.09481 |
link |
2023-07-19 |
Let’s ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation |
Federico Betti et.al. |
2307.09416 |
null |
2023-07-18 |
Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow |
Dogucan Yaman et.al. |
2307.09368 |
null |
2023-07-18 |
Augmenting CLIP with Improved Visio-Linguistic Reasoning |
Samyadeep Basu et.al. |
2307.09233 |
null |
2023-07-18 |
Jean-Luc Picard at Touché 2023: Comparing Image Generation, Stance Detection and Feature Matching for Image Retrieval for Arguments |
Max Moebius et.al. |
2307.09172 |
null |
2023-07-18 |
Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond |
Yang Zhao et.al. |
2307.08996 |
null |
2023-07-18 |
PromptCrafter: Crafting Text-to-Image Prompt through Mixed-Initiative Dialogue with LLM |
Seungho Baek et.al. |
2307.08985 |
null |
2023-07-17 |
Harnessing the Power of AI based Image Generation Model DALLE 2 in Agricultural Settings |
Ranjan Sapkota et.al. |
2307.08789 |
null |
2023-07-17 |
Diffusion Models Beat GANs on Image Classification |
Soumik Mukhopadhyay et.al. |
2307.08702 |
null |
2023-07-17 |
Flow Matching in Latent Space |
Quan Dao et.al. |
2307.08698 |
link |
2023-07-17 |
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning |
Tri Dao et.al. |
2307.08691 |
link |
2023-07-17 |
Image Captions are Natural Prompts for Text-to-Image Models |
Shiye Lei et.al. |
2307.08526 |
null |
2023-07-17 |
Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data |
Kai Katsumata et.al. |
2307.08319 |
null |
2023-07-17 |
Manifold-Guided Sampling in Diffusion Models for Unbiased Image Generation |
Xingzhe Su et.al. |
2307.08199 |
null |
2023-07-16 |
Planting a SEED of Vision in Large Language Model |
Yuying Ge et.al. |
2307.08041 |
link |
2023-07-15 |
Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning? |
Jialu Gao et.al. |
2307.07837 |
null |
2023-07-18 |
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer |
Wing-Yin Yu et.al. |
2307.07754 |
link |
2023-07-14 |
GenAssist: Making Image Generation Accessible |
Mina Huh et.al. |
2307.07589 |
null |
2023-07-14 |
Generative adversarial networks for data-scarce spectral applications |
Juan José García-Esteban et.al. |
2307.07454 |
null |
2023-07-13 |
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation |
Yi Wang et.al. |
2307.06942 |
link |
2023-07-13 |
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation |
Yingqing He et.al. |
2307.06940 |
link |
2023-07-13 |
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models |
Moab Arar et.al. |
2307.06925 |
null |
2023-07-13 |
Improving Nonalcoholic Fatty Liver Disease Classification Performance With Latent Diffusion Models |
Romain Hardy et.al. |
2307.06507 |
null |
2023-07-12 |
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation |
Kaiyi Huang et.al. |
2307.06350 |
link |
2023-07-12 |
Facial Reenactment Through a Personalized Generator |
Ariel Elazary et.al. |
2307.06307 |
null |
2023-07-12 |
CellGAN: Conditional Cervical Cell Synthesis for Augmenting Cytopathological Image Classification |
Zhenrong Shen et.al. |
2307.06182 |
link |
2023-07-12 |
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models |
Sanghyun Kim et.al. |
2307.05977 |
link |
2023-07-12 |
DiffuseGAE: Controllable and High-fidelity Image Manipulation from Disentangled Representation |
Yipeng Leng et.al. |
2307.05899 |
null |
2023-07-12 |
Precise Image Generation on Current Noisy Quantum Computing Devices |
Florian Rehm et.al. |
2307.05253 |
null |
2023-07-11 |
Generative Pretraining in Multimodality |
Quan Sun et.al. |
2307.05222 |
link |
2023-07-11 |
TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation |
Paul Grimal et.al. |
2307.05134 |
link |
2023-07-11 |
SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation |
Zhengxin Lei et.al. |
2307.05087 |
null |
2023-07-11 |
Diffusion idea exploration for art generation |
Nikhil Verma et.al. |
2307.04978 |
null |
2023-07-10 |
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback |
Jaskirat Singh et.al. |
2307.04749 |
null |
2023-07-11 |
DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer |
Dan Ruta et.al. |
2307.04157 |
null |
2023-07-09 |
Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance |
Paul Kuo-Ming Huang et.al. |
2307.04081 |
null |
2023-07-08 |
Measuring the Success of Diffusion Models at Imitating Human Artists |
Stephen Casper et.al. |
2307.04028 |
null |
2023-07-08 |
HUMS2023 Data Challenge Result Submission |
Dhiraj Neupane et.al. |
2307.03871 |
null |
2023-07-07 |
Synthesizing Forestry Images Conditioned on Plant Phenotype Using a Generative Adversarial Network |
Debasmita Pal et.al. |
2307.03789 |
link |
2023-07-07 |
RGB-D Mapping and Tracking in a Plenoxel Radiance Field |
Andreas L. Teigen et.al. |
2307.03404 |
link |
2023-07-06 |
VideoGLUE: Video General Understanding Evaluation of Foundation Models |
Liangzhe Yuan et.al. |
2307.03166 |
link |
2023-07-06 |
On the Cultural Gap in Text-to-Image Generation |
Bingshuai Liu et.al. |
2307.02971 |
null |
2023-07-06 |
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback |
TaeHo Yoon et.al. |
2307.02770 |
link |
2023-07-05 |
Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation |
Sébastien Lachapelle et.al. |
2307.02598 |
link |
2023-07-05 |
Diffusion Models for Computational Design at the Example of Floor Plans |
Joern Ploennigs et.al. |
2307.02511 |
link |
2023-07-05 |
Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality |
Peter Lorenz et.al. |
2307.02347 |
link |
2023-07-05 |
On the Adversarial Robustness of Generative Autoencoders in the Latent Space |
Mingfei Lu et.al. |
2307.02202 |
null |
2023-07-05 |
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation |
Rui Gong et.al. |
2307.02138 |
null |
2023-07-04 |
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis |
Dustin Podell et.al. |
2307.01952 |
link |
2023-07-04 |
A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization |
Kshama Kodthalu Shivashankara et.al. |
2307.01946 |
link |
2023-07-04 |
Text + Sketch: Image Compression at Ultra Low Rates |
Eric Lei et.al. |
2307.01944 |
link |
2023-07-04 |
Generative Artificial Intelligence Consensus in a Trustless Network |
Edward Kim et.al. |
2307.01898 |
null |
2023-07-04 |
Training Energy-Based Models with Diffusion Contrastive Divergences |
Weijian Luo et.al. |
2307.01668 |
null |
2023-07-04 |
AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation |
Yunqing Zhao et.al. |
2307.01465 |
null |
2023-07-03 |
Squeezing Large-Scale Diffusion Models for Mobile |
Jiwoong Choi et.al. |
2307.01193 |
null |
2023-07-03 |
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion |
Shitao Tang et.al. |
2307.01097 |
link |
2023-07-03 |
DifFSS: Diffusion Model for Few-Shot Semantic Segmentation |
Weimin Tan et.al. |
2307.00773 |
link |
2023-07-02 |
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance |
Linoy Tsaban et.al. |
2307.00522 |
null |
2023-07-01 |
DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation |
Zhuowei Chen et.al. |
2307.00300 |
null |
2023-07-01 |
AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence |
Jiarui Wang et.al. |
2307.00211 |
link |
2023-06-30 |
Stay on topic with Classifier-Free Guidance |
Guillaume Sanchez et.al. |
2306.17806 |
null |
2023-06-30 |
Practical and Asymptotically Exact Conditional Sampling in Diffusion Models |
Luhuan Wu et.al. |
2306.17775 |
link |
2023-06-30 |
Counting Guidance for High Fidelity Text-to-Image Synthesis |
Wonjun Kang et.al. |
2306.17567 |
null |
2023-06-30 |
Class-Incremental Learning using Diffusion Model for Distillation and Replay |
Quentin Jodelet et.al. |
2306.17560 |
null |
2023-06-30 |
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals |
Yunpeng Bai et.al. |
2306.16934 |
link |
2023-06-29 |
CLIPAG: Towards Generator-Free Text-to-Image Generation |
Roy Ganz et.al. |
2306.16805 |
null |
2023-06-28 |
Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing |
Jiechong Song et.al. |
2306.16060 |
link |
2023-06-27 |
Semi-supervised Multimodal Representation Learning through a Global Workspace |
Benjamin Devillers et.al. |
2306.15711 |
link |
2023-06-26 |
A Simple and Effective Baseline for Attentional Generative Adversarial Networks |
Mingyu Jin et.al. |
2306.14708 |
link |
2023-06-26 |
Localized Text-to-Image Generation for Free via Cross Attention Control |
Yutong He et.al. |
2306.14636 |
null |
2023-06-26 |
A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis |
Aishwarya Agarwal et.al. |
2306.14544 |
null |
2023-06-26 |
Progressive Energy-Based Cooperative Learning for Multi-Domain Image-to-Image Translation |
Weinan Song et.al. |
2306.14448 |
null |
2023-06-26 |
Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models |
Luozhou Wang et.al. |
2306.14408 |
link |
2023-06-25 |
DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data |
Jingyuan Zhu et.al. |
2306.14153 |
null |
2023-06-24 |
UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation |
Michael Ogezi et.al. |
2306.14067 |
link |
2023-06-23 |
Zero-shot spatial layout conditioning for text-to-image diffusion models |
Guillaume Couairon et.al. |
2306.13754 |
null |