Updated on 2026.05.06
Usage instructions: here
world model
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-05-05 | Implementing True MPI Sessions and Evaluating MPI Initialization Scalability | Hui Zhou et.al. | 2605.03983 | null |
| 2026-05-05 | A Benchmark for Interactive World Models with a Unified Action Generation Framework | Jianjie Fang et.al. | 2605.03941 | null |
| 2026-05-05 | RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models | Hao Wu et.al. | 2605.03821 | null |
| 2026-05-05 | What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity | Haoxi Li et.al. | 2605.03782 | null |
| 2026-05-05 | AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics | Tencent HY Team et.al. | 2605.03652 | null |
| 2026-05-05 | Learning to Theorize the World from Observation | Doojin Baek et.al. | 2605.03413 | null |
| 2026-05-04 | Existence, Asymptotic Behavior, and Numerical Analysis of a Generalized Abel Differential Equation with Applications in Financial Modeling | Dragos-Patru Covei et.al. | 2605.02831 | null |
| 2026-05-04 | DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation | Danil Tokhchukov et.al. | 2605.02759 | null |
| 2026-05-04 | Shadow-Loom: Causal Reasoning over Graphical World Model of Narratives | David Wilmot et.al. | 2605.02475 | null |
| 2026-05-04 | Video Generation with Predictive Latents | Yian Zhao et.al. | 2605.02134 | null |
| 2026-05-03 | TRAP: Tail-aware Ranking Attack for World-Model Planning | Siyuan Duan et.al. | 2605.01950 | null |
| 2026-05-03 | Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models | Junyuan Xiao et.al. | 2605.01896 | null |
| 2026-05-03 | Embody4D: A Generalist 4D World Model for Embodied AI | Peiyan Tu et.al. | 2605.01799 | null |
| 2026-05-03 | SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 25+ Sign Languages | Sen Fang et.al. | 2605.01720 | null |
| 2026-05-03 | Latent State Design for World Models under Sufficiency Constraints | Keon Woo Kim et.al. | 2605.01694 | null |
| 2026-05-03 | Video Active Perception: Effective Inference-Time Long-Form Video Understanding with Vision-Language Models | Martin Q. Ma et.al. | 2605.01662 | null |
| 2026-05-01 | Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling | Sen Cui et.al. | 2605.00412 | null |
| 2026-04-30 | World Model for Robot Learning: A Comprehensive Survey | Bohan Hou et.al. | 2605.00080 | null |
| 2026-04-30 | Being-H0.7: A Latent World-Action Model from Egocentric Videos | Hao Luo et.al. | 2605.00078 | null |
| 2026-04-30 | HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation | Xin Zhou et.al. | 2604.28196 | null |
| 2026-04-30 | LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models | Hao Chen et.al. | 2604.28192 | null |
| 2026-04-30 | Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling | Keming Wu et.al. | 2604.28185 | null |
| 2026-04-30 | Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces | Andrew Bond et.al. | 2604.28122 | null |
| 2026-04-30 | Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA | Feeza Khan Khanzada et.al. | 2604.27994 | null |
| 2026-04-30 | GUI Agents with Reinforcement Learning: Toward Digital Inhabitants | Junan Hu et.al. | 2604.27955 | null |
| 2026-04-30 | Flying by Inference: Active Inference World Models for Adaptive UAV Swarms | Kaleem Arshid et.al. | 2604.27935 | null |
| 2026-04-30 | Simulating clinical interventions with a generative multimodal model of human physiology | Guy Lutsker et.al. | 2604.27899 | null |
| 2026-04-30 | Graph World Models: Concepts, Taxonomy, and Future Directions | Jiawei Liu et.al. | 2604.27895 | null |
| 2026-04-30 | MotuBrain: An Advanced World Action Model for Robot Control | MotuBrain Team et.al. | 2604.27792 | null |
| 2026-04-29 | World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning | Wanyue Zhang et.al. | 2604.26934 | null |
| 2026-04-29 | STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation | Yuxuan Tian et.al. | 2604.26848 | null |
| 2026-04-29 | Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising | Jun Guo et.al. | 2604.26694 | null |
| 2026-04-29 | AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents | Mahnoor Shahid et.al. | 2604.26522 | null |
| 2026-04-29 | DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation | Junhu Fu et.al. | 2604.26232 | null |
| 2026-04-28 | Lifting Embodied World Models for Planning and Control | Alex N. Wang et.al. | 2604.26182 | null |
| 2026-04-28 | HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation | Bingzi Zhang et.al. | 2604.25361 | null |
| 2026-04-28 | ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution | Chuyao Fu et.al. | 2604.25329 | null |
| 2026-04-27 | Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales | Haozhi Han et.al. | 2604.24091 | null |
| 2026-04-26 | From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation | Jiafeng Wu et.al. | 2604.23629 | null |
| 2026-04-26 | Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling | Zhen Ye et.al. | 2604.23586 | null |
| 2026-04-26 | Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model | Jingni Huang et.al. | 2604.23532 | null |
| 2026-04-25 | Active Inference: A method for Phenotyping Agency in AI systems? | Philip Wilson et.al. | 2604.23278 | null |
| 2026-04-24 | Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systems | Jie Wu et.al. | 2604.22879 | null |
| 2026-04-24 | Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond | Meng Chu et.al. | 2604.22748 | null |
| 2026-04-24 | Beyond Patient Invariance: Learning Cardiac Dynamics via Action-Conditioned JEPAs | Jose Geraldo Fernandes et.al. | 2604.22618 | null |
| 2026-04-24 | Video Analysis and Generation via a Semantic Progress Function | Gal Metzer et.al. | 2604.22554 | null |
| 2026-04-24 | OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space | Zhuding Liang et.al. | 2604.22240 | null |
| 2026-04-24 | A Co-Evolutionary Theory of Human-AI Coexistence: Mutualism, Governance, and Dynamics in Complex Societies | Somyajit Chakraborty et.al. | 2604.22227 | null |
| 2026-04-24 | dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model | Yaxuan Li et.al. | 2604.22152 | null |
| 2026-04-23 | Causality and Semantic Separation | Anna Zhang et.al. | 2604.22041 | null |
| 2026-04-23 | Seeing Fast and Slow: Learning the Flow of Time in Videos | Yen-Siang Wu et.al. | 2604.21931 | null |
| 2026-04-23 | Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions | Jiseon Kim et.al. | 2604.21871 | null |
| 2026-04-23 | Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training | Yaxuan Li et.al. | 2604.21741 | null |
| 2026-04-22 | Building a Precise Video Language with Human-AI Oversight | Zhiqiu Lin et.al. | 2604.21718 | null |
| 2026-04-23 | WorldMark: A Unified Benchmark Suite for Interactive Video World Models | Xiaojie Xu et.al. | 2604.21686 | null |
| 2026-04-22 | Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction | Abhishek Dharmaratnakar et.al. | 2604.21154 | null |
| 2026-04-22 | Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics | Open-H-Embodiment Consortium et.al. | 2604.21017 | null |
| 2026-04-22 | DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation | Hyeonwoo Kim et.al. | 2604.20841 | null |
| 2026-04-22 | Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning | Aravind Venugopal et.al. | 2604.20627 | null |
| 2026-04-22 | CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs | Xingcheng Zhou et.al. | 2604.20460 | null |
| 2026-04-22 | X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference | Yixiao Zeng et.al. | 2604.20289 | null |
| 2026-04-22 | Cortex 2.0: Grounding World Models in Real-World Industrial Deployment | Adriana Aida et.al. | 2604.20246 | null |
| 2026-04-22 | Toward Safe Autonomous Robotic Endovascular Interventions using World Models | Harry Robertshaw et.al. | 2604.20151 | null |
| 2026-04-21 | ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration | Cagri Eryilmaz et.al. | 2604.19856 | null |
| 2026-04-21 | CityRAG: Stepping Into a City via Spatially-Grounded Video Generation | Gene Chou et.al. | 2604.19741 | null |
| 2026-04-21 | UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling | Boyu Chen et.al. | 2604.19734 | null |
| 2026-04-22 | Mask World Model: Predicting What Matters for Robust Robot Policy Learning | Yunfan Lou et.al. | 2604.19683 | null |
| 2026-04-21 | Safety-Critical Contextual Control via Online Riemannian Optimization with World Models | Tongxin Li et.al. | 2604.19639 | null |
| 2026-04-21 | LASER: Learning Active Sensing for Continuum Field Reconstruction | Huayu Deng et.al. | 2604.19355 | null |
| 2026-04-21 | RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation | Feng Jiang et.al. | 2604.19092 | null |
| 2026-04-20 | Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training | Vin Bhaskara et.al. | 2604.18701 | null |
| 2026-04-21 | MultiWorld: Scalable Multi-Agent Multi-View Video World Models | Haoyu Wu et.al. | 2604.18564 | null |
| 2026-04-20 | OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation | Jinghui Lu et.al. | 2604.18486 | null |
| 2026-04-20 | Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity | Blaise Delaney et.al. | 2604.18058 | null |
| 2026-04-20 | The Umwelt Representation Hypothesis: Rethinking Universality | Victoria Bosch et.al. | 2604.17960 | null |
| 2026-04-20 | Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer | Tianfu Wang et.al. | 2604.17883 | null |
| 2026-04-19 | Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception | Siyuan Meng et.al. | 2604.17651 | null |
| 2026-04-19 | Dual-Anchoring: Addressing State Drift in Vision-Language Navigation | Kangyi Wu et.al. | 2604.17473 | null |
| 2026-04-19 | Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation | Zhijiang Tang et.al. | 2604.17428 | null |
| 2026-04-19 | DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior | Junjia Huang et.al. | 2604.17195 | null |
| 2026-04-18 | TensorHub: Rethinking AI Model Hub with Tensor-Centric Compression | Tingfeng Lan et.al. | 2604.17104 | null |
| 2026-04-18 | LIVE: Leveraging Image Manipulation Priors for Instruction-based Video Editing | Weicheng Wang et.al. | 2604.17021 | null |
| 2026-04-18 | SafeDream: Safety World Model for Proactive Early Jailbreak Detection | Bo Yan et.al. | 2604.16824 | null |
| 2026-04-16 | POMDP-based Object Search with Growing State Space and Hybrid Action Domain | Yongbo Chen et.al. | 2604.14965 | null |
| 2026-04-16 | Learning Ad Hoc Network Dynamics via Graph-Structured World Models | Can Karacelebi et.al. | 2604.14811 | null |
| 2026-04-16 | World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems | Runze Li et.al. | 2604.14732 | null |
| 2026-04-15 | HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds | Team HY-World et.al. | 2604.14268 | null |
| 2026-04-15 | Seedance 2.0: Advancing Video Generation for World Complexity | Team Seedance et.al. | 2604.14148 | null |
| 2026-04-15 | Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective | Weijie Wang et.al. | 2604.14025 | null |
| 2026-04-15 | Beyond State Consistency: Behavior Consistency in Text-Based World Models | Youling Huang et.al. | 2604.13824 | null |
| 2026-04-15 | Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap | Hanxuan Chen et.al. | 2604.13654 | null |
| 2026-04-15 | DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer | Hengye Lyu et.al. | 2604.13509 | null |
| 2026-04-15 | VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning | Yifan Li et.al. | 2604.13425 | null |
| 2026-04-14 | Robotic Manipulation is Vision-to-Geometry Mapping ( $f(v) \rightarrow G$ ): Vision-Geometry Backbones over Language and Video Models | Zijian Song et.al. | 2604.12908 | null |
| 2026-04-14 | ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models | Xinliang Wang et.al. | 2604.12251 | null |
| 2026-04-13 | Grounded World Model for Semantically Generalizable Planning | Quanyi Li et.al. | 2604.11751 | null |
| 2026-04-13 | Dyadic Partnership(DP): A Missing Link Towards Full Autonomy in Medical Robotics | Nassir Navab et.al. | 2604.11423 | null |
| 2026-04-13 | ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation | Yiran Qin et.al. | 2604.11386 | null |
| 2026-04-13 | WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models | Anlan Yu et.al. | 2604.11351 | null |
| 2026-04-13 | 3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS | Bronislav Sidik et.al. | 2604.11302 | null |
| 2026-04-13 | AIM: Intent-Aware Unified world action Modeling with Spatial Value Maps | Liaoyuan Fan et.al. | 2604.11135 | null |
| 2026-04-13 | From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience | Jia Luo et.al. | 2604.11041 | null |
| 2026-04-13 | OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models | Xiaomeng Hu et.al. | 2604.10866 | null |
| 2026-04-12 | Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks | Weijiang Li et.al. | 2604.10690 | null |
| 2026-04-11 | Zero-shot World Models Are Developmentally Efficient Learners | Khai Loong Aw et.al. | 2604.10333 | null |
| 2026-04-11 | VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation | Longteng Jiang et.al. | 2604.10127 | null |
| 2026-04-10 | EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks | Lulin Liu et.al. | 2604.09535 | null |
| 2026-04-10 | Toward World Models for Epidemiology | Zeeshan Memon et.al. | 2604.09519 | null |
| 2026-04-10 | PhysInOne: Visual Physics Learning and Reasoning in One Suite | Siyuan Zhou et.al. | 2604.09415 | null |
| 2026-04-10 | VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis | Xiaolei Lang et.al. | 2604.09330 | null |
| 2026-04-10 | Learning Vision-Language-Action World Models for Autonomous Driving | Guoqing Wang et.al. | 2604.09059 | null |
| 2026-04-10 | Advantage-Guided Diffusion for Model-Based Reinforcement Learning | Daniele Foffano et.al. | 2604.09035 | null |
| 2026-04-10 | Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory | Zile Wang et.al. | 2604.08995 | null |
| 2026-04-10 | WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning | Mintae Kim et.al. | 2604.08958 | null |
| 2026-04-10 | Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication | Benjamin Amoh et.al. | 2604.08944 | null |
| 2026-04-09 | Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning | Mohamad H. Danesh et.al. | 2604.08780 | null |
| 2026-04-09 | Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics | Ying Shen et.al. | 2604.08503 | null |
| 2026-04-09 | Grounding Clinical AI Competency in Human Cognition Through the Clinical World Model and Skill-Mix Framework | Seyed Amir Ahmad Safavi-Naini et.al. | 2604.08226 | null |
| 2026-04-09 | Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation | Xiaoqian Qi et.al. | 2604.08199 | null |
| 2026-04-09 | ViVa: A Video-Generative Value Model for Robot Reinforcement Learning | Jindi Lv et.al. | 2604.08168 | null |
| 2026-04-09 | MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models | Zile Guo et.al. | 2604.07991 | null |
| 2026-04-09 | WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models | Hongjin Chen et.al. | 2604.07957 | null |
| 2026-04-09 | DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics | Hang Zhang et.al. | 2604.07758 | null |
| 2026-04-09 | CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics | Ziyi Ding et.al. | 2604.07712 | null |
| 2026-04-08 | Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations | Chao Tang et.al. | 2604.07517 | null |
| 2026-04-08 | GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control | Prakul Sunil Hiremath et.al. | 2604.07426 | null |
| 2026-04-08 | How Much LLM Does a Self-Revising Agent Actually Need? | Seongwoo Jeong et.al. | 2604.07236 | null |
| 2026-04-08 | PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing | Ruihang Xu et.al. | 2604.07230 | null |
| 2026-04-08 | INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling | InSpatio Team et.al. | 2604.07209 | null |
| 2026-04-08 | Radio-Frequency Inverse Rendering for Wireless Environment Modeling | Fuhai Wang et.al. | 2604.07086 | null |
| 2026-04-08 | Telecom World Models: Unifying Digital Twins, Foundation Models, and Predictive Planning for 6G | Hang Zou et.al. | 2604.06882 | null |
| 2026-04-08 | The Rhetoric of Machine Learning | Robert C. Williamson et.al. | 2604.06754 | null |
| 2026-04-08 | Controllable Generative Video Compression | Ding Ding et.al. | 2604.06655 | null |
| 2026-04-07 | Neural Computers | Mingchen Zhuge et.al. | 2604.06425 | null |
| 2026-04-07 | Evolution of Video Generative Foundations | Teng Hu et.al. | 2604.06339 | null |
| 2026-04-07 | Action Images: End-to-End Policy Learning via Multiview Video Generation | Haoyu Zhen et.al. | 2604.06168 | null |
| 2026-04-07 | Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement | Qimin Zhong et.al. | 2604.06155 | null |
| 2026-04-07 | SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation | Hiba Dahmani et.al. | 2604.06113 | null |
| 2026-04-06 | Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding | Chaoyou Fu et.al. | 2604.05015 | null |
| 2026-04-06 | StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing | StarVLA Community et.al. | 2604.05014 | null |
| 2026-04-06 | A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens | Tommie Kerssies et.al. | 2604.04913 | null |
| 2026-04-06 | Individual and Combined Effects of English as a Second Language and Typos on LLM Performance | Serena Liu et.al. | 2604.04723 | null |
| 2026-04-06 | OpenWorldLib: A Unified Codebase and Definition of Advanced World Models | DataFlow Team et.al. | 2604.04707 | null |
| 2026-04-06 | Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale | Zhengcen Li et.al. | 2604.04634 | null |
| 2026-04-06 | Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation? | Zhongru Zhang et.al. | 2604.04502 | null |
| 2026-04-06 | UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining | Pei Yang et.al. | 2604.04402 | null |
| 2026-04-05 | DriveVA: Video Action Models are Zero-Shot Drivers | Mengmeng Liu et.al. | 2604.04198 | null |
| 2026-04-05 | ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity | Hang Wang et.al. | 2604.04029 | null |
| 2026-04-04 | Rethinking Position Embedding as a Context Controller for Multi-Reference and Multi-Shot Video Generation | Binyuan Huang et.al. | 2604.03738 | null |
| 2026-04-04 | VidNum-1.4K: A Comprehensive Benchmark for Video-based Numerical Reasoning | Shaoyang Cui et.al. | 2604.03701 | null |
embodied AI
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-05-04 | Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis | Jiaqi Shi et.al. | 2605.02357 | null |
| 2026-05-03 | Embody4D: A Generalist 4D World Model for Embodied AI | Peiyan Tu et.al. | 2605.01799 | null |
| 2026-05-02 | ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue | Daoxuan Zhang et.al. | 2605.01371 | null |
| 2026-05-02 | VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU | Bin Xu et.al. | 2605.01352 | null |
| 2026-05-01 | Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey | Qianzhou Chen et.al. | 2605.00970 | null |
| 2026-05-01 | Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning | Chengshuai Shi et.al. | 2605.00347 | null |
| 2026-04-30 | World Model for Robot Learning: A Comprehensive Survey | Bohan Hou et.al. | 2605.00080 | null |
| 2026-04-30 | Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents | Chunhui Zhang et.al. | 2604.27699 | null |
| 2026-04-30 | Robot Learning from Human Videos: A Survey | Junyi Ma et.al. | 2604.27621 | null |
| 2026-04-30 | SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation | Pengna Li et.al. | 2604.27620 | null |
| 2026-04-30 | World2Minecraft: Occupancy-Driven Simulated Scenes Construction | Lechao Zhang et.al. | 2604.27578 | null |
| 2026-04-30 | SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation | Song Tang et.al. | 2604.27555 | null |
| 2026-04-30 | Context as Prior: Bayesian-Inspired Intent Inference for Non-Speaking Agents with a Household Cat Testbed | Wenqian Zhang et.al. | 2604.27445 | null |
| 2026-04-29 | 3D Generation for Embodied AI and Robotic Simulation: A Survey | Tianwei Ye et.al. | 2604.26509 | null |
| 2026-04-29 | Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding | Yufei Yin et.al. | 2604.26261 | null |
| 2026-04-28 | Lifting Embodied World Models for Planning and Control | Alex N. Wang et.al. | 2604.26182 | null |
| 2026-04-28 | GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning | Yufei Jia et.al. | 2604.25459 | null |
| 2026-04-28 | Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents | Jianming Chen et.al. | 2604.25161 | null |
| 2026-04-27 | Interoceptive machine framework: Toward interoception-inspired regulatory architectures in artificial intelligence | Diego Candia-Rivera et.al. | 2604.24527 | null |
| 2026-04-27 | AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents | Hojoon Kim et.al. | 2604.24039 | null |
| 2026-04-26 | From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation | Jiafeng Wu et.al. | 2604.23629 | null |
| 2026-04-26 | PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement | Tianyidan Xie et.al. | 2604.23580 | null |
| 2026-04-24 | AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI | Mohammad Sadegh Salehi et.al. | 2604.23018 | null |
| 2026-04-22 | EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving | Finn Rasmus Schäfer et.al. | 2604.22851 | null |
| 2026-04-27 | A Co-Evolutionary Theory of Human-AI Coexistence: Mutualism, Governance, and Dynamics in Complex Societies | Somyajit Chakraborty et.al. | 2604.22227 | null |
| 2026-04-23 | A Replicable Robotics Awareness Method Using LLM-Enabled Robotics Interaction: Evidence from a Corporate Challenge | S. A. Prieto et.al. | 2604.21377 | null |
| 2026-04-23 | ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures | Xiyin Zeng et.al. | 2604.21232 | null |
| 2026-04-23 | Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment | Jingkun Chen et.al. | 2604.21160 | null |
| 2026-04-22 | Planetary Exploration 3.0: A Roadmap for Software-Defined, Radically Adaptive Space Systems | Masahiro Ono et.al. | 2604.20910 | null |
| 2026-04-22 | LLM-Guided Safety Agent for Edge Robotics with an ISO-Compliant Perception-Compute-Control Architecture | Xu Huang et.al. | 2604.20193 | null |
| 2026-04-21 | Environmental Understanding Vision-Language Model for Embodied Agent | Jinsik Bang et.al. | 2604.19839 | null |
| 2026-04-21 | InHabit: Leveraging Image Foundation Models for Scalable 3D Human Placement | Nikita Kister et.al. | 2604.19673 | null |
| 2026-04-21 | SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models | Josue Torres-Fonseca et.al. | 2604.19638 | null |
| 2026-04-21 | RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation | Feng Jiang et.al. | 2604.19092 | null |
| 2026-04-21 | Explore Like Humans: Autonomous Exploration with Online SG-Memo Construction for Embodied Agents | Xu Chen et.al. | 2604.19034 | null |
| 2026-04-20 | Will People Enjoy a Robot Trainer? A Case Study with Snoopie the Pacerbot | Maximilian Du et.al. | 2604.18331 | null |
| 2026-04-20 | EmbodiedLGR: Integrating Lightweight Graph Representation and Retrieval for Semantic-Spatial Memory in Robotic Agents | Paolo Riva et.al. | 2604.18271 | null |
| 2026-04-20 | E3VS-Bench: A Benchmark for Viewpoint-Dependent Active Perception in 3D Gaussian Splatting Scenes | Koya Sakamoto et.al. | 2604.17969 | null |
| 2026-04-20 | StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement | Kerui Li et.al. | 2604.17887 | null |
| 2026-04-20 | OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL | Haoxiang Jie et.al. | 2604.17706 | null |
| 2026-04-19 | Seeing Isn’t Believing: Mitigating Belief Inertia via Active Intervention in Embodied Agents | Hanlin Wang et.al. | 2604.17252 | null |
| 2026-04-19 | GaLa: Hypergraph-Guided Visual Language Models for Procedural Planning | Kun Wang et.al. | 2604.17241 | null |
| 2026-04-18 | Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents | Sukai Huang et.al. | 2604.17019 | null |
| 2026-04-18 | Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification | Jiawen Wen et.al. | 2604.16993 | null |
| 2026-04-18 | Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction | Xianhao Wang et.al. | 2604.16886 | null |
| 2026-04-16 | GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology | Shivendra Agrawal et.al. | 2604.15495 | null |
| 2026-04-20 | ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints | Pei-An Chen et.al. | 2604.14902 | null |
| 2026-04-16 | World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems | Runze Li et.al. | 2604.14732 | null |
| 2026-04-16 | Model-Based Reinforcement Learning Exploits Passive Body Dynamics for High-Performance Biped Robot Locomotion | Tomoya Kamimura et.al. | 2604.14565 | null |
| 2026-04-15 | SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing | Aodi Wu et.al. | 2604.14399 | null |
| 2026-04-15 | [Emerging Ideas] Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-First Architecture for Physical AI | You Rim Choi et.al. | 2604.13959 | null |
| 2026-04-15 | EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development | Xueyang Zhou et.al. | 2604.13800 | null |
| 2026-04-15 | ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation | Jingjing Qian et.al. | 2604.13633 | null |
| 2026-04-16 | VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation | Yulu Gao et.al. | 2604.13596 | null |
| 2026-04-15 | AgentComm: Semantic Communication for Embodied Agents | Peiwen Jiang et.al. | 2604.13558 | null |
| 2026-04-15 | Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization | Jianzong Wang et.al. | 2604.13533 | null |
| 2026-04-14 | Exploration and Exploitation Errors Are Measurable for Language Model Agents | Jaden Park et.al. | 2604.13151 | null |
| 2026-04-14 | Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting | Ziyuan Xia et.al. | 2604.12626 | null |
| 2026-04-15 | Reading Between the Pixels: Linking Text-Image Embedding Alignment to Typographic Attack Success on Vision-Language Models | Ravikumar Balakrishnan et.al. | 2604.12371 | null |
| 2026-04-13 | Human-Inspired Context-Selective Multimodal Memory for Social Robots | Hangyeol Kang et.al. | 2604.12081 | null |
| 2026-04-13 | GeomPrompt: Geometric Prompt Learning for RGB-D Semantic Segmentation Under Missing and Degraded Depth | Krishna Jaganathan et.al. | 2604.11585 | null |
| 2026-04-13 | DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models | Siyuan Xu et.al. | 2604.11572 | null |
| 2026-04-13 | Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech | Edwin C. Montiel-Vazquez et.al. | 2604.11417 | null |
| 2026-04-13 | EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems | Xue Qin et.al. | 2604.11174 | null |
| 2026-04-13 | EgoFun3D: Modeling Interactive Objects from Egocentric Videos using Function Templates | Weikun Peng et.al. | 2604.11038 | null |
| 2026-04-13 | Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation | Xue Qin et.al. | 2604.11028 | null |
| 2026-04-14 | ArtiCAD: Articulated CAD Assembly Design via Multi-Agent Code Generation | Yuan Shui et.al. | 2604.10992 | null |
| 2026-04-13 | ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching | Xiaotian Qiu et.al. | 2604.10962 | null |
| 2026-04-12 | ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment | Mingyu Dong et.al. | 2604.10789 | null |
| 2026-04-12 | HOG-Layout: Hierarchical 3D Scene Generation, Optimization and Editing via Vision-Language Models | Haiyan Jiang et.al. | 2604.10772 | null |
| 2026-04-10 | PhysInOne: Visual Physics Learning and Reasoning in One Suite | Siyuan Zhou et.al. | 2604.09415 | null |
| 2026-04-10 | V-CAGE: Vision-Closed-Loop Agentic Generation Engine for Robotic Manipulation | Yaru Liu et.al. | 2604.09036 | null |
| 2026-04-10 | PilotBench: A Benchmark for General Aviation Agents with Safety Constraints | Yalun Wu et.al. | 2604.08987 | null |
| 2026-04-10 | AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly | Zhi Jing et.al. | 2604.08983 | null |
| 2026-04-09 | AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation | Yi-Hua Huang et.al. | 2604.08746 | null |
| 2026-04-09 | 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding | Makanjuola Ogunleye et.al. | 2604.08645 | null |
| 2026-04-09 | Visually-grounded Humanoid Agents | Hang Ye et.al. | 2604.08509 | null |
| 2026-04-10 | PolySLGen: Online Multimodal Speaking-Listening Reaction Generation in Polyadic Interaction | Zhi-Yi Lin et.al. | 2604.08125 | null |
| 2026-04-10 | Governed Capability Evolution for Embodied Agents: Safe Upgrade, Compatibility Checking, and Runtime Rollback for Embodied Capability Modules | Xue Qin et.al. | 2604.08059 | null |
| 2026-04-09 | DP-DeGauss: Dynamic Probabilistic Gaussian Decomposition for Egocentric 4D Scene Reconstruction | Tingxi Chen et.al. | 2604.07986 | null |
| 2026-04-09 | PanoSAM2: Lightweight Distortion- and Memory-aware Adaptions of SAM2 for 360 Video Object Segmentation | Dingwen Xiao et.al. | 2604.07901 | null |
| 2026-04-09 | Object-Attribute-Relation Model Driven Adaptive Hierarchical Transmission for Multimodal Semantic Communication | Chenxing Li et.al. | 2604.07859 | null |
| 2026-04-09 | Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution | Xue Qin et.al. | 2604.07833 | null |
| 2026-04-09 | Learning Without Losing Identity: Capability Evolution for Embodied Agents | Xue Qin et.al. | 2604.07799 | null |
| 2026-04-09 | DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics | Hang Zhang et.al. | 2604.07758 | null |
| 2026-04-08 | Spatio-Temporal Grounding of Large Language Models from Perception Streams | Jacob Anderson et.al. | 2604.07592 | null |
| 2026-04-08 | Infrastructure First: Enabling Embodied AI for Science in the Global South | Shaoshan Liu et.al. | 2604.06722 | null |
| 2026-04-07 | Hazard Management in Robot-Assisted Mammography Support | Ioannis Stefanakos et.al. | 2604.05749 | null |
| 2026-04-07 | Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation | Wuyang Luan et.al. | 2604.05673 | null |
| 2026-04-07 | Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming | Baoshun Tong et.al. | 2604.05595 | null |
| 2026-04-07 | CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment | Li Kang et.al. | 2604.05484 | null |
| 2026-04-06 | StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing | StarVLA Community et.al. | 2604.05014 | null |
| 2026-04-06 | InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement | Yude Zou et.al. | 2604.04843 | null |
| 2026-04-06 | Toward Self-Organizing Production Logistics in Circular Factories: A Multi-Agent Approach | Jan-Felix Klein et.al. | 2604.04753 | null |
| 2026-04-06 | ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration | Rongfeng Zhao et.al. | 2604.04664 | null |
| 2026-04-05 | Hypothesis Graph Refinement: Hypothesis-Driven Exploration with Cascade Error Correction for Embodied Navigation | Peixin Chen et.al. | 2604.04108 | null |
| 2026-04-04 | From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems | Mingyang Xie et.al. | 2604.03890 | null |
| 2026-04-03 | Learning Additively Compositional Latent Actions for Embodied AI | Hangxing Wei et.al. | 2604.03340 | null |
| 2026-04-03 | OMNI-PoseX: A Fast Vision Model for 6D Object Pose Estimation in Embodied Tasks | Michael Zhang et.al. | 2604.02759 | null |
| 2026-04-02 | Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation | Teng Liu et.al. | 2604.02391 | null |
| 2026-04-02 | Hi-LOAM: Hierarchical Implicit Neural Fields for LiDAR Odometry and Mapping | Zhiliu Yang et.al. | 2604.01720 | null |
| 2026-03-31 | Benchmarking Interaction, Beyond Policy: a Reproducible Benchmark for Collaborative Instance Object Navigation | Edoardo Zorzi et.al. | 2604.00265 | null |
image generation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-05-05 | Large Language Models are Universal Reasoners for Visual Generation | Sucheng Ren et.al. | 2605.04040 | null |
| 2026-05-05 | Flow Sampling: Learning to Sample from Unnormalized Densities via Denoising Conditional Processes | Aaron Havens et.al. | 2605.03984 | null |
| 2026-05-05 | DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models | Qichao Wang et.al. | 2605.03877 | null |
| 2026-05-05 | Phase-Corrected Near-Field Microwave Imaging via Inverse Source Reconstruction with Modulated Signals | Quanfeng Wang et.al. | 2605.03875 | null |
| 2026-05-05 | Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation | Bin Wu et.al. | 2605.03849 | null |
| 2026-05-05 | Towards accurate extreme event likelihoods from diffusion model climate emulators | Peter Manshausen et.al. | 2605.03802 | null |
| 2026-05-05 | GeoTopoDiff: Learning Geometry–Topology Graph Priors through Boundary-Constrained Mixed Diffusion for Sparse-Slice 3D Porous Reconstruction | Yue Shi et.al. | 2605.03764 | null |
| 2026-05-05 | Agent-Based Modeling of Low-Emission Fertilizer Adoption for Dairy Farm Decarbonisation using Empirical Farm Data | Surya Jayakumar et.al. | 2605.03648 | null |
| 2026-05-05 | Diffusion Masked Pretraining for Dynamic Point Cloud | Zhuoyue Zhang et.al. | 2605.03639 | null |
| 2026-05-05 | Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing | Zhiyuan Li et.al. | 2605.03637 | null |
| 2026-05-04 | Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion | Amirhosein Javadi et.al. | 2605.02849 | null |
| 2026-05-04 | TOC-SR: Task-Optimal Compact diffusion for Image Super Resolution | Sowmya Vajrala et.al. | 2605.02767 | null |
| 2026-05-04 | SIAM: Head and Brain MRI Segmentation from Few High-Quality Templates via Synthetic Training | Romain Valabregue et.al. | 2605.02737 | null |
| 2026-05-04 | Stylistic Attribute Control in Latent Diffusion Models | Max Reimann et.al. | 2605.02583 | null |
| 2026-05-04 | MooD: An Efficient VA-Driven Affective Image Editing Framework via Fine-Grained Semantic Control | Xinyi Yin et.al. | 2605.02521 | null |
| 2026-05-04 | Anomaly-Preference Image Generation | Fuyun Wang et.al. | 2605.02439 | null |
| 2026-05-04 | DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing | Desong Yang et.al. | 2605.02417 | null |
| 2026-05-04 | DriftDecode: One-Step Wireless Image Decoding via Drifting-Inspired Detail Recovery | Jingwen Fu et.al. | 2605.02325 | null |
| 2026-05-04 | Anon: Extrapolating Optimizer Adaptivity Across the Real Spectrum | Yiheng Zhang et.al. | 2605.02317 | null |
| 2026-05-04 | A Hybrid Approach for Closing the Sim2real Appearance Gap in Game Engine Synthetic Datasets | Stefanos Pasios et.al. | 2605.02291 | null |
| 2026-05-01 | Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift | Adam Arthur et.al. | 2605.00788 | null |
| 2026-05-01 | Reconstruction of glymphatic transport fields from subject-specific imaging data, with particular emphasis on cerebrospinal fluid flow and tracer conservation | A. Derya Bakiler et.al. | 2605.00730 | null |
| 2026-05-01 | PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning | Guandong Li et.al. | 2605.00707 | null |
| 2026-05-01 | STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack | Xutao Mao et.al. | 2605.00699 | null |
| 2026-05-01 | UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors | Houyuan Chen et.al. | 2605.00658 | null |
| 2026-05-01 | Faithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priors | Hao Wei et.al. | 2605.00605 | null |
| 2026-05-01 | Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation | Nadav Z. Cohen et.al. | 2605.00548 | null |
| 2026-05-01 | End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer | Wenda Chu et.al. | 2605.00503 | null |
| 2026-05-01 | Trees to Flows and Back: Unifying Decision Trees and Diffusion Models | Sai Niranjan Ramachandran et.al. | 2605.00414 | null |
| 2026-05-01 | Binomial flows: Denoising and flow matching for discrete ordinal data | Yair Shenfeld et.al. | 2605.00360 | null |
| 2026-04-30 | PhyCo: Learning Controllable Physical Priors for Generative Motion | Sriram Narayanan et.al. | 2604.28169 | null |
| 2026-04-29 | AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation | Xu Wang et.al. | 2604.28126 | null |
| 2026-04-30 | From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pokémon Case Study | Johannes Pfau et.al. | 2604.27972 | null |
| 2026-04-30 | Diffusion-OAMP for Joint Image Compression and Wireless Transmission | Wentao Hou et.al. | 2604.27952 | null |
| 2026-04-30 | Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection | Ali Shibli et.al. | 2604.27889 | null |
| 2026-04-30 | Machine Unlearning for Class Removal through SISA-based Deep Neural Network Architectures | Ishrak Hamim Mahi et.al. | 2604.27804 | null |
| 2026-04-30 | Leveraging Verifier-Based Reinforcement Learning in Image Editing | Hanzhong Guo et.al. | 2604.27505 | null |
| 2026-04-30 | Electrothermal Dynamics of Cold Front in Impure Tokamak Plasmas | S. Oshiro et.al. | 2604.27444 | null |
| 2026-04-30 | ABC: Any-Subset Autoregression via Non-Markovian Diffusion Bridges in Continuous Time and Space | Gabe Guo et.al. | 2604.27443 | null |
| 2026-04-30 | Sparse-View 3D Gaussian Splatting in the Wild | Wongi Park et.al. | 2604.27422 | null |
| 2026-04-29 | SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset | Changhyun Roh et.al. | 2604.26883 | null |
| 2026-04-29 | Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data | Bao Pham et.al. | 2604.26841 | null |
| 2026-04-29 | Conditional diffusion denoising probabilistic model for super-resolution of atmospheric boundary layer large eddy simulation | Omar Sallam et.al. | 2604.26776 | null |
| 2026-04-29 | Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising | Jun Guo et.al. | 2604.26694 | null |
| 2026-04-29 | Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models | Haosen Li et.al. | 2604.26503 | null |
| 2026-04-29 | Probabilistic data quality assessment for structural monitoring data via outlier-resistant conditional diffusion model | Qi Li et.al. | 2604.26366 | null |
| 2026-04-29 | Beyond Fixed Formulas: Data-Driven Linear Predictor for Efficient Diffusion Models | Zhirong Shen et.al. | 2604.26365 | null |
| 2026-04-29 | ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance | Yang Yang et.al. | 2604.26348 | null |
| 2026-04-29 | SpatialFusion: Endowing Unified Image Generation with Intrinsic 3D Geometric Awareness | Haiyi Qiu et.al. | 2604.26341 | null |
| 2026-04-28 | Charge diffusion and modulation transfer function in a Nancy Grace Roman Space Telescope detector | Emily Macbeth et.al. | 2604.26114 | null |
| 2026-04-28 | DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing | Hanqing Yang et.al. | 2604.25477 | null |
| 2026-04-28 | A Systematic Post-Train Framework for Video Generation | Zeyue Xue et.al. | 2604.25427 | null |
| 2026-04-28 | Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings | Luca Parolari et.al. | 2604.25358 | null |
| 2026-04-28 | Edge-Cloud Collaborative Reconstruction via Structure-Aware Latent Diffusion for Downstream Remote Sensing Perception | Yun Li et.al. | 2604.25319 | null |
| 2026-04-28 | Golden RPG: Confidence-Adaptive Region-Aware Noise for Compositional Text-to-Image Generation | Hao Li et.al. | 2604.25314 | null |
| 2026-04-28 | The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents | Yuwei Sun et.al. | 2604.25299 | null |
| 2026-04-28 | Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds | Liuzhuozheng Li et.al. | 2604.25289 | null |
| 2026-04-28 | ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent | Hanyi Wang et.al. | 2604.25128 | null |
| 2026-04-27 | Generative diffusion models for spatiotemporal influenza forecasting | Joseph Lemaitre et.al. | 2604.24913 | null |
| 2026-04-27 | VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations | Maitreya Patel et.al. | 2604.24885 | null |
| 2026-04-27 | DiffQEC: A versatile diffusion model for quantum error correction | Tianyi Xu et.al. | 2604.24640 | null |
| 2026-04-27 | Meta-CoT: Enhancing Granularity and Generalization in Image Editing | Shiyi Zhang et.al. | 2604.24625 | null |
| 2026-04-27 | Diffusion Model as a Generalist Segmentation Learner | Haoxiao Wang et.al. | 2604.24575 | null |
| 2026-04-27 | CA-IDD: Cross-Attention Guided Identity-Conditional Diffusion for Identity-Consistent Face Swapping | Md Shohel Rana et.al. | 2604.24493 | null |
| 2026-04-27 | Guiding Vector Field Generation via Score-based Diffusion Model | Zirui Chen et.al. | 2604.24487 | null |
| 2026-04-27 | TextGround4M: A Prompt-Aligned Dataset for Layout-Aware Text Rendering | Dongxing Mao et.al. | 2604.24459 | null |
| 2026-04-27 | Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion | Zhongjie Duan et.al. | 2604.24351 | null |
| 2026-04-27 | GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models | Yiming Zhang et.al. | 2604.24238 | null |
| 2026-04-27 | Seeing Is No Longer Believing: Frontier Image Generation Models, Synthetic Visual Evidence, and Real-World Risk | Shuai Wu et.al. | 2604.24197 | null |
| 2026-04-27 | Bridging Restoration and Generation Manifolds in One-Step Diffusion for Real-World Super-Resolution | Shyang-En Weng et.al. | 2604.24136 | null |
| 2026-04-24 | Statistical Analysis of Markovian Generative Modeling | Eddie Aamari et.al. | 2604.22712 | null |
| 2026-04-24 | Generative Modeling of Neurodegenerative Brain Anatomy with 4D Longitudinal Diffusion Model | Nivetha Jayakumar et.al. | 2604.22700 | null |
| 2026-04-24 | Structure-Guided Diffusion Model for EEG-Based Visual Cognition Reconstruction | Yongxiang Lian et.al. | 2604.22649 | null |
| 2026-04-24 | Efficient Diffusion Distillation via Embedding Loss | Jincheng Ying et.al. | 2604.22379 | null |
| 2026-04-24 | TabSCM: A practical Framework for Generating Realistic Tabular Data | Sven Jacob et.al. | 2604.22337 | null |
| 2026-04-24 | Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation | Ran Zhao et.al. | 2604.22302 | null |
| 2026-04-24 | Evaluation of image simulation open source solutions for simulation of synthetic images in lunar environment | Jai G Singla et.al. | 2604.22296 | null |
| 2026-04-24 | AI-Driven Performance-to-Design Generation and Optimization of Marine Propellers | Leah Chen et.al. | 2604.22224 | null |
| 2026-04-24 | Breaking Watermarks in the Frequency Domain: A Modulated Diffusion Attack Framework | Chunpeng Wang et.al. | 2604.22220 | null |
| 2026-04-24 | Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data | Harry Dong et.al. | 2604.22212 | null |
| 2026-04-23 | VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis | Songen Gu et.al. | 2604.21914 | null |
| 2026-04-23 | UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection | Yanran Zhang et.al. | 2604.21904 | null |
| 2026-04-23 | A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models | Max Defez et.al. | 2604.21903 | null |
| 2026-04-23 | Causality-Encoded Diffusion Models for Interventional Sampling and Edge Inference | Li Chen et.al. | 2604.21843 | null |
| 2026-04-23 | Quotient-Space Diffusion Models | Yixian Xu et.al. | 2604.21809 | null |
| 2026-04-23 | DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion | Tahar Chettaoui et.al. | 2604.21627 | null |
| 2026-04-23 | Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications | Shuangbo Xiong et.al. | 2604.21587 | null |
| 2026-04-23 | DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction | Shiyan Su et.al. | 2604.21518 | null |
| 2026-04-23 | VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution | Yixuan Zhu et.al. | 2604.21450 | null |
| 2026-04-23 | TopoStyle: Supporting Iterative Design with Generative AI for 2.5D Topology Optimization | Shuyue Feng et.al. | 2604.21315 | null |
| 2026-04-22 | ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control | Shelly Golan et.al. | 2604.20816 | null |
| 2026-04-22 | LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model | Inclusion AI et.al. | 2604.20796 | null |
| 2026-04-22 | Geometric Renyi Differential Privacy: Ricci Curvature Characterized by Heat Diffusion Mechanisms | Xiaotian Chang et.al. | 2604.20761 | null |
| 2026-04-22 | GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers | Yuxuan Xue et.al. | 2604.20715 | null |
| 2026-04-22 | Physics-Informed Conditional Diffusion for Motion-Robust Retinal Temporal Laser Speckle Contrast Imaging | Qian Chen et.al. | 2604.20594 | null |
| 2026-04-22 | Exploring Spatial Intelligence from a Generative Perspective | Muzhi Zhu et.al. | 2604.20570 | null |
| 2026-04-22 | Near-Field Wideband Channel Estimation for XL-MIMO Systems via Denoising Diffusion Model | Qingxia Feng et.al. | 2604.20494 | null |
| 2026-04-22 | Conditional Monte Carlo Tree Diffusion for Designing Cell-Type-Specific and Biologically Faithful Regulatory DNA | Animesh Awasthi et.al. | 2604.20488 | null |
| 2026-04-22 | Discrete Preference Learning for Personalized Multimodal Generation | Yuting Zhang et.al. | 2604.20434 | null |
| 2026-04-22 | Cold-Start Forecasting of New Product Life-Cycles via Conditional Diffusion Models | Ruihan Zhou et.al. | 2604.20370 | null |
| 2026-04-21 | Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items | Mengting Chen et.al. | 2604.19748 | null |
| 2026-04-21 | AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model | Yutian Chen et.al. | 2604.19747 | null |
| 2026-04-21 | Generative Drifting for Conditional Medical Image Generation | Zirong Li et.al. | 2604.19736 | null |
| 2026-04-21 | ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis | Zhengwentai Sun et.al. | 2604.19720 | null |
| 2026-04-21 | MedFlowSeg: Flow Matching for Medical Image Segmentation with Frequency-Aware Attention | Zhi Chen et.al. | 2604.19675 | null |
| 2026-04-21 | InHabit: Leveraging Image Foundation Models for Scalable 3D Human Placement | Nikita Kister et.al. | 2604.19673 | null |
| 2026-04-21 | Budgeted Online Influence Maximization | Pierre Perrault et.al. | 2604.19672 | null |
| 2026-04-21 | Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teaming | Alex Cuellar et.al. | 2604.19670 | null |
| 2026-04-21 | CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation | Xiangyang Luo et.al. | 2604.19636 | null |
| 2026-04-21 | SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing | Ying Zeng et.al. | 2604.19587 | null |
| 2026-04-20 | PlankFormer: Robust Plankton Instance Segmentation via MAE-Pretrained Vision Transformers and Pseudo Community Image Generation | Masaharu Miyazaki et.al. | 2604.17856 | null |
| 2026-04-20 | UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement | Jingwei Yang et.al. | 2604.17850 | null |
| 2026-04-20 | Efficient Diffusion Models under Nonconvex Equality and Inequality constraints via Landing | Kijung Jeon et.al. | 2604.17838 | null |
| 2026-04-20 | AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion | Hongjie Li et.al. | 2604.17818 | null |
| 2026-04-20 | Optimally Bridging Semantics and Data: Generative Semantic Communication via Schrödinger Bridge | Dahua Gao et.al. | 2604.17802 | null |
| 2026-04-20 | Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement | Hongxu Jiang et.al. | 2604.17773 | null |
| 2026-04-20 | Grokking of Diffusion Models: Case Study on Modular Addition | Joon Hyeok Kim et.al. | 2604.17673 | null |
| 2026-04-19 | ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes | Honglin Chen et.al. | 2604.17623 | null |
| 2026-04-19 | DGSSM: Diffusion guided state-space models for multimodal salient object detection | Suklav Ghosh et.al. | 2604.17585 | null |
| 2026-04-19 | Target Parameterization in Diffusion Models for Nonlinear Spatiotemporal System Identification | Achraf El Messaoudi et.al. | 2604.17566 | null |
| 2026-04-17 | Repurposing 3D Generative Model for Autoregressive Layout Generation | Haoran Feng et.al. | 2604.16299 | null |
| 2026-04-17 | Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan | Shivarth Rai et.al. | 2604.16284 | null |
| 2026-04-17 | Motion-Adapter: A Diffusion Model Adapter for Text-to-Motion Generation of Compound Actions | Yue Jiang et.al. | 2604.16135 | null |
| 2026-04-17 | Elucidating the SNR-t Bias of Diffusion Probabilistic Models | Meng Yu et.al. | 2604.16044 | null |
| 2026-04-17 | From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance | Jinhao Shen et.al. | 2604.15948 | null |
| 2026-04-17 | Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions | Bo Zhao et.al. | 2604.15917 | null |
| 2026-04-17 | Efficient Video Diffusion Models: Advancements and Challenges | Shitong Shao et.al. | 2604.15911 | null |
| 2026-04-17 | Beyond Text Prompts: Precise Concept Erasure through Text-Image Collaboration | Jun Li et.al. | 2604.15829 | null |
| 2026-04-17 | Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction | Jingyuan Li et.al. | 2604.15694 | null |
| 2026-04-17 | CLIMB: Controllable Longitudinal Brain Image Generation using Mamba-based Latent Diffusion Model and Gaussian-aligned Autoencoder | Duy-Phuong Dao et.al. | 2604.15611 | null |
| 2026-04-16 | TokenLight: Precise Lighting Control in Images using Attribute Tokens | Sumit Chaturvedi et.al. | 2604.15310 | null |
| 2026-04-16 | An Analysis of Regularization and Fokker-Planck Residuals in Diffusion Models for Image Generation | Onno Niemann et.al. | 2604.15171 | null |
| 2026-04-16 | Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching | Aihua Li et.al. | 2604.15009 | null |
| 2026-04-16 | Diffusion Crossover: Defining Evolutionary Recombination in Diffusion Models via Noise Sequence Interpolation | Chisatao Kumada et.al. | 2604.14790 | null |
| 2026-04-16 | Constraint-based Pre-training: From Structured Constraints to Scalable Model Initialization | Fu Feng et.al. | 2604.14769 | null |
| 2026-04-16 | SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces | Rongchao Xu et.al. | 2604.14705 | null |
| 2026-04-16 | Mean Flow Policy Optimization | Xiaoyi Dong et.al. | 2604.14698 | null |
| 2026-04-16 | Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting | Inseok Jeon et.al. | 2604.14648 | null |
| 2026-04-16 | Uncertainty-aware Generative Learning Path Recommendation with Cognition-Adaptive Diffusion | Xiangrui Xiong et.al. | 2604.14613 | null |
| 2026-04-16 | Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models | Amir El-Ghoussani et.al. | 2604.14591 | null |
| 2026-04-15 | Diffusion Language Models for Speech Recognition | Davyd Naveriani et.al. | 2604.14001 | null |
| 2026-04-15 | Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation | Zoe De Simone et.al. | 2604.13956 | null |
| 2026-04-15 | ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding | Tianze Xia et.al. | 2604.13938 | null |
| 2026-04-15 | Three-dimensional photon transport in spinodal photocatalytic aerogels: how bicontinuous morphology controls kinetic rate constants | Renaud A. L. Vallée et.al. | 2604.13929 | null |
| 2026-04-15 | Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model | Shuyun Wang et.al. | 2604.13906 | null |
| 2026-04-15 | PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios | Zebei Tong et.al. | 2604.13863 | null |
| 2026-04-15 | DiffMagicFace: Identity Consistent Facial Editing of Real Videos | Huanghao Yin et.al. | 2604.13841 | null |
| 2026-04-15 | EMGFlow: Robust and Efficient Surface Electromyography Synthesis via Flow Matching | Boxuan Jiang et.al. | 2604.13685 | null |
| 2026-04-15 | Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation | Elton Cao et.al. | 2604.13549 | null |
| 2026-04-15 | LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design | Chaoran Zhang et.al. | 2604.13520 | null |
| 2026-04-14 | Generative Refinement Networks for Visual Synthesis | Jian Han et.al. | 2604.13030 | null |
| 2026-04-14 | Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data | Farbod Alinezhad et.al. | 2604.12992 | null |
| 2026-04-14 | Turbulent pair dispersion with Stochastic Generative Diffusion Models | Andrei Pantea et.al. | 2604.12932 | null |
| 2026-04-14 | Transformer Based Machine Fault Detection From Audio Input | Kiran Voderhobli Holla et.al. | 2604.12733 | null |
| 2026-04-14 | OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner | Haoyang Jiang et.al. | 2604.12668 | null |
| 2026-04-14 | SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models | You Qin et.al. | 2604.12617 | null |
| 2026-04-14 | StructDiff: A Structure-Preserving and Spatially Controllable Diffusion Model for Single-Image Generation | Yinxi He et.al. | 2604.12575 | null |
| 2026-04-14 | T2I-BiasBench: A Multi-Metric Framework for Auditing Demographic and Cultural Bias in Text-to-Image Models | Nihal Jaiswal et.al. | 2604.12481 | null |
| 2026-04-14 | Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling | Zida Li et.al. | 2604.12446 | null |
| 2026-04-14 | Bridging the Micro–Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization | Xiaojie Liang et.al. | 2604.12341 | null |
| 2026-04-13 | Diffusing diffusivity model with dichotomous noise | Dongho Lee et.al. | 2604.11800 | null |
| 2026-04-13 | LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling | Yuxin Chen et.al. | 2604.11748 | null |
| 2026-04-13 | On the Robustness of Watermarking for Autoregressive Image Generation | Andreas Müller et.al. | 2604.11720 | null |
| 2026-04-13 | Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction | Efstathios Karypidis et.al. | 2604.11707 | null |
| 2026-04-13 | Dual-Control Frequency-Aware Diffusion Model for Depth-Dependent Optical Microrobot Microscopy Image Generation | Lan Wei et.al. | 2604.11680 | null |
| 2026-04-13 | RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time | Haozhe Wang et.al. | 2604.11626 | null |
| 2026-04-13 | Progressively Texture-Aware Diffusion for Contrast-Enhanced Sparse-View CT | Tianqi Wang et.al. | 2604.11559 | null |
| 2026-04-13 | Continuous Adversarial Flow Models | Shanchuan Lin et.al. | 2604.11521 | null |
| 2026-04-13 | Anthropogenic Regional Adaptation in Multimodal Vision-Language Model | Samuel Cahyawijaya et.al. | 2604.11490 | null |
| 2026-04-13 | Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-Resolution | Yang Ji et.al. | 2604.11470 | null |
| 2026-04-13 | One Scale at a Time: Scale-Autoregressive Modeling for Fluid Flow Distributions | Mario Lino et.al. | 2604.11403 | null |
| 2026-04-13 | DiLO: Decoupling Generative Priors and Neural Operators via Diffusion Latent Optimization for Inverse Problems | Haibo Liu et.al. | 2604.11375 | null |
| 2026-04-13 | Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale | Dongxu Wei et.al. | 2604.11331 | null |
| 2026-04-13 | Learning Discrete Diffusion of Graphs via Free-Energy Gradient Flows | Dario Rancati et.al. | 2604.11311 | null |
| 2026-04-13 | Structured State-Space Regularization for Compact and Generation-Friendly Image Tokenization | Jinsung Lee et.al. | 2604.11089 | null |
| 2026-04-13 | LaDA-Band: Language Diffusion Models for Vocal-to-Accompaniment Generation | Qi Wang et.al. | 2604.11052 | null |
| 2026-04-10 | Envisioning the Future, One Step at a Time | Stefan Andreas Baumann et.al. | 2604.09527 | null |
| 2026-04-10 | Gardening on the Moon: An Advection-Diffusion Model to Guide the Search for Supernova Debris in the Lunar Regolith | Emily S. Costello et.al. | 2604.09524 | null |
| 2026-04-10 | SCoRe: Clean Image Generation from Diffusion Models Trained on Noisy Images | Yuta Matsuzaki et.al. | 2604.09436 | null |
| 2026-04-10 | Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories | Wonbong Jang et.al. | 2604.09429 | null |
| 2026-04-10 | EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure | Junyeong Ahn et.al. | 2604.09405 | null |
| 2026-04-10 | Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing | Zhuohan Ouyang et.al. | 2604.09386 | null |
| 2026-04-10 | Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation | Huiang He et.al. | 2604.09231 | null |
| 2026-04-10 | Training-free, Perceptually Consistent Low-Resolution Previews with High-Resolution Image for Efficient Workflows of Diffusion Models | Wongi Jeong et.al. | 2604.09227 | null |
| 2026-04-10 | SHIFT: Steering Hidden Intermediates in Flow Transformers | Nina Konovalova et.al. | 2604.09213 | null |
| 2026-04-10 | CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation | Haoyu Zhao et.al. | 2604.09201 | null |
| 2026-04-09 | When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models | Zhengyang Sun et.al. | 2604.08546 | null |
| 2026-04-09 | RewardFlow: Generate Images by Optimizing What You Reward | Onkar Susladkar et.al. | 2604.08536 | null |
| 2026-04-09 | Novel View Synthesis as Video Completion | Qi Wu et.al. | 2604.08500 | null |
| 2026-04-09 | LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation | Jingjing Wang et.al. | 2604.08475 | null |
| 2026-04-09 | Bias-Constrained Diffusion Schedules for PDE Emulations: Reconstruction Error Minimization and Efficient Unrolled Training | Constantin Le Cleï et.al. | 2604.08357 | null |
| 2026-04-09 | Controlling the rain fall statistics using Mean-Reverting Jump Diffusion model | Joya GhoshDastider et.al. | 2604.08338 | null |
| 2026-04-09 | DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning | Eren Çetin et.al. | 2604.08329 | null |
| 2026-04-09 | HistDiT: A Structure-Aware Latent Conditional Diffusion Model for High-Fidelity Virtual Staining in Histopathology | Aasim Bin Saleem et.al. | 2604.08305 | null |
| 2026-04-09 | GroundingAnomaly: Spatially-Grounded Diffusion for Few-Shot Anomaly Synthesis | Yishen Liu et.al. | 2604.08301 | null |
| 2026-04-09 | EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization | Xiangyuan Wang et.al. | 2604.08213 | null |
| 2026-04-08 | Distilling Photon-Counting CT into Routine Chest CT through Clinically Validated Degradation Modeling | Junqi Liu et.al. | 2604.07329 | null |
| 2026-04-08 | GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos | Yiqian Wu et.al. | 2604.07273 | null |
| 2026-04-08 | PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing | Ruihang Xu et.al. | 2604.07230 | null |
| 2026-04-08 | VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis | Jian Yu et.al. | 2604.07210 | null |
| 2026-04-08 | SurFITR: A Dataset for Surveillance Image Forgery Detection and Localisation | Qizhou Wang et.al. | 2604.07101 | null |
| 2026-04-08 | Granular mixing and flow dynamics in horizontal stirred bed reactors | Sahar Pourandi et.al. | 2604.07082 | null |
| 2026-04-08 | Not all tokens contribute equally to diffusion learning | Guoqing Zhang et.al. | 2604.07026 | null |
| 2026-04-08 | MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation | Xiaoxiao Ma et.al. | 2604.06966 | null |
| 2026-04-08 | FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling | Yitong Li et.al. | 2604.06916 | null |
| 2026-04-08 | RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details | Dewei Zhou et.al. | 2604.06870 | null |
| 2026-04-08 | FVD: Inference-Time Alignment of Diffusion Models via Fleming-Viot Resampling | Shivanshu Shekhar et.al. | 2604.06779 | null |
| 2026-04-08 | FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching | Junchao Yi et.al. | 2604.06757 | null |
| 2026-04-07 | DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models | Zhengming Yu et.al. | 2604.06161 | null |
| 2026-04-07 | Learning-Guided Force-Feedback Model Predictive Control with Obstacle Avoidance for Robotic Deburring | Krzysztof Wojciechowski et.al. | 2604.06133 | null |
| 2026-04-07 | PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer | David Picard et.al. | 2604.06129 | null |
| 2026-04-07 | SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation | Hiba Dahmani et.al. | 2604.06113 | null |
| 2026-04-07 | Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors | Junbin Zhang et.al. | 2604.06074 | null |
| 2026-04-07 | Beyond Black-Scholes: A Computational Framework for Option Pricing Using Heston, GARCH, and Jump Diffusion Models | Karmanpartap Singh Sidhu et.al. | 2604.06068 | null |
| 2026-04-07 | Lipschitz regularity in Flow Matching and Diffusion Models: sharp sampling rates and functional inequalities | Arthur Stéphanovitch et.al. | 2604.06065 | null |
| 2026-04-07 | HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation | Tao Hu et.al. | 2604.05961 | null |
| 2026-04-07 | Leveraging Image Editing Foundation Models for Data-Efficient CT Metal Artifact Reduction | Ahmet Rasim Emirdagi et.al. | 2604.05934 | null |
| 2026-04-07 | Improving Controllable Generation: Faster Training and Better Performance via $x_0$ -Supervision | Amadou S. Sangare et.al. | 2604.05761 | null |
| 2026-04-06 | Your Pre-trained Diffusion Model Secretly Knows Restoration | Sudarshan Rajagopalan et.al. | 2604.04924 | null |
| 2026-04-06 | Diffusion of PeV Cosmic Rays in the Turbulent and Multiphase Interstellar Medium | Yue Hu et.al. | 2604.04814 | null |
| 2026-04-06 | Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning | Lei Zhang et.al. | 2604.04746 | null |
| 2026-04-06 | ZeD-MAP: Bundle Adjustment Guided Zero-Shot Depth Maps for Real-Time Aerial Imaging | Selim Ahmet Iz et.al. | 2604.04667 | null |
| 2026-04-06 | Training-Free Refinement of Flow Matching with Divergence-based Sampling | Yeonwoo Cha et.al. | 2604.04646 | null |
| 2026-04-06 | Beyond Semantics: Uncovering the Physics of Fakes via Universal Physical Descriptors for Cross-Modal Synthetic Detection | Mei Qiu et.al. | 2604.04608 | null |
| 2026-04-06 | PR-IQA: Partial-Reference Image Quality Assessment for Diffusion-Based Novel View Synthesis | Inseong Choi et.al. | 2604.04576 | null |
| 2026-04-06 | Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models | Arian Komaei Koma et.al. | 2604.04575 | null |
| 2026-04-06 | Training-Free Image Editing with Visual Context Integration and Concept Alignment | Rui Song et.al. | 2604.04487 | null |
| 2026-04-06 | Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse | Hao Liu et.al. | 2604.04451 | null |
LLM training
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-05-05 | Audio-Visual Intelligence in Large Foundation Models | You Qin et.al. | 2605.04045 | null |
| 2026-05-05 | Stayin’ Aligned Over Time: Towards Longitudinal Human-LLM Alignment via Contextual Reflection and Privacy-Preserving Behavioral Data | Simret Araya Gebreegziabher et.al. | 2605.04029 | null |
| 2026-05-05 | On Adaptivity in Zeroth-Order Optimization | Hassan Dbouk et.al. | 2605.03869 | null |
| 2026-05-05 | Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF | Mullosharaf K. Arabov et.al. | 2605.03799 | null |
| 2026-05-05 | AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics | Tencent HY Team et.al. | 2605.03652 | null |
| 2026-05-05 | Revisiting Graph-Tokenizing Large Language Models: A Systematic Evaluation of Graph Token Understanding | Zhongjian Zhang et.al. | 2605.03514 | null |
| 2026-05-04 | Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability | Yash Aggarwal et.al. | 2605.03217 | null |
| 2026-05-04 | Enwar 3.0: An Agentic Multi-Modal LLM Orchestrator for Situation-Aware Beamforming, Blockage Prediction, and Handover Management | Ahmad M. Nazar et.al. | 2605.03215 | null |
| 2026-05-04 | Geometric Deviation as an Unsupervised Pre-Generation Reliability Signal: Probing LLM Representations for Answerability | Yucheng Du et.al. | 2605.03196 | null |
| 2026-05-04 | Bolek: A Multimodal Language Model for Molecular Reasoning | Frederic Grabowski et.al. | 2605.02745 | null |
| 2026-05-04 | Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models | Inoussa Mouiche et.al. | 2605.02626 | null |
| 2026-05-04 | Efficient Preference Poisoning Attack on Offline RLHF | Chenye Yang et.al. | 2605.02495 | null |
| 2026-05-04 | Anomaly-Preference Image Generation | Fuyun Wang et.al. | 2605.02439 | null |
| 2026-05-04 | Reliability-Oriented Multilingual Orthopedic Diagnosis: A Domain-Adaptive Modeling and a Conceptual Validation Framework | Danish Ali et.al. | 2605.02266 | null |
| 2026-05-03 | Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models | Nikolaos Giarelis et.al. | 2605.01870 | null |
| 2026-05-03 | RMGAP: Benchmarking the Generalization of Reward Models across Diverse Preferences | Yangyang Zhou et.al. | 2605.01831 | null |
| 2026-05-02 | LLM Output Detectability and Task Performance Can be Jointly Optimized | Koshiro Saito et.al. | 2605.01350 | null |
| 2026-05-02 | Addressing Data Scarcity in Bangla Fake News Detection: An LLM-Based Dataset Augmentation Approach | Ahmed Alfey Sani et.al. | 2605.01292 | null |
| 2026-05-02 | GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models | Zhiwen Ruan et.al. | 2605.01256 | null |
| 2026-05-01 | Let ViT Speak: Generative Language-Image Pre-training | Yan Fang et.al. | 2605.00809 | null |
| 2026-05-01 | AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments | Zhijie Cai et.al. | 2605.00650 | null |
| 2026-05-01 | H-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversations | Passant Elchafei et.al. | 2605.00631 | null |
| 2026-05-01 | DynamicPO: Dynamic Preference Optimization for Recommendation | Xingyu Hu et.al. | 2605.00327 | null |
| 2026-05-01 | Online Self-Calibration Against Hallucination in Vision-Language Models | Minghui Chen et.al. | 2605.00323 | null |
| 2026-04-30 | Attention Is Where You Attack | Aviral Srivastava et.al. | 2605.00236 | null |
| 2026-04-30 | TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization | Abdulhady Abas Abdullah et.al. | 2605.00224 | null |
| 2026-04-30 | Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback | Yikai Wang et.al. | 2605.00155 | null |
| 2026-04-30 | ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts | Nhung Thi-Hong Duong et.al. | 2605.00116 | null |
| 2026-04-30 | FiLMMeD: Feature-wise Linear Modulation for Cross-Problem Multi-Depot Vehicle Routing | Arthur Corrêa et.al. | 2604.28102 | null |
| 2026-04-30 | Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care | Prabhjot Singh et.al. | 2604.28010 | null |
| 2026-04-30 | ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training | Wenxiang Lin et.al. | 2604.27844 | null |
| 2026-04-30 | Mind the Gap: Structure-Aware Consistency in Preference Learning | Mehryar Mohri et.al. | 2604.27733 | null |
| 2026-04-30 | Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments | Emilia Milano et.al. | 2604.27661 | null |
| 2026-04-30 | HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs | Chang-Chih Meng et.al. | 2604.27643 | null |
| 2026-04-30 | SecGoal: A Benchmark for Security Goal Extraction and Formalization from Protocol Documents | Dawei Huang et.al. | 2604.27601 | null |
| 2026-04-30 | Leveraging Verifier-Based Reinforcement Learning in Image Editing | Hanzhong Guo et.al. | 2604.27505 | null |
| 2026-04-30 | Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors | Zi Li et.al. | 2604.27426 | null |
| 2026-04-29 | Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation | Jon-Paul Cacioli et.al. | 2604.27249 | null |
| 2026-04-29 | Zero-Shot to Full-Resource: Cross-lingual Transfer Strategies for Aspect-Based Sentiment Analysis | Jakob Fehle et.al. | 2604.26619 | null |
| 2026-04-29 | Translating Under Pressure: Domain-Aware LLMs for Crisis Communication | Antonio Castaldo et.al. | 2604.26597 | null |
| 2026-04-29 | SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning | Yimeng Shan et.al. | 2604.26388 | null |
| 2026-04-28 | Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas | Nayoung Choi et.al. | 2604.26120 | null |
| 2026-04-28 | When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient | Shuning Shang et.al. | 2604.25872 | null |
| 2026-04-28 | From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling | Jianghao Lin et.al. | 2604.25847 | null |
| 2026-04-28 | Step-Audio-R1.5 Technical Report | Yuxin Zhang et.al. | 2604.25719 | null |
| 2026-04-28 | Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation | Mehrdad Ghassabi et.al. | 2604.25702 | null |
| 2026-04-28 | Health System Scale Semantic Search Across Unstructured Clinical Notes | Faith Wavinya Mutinda et.al. | 2604.25605 | null |
| 2026-04-28 | A Systematic Post-Train Framework for Video Generation | Zeyue Xue et.al. | 2604.25427 | null |
| 2026-04-28 | FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices | Changyu Li et.al. | 2604.25421 | null |
| 2026-04-28 | Below-Chance Blindness: Prompted Underperformance in Small LLMs Produces Positional Bias Rather than Answer Avoidance | Jon-Paul Cacioli et.al. | 2604.25249 | null |
| 2026-04-28 | Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment | James Pustejovsky et.al. | 2604.25136 | null |
| 2026-04-28 | What Makes Good Instruction-Tuning Data? An In-Context Learning Perspective | Guangzeng Han et.al. | 2604.25132 | null |
| 2026-04-27 | A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations | Zihan Liu et.al. | 2604.24468 | null |
| 2026-04-27 | A Multi-Dimensional Audit of Politically Aligned Large Language Models | Lisa Korver et.al. | 2604.24429 | null |
| 2026-04-27 | Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment | Wenzhe Xu et.al. | 2604.24178 | null |
| 2026-04-27 | TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training | Man Liu et.al. | 2604.24088 | null |
| 2026-04-27 | Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B | Jon-Paul Cacioli et.al. | 2604.24070 | null |
| 2026-04-27 | Disagreement as Signals: Dual-view Calibration for Sequential Recommendation Denoising | Sijia Li et.al. | 2604.24048 | null |
| 2026-04-27 | FlashOverlap: Minimizing Tail Latency in Communication Overlap for Distributed LLM Training | Rezaul Karim et.al. | 2604.24013 | null |
| 2026-04-27 | Hindsight Preference Optimization for Financial Time Series Advisory | Yanwei Cui et.al. | 2604.23988 | null |
| 2026-04-27 | Continual Calibration: Coverage Can Collapse Before Accuracy in Lifelong LLM Fine-Tuning | Ibne Farabi Shihab et.al. | 2604.23987 | null |
| 2026-04-27 | MatchRDMA: A Segmented and Rate-Matched Long-Haul RDMA Scheme for Geo-distributed LLM Training over OTN | Jun Dai et.al. | 2604.23932 | null |
| 2026-04-24 | CAGE-SGG: Counterfactual Active Graph Evidence for Open-Vocabulary Scene Graph Generation | Suiyang Guang et.al. | 2604.22274 | null |
| 2026-04-24 | TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis | Xi Wang et.al. | 2604.22225 | null |
| 2026-04-24 | Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen | Jon-Paul Cacioli et.al. | 2604.22215 | null |
| 2026-04-23 | PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training | Harsh Kumar et.al. | 2604.22117 | null |
| 2026-04-23 | When Cow Urine Cures Constipation on YouTube: Limits of LLMs in Detecting Culture-specific Health Misinformation | Anamta Khan et.al. | 2604.22002 | null |
| 2026-04-23 | When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs | Pegah Khayatan et.al. | 2604.21911 | null |
| 2026-04-23 | Why are all LLMs Obsessed with Japanese Culture? On the Hidden Cultural and Regional Biases of LLMs | Joseba Fernandez de Landa et.al. | 2604.21751 | null |
| 2026-04-23 | Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation | Nikita Severin et.al. | 2604.21536 | null |
| 2026-04-23 | Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning | Hanjun Cho et.al. | 2604.21495 | null |
| 2026-04-23 | Reasoning Primitives in Hybrid and Non-Hybrid LLMs | Shivam Rawat et.al. | 2604.21454 | null |
| 2026-04-23 | CAP: Controllable Alignment Prompting for Unlearning in LLMs | Zhaokun Wang et.al. | 2604.21251 | null |
| 2026-04-23 | Reasoning About Traversability: Language-Guided Off-Road 3D Trajectory Planning | Byounggun Park et.al. | 2604.21249 | null |
| 2026-04-23 | Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model | Runheng Liu et.al. | 2604.21223 | null |
| 2026-04-23 | On Reasoning Behind Next Occupation Recommendation | Shan Dong et.al. | 2604.21204 | null |
| 2026-04-22 | TabSHAP | Aryan Chaudhary et.al. | 2604.21120 | null |
| 2026-04-22 | MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment | Andor Vári-Kakas et.al. | 2604.20685 | null |
| 2026-04-22 | The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality | Umberto Domanti et.al. | 2604.20569 | null |
| 2026-04-22 | Where Reasoning Breaks: Logic-Aware Path Selection by Controlling Logical Connectives in LLMs Reasoning Chains | Seunghyun Park et.al. | 2604.20564 | null |
| 2026-04-22 | Evian: Towards Explainable Visual Instruction-tuning Data Auditing | Zimu Jia et.al. | 2604.20544 | null |
| 2026-04-22 | Surrogate modeling for interpreting black-box LLMs in medical predictions | Changho Han et.al. | 2604.20331 | null |
| 2026-04-22 | Image Generators are Generalist Vision Learners | Valentin Gabeur et.al. | 2604.20329 | null |
| 2026-04-22 | LLM-guided phase diagram construction through high-throughput experimentation | Ryo Tamura et.al. | 2604.20304 | null |
| 2026-04-22 | HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs | Darsh Kachroo et.al. | 2604.20140 | null |
| 2026-04-21 | Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text | Chengyu Huang et.al. | 2604.20051 | null |
| 2026-04-21 | Super Apriel: One Checkpoint, Many Speeds | SLAM Labs et.al. | 2604.19877 | null |
| 2026-04-21 | Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation | Nurkhan Laiyk et.al. | 2604.19678 | null |
| 2026-04-21 | HP-Edit: A Human-Preference Post-Training Framework for Image Editing | Fan Li et.al. | 2604.19406 | null |
| 2026-04-21 | Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs | Guy Mor-Lan et.al. | 2604.19292 | null |
| 2026-04-21 | HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing | Euntae Kim et.al. | 2604.19274 | null |
| 2026-04-21 | UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training | Size Zheng et.al. | 2604.19241 | null |
| 2026-04-21 | The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models | Shuai Wu et.al. | 2604.19139 | null |
| 2026-04-21 | SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning | Rania Elbadry et.al. | 2604.19098 | null |
| 2026-04-21 | STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation | Shuyuan Zhao et.al. | 2604.19042 | null |
| 2026-04-21 | Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback | Qiang Liu et.al. | 2604.19024 | null |
| 2026-04-21 | Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control | Julian Skifstad et.al. | 2604.19018 | null |
| 2026-04-20 | JudgeMeNot: Personalizing Large Language Models to Emulate Judicial Reasoning in Hebrew | Itay Razumenko et.al. | 2604.18041 | null |
| 2026-04-20 | Architecture Matters More Than Scale: A Comparative Study of Retrieval and Memory Augmentation for Financial QA Under SME Compute Constraints | Jianan Liu et.al. | 2604.17979 | null |
| 2026-04-20 | Efficient Federated RLHF via Zeroth-Order Policy Optimization | Deyi Wang et.al. | 2604.17747 | null |
| 2026-04-19 | PBSBench: A Multi-Level Vision-Language Framework and Benchmark for Hematopathology Whole Slide Image Interpretation | Yuanlong Wang et.al. | 2604.17570 | null |
| 2026-04-19 | PoliLegalLM: A Technical Report on a Large Language Model for Political and Legal Affairs | Yuting Huang et.al. | 2604.17543 | null |
| 2026-04-19 | E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition | Meng Zhang et.al. | 2604.17319 | null |
| 2026-04-19 | Cat-DPO: Category-Adaptive Safety Alignment | Tiankai Yang et.al. | 2604.17299 | null |
| 2026-04-19 | HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads | Juyuan Wang et.al. | 2604.17237 | null |
| 2026-04-19 | Guardrails in Logit Space: Safety Token Regularization for LLM Alignment | Thong Bach et.al. | 2604.17210 | null |
| 2026-04-18 | Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification | Kimia Hamidieh et.al. | 2604.17112 | null |
| 2026-04-17 | Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation | Yide Ran et.al. | 2604.16197 | null |
| 2026-04-17 | CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization | Junyi Li et.al. | 2604.15847 | null |
| 2026-04-17 | Into the Gray Zone: Domain Contexts Can Blur LLM Safety Boundaries | Ki Sen Hung et.al. | 2604.15717 | null |
| 2026-04-17 | Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning | Xiaoyu Yang et.al. | 2604.15705 | null |
| 2026-04-17 | C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment | Pufan Zeng et.al. | 2604.15675 | null |
| 2026-04-17 | GroupDPO: Memory efficient Group-wise Direct Preference Optimization | Jixuan Leng et.al. | 2604.15602 | null |
| 2026-04-16 | StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models | Dingzhi Yu et.al. | 2604.15416 | null |
| 2026-04-16 | MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events | Raunak Agarwal et.al. | 2604.15203 | null |
| 2026-04-16 | RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models | Gabriele Mattioli et.al. | 2604.14951 | null |
| 2026-04-16 | WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training | Yifu Chen et.al. | 2604.14932 | null |
| 2026-04-16 | Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models | Danae Sánchez Villegas et.al. | 2604.14888 | null |
| 2026-04-16 | CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning | Zhuo Wang et.al. | 2604.14768 | null |
| 2026-04-16 | Switching Efficiency: A Novel Framework for Dissecting AI Data Center Network Efficiency | Niangen Ye et.al. | 2604.14690 | null |
| 2026-04-16 | SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models | Binxian Su et.al. | 2604.14672 | null |
| 2026-04-15 | FoodSense: A Multisensory Food Dataset and Benchmark for Predicting Taste, Smell, Texture, and Sound from Images | Sabab Ishraq et.al. | 2604.14388 | null |
| 2026-04-15 | The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models | Akshay Paruchuri et.al. | 2604.14363 | null |
| 2026-04-15 | DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines | Gabriel Pimenta de Freitas Cardoso et.al. | 2604.14314 | null |
| 2026-04-15 | Don’t Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models | Ami Baid et.al. | 2604.14129 | null |
| 2026-04-15 | TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration | Zerun Ma et.al. | 2604.14116 | null |
| 2026-04-15 | MAny: Merge Anything for Multimodal Continual Instruction Tuning | Zijian Gao et.al. | 2604.14016 | null |
| 2026-04-15 | Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection | Ahmad Dawar Hakimi et.al. | 2604.13899 | null |
| 2026-04-15 | SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention | Hongtao Xu et.al. | 2604.13847 | null |
| 2026-04-15 | Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges | Xiaohua Wang et.al. | 2604.13602 | null |
| 2026-04-15 | SAKURAONE: An Open Ethernet-Based AI HPC System and Its Observed Workload Dynamics in a Single-Tenant LLM Development Environment | Fumikazu Konishi et.al. | 2604.13600 | null |
| 2026-04-15 | Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate | Cunda Wang et.al. | 2604.13551 | null |
| 2026-04-15 | Synthesizing Instruction-Tuning Datasets with Contrastive Decoding | Tatsuya Ichinose et.al. | 2604.13538 | null |
| 2026-04-14 | Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization | Aadyot Bhatnagar et.al. | 2604.13175 | null |
| 2026-04-14 | Visual Preference Optimization with Rubric Rewards | Ya-Qi Yu et.al. | 2604.13029 | null |
| 2026-04-14 | One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness | Erfan Baghaei Potraghloo et.al. | 2604.13006 | null |
| 2026-04-14 | Boosting Visual Instruction Tuning with Self-Supervised Guidance | Sophia Sirko-Galouchenko et.al. | 2604.12966 | null |
| 2026-04-14 | From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation | Chuang Peng et.al. | 2604.12666 | null |
| 2026-04-14 | Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design | Leon Eshuijs et.al. | 2604.12500 | null |
| 2026-04-14 | Analyzing the Effect of Noise in LLM Fine-tuning | Lingfang Li et.al. | 2604.12469 | null |
| 2026-04-14 | Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization | Zhijie Cai et.al. | 2604.12401 | null |
| 2026-04-14 | AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs | Manoj Madushanka Perera et.al. | 2604.12179 | null |
| 2026-04-14 | Nucleus-Image: Sparse MoE for Image Generation | Chandan Akiti et.al. | 2604.12163 | null |
| 2026-04-13 | Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models | Syed Rifat Raiyan et.al. | 2604.12076 | null |
| 2026-04-13 | CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation | WonJin Yoon et.al. | 2604.11801 | null |
| 2026-04-13 | RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents | Riccardo Rosati et.al. | 2604.11655 | null |
| 2026-04-13 | MLLM-as-a-Judge Exhibits Model Preference Bias | Shuitsu Koyama et.al. | 2604.11589 | null |
| 2026-04-13 | OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems | Kun Liu et.al. | 2604.11477 | null |
| 2026-04-13 | Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization | Zhixin Lin et.al. | 2604.11259 | null |
| 2026-04-13 | BITS Pilani at SemEval-2026 Task 9: Structured Supervised Fine-Tuning with DPO Refinement for Polarization Detection | Atharva Gupta et.al. | 2604.11121 | null |
| 2026-04-13 | DDO-RM for LLM Preference Optimization: A Minimal Held-Out Benchmark against DPO | Tiantian Zhang et.al. | 2604.11119 | null |
| 2026-04-12 | Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series | Krzysztof Ociepa et.al. | 2604.10799 | null |
| 2026-04-12 | Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation | Charles Koutcheme et.al. | 2604.10720 | null |
| 2026-04-12 | ProUIE: A Macro-to-Micro Progressive Learning Method for LLM-based Universal Information Extraction | Wenda Liu et.al. | 2604.10633 | null |
| 2026-04-12 | CogInstrument: Modeling Cognitive Processes for Bidirectional Human-LLM Alignment in Planning Tasks | Anqi Wang et.al. | 2604.10587 | null |
| 2026-04-12 | Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs | Subramanyam Sahoo et.al. | 2604.10585 | null |
| 2026-04-10 | Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning | Yi Sui et.al. | 2604.09150 | null |
| 2026-04-10 | NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System | Parjanya Aditya Shukla et.al. | 2604.09069 | null |
| 2026-04-10 | TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice | Gang Hu et.al. | 2604.08948 | null |
| 2026-04-09 | Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models | Yousra Fettach et.al. | 2604.08757 | null |
| 2026-04-09 | Decomposing the Delta: What Do Models Actually Learn from Preference Pairs? | Chia-Hsuan Lee et.al. | 2604.08723 | null |
| 2026-04-09 | SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions | Ashima Suvarna et.al. | 2604.08477 | null |
| 2026-04-09 | ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection | He Geng et.al. | 2604.08326 | null |
| 2026-04-09 | Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models | Weiwei Qi et.al. | 2604.08297 | null |
| 2026-04-09 | Self-Debias: Self-correcting for Debiasing Large Language Models | Xuan Feng et.al. | 2604.08243 | null |
| 2026-04-09 | EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization | Xiangyuan Wang et.al. | 2604.08213 | null |
| 2026-04-09 | Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment | Blessing Agyei Kyem et.al. | 2604.08212 | null |
| 2026-04-09 | Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling | Jiaxuan Wang et.al. | 2604.08178 | null |
| 2026-04-09 | DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing | Gyanendra Das et.al. | 2604.07965 | null |
| 2026-04-09 | Rethinking Data Mixing from the Perspective of Large Language Models | Yuanjian Xu et.al. | 2604.07963 | null |
| 2026-04-09 | Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning | Shiwan Zhao et.al. | 2604.07941 | null |
| 2026-04-08 | VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis | Jian Yu et.al. | 2604.07210 | null |
| 2026-04-08 | Gemma 4, Phi-4, and Qwen3: Accuracy-Efficiency Tradeoffs in Dense and MoE Reasoning Language Models | Md Motaleb Hossen Manik et.al. | 2604.07035 | null |
| 2026-04-08 | MARS: Enabling Autoregressive Models Multi-Token Generation | Ziqi Jin et.al. | 2604.07023 | null |
| 2026-04-08 | Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions | Parth Patil et.al. | 2604.06799 | null |
| 2026-04-08 | Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search | Zhiyu Cao et.al. | 2604.06771 | null |
| 2026-04-08 | The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence | Napoleon Paxton et.al. | 2604.06621 | null |
| 2026-04-07 | Limits of Difficulty Scaling: Hard Samples Yield Diminishing Returns in GRPO-Tuned SLMs | Suraj Yadav et.al. | 2604.06298 | null |
| 2026-04-07 | Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles | Ben Wigler et.al. | 2604.06071 | null |
| 2026-04-07 | How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism | Elisabetta Rocchetti et.al. | 2604.06015 | null |
| 2026-04-07 | Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment | Renxuan Tan et.al. | 2604.05965 | null |
| 2026-04-07 | BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs | Abbas Ghaddar et.al. | 2604.05942 | null |
| 2026-04-07 | JD-BP: A Joint-Decision Generative Framework for Auto-Bidding and Pricing | Linghui Meng et.al. | 2604.05845 | null |
| 2026-04-07 | Vision-Guided Iterative Refinement for Frontend Code Generation | Hannah Sansford et.al. | 2604.05839 | null |
| 2026-04-07 | Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning | Yanbei Jiang et.al. | 2604.05756 | null |
| 2026-04-06 | Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems | Ahmad Maroof Karimi et.al. | 2604.05168 | null |
| 2026-04-06 | SenseAI: A Human-in-the-Loop Dataset for RLHF-Aligned Financial Sentiment Reasoning | Berny Kabalisa et.al. | 2604.05135 | null |
| 2026-04-06 | Offline RL for Adaptive Policy Retrieval in Prior Authorization | Ruslan Sharifullin et.al. | 2604.05125 | null |
| 2026-04-06 | One Model for All: Multi-Objective Controllable Language Models | Qiang He et.al. | 2604.04497 | null |
| 2026-04-06 | MolDA: Molecular Understanding and Generation via Large Language Diffusion Model | Seohyeon Shin et.al. | 2604.04403 | null |
| 2026-04-06 | Developing Authentic Simulated Learners for Mathematics Teacher Learning: Insights from Three Approaches with Large Language Models | Jie Cao et.al. | 2604.04361 | null |
| 2026-04-05 | APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs | Mahmoud Srewa et.al. | 2604.04261 | null |
| 2026-04-05 | DARE: Diffusion Large Language Models Alignment and Reinforcement Executor | Jingyi Yang et.al. | 2604.04215 | null |
| 2026-04-05 | A Semi-Automated Annotation Workflow for Paediatric Histopathology Reports Using Small Language Models | Avish Vijayaraghavan et.al. | 2604.04168 | null |
| 2026-04-05 | Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison | Jihoon Jeong et.al. | 2604.04064 | null |
| 2026-04-05 | COBOL-Coder: Domain-Adapted Large Language Models for COBOL Code Generation and Translation | Anh T. V. Dau et.al. | 2604.03986 | null |
| 2026-04-05 | SafeCtrl: Region-Aware Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress | Lingyun Zhang et.al. | 2604.03941 | null |
| 2026-04-04 | Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment | Soham Gadgil et.al. | 2604.03867 | null |