Skip to content

TianxingChen/Paper-List-For-EmbodiedAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Paper List for Robotics & Embodied AI - Tianxing Chen

0. Table of contents

  1. Diffusion Model for Planning, Policy, and RL
  2. 3D-based Manipulation
  3. 2D-based Manipulation
  4. LLM for robotics
  5. LLM Agent (Planning)
  6. Generative Model for Embodied
  7. Visual Feature: Correspondence, Affordance
  8. Detection & Segmentation
  9. Pose Estimation and Tracking
  10. Humanoid
  11. Dataset & Benchmark
  12. Hardware
  13. 2D to 3D Generation
  14. Gaussion Splatting
  15. Robotics for Medical
  16. Companies

1. Diffusion Model for Planning, Policy, and RL

  • [arXiv] Diffusion Models for Reinforcement Learning: A Survey, arXiv

  • [ICLR 2023 (Top 5% Notable)] Is Conditional Generative Modeling all you need for Decision-Making?, website

  • [RSS 2023] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion, website

  • [ICML 2022 (Long Talk)] Planning with Diffusion for Flexible Behavior Synthesis, website

  • [ICML 2023 Oral] Adaptdiffuser: Diffusion models as adaptive self-evolving planners, website

  • [CVPR 2024] SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution, website

  • [arXiv] Learning a Diffusion Model Policy From Reward via Q-Score Matching, arXiv

  • [CoRL 2023] ChainedDiffuser: Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation, website

  • [CVPR 2023] Affordance Diffusion: Synthesizing Hand-Object Interactions, website

  • [arXiv] DiffuserLite: Towards Real-time Diffusion Planning, arXiv

  • [arXiv] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations, website

  • [arXiv] 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations, website

  • [arXiv] SafeDiffuser: Safe Planning with Diffusion Probabilistic Models, arXiv

  • [CVPR 2024] Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation, arXiv

  • [arXiv 2024] Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning, arXiv

  • [arXiv 2024] Surgical Robot Transformer: Imitation Learning for Surgical Tasks, website

2. 3D-based Manipulation

  • [RSS 2024] RVT-2: Learning Precise Manipulation from Few Examples website

  • [arXiv 2023] D3 Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation, website

  • [arXiv 2024] UniDoorManip: Learning Universal Door Manipulation Policy Over Large-scale and Diverse Door Manipulation Environments, website

  • [CoRL 2023 (Oral)] GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields, website

  • [ECCV 2024] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation, website

Grasping

  • GraspNet website
    • [TRO 2023] AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains, arXiv
  • [arXiv 2024] ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter, website
  • [arXiv 2024] GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping, website

3. 2D-based Manipulation

  • [NIPS 2023] MoVie: Visual Model-Based Policy Adaptation for View Generalization, website

4. LLM for robotics (LLM Agent)

  • [arXiv 2024] OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics, website

  • [CoRL 2023] VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models, website

  • [arXiv 2023] ChatGPT for Robotics: Design Principles and Model Abilities, arXiv

  • [arXiv 2024] Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation, arXiv

  • [PMLR 2023] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, website

5. LLM Agnet (planning)

  • [NIPS 2023] Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning, website

6. Generative Model for Embodied

  • [arXiv 2024] Generative Image as Action Models, website

  • [arXiv 2024] Genie: Generative Interactive Environments, website

7. Visual Feature

7.1 Correspondence

  • [arXiv 2023] D3 Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation, website

  • [CoRL 2020] Transporter Networks: Rearranging the Visual World for Robotic Manipulation, website

  • [ICLR 2024] SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation, website

  • [ICRA 2024] UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence, website

  • [CoRL 2018] Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation, PDF

  • [arXiv 2024] Theia: Distilling Diverse Vision Foundation Models for Robot Learning, website, Github repo

7.2 Affordance

  • [CoRL 2022] Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, website

  • [arXiv 2024] Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation, arXiv

  • [arXiv 2024] PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments, arXiv

  • [ICLR 2022] VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects, website

  • [ICLR 2023] DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Object Manipulation, arXiv

  • [CVPR 2022] Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos, website

  • [ICCV 2023] AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose, website

8. Detection & Segmentation

  • [ECCV 2024] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, Github repo

  • [arXiv 2024] Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect, Segment and Generate Anything, Github repo

  • [ICCV 2023] DEVA: Tracking Anything with Decoupled Video Segmentation, website

  • [ECCV 2022] Mem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model, website

  • [ICCV 2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation, website

  • LangSAM Github repo, combining Grounding DINO and SAM

9. Pose Estimation and Tracking

  • [CVPR 2024 (Highlight)] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects, website

  • [CVPR 2023 (Highlight)] GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts, website

  • [arXiv 2023] GAMMA: Generalizable Articulation Modeling and Manipulation for Articulated Objects, website

  • [arXiv 2024] ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics, website

  • [ICCV 2023] AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose, website

  • [CVPR 2023] BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects, website

10. Humanoid

  • [arXiv 2024] HumanPlus: Humanoid Shadowing and Imitation from Humans, website

11. Dataset & Benchmark

  • [arXiv 2024] Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks, website, zhihu
  • [arXiv 2024] GRUtopia: Dream General Robots in a City at Scale, Github Repo
  • [ICLR 2024] AgentBoard: An Analytical Evaluation Board of Multi-Turn LLM Agents, website
  • [arXiv 2024] RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios, Github repo
  • [arXiv 2024] BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark, website
  • [arXiv 2024] Evaluating Real-World Robot Manipulation Policies in Simulation, website

12. Hardware

  • [arXiv 2024] DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation, website

13. 2D to 3D Generation

  • [arXiv 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image, website

14. Gaussian Splatting

  • [SIGGRAPH 2024] 2DGS: 2D Gaussian Splatting for Geometrically Accurate Radiance Fields, website

15. Robotics for Medical

  • [arXiv 2024] Surgical Robot Transformer: Imitation Learning for Surgical Tasks, website

TO READ

  1. Stabilizing Transformers for Reinforcement Learning

    • Summary: 本文提出了Gated Transformer-XL (GTrXL),一种改进的Transformer架构,用于解决标准Transformer在强化学习中的优化难题。通过引入层归一化和门控机制,GTrXL在部分可观察性环境中取得了优于LSTM的性能。
    • 链接
  2. CoBERL: Contrastive BERT for Reinforcement Learning

    • Summary: 文章介绍了CoBERL,它结合了对比损失和Transformer架构,通过双向掩码预测和对比学习方法提高强化学习中的数据效率和性能。
    • 链接
  3. Adaptive Transformers in RL

    • Summary: 该研究探索了在强化学习中使用具有自适应注意力跨度的Transformer模型,发现这种方法能够提高模型在需要长期依赖的环境中的性能。
    • 链接
  4. Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

    • Summary: 本文提出了Actor-Learner Distillation (ALD)方法,通过从大型学习者模型向小型执行者模型进行知识蒸馏,以提高Transformer在强化学习中的样本效率。
    • 链接
  5. Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

    • Summary: 介绍了Deep Transformer Q-Networks (DTQN),这是一种新型的强化学习架构,使用Transformer的自注意力机制来处理部分可观察性任务,并在多个挑战性环境中展示了有效性。
    • 链接
  6. CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

    • Summary: CtrlFormer是一种新型的Transformer架构,专注于通过学习可迁移的状态表示来提高视觉控制任务的样本效率,特别强调了在跨任务迁移学习方面的优势。
    • 链接

Sapiens: Foundation for Human Vision Models: https://about.meta.com/realitylabs/codecavatars/sapiens General Flow as Foundation Affordance for Scalable Robot Learning https://general-flow.github.io/

Releases

No releases published

Packages

No packages published