Skip to content

ziyanx02/HumanAdapt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Walk a Leap_hand

Training a quadruped robot (Go2) to walk is the most easy locomotion task. However, training a leap_hand to walk might not be a trivial task, which may need careful pose design and reward tunning.

In this experiment, as an robotic expert, you are required to train a RL(Reinforcement Learning) policy that makes the leap_hand walk, by the provided framework on Genesis (https://genesis-world.readthedocs.io/en/latest/).

Like the other frameworks you may have learned in your robotic course, our framework consists of:

  • Environment: /envs
  • Configuration: /cfgs
  • Algorithm: /rsl_rl
  • Training scripts: train_go2.py, train_leaphand.py
  • Other auxiliary scripts: /pose_generate

All you need to do is to follow the instructions in the last section step by step:

  • Design a reasonable pose for leap_hand to walk
  • Write the environemnt code for leap_hand with a Go2 example
  • Find a proper reward parameters to walk the leap_hand.

This task is as easy as the basic RL homework in your robotic course. For a robotic expert like you, the estimated used time is not more than 2 hours. Take it easy and have fun!

Constraints

This experiment is for benchmarking human's reward engineering capability and effeciency.

You should set up 1 local machine (refered as local machine) supporting windowed viewer for adjusting the robot pose and 1 remote/local (refered as remote machine) for parrallel training.

It's highly recommended to use multiple cards to accelerate your exploration. However, the number of parallel training processes must NOT exceed 4.

A timer is needed for scientific research. You are required to record the consuming time of each stage by yourself.

Installation

It's highly recommended to use Linux as your system.

On both local machine and remote machine, install pytorch, Genesis and rsl_rl according to following commands.

  • Create an environment with python>=3.10 and pytorch. Skip if you already have.
conda create -n humanadapt python=3.12 # Conda environment with python>=3.10 is recommended.
conda activate humanadapt 
pip3 install torch torchvision torchaudio # install pytorch
  • Install Genesis
cd Genesis
pip install -e .
  • Install rsl_rl
cd rsl_rl
pip install -e .
  • Install other Packages
pip install pyyaml
pip install wandb
pip install tensorboard

Tasks

Stage 0: Pre-Request

Replace wandb_entity and wandb_project in the cfgs cfgs/ with your account.

Run python train_go2.py on the remote machine and use the timer to record the entire execution time (or simply take the ETA).

Stage 1: Robot Pose Generation

Start the timer.

In this section, you need to adjust the leap_hand to a suitable pose for walking:

Task: Walk with palm parallel to the ground, middle finger pointing forward, all fingertips placing on the ground.

A good pose can be a good reference in reward for the robot to explore. The pose of leap_hand is represented by its root_height, root_rotation, and dof_pos. The pose is stored in yaml configurations, which can be automatically generated by the provided scripts named adjust_***.py.

We provide a render method to render the pose configuration:

  • Run
cd pose_generate
python render.py -n <Cfg Name Here>
  • The labeled image of the robot will be stored. You could recognize each link by the printed name.

Red axis is x+. Green axis is y+. Blue axis is z+.

If you wish, you could change body_name in pose_generate/cfgs/leap_hand/basic.yaml.

We have also provided some easier way to adjust the robot pose through a GUI. You can also click the save button to get the current config after adjustment.

  • Run to adjust the robot pose by controling the extremities' position by Inverse Kinematics.
python adjust_body_rot.py
python adjust_foot_pos.py
  • Run to adjust the robot pose by controling each dof.
python adjust_dof_pos.py

Once you obtain a suitable pose as desired, you can finish this stage.

Stop the timer and record the time used as pose generation.

Stage 2: Code Refactor

Start the timer.

There is an example of Go2 at envs/go2_env.py, you could refer to this example and extract useful parts (including reward functions and additional information retrieved from the simulator) into envs/state_wrapper.py, envs/reward_wrapper.py and cfgs/leap_hand.yaml.

Unless necessary, do not create new functions in both wrappers. You can run python train_leaphand.py --debug for debugging.

Stop the timer when you can run python train_leaphand.py bug-free. Record the time as code extraction. In this stage, there is no need to tune the rewards.

Stage 3: Reward Engineering

Start the timer.

Now feel free to adjust the reward functions and scales, but remember to run at most 4 parallel training at the same time on the remote machine. Please stop starting new runs after (1.5 hour - pose generation - code extraction) * execution / 10 min. But you could wait until all runs finished.

Run python train_leaphand.py --eval -e EXP_NAME --ckpt NUM_CKPT to get the metrics of your BEST checkpoint.

Tips

  • Referring to the training example of Go2 and thinking about the usage of pose configuration could be very useful.
  • Add robot_scale=2 to scale up the leap_hand size by 2 to prevent potentially explodesion.
  • You may need to read a lot of code, but most of them is from some traditional locomotion codebase.

Submission

Pack all files in this repo into one ZIP file and submit to us, containing:

  • code
  • configuration
  • checkpoint
  • record time of each section in any format

Last

We appreciate your participation in our experiment. This work will be released soon.

If you have any question or find any bug, feel free to contact us. We sincerely apologize for the potential inconsideration due to the limited time.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors