Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about result for Pick Environment #15

Open
ljjTYJR opened this issue May 15, 2020 · 11 comments
Open

Question about result for Pick Environment #15

ljjTYJR opened this issue May 15, 2020 · 11 comments

Comments

@ljjTYJR
Copy link

ljjTYJR commented May 15, 2020

Hello, I have some questions about the results about the environment of pick_and_place.
I used ddpg+her to train the agent, but get bad result(success rate=0), I read your paper, you said you use BC(behavior cloning), could you give some hint or some reference to get a good result?

@bango123
Copy link
Contributor

Hello! For the pick environment, the state space is quite large due to the small scale of the PSM gripper and the object. I generated demonstrations of grasping by driving the tool above the object, grasping it, and then moving it to the goal. These demonstrations are then used by augmenting the policy loss in DDPG with a behavioral cloning loss. This is similar to https://arxiv.org/pdf/1709.10089.pdf

@ljjTYJR
Copy link
Author

ljjTYJR commented May 16, 2020

Hello! Thanks for the reply, I read the paper you mentioned. I wonder how you generate the demonstrations data, just guide the arm in vrep and collect the data?
I read in the paper that he used a VR device(HTC ViVE)

@bango123
Copy link
Contributor

The grasping task can be put into 3 steps:

  1. Move the arm above the object and orientate the gripper so it is angled towards the object
  2. Open the gripper, move it directly over the object, and close gripper
  3. Move the object to the target location.

Hope this helps!

@ljjTYJR
Copy link
Author

ljjTYJR commented May 17, 2020

I mean...from baseline#474, he generates the data by a script. Can I also get the data similar to this way?

@bango123
Copy link
Contributor

Sorry for the delayed response. I had to look around. I found the generated data and have attached it in the .zip file. Also the code I wrote is posted below:

    actions = []
    observations = []
    infos = []

    for it in range(0,numEp_forData):
    
        episodeActs = []
        episodeObs  = []
        episodeInfo = []
    
        state = env_dvrk.reset()
        episodeObs.append(state)
        env_dvrk.render()
    

        step  = 0

        for i in range(0,13):
            a = [0,0, -1, 1]

            state,r, _,info = env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)

        for i in range(0,2):
            a = [0,0, 0, -0.5]
            state,r, _,info =  env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)
        

        while step < env_dvrk._max_episode_steps:
            goal     = state['desired_goal']
            pos_ee   = state['observation'][-3:]
            pos_obj  = state['observation'][-4:]
            action = np.array(goal - pos_ee)

            a = np.clip([10*action[0], 10*action[1], 10*action[2], -0.5], -1, 1)
            state,r, _,info =  env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)
    
        actions.append(episodeActs)
        observations.append(episodeObs)
        infos.append(episodeInfo)
    
        print('Final Reward at {} is {}'.format(it,r))


dvrkPick.zip

@ljjTYJR
Copy link
Author

ljjTYJR commented May 20, 2020

Thanks for your help;
Another question, did you meet with the problem that when the gripper grasp the object,the PSM shakes violently?(When run the script)

@bango123
Copy link
Contributor

During training, I’ve only seen that when trying other rewards functions.

@leey127
Copy link

leey127 commented May 25, 2020

Your discussions above helped me a lot. Thank you!
I used OpenAI/baselines/her to train an agent in dVRLPick environment with demos provided by @bango123 , but I still got bad results (success rate=0 just like @ljjTYJR said). I have tested FetchPickAndPlace-v1 environment use the same algorithm and it worked well.
Do I have to modify something in the original HER algorithm in Baselines?
By the way, I've commented out "vrep.simxFinish(self.clientID)" before ( #11 ). Only after that I could get a better result in dVRLReach environment. Would this modification influence the training process?

I'm very confused about these problems and wishing for your reply.

@ljjTYJR
Copy link
Author

ljjTYJR commented May 25, 2020

@leey127
Hello, I also meet with this problem , i doubt it may be due to the vrep simulation, the pick environment uses two sensors to grasp the object, but i find it a little difficult to trigger.

@leey127
Copy link

leey127 commented May 27, 2020

Do you have any suggestions about this problem? @bango123

@bango123
Copy link
Contributor

I would confirm the code I shared in the previous comment is able to solve the environment. Basically hand-craft a policy to solve it to confirm there are no other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants