Question about result for Pick Environment #15

ljjTYJR · 2020-05-15T14:04:21Z

Hello, I have some questions about the results about the environment of pick_and_place.
I used ddpg+her to train the agent, but get bad result(success rate=0), I read your paper, you said you use BC(behavior cloning), could you give some hint or some reference to get a good result?

bango123 · 2020-05-15T20:51:22Z

Hello! For the pick environment, the state space is quite large due to the small scale of the PSM gripper and the object. I generated demonstrations of grasping by driving the tool above the object, grasping it, and then moving it to the goal. These demonstrations are then used by augmenting the policy loss in DDPG with a behavioral cloning loss. This is similar to https://arxiv.org/pdf/1709.10089.pdf

ljjTYJR · 2020-05-16T11:04:26Z

Hello! Thanks for the reply, I read the paper you mentioned. I wonder how you generate the demonstrations data, just guide the arm in vrep and collect the data?
I read in the paper that he used a VR device(HTC ViVE)

bango123 · 2020-05-16T18:59:46Z

The grasping task can be put into 3 steps:

Move the arm above the object and orientate the gripper so it is angled towards the object
Open the gripper, move it directly over the object, and close gripper
Move the object to the target location.

Hope this helps!

ljjTYJR · 2020-05-17T13:52:41Z

I mean...from baseline#474, he generates the data by a script. Can I also get the data similar to this way?

bango123 · 2020-05-19T16:27:51Z

Sorry for the delayed response. I had to look around. I found the generated data and have attached it in the .zip file. Also the code I wrote is posted below:

    actions = []
    observations = []
    infos = []

    for it in range(0,numEp_forData):
    
        episodeActs = []
        episodeObs  = []
        episodeInfo = []
    
        state = env_dvrk.reset()
        episodeObs.append(state)
        env_dvrk.render()
    

        step  = 0

        for i in range(0,13):
            a = [0,0, -1, 1]

            state,r, _,info = env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)

        for i in range(0,2):
            a = [0,0, 0, -0.5]
            state,r, _,info =  env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)
        

        while step < env_dvrk._max_episode_steps:
            goal     = state['desired_goal']
            pos_ee   = state['observation'][-3:]
            pos_obj  = state['observation'][-4:]
            action = np.array(goal - pos_ee)

            a = np.clip([10*action[0], 10*action[1], 10*action[2], -0.5], -1, 1)
            state,r, _,info =  env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)
    
        actions.append(episodeActs)
        observations.append(episodeObs)
        infos.append(episodeInfo)
    
        print('Final Reward at {} is {}'.format(it,r))

dvrkPick.zip

ljjTYJR · 2020-05-20T15:50:33Z

Thanks for your help;
Another question, did you meet with the problem that when the gripper grasp the object，the PSM shakes violently？(When run the script)

bango123 · 2020-05-23T15:18:59Z

During training, I’ve only seen that when trying other rewards functions.

leey127 · 2020-05-25T10:01:18Z

Your discussions above helped me a lot. Thank you!
I used OpenAI/baselines/her to train an agent in dVRLPick environment with demos provided by @bango123 , but I still got bad results (success rate=0 just like @ljjTYJR said). I have tested FetchPickAndPlace-v1 environment use the same algorithm and it worked well.
Do I have to modify something in the original HER algorithm in Baselines?
By the way, I've commented out "vrep.simxFinish(self.clientID)" before ( #11 ). Only after that I could get a better result in dVRLReach environment. Would this modification influence the training process?

I'm very confused about these problems and wishing for your reply.

ljjTYJR · 2020-05-25T13:02:35Z

@leey127
Hello, I also meet with this problem , i doubt it may be due to the vrep simulation, the pick environment uses two sensors to grasp the object, but i find it a little difficult to trigger.

leey127 · 2020-05-27T11:37:00Z

Do you have any suggestions about this problem? @bango123

bango123 · 2020-05-27T16:11:13Z

I would confirm the code I shared in the previous comment is able to solve the environment. Basically hand-craft a policy to solve it to confirm there are no other issues.

bango123 mentioned this issue Jun 30, 2020

Advices for good training #17

Closed

poliandre98 mentioned this issue Dec 13, 2020

Objects Management in the Environments #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about result for Pick Environment #15

Question about result for Pick Environment #15

ljjTYJR commented May 15, 2020

bango123 commented May 15, 2020

ljjTYJR commented May 16, 2020

bango123 commented May 16, 2020

ljjTYJR commented May 17, 2020

bango123 commented May 19, 2020

ljjTYJR commented May 20, 2020 •

edited

bango123 commented May 23, 2020

leey127 commented May 25, 2020

ljjTYJR commented May 25, 2020

leey127 commented May 27, 2020

bango123 commented May 27, 2020

Question about result for Pick Environment #15

Question about result for Pick Environment #15

Comments

ljjTYJR commented May 15, 2020

bango123 commented May 15, 2020

ljjTYJR commented May 16, 2020

bango123 commented May 16, 2020

ljjTYJR commented May 17, 2020

bango123 commented May 19, 2020

ljjTYJR commented May 20, 2020 • edited

bango123 commented May 23, 2020

leey127 commented May 25, 2020

ljjTYJR commented May 25, 2020

leey127 commented May 27, 2020

bango123 commented May 27, 2020

ljjTYJR commented May 20, 2020 •

edited