Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] len(observation_shape) == 1 #397

Closed
SebyR opened this issue May 21, 2024 · 5 comments
Closed

[QUESTION] len(observation_shape) == 1 #397

SebyR opened this issue May 21, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@SebyR
Copy link

SebyR commented May 21, 2024

Hello, I am a high school student and still new to this. I've encountered the next problem: I have made my custom env

import numpy as np
import pybullet as p
import pybullet_data
import random
import math
import time
import pygame
from gymnasium.spaces import Box
from gymnasium import Env
import cv2

distance = 100000
img_w, img_h = 32, 32
num_of_frames = 2

class hEnv(Env):
    def __init__(self, use_controll=True, render=True):
        super(hEnv, self).__init__()
        self.use_controll = use_controll
        self.render = render
        if self.render:
            p.connect(p.GUI)
        else:
            p.connect(p.DIRECT)
        self.observation_space = Box(low=0, high=255, shape=(num_of_frames-1, 3, img_h, img_w), dtype=np.uint8)  # Changed shape to (3, ...)
        self.action_space = Box(low=-10, high=10, shape=(4,), dtype=np.float32)
        self.steps = 0
        self.obs = [np.zeros((3, img_h, img_w), dtype=np.uint8) for _ in range(num_of_frames - 1)]  # Changed shape to (3, ...)
        print(np.array(self.obs).shape)
        if self.use_controll:
            self.xbox()
        self.frame = 0
        if self.use_controll and pygame.joystick.get_count() == 0:
            print("No joystick")

    def xbox(self):
        pygame.init()
        pygame.joystick.init()
        self.joysticks = [pygame.joystick.Joystick(x) for x in range(pygame.joystick.get_count())]

    def load_model(self):
        p.setAdditionalSearchPath(pybullet_data.getDataPath())
        self.plane = p.loadURDF("plane.urdf", basePosition=[0, 0, 0])
        self.model = p.loadURDF("husky/husky.urdf", basePosition=[0, 0, 0])
        self.walls = p.loadURDF("D:/Proiecte/D3RLPY/D3RLPY/URDF/walls/walls.urdf", basePosition=[0, 0, 0])
        for i in range(p.getNumJoints(self.model)):
            print(p.getJointInfo(self.model,i))
        p.setGravity(0,0,-9.8)

    def step(self, action):
        reward, done = self.get_reward_done()
        obs = self.get_observation()
        for i in range(2, 6):
            p.setJointMotorControl2(self.model, i, p.VELOCITY_CONTROL, targetVelocity=action[i - 2])
            p.stepSimulation()
        self.steps += 1
        info = {}
        truncated = False
        print(reward)
        return obs, reward, done, truncated, info

    def controll(self):
        if not self.use_controll:
            return [0, 0, 0, 0]

        pygame.event.get()
        left_joystick_value = pygame.joystick.Joystick(0).get_axis(1) * -10
        right_joystick_value = pygame.joystick.Joystick(0).get_axis(3) * -10

        print(left_joystick_value, right_joystick_value)

        output = [left_joystick_value, right_joystick_value, left_joystick_value, right_joystick_value]
        return output

    def make_photo(self):
        agent_pos, agent_orn = list(p.getLinkState(self.model, 8, computeForwardKinematics=True))[:2]
        yaw = p.getEulerFromQuaternion(agent_orn)[-1]
        xA, yA, zA = agent_pos
        zA = zA + 0.3
        xB = xA + math.cos(yaw) * distance
        yB = yA + math.sin(yaw) * distance
        zB = zA
        view_matrix = p.computeViewMatrix(
            cameraEyePosition=[xA, yA, zA],
            cameraTargetPosition=[xB, yB, zB],
            cameraUpVector=[0, 0, 1.0]
        )
        projection_matrix = p.computeProjectionMatrixFOV(
            fov=90, aspect=1.5, nearVal=0.02, farVal=25)
        self.rgb_image = p.getCameraImage(img_w, img_h, view_matrix, projection_matrix, shadow=True,
                                          renderer=p.ER_BULLET_HARDWARE_OPENGL)
        self.rgb_imageF = np.reshape(self.rgb_image[2], [4, img_h, img_w])  # Change from [4, ...] to [3, ...]
        self.rgb_imageF = self.rgb_imageF[:3,:,:].astype(np.uint8)
        return self.rgb_imageF

    def get_observation(self):
        obs = [self.make_photo() for _ in range(num_of_frames - 1)]
        return np.array(obs)  # Convert list of observations to a numpy array

    def spawn_target(self):
        random_x = random.uniform(-5.0,5.0)
        random_y = random.uniform(-5.0, 5.0)
        if random_x < 2 and random_x>0:random_x= 2
        if random_x >-2 and random_x<0:random_x=-2
        if random_y < 2 and random_y>0:random_y=2
        if random_y >-2 and random_y<0:random_y=-2
        base_position = [random_x, random_y, 0.5]
        self.target = p.loadURDF('D:/Proiecte/D3RLPY/D3RLPY/URDF/walls/cubeRed.urdf', basePosition = base_position)

    def num_to_range(self,num, inMin, inMax, outMin, outMax):
        return outMin + (float(num - inMin) / float(inMax - inMin) * (outMax - outMin) * 1.0)

    def get_reward_done(self):
        joint_info = p.getBasePositionAndOrientation(self.model)
        x1,y1,_ = joint_info[0]

        base_position = p.getBasePositionAndOrientation(self.target)
        x2,y2,_ = base_position[0]

        distance = abs(((x2 - x1) ** 2 + (y2 - y1) ** 2) ** 0.5)
        done = False

        rosu = np.reshape(self.rgb_image[2], (img_h, img_w, 4))  # Changed from (img_h, img_w,4) to (img_h, img_w,3)
        rosu = rosu[:,:,:3].astype(np.uint8)  # Ensure it's only RGB without alpha channel
        cv2.cvtColor(rosu, cv2.COLOR_BGR2RGB)

        lower_red = np.array([100, 0, 0])
        upper_red = np.array([255, 0, 0])
        red_mask = cv2.inRange(rosu, lower_red, upper_red)
        rosu_cantitate = np.count_nonzero(red_mask)
        rosu_cantitate = self.num_to_range(rosu_cantitate, 0, img_h*img_w, 0, 5)

        reward = rosu_cantitate
        if rosu_cantitate < 1:
            rosu_cantitate = 0

        if distance < 1.25 and rosu_cantitate > 2:
            done = True
            reward = rosu_cantitate
        if self.steps >= 1250:
            done = True
            reward = rosu_cantitate
        return reward, done

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        if p.isConnected():
            p.disconnect()
            time.sleep(1)
        if self.render:
            p.connect(p.GUI)
        else:
            p.connect(p.DIRECT)
        p.resetDebugVisualizerCamera(cameraDistance=7.5, cameraYaw=0, cameraPitch=-40, cameraTargetPosition=[0, 0, 0])
        self.steps = 0
        self.load_model()
        self.spawn_target()
        if self.use_controll:
            self.xbox()
        obs = [self.make_photo() for _ in range(num_of_frames-1)]
        info = {}
        return obs, info

also I did a piece of code that saves some demonstrations like so:

import numpy as np
import pickle
from h_env import hEnv

# Define the number of episodes and steps per episode
num_episodes = 10

# Initialize environment
env = hEnv(use_controll=True, render=True)

# Initialize dataset

observations = []
actions = []
rewards = []
terminals = []

done = False
# Collect data
for episode in range(num_episodes):
    done = False
    obs, _ = env.reset()
    while not done:
        # Random action for data collection (you can use a policy if you have one)
            action = env.controll()

            # Step the environment
            next_obs, reward, done, _, _ = env.step(action)

            # Append to dataset
            observations.append(obs)
            actions.append(action)
            rewards.append(reward)
            terminals.append(done)

            obs = next_obs

data_file = 'collected_data.pkl'

# Save the collected data
with open(data_file, 'wb') as f:
    data_dict = {
        'observations': observations,
        'actions': actions,
        'rewards': rewards,
        'terminals': terminals
    }
    pickle.dump(data_dict, f)

And finally I have the training code that looks like this:

import pickle
import numpy as np
from d3rlpy.dataset import MDPDataset
from h_env import hEnv
import time

# Load the dataset
data_file = 'collected_data.pkl'

# Load the collected data
with open(data_file, 'rb') as f:
    data_dict = pickle.load(f)
    observations = data_dict['observations']
    actions = data_dict['actions']
    rewards = data_dict['rewards']
    terminals = data_dict['terminals']


# Initialize environment
env = hEnv(use_controll=False, render=True)


mdp_dataset = MDPDataset(observations=np.array(observations, dtype=np.uint8),
                         actions=np.array(actions,dtype=np.float32),
                         rewards=np.array(rewards,dtype=np.float32),
                         terminals=np.array(terminals,dtype=np.float32))

time.sleep(20)

import d3rlpy

sac = d3rlpy.algos.SACConfig().create(device="cuda:0")

#  Pretrain the algorithm
# sac.fit(mdp_dataset, n_steps=200, n_steps_per_epoch=10)
#
#  Save the pretrained model
# sac.save_model('pretrained_ddpg.pt')
sac.build_with_dataset(mdp_dataset)

sac.fit(n_steps=200, n_steps_per_epoch=10) 

# sac.fit_online(env, n_steps=200, n_steps_per_epoch=10)

sac.save_model('trained_ddpg.pt')

I don't know how to solve the observation_shape problem because the output looks fine:

**2024-05-21 17:40.33 [info     ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(4,)]) observation_signature=Signature(dtype=[dtype('uint8')], shape=[(1, 3, 32, 32)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-05-21 17:40.33 [info     ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1>
2024-05-21 17:40.33 [info     ] Action size has been automatically determined. action_size=4**

but i still get a big error ending in assert len(observation_shape) == 1
Please help me and thank you in advance!

@SebyR SebyR added the bug Something isn't working label May 21, 2024
@SebyR SebyR changed the title [QUESTION] bug title [QUESTION] len(observation_shape) == 1 May 21, 2024
@takuseno
Copy link
Owner

takuseno commented Jun 1, 2024

@SebyR Hi, sorry for the late response. Can you share a full stack trace to see where you're getting the error from?

@SebyR
Copy link
Author

SebyR commented Jun 14, 2024

I figured it out, instead of passing n frames I just passed 1 frame, the new observation space (in my env) being (3,64,64) or something like that instead of (n,3,64,64). I still want to be able to pass an image and other data at the same time but I am not sure how I could do this. I will mark this as close but I would really appreciate if you could tell me how to use an image with sensor data lets say at the same time. should I just make another layer like (4,64,64) and have it there?

@SebyR SebyR closed this as completed Jun 14, 2024
@takuseno
Copy link
Owner

Hi, good to hear that you figured it out by yourself.

I would really appreciate if you could tell me how to use an image with sensor data lets say at the same time. should I just make another layer like (4,64,64) and have it there?

I couldn't follow this sentence, but this is the one you're looking for?
https://github.com/takuseno/d3rlpy/blob/master/examples/tuple_observation.py

d3rlpy supports tuple observations.

@SebyR
Copy link
Author

SebyR commented Jun 16, 2024

I was wondering if I could use more complex observation spaces. I am trying to train a model for a humanoid robot and the observation is made out of an image and accelerometer data I didn't know how to structure the observation space for this. Thank you for the reply!

@takuseno
Copy link
Owner

Yeah, you should be able to use the tuple observations with d3rlpy. How to structure the observation space of specific tasks is not a scope of d3rlpy. Please ask the question at their repositories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants