[QUESTION] len(observation_shape) == 1 #397

SebyR · 2024-05-21T14:49:46Z

Hello, I am a high school student and still new to this. I've encountered the next problem: I have made my custom env

import numpy as np
import pybullet as p
import pybullet_data
import random
import math
import time
import pygame
from gymnasium.spaces import Box
from gymnasium import Env
import cv2

distance = 100000
img_w, img_h = 32, 32
num_of_frames = 2

class hEnv(Env):
    def __init__(self, use_controll=True, render=True):
        super(hEnv, self).__init__()
        self.use_controll = use_controll
        self.render = render
        if self.render:
            p.connect(p.GUI)
        else:
            p.connect(p.DIRECT)
        self.observation_space = Box(low=0, high=255, shape=(num_of_frames-1, 3, img_h, img_w), dtype=np.uint8)  # Changed shape to (3, ...)
        self.action_space = Box(low=-10, high=10, shape=(4,), dtype=np.float32)
        self.steps = 0
        self.obs = [np.zeros((3, img_h, img_w), dtype=np.uint8) for _ in range(num_of_frames - 1)]  # Changed shape to (3, ...)
        print(np.array(self.obs).shape)
        if self.use_controll:
            self.xbox()
        self.frame = 0
        if self.use_controll and pygame.joystick.get_count() == 0:
            print("No joystick")

    def xbox(self):
        pygame.init()
        pygame.joystick.init()
        self.joysticks = [pygame.joystick.Joystick(x) for x in range(pygame.joystick.get_count())]

    def load_model(self):
        p.setAdditionalSearchPath(pybullet_data.getDataPath())
        self.plane = p.loadURDF("plane.urdf", basePosition=[0, 0, 0])
        self.model = p.loadURDF("husky/husky.urdf", basePosition=[0, 0, 0])
        self.walls = p.loadURDF("D:/Proiecte/D3RLPY/D3RLPY/URDF/walls/walls.urdf", basePosition=[0, 0, 0])
        for i in range(p.getNumJoints(self.model)):
            print(p.getJointInfo(self.model,i))
        p.setGravity(0,0,-9.8)

    def step(self, action):
        reward, done = self.get_reward_done()
        obs = self.get_observation()
        for i in range(2, 6):
            p.setJointMotorControl2(self.model, i, p.VELOCITY_CONTROL, targetVelocity=action[i - 2])
            p.stepSimulation()
        self.steps += 1
        info = {}
        truncated = False
        print(reward)
        return obs, reward, done, truncated, info

    def controll(self):
        if not self.use_controll:
            return [0, 0, 0, 0]

        pygame.event.get()
        left_joystick_value = pygame.joystick.Joystick(0).get_axis(1) * -10
        right_joystick_value = pygame.joystick.Joystick(0).get_axis(3) * -10

        print(left_joystick_value, right_joystick_value)

        output = [left_joystick_value, right_joystick_value, left_joystick_value, right_joystick_value]
        return output

    def make_photo(self):
        agent_pos, agent_orn = list(p.getLinkState(self.model, 8, computeForwardKinematics=True))[:2]
        yaw = p.getEulerFromQuaternion(agent_orn)[-1]
        xA, yA, zA = agent_pos
        zA = zA + 0.3
        xB = xA + math.cos(yaw) * distance
        yB = yA + math.sin(yaw) * distance
        zB = zA
        view_matrix = p.computeViewMatrix(
            cameraEyePosition=[xA, yA, zA],
            cameraTargetPosition=[xB, yB, zB],
            cameraUpVector=[0, 0, 1.0]
        )
        projection_matrix = p.computeProjectionMatrixFOV(
            fov=90, aspect=1.5, nearVal=0.02, farVal=25)
        self.rgb_image = p.getCameraImage(img_w, img_h, view_matrix, projection_matrix, shadow=True,
                                          renderer=p.ER_BULLET_HARDWARE_OPENGL)
        self.rgb_imageF = np.reshape(self.rgb_image[2], [4, img_h, img_w])  # Change from [4, ...] to [3, ...]
        self.rgb_imageF = self.rgb_imageF[:3,:,:].astype(np.uint8)
        return self.rgb_imageF

    def get_observation(self):
        obs = [self.make_photo() for _ in range(num_of_frames - 1)]
        return np.array(obs)  # Convert list of observations to a numpy array

    def spawn_target(self):
        random_x = random.uniform(-5.0,5.0)
        random_y = random.uniform(-5.0, 5.0)
        if random_x < 2 and random_x>0:random_x= 2
        if random_x >-2 and random_x<0:random_x=-2
        if random_y < 2 and random_y>0:random_y=2
        if random_y >-2 and random_y<0:random_y=-2
        base_position = [random_x, random_y, 0.5]
        self.target = p.loadURDF('D:/Proiecte/D3RLPY/D3RLPY/URDF/walls/cubeRed.urdf', basePosition = base_position)

    def num_to_range(self,num, inMin, inMax, outMin, outMax):
        return outMin + (float(num - inMin) / float(inMax - inMin) * (outMax - outMin) * 1.0)

    def get_reward_done(self):
        joint_info = p.getBasePositionAndOrientation(self.model)
        x1,y1,_ = joint_info[0]

        base_position = p.getBasePositionAndOrientation(self.target)
        x2,y2,_ = base_position[0]

        distance = abs(((x2 - x1) ** 2 + (y2 - y1) ** 2) ** 0.5)
        done = False

        rosu = np.reshape(self.rgb_image[2], (img_h, img_w, 4))  # Changed from (img_h, img_w,4) to (img_h, img_w,3)
        rosu = rosu[:,:,:3].astype(np.uint8)  # Ensure it's only RGB without alpha channel
        cv2.cvtColor(rosu, cv2.COLOR_BGR2RGB)

        lower_red = np.array([100, 0, 0])
        upper_red = np.array([255, 0, 0])
        red_mask = cv2.inRange(rosu, lower_red, upper_red)
        rosu_cantitate = np.count_nonzero(red_mask)
        rosu_cantitate = self.num_to_range(rosu_cantitate, 0, img_h*img_w, 0, 5)

        reward = rosu_cantitate
        if rosu_cantitate < 1:
            rosu_cantitate = 0

        if distance < 1.25 and rosu_cantitate > 2:
            done = True
            reward = rosu_cantitate
        if self.steps >= 1250:
            done = True
            reward = rosu_cantitate
        return reward, done

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        if p.isConnected():
            p.disconnect()
            time.sleep(1)
        if self.render:
            p.connect(p.GUI)
        else:
            p.connect(p.DIRECT)
        p.resetDebugVisualizerCamera(cameraDistance=7.5, cameraYaw=0, cameraPitch=-40, cameraTargetPosition=[0, 0, 0])
        self.steps = 0
        self.load_model()
        self.spawn_target()
        if self.use_controll:
            self.xbox()
        obs = [self.make_photo() for _ in range(num_of_frames-1)]
        info = {}
        return obs, info

also I did a piece of code that saves some demonstrations like so:

import numpy as np
import pickle
from h_env import hEnv

# Define the number of episodes and steps per episode
num_episodes = 10

# Initialize environment
env = hEnv(use_controll=True, render=True)

# Initialize dataset

observations = []
actions = []
rewards = []
terminals = []

done = False
# Collect data
for episode in range(num_episodes):
    done = False
    obs, _ = env.reset()
    while not done:
        # Random action for data collection (you can use a policy if you have one)
            action = env.controll()

            # Step the environment
            next_obs, reward, done, _, _ = env.step(action)

            # Append to dataset
            observations.append(obs)
            actions.append(action)
            rewards.append(reward)
            terminals.append(done)

            obs = next_obs

data_file = 'collected_data.pkl'

# Save the collected data
with open(data_file, 'wb') as f:
    data_dict = {
        'observations': observations,
        'actions': actions,
        'rewards': rewards,
        'terminals': terminals
    }
    pickle.dump(data_dict, f)

And finally I have the training code that looks like this:

import pickle
import numpy as np
from d3rlpy.dataset import MDPDataset
from h_env import hEnv
import time

# Load the dataset
data_file = 'collected_data.pkl'

# Load the collected data
with open(data_file, 'rb') as f:
    data_dict = pickle.load(f)
    observations = data_dict['observations']
    actions = data_dict['actions']
    rewards = data_dict['rewards']
    terminals = data_dict['terminals']


# Initialize environment
env = hEnv(use_controll=False, render=True)


mdp_dataset = MDPDataset(observations=np.array(observations, dtype=np.uint8),
                         actions=np.array(actions,dtype=np.float32),
                         rewards=np.array(rewards,dtype=np.float32),
                         terminals=np.array(terminals,dtype=np.float32))

time.sleep(20)

import d3rlpy

sac = d3rlpy.algos.SACConfig().create(device="cuda:0")

#  Pretrain the algorithm
# sac.fit(mdp_dataset, n_steps=200, n_steps_per_epoch=10)
#
#  Save the pretrained model
# sac.save_model('pretrained_ddpg.pt')
sac.build_with_dataset(mdp_dataset)

sac.fit(n_steps=200, n_steps_per_epoch=10) 

# sac.fit_online(env, n_steps=200, n_steps_per_epoch=10)

sac.save_model('trained_ddpg.pt')

I don't know how to solve the observation_shape problem because the output looks fine:

**2024-05-21 17:40.33 [info     ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(4,)]) observation_signature=Signature(dtype=[dtype('uint8')], shape=[(1, 3, 32, 32)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-05-21 17:40.33 [info     ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1>
2024-05-21 17:40.33 [info     ] Action size has been automatically determined. action_size=4**

but i still get a big error ending in assert len(observation_shape) == 1
Please help me and thank you in advance!

The text was updated successfully, but these errors were encountered:

takuseno · 2024-06-01T02:18:09Z

@SebyR Hi, sorry for the late response. Can you share a full stack trace to see where you're getting the error from?

SebyR · 2024-06-14T19:50:45Z

I figured it out, instead of passing n frames I just passed 1 frame, the new observation space (in my env) being (3,64,64) or something like that instead of (n,3,64,64). I still want to be able to pass an image and other data at the same time but I am not sure how I could do this. I will mark this as close but I would really appreciate if you could tell me how to use an image with sensor data lets say at the same time. should I just make another layer like (4,64,64) and have it there?

takuseno · 2024-06-15T05:29:57Z

Hi, good to hear that you figured it out by yourself.

I would really appreciate if you could tell me how to use an image with sensor data lets say at the same time. should I just make another layer like (4,64,64) and have it there?

I couldn't follow this sentence, but this is the one you're looking for?
https://github.com/takuseno/d3rlpy/blob/master/examples/tuple_observation.py

d3rlpy supports tuple observations.

SebyR · 2024-06-16T20:32:32Z

I was wondering if I could use more complex observation spaces. I am trying to train a model for a humanoid robot and the observation is made out of an image and accelerometer data I didn't know how to structure the observation space for this. Thank you for the reply!

takuseno · 2024-06-18T13:31:57Z

Yeah, you should be able to use the tuple observations with d3rlpy. How to structure the observation space of specific tasks is not a scope of d3rlpy. Please ask the question at their repositories.

SebyR added the bug Something isn't working label May 21, 2024

SebyR changed the title ~~[QUESTION] bug title~~ [QUESTION] len(observation_shape) == 1 May 21, 2024

SebyR closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] len(observation_shape) == 1 #397

[QUESTION] len(observation_shape) == 1 #397

SebyR commented May 21, 2024 •

edited

Loading

takuseno commented Jun 1, 2024

SebyR commented Jun 14, 2024

takuseno commented Jun 15, 2024

SebyR commented Jun 16, 2024

takuseno commented Jun 18, 2024

[QUESTION] len(observation_shape) == 1 #397

[QUESTION] len(observation_shape) == 1 #397

Comments

SebyR commented May 21, 2024 • edited Loading

takuseno commented Jun 1, 2024

SebyR commented Jun 14, 2024

takuseno commented Jun 15, 2024

SebyR commented Jun 16, 2024

takuseno commented Jun 18, 2024

SebyR commented May 21, 2024 •

edited

Loading