[QUESTION] multidimensional states and actions #391

bzeni1 · 2024-05-01T15:29:38Z

When attempting to create an MDPDataset in d3rlpy with data shaped as for example (100, 5) for observations, (100, 5) for actions, (100,) for rewards, (100, 5) for next observations, and (100,) for terminals, all of which are valid and consistent, I encounter an error: "ValueError: operands could not be broadcast together with shapes (500,) (100,)." This error suggests a broadcasting issue internal to d3rlpy, occurring during dataset creation despite correctly matched data dimensions. It seems to interpret or handle multidimensional data incorrectly, potentially a bug with the library’s handling of input shapes for MDPDataset.

When I select only 1 feature, I'm able to create the MDPDataset with shapes like (100, 1). However, I encounter another error later in the code when I try to use the for example the DDPG model.
The error message states that the DDPG model requires 'config' and 'device' arguments, but based on the documentation, DDPG() does not have these arguments. When I try to use the arguments mentioned in the documentation, I get an 'unexpected keyword argument' error.

Do you think this could be a problem with the library? I already tried several python envrionments and got the same errors.
Library used is d3rlpy-2.4.0.

takuseno · 2024-05-05T05:21:05Z

@bzeni1 Hi, could you share the minimal example that I can reproduce your issue? It sounds like your code is simply incorrect.

btw, when you instantiate algorithms, you need to do as follows:

ddpg = d3rlpy.algos.DDPGConfig().create()

bzeni1 · 2024-05-05T06:11:37Z

@takuseno Hi, find my code below. What could be the problem? Thank you in advance for your assistance on this matter.

processed_data = race_data.copy()
settings_columns = [ #22 selected coloumns from my dataset ]
processed_data['reward'] = processed_data['ACCELERATION_m_s2']

for col in settings_columns:
    processed_data[f'state_{col}'] = processed_data[col]
    

for col in settings_columns:
    processed_data[f'action_{col}'] = processed_data[col].diff().fillna(0)


for col in settings_columns:
    processed_data[f'next_{col}'] = processed_data[col].shift(-1)

#end of an episode (race)
processed_data['done'] = processed_data['race_num'].diff(-1) != 0

processed_data = processed_data[processed_data['done'] == False]

print("After filtering rows:", processed_data.shape)

print("States shape:", processed_data[settings_columns].shape)
print("Actions shape:", processed_data[[f'action_{col}' for col in settings_columns]].shape)
print("Rewards shape:", processed_data['reward'].shape)
print("Next states shape:", processed_data[[f'next_{col}' for col in settings_columns]].shape)
print("Dones shape:", processed_data['done'].shape)

**Output**
States shape: (17642, 22)
Actions shape: (17642, 22)
Rewards shape: (17642,)
Next states shape: (17642, 22)
Dones shape: (17642,)

states = processed_data[[f'state_{col}' for col in settings_columns if f'state_{col}' in processed_data.columns]].to_numpy()
actions = processed_data[[col for col in processed_data.columns if col.startswith('action_')]].to_numpy()
rewards = processed_data['reward'].to_numpy()
next_states = processed_data[[col for col in processed_data.columns if col.startswith('next_')]].to_numpy()
dones = processed_data['done'].to_numpy()

#next step:

dataset = MDPDataset(states, actions, rewards, next_states, dones)

#ValueError: operands could not be broadcast together with shapes (388124,) (17642,)

takuseno · 2024-05-05T06:18:28Z

Thanks for sharing your code. It looks like next_states is unnecessary. It needs to be as follows:

dataset = MDPDataset(states, actions, rewards, dones)

bzeni1 · 2024-05-05T21:28:29Z

Thanks for your advice. By removing next_states I am encountering a new issue:

ValueError: Either episodes or env must be provided to determine signatures. Or specify signatures directly.

However I already defined the segment by the 'done' flags, I still don't know how to determine the episodes. What do you think?

takuseno · 2024-05-05T23:25:21Z

My guess is thatdones is all zeros, thus episodes couldn't be found. You need to correctly setup dones.

rohanblueboybaijal · 2024-05-07T17:40:11Z

Hi

I think I am running into a similar issue. I have 2 datasets. FOr both of them all the dimensions are the same
observations: (5000, 4), actions: (5000, 2), rewards: (5000,), terminals: (5000,)

But with 1 dataset the fit function for IQL fails. Although I am getting a different error. I can see that both datasets have some terminals = 1.
Any suggestions for where an error like this might come up?

`

File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 409, in fit
    results = list(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 543, in fitter
    loss = self.update(batch)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 863, in update
    loss = self._impl.update(torch_batch, self._grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/torch_utility.py", line 365, in wrapper
    return f(self, *args, **kwargs)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 70, in update
    return self.inner_update(batch, grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
    metrics.update(self.update_critic(batch))
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
    loss = self.compute_critic_loss(batch, q_tpn)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/iql_impl.py", line 73, in compute_critic
_loss
    q_loss = self._q_func_forwarder.compute_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 256, in
 compute_error
    return compute_ensemble_q_function_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 96, in 
compute_ensemble_q_function_error
    loss = forwarder.compute_error( 
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 130, in com
pute_error
    value = self._q_func(observations, actions).q_value
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/base.py", line 35, in __call__
    return super().__call__(x, action)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 99, in forw
ard
    q_value=self._fc(self._encoder(x, action)),
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 41, in __call__
    return super().__call__(x, action)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 284, in forward
    return self._layers(x)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x6 and 5x256)

`

takuseno · 2024-05-19T04:41:02Z

@rohanblueboybaijal Sorry for the late response. Could you share a minimal example that I can reproduce your error?

takuseno · 2024-06-01T02:16:45Z

Let me close this issue since the initial question should be resolved. Feel free to open a new issue to follow up.

bzeni1 added the bug Something isn't working label May 1, 2024

takuseno closed this as completed Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] multidimensional states and actions #391

[QUESTION] multidimensional states and actions #391

bzeni1 commented May 1, 2024

takuseno commented May 5, 2024

bzeni1 commented May 5, 2024 •

edited

Loading

takuseno commented May 5, 2024

bzeni1 commented May 5, 2024

takuseno commented May 5, 2024

rohanblueboybaijal commented May 7, 2024

takuseno commented May 19, 2024

takuseno commented Jun 1, 2024

[QUESTION] multidimensional states and actions #391

[QUESTION] multidimensional states and actions #391

Comments

bzeni1 commented May 1, 2024

takuseno commented May 5, 2024

bzeni1 commented May 5, 2024 • edited Loading

takuseno commented May 5, 2024

bzeni1 commented May 5, 2024

takuseno commented May 5, 2024

rohanblueboybaijal commented May 7, 2024

takuseno commented May 19, 2024

takuseno commented Jun 1, 2024

bzeni1 commented May 5, 2024 •

edited

Loading