Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] multidimensional states and actions #391

Closed
bzeni1 opened this issue May 1, 2024 · 8 comments
Closed

[QUESTION] multidimensional states and actions #391

bzeni1 opened this issue May 1, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@bzeni1
Copy link

bzeni1 commented May 1, 2024

When attempting to create an MDPDataset in d3rlpy with data shaped as for example (100, 5) for observations, (100, 5) for actions, (100,) for rewards, (100, 5) for next observations, and (100,) for terminals, all of which are valid and consistent, I encounter an error: "ValueError: operands could not be broadcast together with shapes (500,) (100,)." This error suggests a broadcasting issue internal to d3rlpy, occurring during dataset creation despite correctly matched data dimensions. It seems to interpret or handle multidimensional data incorrectly, potentially a bug with the library’s handling of input shapes for MDPDataset.

When I select only 1 feature, I'm able to create the MDPDataset with shapes like (100, 1). However, I encounter another error later in the code when I try to use the for example the DDPG model.
The error message states that the DDPG model requires 'config' and 'device' arguments, but based on the documentation, DDPG() does not have these arguments. When I try to use the arguments mentioned in the documentation, I get an 'unexpected keyword argument' error.

Do you think this could be a problem with the library? I already tried several python envrionments and got the same errors.
Library used is d3rlpy-2.4.0.

@bzeni1 bzeni1 added the bug Something isn't working label May 1, 2024
@takuseno
Copy link
Owner

takuseno commented May 5, 2024

@bzeni1 Hi, could you share the minimal example that I can reproduce your issue? It sounds like your code is simply incorrect.

btw, when you instantiate algorithms, you need to do as follows:

ddpg = d3rlpy.algos.DDPGConfig().create()

@bzeni1
Copy link
Author

bzeni1 commented May 5, 2024

@takuseno Hi, find my code below. What could be the problem? Thank you in advance for your assistance on this matter.

processed_data = race_data.copy()
settings_columns = [ #22 selected coloumns from my dataset ]
processed_data['reward'] = processed_data['ACCELERATION_m_s2']

for col in settings_columns:
    processed_data[f'state_{col}'] = processed_data[col]
    

for col in settings_columns:
    processed_data[f'action_{col}'] = processed_data[col].diff().fillna(0)


for col in settings_columns:
    processed_data[f'next_{col}'] = processed_data[col].shift(-1)

#end of an episode (race)
processed_data['done'] = processed_data['race_num'].diff(-1) != 0

processed_data = processed_data[processed_data['done'] == False]

print("After filtering rows:", processed_data.shape)

print("States shape:", processed_data[settings_columns].shape)
print("Actions shape:", processed_data[[f'action_{col}' for col in settings_columns]].shape)
print("Rewards shape:", processed_data['reward'].shape)
print("Next states shape:", processed_data[[f'next_{col}' for col in settings_columns]].shape)
print("Dones shape:", processed_data['done'].shape)

**Output**
States shape: (17642, 22)
Actions shape: (17642, 22)
Rewards shape: (17642,)
Next states shape: (17642, 22)
Dones shape: (17642,)

states = processed_data[[f'state_{col}' for col in settings_columns if f'state_{col}' in processed_data.columns]].to_numpy()
actions = processed_data[[col for col in processed_data.columns if col.startswith('action_')]].to_numpy()
rewards = processed_data['reward'].to_numpy()
next_states = processed_data[[col for col in processed_data.columns if col.startswith('next_')]].to_numpy()
dones = processed_data['done'].to_numpy()

#next step:

dataset = MDPDataset(states, actions, rewards, next_states, dones)

#ValueError: operands could not be broadcast together with shapes (388124,) (17642,) 

@takuseno
Copy link
Owner

takuseno commented May 5, 2024

Thanks for sharing your code. It looks like next_states is unnecessary. It needs to be as follows:

dataset = MDPDataset(states, actions, rewards, dones)

@bzeni1
Copy link
Author

bzeni1 commented May 5, 2024

Thanks for your advice. By removing next_states I am encountering a new issue:

ValueError: Either episodes or env must be provided to determine signatures. Or specify signatures directly.

However I already defined the segment by the 'done' flags, I still don't know how to determine the episodes. What do you think?

@takuseno
Copy link
Owner

takuseno commented May 5, 2024

My guess is thatdones is all zeros, thus episodes couldn't be found. You need to correctly setup dones.

@rohanblueboybaijal
Copy link

Hi

I think I am running into a similar issue. I have 2 datasets. FOr both of them all the dimensions are the same
observations: (5000, 4), actions: (5000, 2), rewards: (5000,), terminals: (5000,)

But with 1 dataset the fit function for IQL fails. Although I am getting a different error. I can see that both datasets have some terminals = 1.
Any suggestions for where an error like this might come up?

`

File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 409, in fit
    results = list(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 543, in fitter
    loss = self.update(batch)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 863, in update
    loss = self._impl.update(torch_batch, self._grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/torch_utility.py", line 365, in wrapper
    return f(self, *args, **kwargs)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 70, in update
    return self.inner_update(batch, grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
    metrics.update(self.update_critic(batch))
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
    loss = self.compute_critic_loss(batch, q_tpn)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/iql_impl.py", line 73, in compute_critic
_loss
    q_loss = self._q_func_forwarder.compute_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 256, in
 compute_error
    return compute_ensemble_q_function_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 96, in 
compute_ensemble_q_function_error
    loss = forwarder.compute_error( 
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 130, in com
pute_error
    value = self._q_func(observations, actions).q_value
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/base.py", line 35, in __call__
    return super().__call__(x, action)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 99, in forw
ard
    q_value=self._fc(self._encoder(x, action)),
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 41, in __call__
    return super().__call__(x, action)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 284, in forward
    return self._layers(x)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x6 and 5x256)

`

@takuseno
Copy link
Owner

@rohanblueboybaijal Sorry for the late response. Could you share a minimal example that I can reproduce your error?

@takuseno
Copy link
Owner

takuseno commented Jun 1, 2024

Let me close this issue since the initial question should be resolved. Feel free to open a new issue to follow up.

@takuseno takuseno closed this as completed Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants