Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action spaces where actions are tuples #172

Closed
Jogima-cyber opened this issue Jul 31, 2020 · 8 comments
Closed

Action spaces where actions are tuples #172

Jogima-cyber opened this issue Jul 31, 2020 · 8 comments
Labels
question Further information is requested

Comments

@Jogima-cyber
Copy link

Jogima-cyber commented Jul 31, 2020

Hi there, does Tianshou support action spaces where actions are tuples ? I've in mind some hacks that could do the trick, but I'd first like to know if there is already some smooth integration ? (finite action space)

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 31, 2020

The best way is to convert the tuple to a dictionary, say you have the action tuple (as, ap, ar), you can add a wrapper to convert it to {'as': as, 'ap': ap, 'ar': ar}. And in policy.learn(), you can simply extract the batched-action with batch.act.as or something else.
The related discussion is in #147 and here. Further questions are welcomed.

@Trinkle23897 Trinkle23897 added the question Further information is requested label Jul 31, 2020
@Jogima-cyber
Copy link
Author

Thanks a lot for your quick response ! I was thinking of maybe an easier way to hack around this : I just keep an index between my real tuple actions and the actions I give to the policy/neural net. I need the real tuple action for reward calculation in my env, so my custom env would receive an int action, and I would simply take the real corresponding action in the list. I think that should work. What do you think ?

@Trinkle23897
Copy link
Collaborator

Could you give an example code snippet? Because your description is not so intuitive.

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 31, 2020

Okay, I think I get your point. Your solution is okay but my proposal can be generalized to the continuous action space (infinite action), for example, using A2C to compute multiple actions in a tuple of continuous action space.

@Jogima-cyber
Copy link
Author

Jogima-cyber commented Jul 31, 2020

Oh, sorry, yes of course. In my custom env I define all the possible actions using this function :

def process_actions(self):
        M = 5
        N = 5
        itemNUM = (M + 1) + N - 1
        pointer = 0
        seq = np.arange(itemNUM)
        actions = []
        
        for c in combinations(seq, M):
            action = np.zeros(M+1)
            for i in range(len(c)-1):
                action[i+1] = c[i+1] - c[i] - 1
            action[0] = c[0]
            action[M] = itemNUM - c[M-1] - 1
            for i in range(M+1):
                action[i] = action[i]/N
            actions.append(action)
        self.actions = np.array(actions)

And I set action_shape equal to (len(actions)).
So the policy and all other modules call my custom step function in my env with an int as selected action (let's call this variable action_index : env.step(... action=action_index)) between 1 and len(actions).
Then my step function computes the reward using actions[action_index] as an action.

Tell me if I'm clear enough, otherwise I could provide more code snippets.

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 31, 2020

What is combinations

@Jogima-cyber
Copy link
Author

Sorry I forget :

from itertools import combinations

@Trinkle23897
Copy link
Collaborator

Yes, it should definetly work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants