Action spaces where actions are tuples #172

Jogima-cyber · 2020-07-31T00:38:52Z

Hi there, does Tianshou support action spaces where actions are tuples ? I've in mind some hacks that could do the trick, but I'd first like to know if there is already some smooth integration ? (finite action space)

Trinkle23897 · 2020-07-31T00:44:46Z

The best way is to convert the tuple to a dictionary, say you have the action tuple (as, ap, ar), you can add a wrapper to convert it to {'as': as, 'ap': ap, 'ar': ar}. And in policy.learn(), you can simply extract the batched-action with batch.act.as or something else.
The related discussion is in #147 and here. Further questions are welcomed.

Jogima-cyber · 2020-07-31T12:36:01Z

Thanks a lot for your quick response ! I was thinking of maybe an easier way to hack around this : I just keep an index between my real tuple actions and the actions I give to the policy/neural net. I need the real tuple action for reward calculation in my env, so my custom env would receive an int action, and I would simply take the real corresponding action in the list. I think that should work. What do you think ?

Trinkle23897 · 2020-07-31T12:39:54Z

Could you give an example code snippet? Because your description is not so intuitive.

Trinkle23897 · 2020-07-31T12:45:11Z

Okay, I think I get your point. Your solution is okay but my proposal can be generalized to the continuous action space (infinite action), for example, using A2C to compute multiple actions in a tuple of continuous action space.

Jogima-cyber · 2020-07-31T12:50:29Z

Oh, sorry, yes of course. In my custom env I define all the possible actions using this function :

def process_actions(self):
        M = 5
        N = 5
        itemNUM = (M + 1) + N - 1
        pointer = 0
        seq = np.arange(itemNUM)
        actions = []
        
        for c in combinations(seq, M):
            action = np.zeros(M+1)
            for i in range(len(c)-1):
                action[i+1] = c[i+1] - c[i] - 1
            action[0] = c[0]
            action[M] = itemNUM - c[M-1] - 1
            for i in range(M+1):
                action[i] = action[i]/N
            actions.append(action)
        self.actions = np.array(actions)

And I set action_shape equal to (len(actions)).
So the policy and all other modules call my custom step function in my env with an int as selected action (let's call this variable action_index : env.step(... action=action_index)) between 1 and len(actions).
Then my step function computes the reward using actions[action_index] as an action.

Tell me if I'm clear enough, otherwise I could provide more code snippets.

Trinkle23897 · 2020-07-31T12:52:51Z

What is combinations

Jogima-cyber · 2020-07-31T12:54:16Z

Sorry I forget :

from itertools import combinations

Trinkle23897 · 2020-07-31T12:54:17Z

Yes, it should definetly work.

Trinkle23897 added the question Further information is requested label Jul 31, 2020

Jogima-cyber closed this as completed Jul 31, 2020

Trinkle23897 mentioned this issue Aug 28, 2020

Automatically batch tuple/dict obs #38

Closed

8 tasks

Trinkle23897 mentioned this issue Oct 14, 2020

action-observation pairing in RNN-style training #241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action spaces where actions are tuples #172

Action spaces where actions are tuples #172

Jogima-cyber commented Jul 31, 2020 •

edited

Loading

Trinkle23897 commented Jul 31, 2020 •

edited

Loading

Jogima-cyber commented Jul 31, 2020

Trinkle23897 commented Jul 31, 2020

Trinkle23897 commented Jul 31, 2020 •

edited

Loading

Jogima-cyber commented Jul 31, 2020 •

edited

Loading

Trinkle23897 commented Jul 31, 2020 •

edited

Loading

Jogima-cyber commented Jul 31, 2020

Trinkle23897 commented Jul 31, 2020

Action spaces where actions are tuples #172

Action spaces where actions are tuples #172

Comments

Jogima-cyber commented Jul 31, 2020 • edited Loading

Trinkle23897 commented Jul 31, 2020 • edited Loading

Jogima-cyber commented Jul 31, 2020

Trinkle23897 commented Jul 31, 2020

Trinkle23897 commented Jul 31, 2020 • edited Loading

Jogima-cyber commented Jul 31, 2020 • edited Loading

Trinkle23897 commented Jul 31, 2020 • edited Loading

Jogima-cyber commented Jul 31, 2020

Trinkle23897 commented Jul 31, 2020

Jogima-cyber commented Jul 31, 2020 •

edited

Loading

Trinkle23897 commented Jul 31, 2020 •

edited

Loading

Trinkle23897 commented Jul 31, 2020 •

edited

Loading

Jogima-cyber commented Jul 31, 2020 •

edited

Loading

Trinkle23897 commented Jul 31, 2020 •

edited

Loading