# Hurdle 3 Code Walkthrough

## Introduction
All of the code discussed below resides under phase1-hurdles/hurdle3, available at:

https://github.com/SpectrumCollaborationChallenge/phase1-hurdles

Performer solutions for Hurdle 3 must act as an Apache Thrift server frontend for code that learns to play a simplified spectrum sharing game. Performer solutions must respond to RPC calls from the Test Infrastructure container to execute a predetermined number of scoring trials. Each trial will be run for 30,000 rounds, with only the last 1,000 rounds being counted for scoring.  

## The Build Script
build_hurdle_3_trusty.sh is a bash script that downloads a Ubuntu 14.04.5 64 bit LXC template and preconfigures it for use on Hurdle 3. It creates two containers, the Test Infrastructure container and the Solution container. The Solution container is a simple clone of the Test Infrastructure container. These containers will be stored in phase1-hurdles/hurdle3/lxc. Each container requires several GB of disk space, so ensure there is sufficient hard disk space before proceeding.


## Example Solution Code

### hurdle3.thrift
hurdle3.thrift is the Thrift IDL describing the RPC interface to be used for Hurdle 3. Participants should not modify this file other than to add name spaces for languages other than Python or C++. Participants choosing to implement their solution in language other than Python may use this IDL to autogenerate the Thrift interface code for their language of choice. See here for more information on generating source code in various languages using the Thrift compiler:

https://thrift.apache.org/tutorial

### Hurdle3SolutionServer.py
Hurdle3SolutionServer.py implements a Thrift server compliant with the hurdle3.thrift IDL. Participants implementing a solution in Python may use this file largely unchanged. There are two lines that participants must change to adapt Hurdle3SolutionServer.py to use their own solution. Those lines are called out in the comments in the following section of Hurdle3SolutionServer.py:

In [1]:
# Import the module for your solution
from hurdle3.RandomGuesser import RandomGuesser

class SolutionHandler:

    def __init__(self, num_states=10, seed=None):
        self.log = {}

        # Change RandomGuesser to your solution
        self.solution = RandomGuesser(num_states, seed)

    def start(self):

        prediction, next_state = self.solution.start()

        return StepResult(prediction, next_state)

    def step(self, reward, observation):

        prediction, next_state = self.solution.step(reward, observation)

        return StepResult(prediction, next_state)

    def stop(self):
        sys.exit()

## Solution RPC Requirements
As seen in the section of Hurdle3SolutionServer.py, participant solutions must implement handlers for the following RPC calls. If using the SolutionHandler class as a Thrift wrapper, participant solutions would implement the functions below. Otherwise, see hurdle3.thrift for the required RPC calls.

Please note that the solution provided by participants only needs to support the hurdle3.thrift interface. The SolutionHandler in Hurdle3SolutionServer.py is only provided as a convenience. 

---
### start
This call should initialize, or reinitialize a participant's solution to an initial state. This function will be called at the start of each trial.

#### Syntax
    
```python
prediction, next_state = start()
```
    
#### Parameters
None

#### Returns
tuple consiting of:
- **prediction:** An integer representing the participant's prediction of the next output of the DARPA state machine
- **next_state:** An integer representing the participant's next output. 

---
### step
This call triggers the next round of the spectrum sharing game. Participants are given the results of the last round of the game and are prompted to provide their answers for the next round. 

#### Syntax
    
```python
prediction, next_state = step(reward, observation)
```
    
#### Parameters
- **reward:** An integer representing the reward accrued by the participant in the previous round
- **observation:** The observed output from the DARPA state machine in the previous round

#### Returns
tuple consiting of:
- **prediction:** An integer representing the participant's prediction of the next output of the DARPA state machine
- **next_state:** An integer representing the participant's next output. 
---


## Test Infrastructure Code

### hurdle3_rpc
All code in the hurdle3_rpc subdirectory was autogenerated by running: 

```console
thrift --gen py -out ./ hurdle3.thrift
```

### hurdle3.ProbabilisticStateMachine.py
This is the DARPA state machine all participants will be playing against in the simplified spectrum sharing game. It supports being given specific initial states and random number generator seeds to support deterministic, repeatable testing. 

### hurdle3.RandomGuesser.py
This is a very simple version of a solution that can play against the DARPA Probabilistic State machine in the simplified spectrum sharing game and is included below. This solution guesses the outputs for each round purely at random.

The num_states parameter is included to allow participants to begin development with smaller numbers of states and eventually grow to the full number of states used for final hurdle evaluation. 

The seed parameter is provided to allow any random number generators to be initialized in order to support repeatable testing. 

Participants are not required to implement these parameters. Any solution that conforms to the hurdle3.thrift IDL will be able to be evaluated.

In [2]:
class RandomGuesser(object):
    '''
    Class for purely random guesser to check out mechanics
    '''
    def __init__(self, num_states, seed=None):
        # set up a dedicated random number generator
        # for this object to guarantee repeatability
        # if a seed is specified
        
        self._seed = seed

        self._num_states = num_states

    def start(self):
        '''
        Run first iteration as a special case.
        Note that start() should initialize the solution for
        the beginning of a test, even if called in the middle of a 
        test, in order to support running multiple consecutive trials.
        '''
        self._rng = np.random.RandomState(self._seed)

        predicted_output = self._rng.randint(self._num_states)
        my_next_output = self._rng.randint(self._num_states)

        return predicted_output, my_next_output

    def step(self, reward, observation):
        '''
        Given the observation, generate the next probabilistic action
        '''

        predicted_output = self._rng.randint(self._num_states)
        my_next_output = self._rng.randint(self._num_states)

        return predicted_output, my_next_output

### Hurdle3Scoring.py
This is the top level script that will run on the Test Infrastructure container. 

It will run a predefined number of trials, each for a predefined number of rounds, and score the results. 

Both the participant solution and the ProbabilisticStateMachine are reinitialized between trials. 

Note that the participant solution should not drop connections to its Thrift server in between trials. The participant solution should only shut down its Thrift server in response to receiving a **stop()** RPC call. 


#### Trial function call sequence

---
#### Initialization
At the start of each trial, Hurdle3Scoring.py will send a **start()** call to initialize the participant solution. 

Hurdle3Scoring.py will also call **start()** on the DARPA state machine. 

---
#### Computing the Score
Hurdle3Scoring.py will score each round using:

In [3]:
def compute_score(action, state_prediction, state_actual,
                  no_collision_reward=1,
                  collision_penalty=-12,
                  prediction_success_reward=3):
    score = 0

    if action == state_actual:
        score += collision_penalty
    else:
        score += no_collision_reward

    if state_prediction == state_actual:
        score += prediction_success_reward

    return score

The score derived by **compute_score()** and the initial ouptput of the DARPA state machine will be passed to the participant solution as the parameters of the next **step()** RPC call.

---
#### Training Period
The rounds from the start of the trial up until the final 1000 rounds are considered the training period for the participant's solution. 

For all rounds following the first round of a trial, Hurdle3Scoring.py will send a **step()** call to the participant solution, then send a **step()** call to the DARPA state machine, and finally run **compute_score()** to score the round. The scores for these rounds do not count toward's a participant's final score.

---
#### Scoring Period
The final 1000 rounds of a trial are run exactly the same as the rounds in the training period, except the score accrued by the participant soltution counts towards the score for the trial. 

At the end of the scoring period, the average score over the last 1000 rounds is computed. If the participant solution's average score is greater than or equal to 2.0, the trial is considered to have passed. 

At the end of the Scoring Period, the DARPA state machine and participant solutions are reinitialized for the start of the next trial. 

### Final Scoring
The pass or failure of the participant's solution is recorded over 10 trials. A participant solution is only considered to have passed Hurdle 3 if it receives a passing score for 6 out of 10 trials. 

## Items To Be Updated
All updates will be made available on:

https://github.com/SpectrumCollaborationChallenge/phase1-hurdles

Open Track teams will also be notified of updates via email. 


1. DARPA plans to provide a version of Hurdle3Scoring.py that provides more detailed scoring feedback