# Understanding Metrics
### Disclaimer
This notebook closely follows the content from **clemgame/metrics.py** in the [**clp-research/clembench**](https://github.com/clp-research/clembench) repository. <br> 
While the core content is derived from the original file, some additional detailed descriptions and explanations have been added to serve as a source of knowledge during the development of our project.
___
During the implementation of the game, the exact same names should be used to ensure functionality <br>

**`METRIC_ABORTED = "Aborted"`** 
- Record Level: Episode
- 1 or 0, depending on whether the game was aborted
- It does not include games lost

In [None]:
# Example:
if aborted:
    self.log_episode_score(ms.METRIC_ABORTED, 1)
    # Where ms refers to backends/metrics

___

**`METRIC_LOSE = "Lose"`**
- Record Level: Episode
- **0** or **1**, depending on whether the game is lost.
- Does not include Abort
- Opposite of success
- This is always **0** if the game was aborted (!=lost)

In [None]:
# Example where game is aborted (!= lost)
if aborted:
    self.log_episode_score(ms.METRIC_ABORTED, 1)
    self.log_episode_score(ms.METRIC_SUCCESS, 0)
    self.log_episode_score(ms.METRIC_LOSE, 0)

# Example where game is lost, (!=aborted|!=success)
if success_a != success_b:
                self.log_episode_score(ms.METRIC_ABORTED, 0)
                self.log_episode_score(ms.METRIC_SUCCESS, 0)
                self.log_episode_score(ms.METRIC_LOSE, 1)
                # Game-specific metrics

___

**`METRIC_SUCCESS = "Success"`**
- Episode or Turn level
- 0 or 1, depending on whether the gameplay was successful (goal reached in x amount of turns)
- Opposite of Lost 
- Always 0 if the game was aborted (!=lost|!=success)

In [None]:
# Example:
if success_a and success_b:   
    self.log_episode_score(ms.METRIC_ABORTED, 0)
    self.log_episode_score(ms.METRIC_SUCCESS, 1)
    self.log_episode_score(ms.METRIC_LOSE, 0)

___

**`BENCH_SCORE = "Main Score"`**
- The main score of the game. 
- It is a value between **0** and **100** that summarizes overall performance
- Episode level.

In [None]:
# Example:
if success_a and success_b:   
    self.log_episode_score(ms.METRIC_ABORTED, 0)
    self.log_episode_score(ms.METRIC_SUCCESS, 1)
    self.log_episode_score(ms.METRIC_LOSE, 0)
    # Game-specific metrics
    self.log_episode_score(ms.BENCH_SCORE, 100)
    self.log_episode_score("Player Score", 100)
else:
    self.log_episode_score(ms.BENCH_SCORE, 0)
    self.log_episode_score("Player Score", 0)

___

#### API Requests
**`METRIC_REQUEST_COUNT = "Request Count"`**
- How many requests were made to the API
- Can be Episode or Turn Level

**`METRIC_REQUEST_COUNT_PARSED = "Parsed Request Count"`**
- How many calls were made during the whole game play that were actually successfully parsed
- Episode or optionally turn level

**`METRIC_REQUEST_COUNT_VIOLATED = "Violated Request Count"`**
- How many requests were made during the whole gameplay that could not be parsed.
- Episode or optionally turn level

**`METRIC_REQUEST_SUCCESS = "Request Success Ratio"`**
- **METRIC_REQUEST_COUNT_PARSED/METRIC_REQUEST_COUNT**
- Episode or optionally turn level

In [None]:
# Example:
def score_requests(self, episode_interactions: Dict):
    # logging total request count, parsed, violated, and success ratio of parsed requests over all requests
    request_count = episode_interactions[
        ms.METRIC_REQUEST_COUNT]  # could also be calculated by adding parsed and violated requests
    parsed_requests = episode_interactions[ms.METRIC_REQUEST_COUNT_PARSED]
    violated_requests = episode_interactions[ms.METRIC_REQUEST_COUNT_VIOLATED]

    self.log_episode_score(ms.METRIC_REQUEST_COUNT, request_count)
    self.log_episode_score(ms.METRIC_REQUEST_COUNT_PARSED, parsed_requests)
    self.log_episode_score(ms.METRIC_REQUEST_COUNT_VIOLATED, violated_requests)
    self.log_episode_score(ms.METRIC_REQUEST_SUCCESS, parsed_requests / request_count)

___

# Summary
- Try to use all of these Metrics when Implementing our game
- Some of these will have to be fine-tuned, changed in our version
- Potentially necessary: adding new metrics.