# Estimating Emergence Risk with `emergenet.emergenet`

## Installation

In [13]:
%%capture
!pip install emergenet --upgrade

import pandas as pd
from emergenet.emergenet import Enet, predict_irat_emergence

DATA_DIR = 'data/emergenet/'

## Evaluation of Risk at a Particular Time

We demonstrate the usage of `emergenet.emergenet` on an IRAT-analyzed sequence, **A/Indiana/08/2011**. IRAT analyzed the risk in December 2012, so we will do the same.

In [14]:
# Load IRAT sequence - A/Indiana/08/2011
irat_df = pd.read_csv(DATA_DIR+'irat.csv')
row = irat_df.iloc[20]
print(row)

Influenza Virus                                                   A/Indiana/08/2011
Virus Type                                                                     H3N2
Date of Risk Assessment                                                  2012-12-01
Risk Score Category                                                        Moderate
Emergence Score                                                                 6.0
Impact Score                                                                    4.5
Mean Low Acceptable Emergence                                                  -1.0
Mean High Acceptable Emergence                                                 -1.0
Mean Low Acceptable Impact                                                     -1.0
Mean High Acceptable Impact                                                    -1.0
HA Sequence                       MKTIIAFSCILCLIFAQKLPGSDNSMATLCLGHHAVPNGTLVKTIT...
NA Sequence                       MNPNQKIITIGSVSLIIATICFLMQIAILVTTVTLHFKQHDY

The `Enet` model requires the analysis date in format YYYY-MM-DD, the HA sequence, and the NA sequence. It will train multiple Emergenet models, so it will take some time.

**As of 2024-04-01, Emergenet only supports sequences from 2010-01-01 to 2024-01-01.**

Optionally, you can provide a `save_data` directory, which saves trained Emergenet models, the data used to train those models, and the risk results.

In [15]:
%%time
# We need the analysis date, and HA and NA sequences
# Optionally, we can proved a save_data directory
analysis_date = row['Date of Risk Assessment']
ha_seq = row['HA Sequence']
na_seq = row['NA Sequence']
SAVE_DIR = 'data/emergenet/example_results/'

# Initialize the Enet
enet = Enet(analysis_date=analysis_date, 
            ha_seq=ha_seq, 
            na_seq=na_seq, 
            save_data=SAVE_DIR, 
            random_state=42)

# Estimate the Enet risk scores
ha_risk, na_risk = enet.risk()
print(f'\nHA Risk: {ha_risk:.6f}')
print(f'NA Risk: {na_risk:.6f}\n')

# Map the Enet risk scores to the IRAT risk scale
irat, irat_low, irat_high = predict_irat_emergence(ha_risk=ha_risk, 
                                                   na_risk=na_risk)
print(f'IRAT Emergence Score Prediction: {irat:.2f}')
print(f'IRAT Low Emergence Score Prediction: {irat_low:.2f}')
print(f'IRAT High Emergence Score Prediction: {irat_high:.2f}\n')

Number of human sequences: 6416
Counter({'H3': 2389, 'H1': 800, 'H5': 18, 'H7': 1})
Counter({'N2': 2392, 'N1': 815, 'N3': 1})

HA Risk: 0.000055
NA Risk: 0.000080

IRAT Emergence Score Prediction: 7.64
IRAT Low Emergence Score Prediction: 6.10
IRAT High Emergence Score Prediction: 9.07

CPU times: user 1h 24min 25s, sys: 16.8 s, total: 1h 24min 42s
Wall time: 1h 24min 45s


## Evaluation of Risk at Present Time

**As of 2024-04-01, "present_time" = 2024-01-01.**

What if we want to evaluate the risk of **A/Indiana/08/2011** at present time? Instead of providing an analysis date, set it to `'PRESENT'`. This uses pre-trained Enet models trained as of **"present_time"**, so will only take a couple minutes.

In [16]:
%%time
# Initialize the Enet
enet_present = Enet(analysis_date='PRESENT', 
                    ha_seq=ha_seq, 
                    na_seq=na_seq, 
                    random_state=42)

# Estimate the Enet risk scores at present time (2024-01-01)
ha_risk_present, na_risk_present = enet_present.risk()
print(f'HA Risk: {ha_risk_present:.6f}')
print(f'NA Risk: {na_risk_present:.6f}\n')

# Map the Enet risk scores to the IRAT risk scale
irat_present, irat_low_present, irat_high_present = predict_irat_emergence(ha_risk=ha_risk_present, 
                                                                           na_risk=ha_risk_present)
print(f'IRAT Emergence Score Prediction: {irat_present:.2f}')
print(f'IRAT Low Emergence Score Prediction: {irat_low_present:.2f}')
print(f'IRAT High Emergence Score Prediction: {irat_high_present:.2f}\n')

HA Risk: 0.001735
NA Risk: 0.000913

IRAT Emergence Score Prediction: 6.38
IRAT Low Emergence Score Prediction: 5.02
IRAT High Emergence Score Prediction: 7.69

CPU times: user 18min 40s, sys: 1.81 s, total: 18min 42s
Wall time: 18min 42s
