# TREC-simplified Relevance Matrix

In [None]:
import itertools
import pandas as pd
os.chdir('../Playground/relevance_matrix')
from trec_matrix_simplified import load_TREC_data, get_unique_topics, determine_group, prepare_data, save_file

This code serves two purposes:
1. Create the TREC-pairs TSV file (to be used for TREC-repurposed).
2. Create TREC-simplified TSV file.

#### TREC-pairs

Code strategy:

1. Load TREC data (TSV file containing topic-to-document relevance).
2. Get unique topics from the data.
3. For each unique topic, find all the possible combination of pairs of PMIDs and the relevance scores for these pairs.
4. Save the 5-column matrix containing to a TSV file.

In [None]:
# Comment out any code calling the methods
# Load TREC data 
input_path = "../Data/TREC/Relevance_Matrix/TREC.tsv"
trec_df= load_TREC_data(input_path, False)

# Get unique topics
topics = get_unique_topics(trec_df)
# Prepare possible combinations of PMID pairs and their relevance scores
output_data = prepare_data(trec_df, topics, False)

# Save the data to a TSV file
output_path = "../Data/TREC/Relevance_Matrix/trec_relevance_pairs.tsv"
save_file(output_data, output_path, False)

#### TREC-simplified

Code strategy:

1. Load TREC data (TSV file containing topic-to-document relevance).
    - Replace all the relevance scores having value 2 with value 1. 
2. Get unique topics from the data.
3. For each unique topic, find all the possible combination of pairs of PMIDs and the relevance scores for these pairs.
4. Determine group for each pair based on the combination of their relevance scores. 
    - The pair with the relevance score of (1,1) is considered as 'A'. 
    - The pair with the relevance score of (1,0) or (0,1) is considered as 'B'.
    - The pair with the relevance score of (0,0) is considered as 'C'.
5. Save the 6-column matrix containing to a TSV file.

In [None]:
# Comment out any code calling the methods
# Load TREC data
input_path = "../Data/TREC/Relevance_Matrix/TREC.tsv"
trec_df= load_TREC_data(input_path)

# Get unique topics
topics = get_unique_topics(trec_df)
# Prepare possible combinations of PMID pairs and their relevance scores
output_data = prepare_data(trec_df, topics)

# Save the data to a TSV file
output_path = "Data/TREC/Relevance_Matrix/trec_simplified_relevance_matrix.tsv"
save_file(output_data, output_path)