# 2024 COMP90042 Project

# Readme

#### **Document Retreival using BM25 Model**: 

In this notebook, we train our custom implementation of a BM25 retreiver model on a corpus consisting of all sentences from the knowledge source. Then we evaluate the performance of this trained BM25 model on both the training and validation set claims

*** **PLEASE NOTE**: The BM25 model implementation is contained in a separate python script called: `BM25.py`. We are importing the `BM25` class from this script. We also import helper function that we implemented for pre-processing/cleaning our data from the python script called `utils.py`.

In [1]:
%load_ext autoreload
%autoreload 2

# install required packages
!pip install unidecode
!python -m nltk.downloader stopwords

from utils import *
from BM25 import *
import pickle 
import numpy as np

# 1.DataSet Processing

#### Load the Claims Dataset with Knowledge Source and Clean the Text

In [2]:
# load dataset and prepare corpus
knowledge_source, train_data, val_data = load_dataset()      
print(f"Number of evidence passages: {len(knowledge_source)}")
print(f"Number of training instances: {len(train_data)}")  
print(f"Number of validation instances: {len(val_data)}")


# clean all senteneces in the dataset (this involves converting from unicode to asc-ii, removing URLS, removing repeating non-alphanumeric characters, etc. Just a bunch of thing that are most likely will not be useful for claim classification task)
cleaner = SentenceCleaner()
knowledge_source, train_data, val_data = cleaner.clean_dataset(knowledge_source, train_data, val_data)
print(f"\nNumber of evidence passages after cleaning: {len(knowledge_source)}")
print(f"Number of training instances after cleaning: {len(train_data)}")  
print(f"Number of validation instances after cleaning: {len(val_data)}")

# dictionary for mapping integer to document id 
int2docID = {i:evidence_id for i,evidence_id in enumerate(list(knowledge_source.keys()))}
claim_ids = [claim_id for claim_id in train_data.keys()]

Number of evidence passages: 1208827
Number of training instances: 1228
Number of validation instances: 154

Number of evidence passages after cleaning: 1206800
Number of training instances after cleaning: 1228
Number of validation instances after cleaning: 154


# 2. Model Implementation

#### Instantiate and train BM25 retreiver on the Knowledge Source corpus.

In [3]:
# instantiate and train a BM25 retreiver
retriever = BM25_retriever(b=0.3, k=0.5, remove_stopwords=True, apply_stemming=True) # set options for stopword removal and stemming
retriever.train(knowledge_source.values())

Tokenizing documents...


  0%|          | 0/1206800 [00:00<?, ?it/s]

100%|██████████| 1206800/1206800 [02:17<00:00, 8763.01it/s] 


Computing TF...


100%|██████████| 1206800/1206800 [00:39<00:00, 30481.31it/s]


Computing TFIDF and creating inverted index...


100%|██████████| 501827/501827 [00:30<00:00, 16725.97it/s] 


Computing TFIDF vector norms...


100%|██████████| 1206800/1206800 [00:32<00:00, 36743.99it/s]


#### Save trained BM25 retriever to file

In [3]:
# Save the trained retreiver object to the pickle file
with open("bm25_b=0.3_k=0.5.pkl", "wb") as file:
    pickle.dump(retriever, file)
    print("Saved the trained BM25 retriever object to the pickle file")

# Load the trained retriever object from the pickle file
#with open("bm25_b=0.3_k=0.5.pkl", "rb") as file:
#    retriever = pickle.load(file)


# 3.Testing and Evaluation

#### Run some document retreival tests to see what kind of evidence is being retreived for some given claims.

In [4]:
# evaluate retreival results for a given claim text
def claim_retreive_eval(claim, k=5):
    print(f"Claim: {claim['claim_text']}")
    # get top k relevant evidence passages and their scores
    topk_ev_indices, topk_scores = retriever.retrieve_docs(query=claim['claim_text'], topk=k)
    # convert indices to document ids
    topk_ev_ids = [int2docID[i] for i in topk_ev_indices]
    # get the gold evidence ids
    gold_ev_ids = claim['evidences']
    # compute precision, recall, and F1
    intersection = set(topk_ev_ids).intersection(gold_ev_ids)
    precision = len(intersection) / len(topk_ev_ids)
    recall = len(intersection) / len(gold_ev_ids)
    f1 = (2*precision*recall/(precision + recall)) if (precision + recall) > 0 else 0 

    print(f"\nGold evidence passages: {gold_ev_ids}")
    for ev in gold_ev_ids:
        print(f"{ev} --> {knowledge_source[ev]}")
    
    print(f"\nTop {k} retrieved evidence passages: {topk_ev_ids}")
    for ev in topk_ev_ids:
        print(f"{ev} --> {knowledge_source[ev]}")

    print(f"\nMatching evidence passages: {intersection}")

    
    print(f"Precision: {precision}, Recall: {recall}, F1: {f1}")

    return precision, recall, f1

In [6]:
# pick a random claim
claim = val_data['claim-2583'] #train_data[random.choice(claim_ids)]

# evaluate the retreiver for this claim
claim_retreive_eval(claim, k=10000)

Claim: "Each unit of CO2 you put into the atmosphere has less and less of a warming impact.

Gold evidence passages: ['evidence-827004', 'evidence-677057', 'evidence-1169593', 'evidence-1099186', 'evidence-178206']
evidence-827004 --> The international community began the long process towards building effective international and domestic measures to tackle GHG emissions (carbon dioxide, methane, nitrous oxide, hydroflurocarbons, perfluorocarbons, sulphur hexafluoride) in response to the increasing assertions that global warming is happening due to man-made emissions and the uncertainty over its likely consequences.
evidence-677057 --> This creates air pollution, including nitrous oxides and particulates, and is a significant contributor to global warming through emission of carbon dioxide, for which transport is the fastest-growing emission sector.
evidence-1169593 --> Still the global warming potential of the landfill gas emitted to atmosphere is significant.
evidence-1099186 --> The 

(0.0003, 0.6, 0.0005997001499250374)

#### Compute the following retreiver evaluation metrics on Validation Claims: Average Precision, Average Recall, Average F1 score

In [4]:
# retreival performance on entire validation set
k_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500, 1000, 2000, 5000]
avg_precision, avg_recall, avg_f1 = eval(val_data.values(), int2docID, retriever, k_values=k_values)

for i in range(len(k_values)):
    print(f"k = {k_values[i]} --> Average Precision: {avg_precision[i]}, Average Recall: {avg_recall[i]}, Average F1: {avg_f1[i]}")

100%|██████████| 154/154 [01:20<00:00,  1.92it/s]

k = 3 --> Average Precision: 0.08658008658008662, Average Recall: 0.10606060606060608, Average F1: 0.08718305504019791
k = 5 --> Average Precision: 0.06883116883116881, Average Recall: 0.13495670995670989, Average F1: 0.08442589156874873
k = 8 --> Average Precision: 0.05925324675324675, Average Recall: 0.17965367965367957, Average F1: 0.08372839281930185
k = 10 --> Average Precision: 0.05194805194805192, Average Recall: 0.19080086580086572, Average F1: 0.07738279036980335
k = 15 --> Average Precision: 0.04025974025974026, Average Recall: 0.21428571428571414, Average F1: 0.0651972511492635
k = 25 --> Average Precision: 0.03116883116883119, Average Recall: 0.27370129870129867, Average F1: 0.05456875727811688
k = 50 --> Average Precision: 0.02064935064935066, Average Recall: 0.36991341991342, Average F1: 0.03854835769879658
k = 100 --> Average Precision: 0.013311688311688316, Average Recall: 0.4522727272727274, Average F1: 0.025672520100214258
k = 250 --> Average Precision: 0.006883116883




#### Compute the following retreiver evaluation metrics on Training Claims: Average Precision, Average Recall, Average F1 score

In [8]:
"""# retreival performance on entire training set
k_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500, 1000, 2000, 5000]
avg_precision, avg_recall, avg_f1 = eval(train_data.values(), int2docID, retriever, k_values=k_values)

for i in range(len(k_values)):
    print(f"k = {k_values[i]} --> Average Precision: {avg_precision[i]}, Average Recall: {avg_recall[i]}, Average F1: {avg_f1[i]}")"""

'# retreival performance on entire training set\nk_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500, 1000, 2000, 5000]\navg_precision, avg_recall, avg_f1 = eval(train_data.values(), int2docID, bm25, k_values=k_values)\n\nfor i in range(len(k_values)):\n    print(f"k = {k_values[i]} --> Average Precision: {avg_precision[i]}, Average Recall: {avg_recall[i]}, Average F1: {avg_f1[i]}")'

### **Query Expansion**:

We will now try out Query Expansion techniques to see if we can further improve the performance of the BM25 retriever. We will try the following techniques:

    1) Queries are paraphrased using BackTranlsation 
    2) Queries are paraphrased using Synonym Substitution

In [4]:
from tqdm import tqdm
import pickle 
import random

import nltk
from nltk.tokenize import word_tokenize
from nltk.wsd import lesk
from nltk.corpus import stopwords

import spacy

import torch
import warnings
warnings.filterwarnings("ignore")

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM


#### Define a sentence augmenter class that generates the sentence paraphrases.

In [7]:
# give me a sentence augmentor class combining all three augmentor types that I explore earlier

class SentenceAugmentor:
    def __init__(self, syn_sub_prob=0.25, forward_temperature=1.5, backward_temperature=3.0, num_beams=5, num_return_sequences=2, syn_sub=True, backtranslation=True, named_ents = True):

        self.syn_sub_prob = syn_sub_prob   
        self.forward_temperature = forward_temperature
        self.backward_temperature = backward_temperature
        self.num_beams = num_beams
        self.num_return_sequences = num_return_sequences
        self.backtranslation = backtranslation
        self.syn_sub = syn_sub
        self.named_ents = named_ents

        if backtranslation:    
            # backtranslation models
            self.model_en_fr = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
            self.tokenizer_en_fr = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
            self.tokenizer_fr_en = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
            self.model_fr_en = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-fr-en")

        if syn_sub:
            self.stop_words = set(stopwords.words('english'))

        if named_ents:
            self.nlp = spacy.load('en_core_web_lg')

 
    def synonym_substitution(self, text_batch):
        output_batch = []
        for sentence in text_batch:
            words = word_tokenize(sentence)
            new_words = []
            for word in words:
                if word.lower() not in self.stop_words:
                    synset = lesk(words, word)
                    if synset is not None:
                        synonyms = synset.lemma_names()
                        if synonyms and random.random() < self.syn_sub_prob:
                            new_word = random.choice(synonyms)  # choose a random synonym
                            new_words.append(new_word.replace('_', ' '))
                        else:
                            new_words.append(word)
                    else:
                        new_words.append(word)
                else:
                    new_words.append(word)
            sentence_syn =  [" ".join(new_words)]
            output_batch.append(sentence_syn)
        
        return output_batch    

    def translate(self, text_batch, model, tokenizer, temperature=None, num_beams=10, num_return_sequences=3, verbose=False):
        if verbose: print(f"Translation input --> {text_batch}")
        # Prepare the text input
        #inputs = tokenizer.encode(text, return_tensors="pt")
        inputs = tokenizer(text_batch, return_tensors="pt", padding=True, truncation=True)['input_ids']
        # Generate the translated text
        if temperature is None:
            # Always choosing the most probable word
            outputs = model.generate(inputs, max_length=128, do_sample=False, num_beams=num_beams, num_return_sequences=num_return_sequences, early_stopping=True)
        else:    
            # Using temperature-based sampling
            outputs = model.generate(inputs, max_length=128, do_sample=True, temperature=temperature, num_beams=num_beams, num_return_sequences=num_return_sequences, early_stopping=True)

        #translated_texts =  tokenizer.decode(outputs, skip_special_tokens=True)
        translated_texts =  tokenizer.batch_decode (outputs, skip_special_tokens=True)

        if verbose: 
            for i, t in enumerate(translated_texts):
                print(f"\tBeam {i+1} --> {t}") 
        
        return translated_texts


    def back_translate(self, text_batch, verbose=False):
        if verbose: print(f"Original sentence --> {text_batch}")

        # Translate the text to French
        if verbose: print(f"Translating to french...")
        french_texts = self.translate(text_batch, self.model_en_fr, self.tokenizer_en_fr, temperature=self.forward_temperature, num_beams=self.num_beams, num_return_sequences=self.num_return_sequences, verbose=verbose)

        if verbose: print(f"Translating to english...")
        backtranslation_output = self.translate(french_texts, self.model_fr_en, self.tokenizer_fr_en, temperature=self.backward_temperature, num_beams=self.num_beams, num_return_sequences=self.num_return_sequences, verbose=verbose)

        # reshape sentence list to nested list
        backtranslation_output = [backtranslation_output[i:i+self.num_return_sequences*self.num_return_sequences] for i in range(0, len(backtranslation_output), self.num_return_sequences*self.num_return_sequences)]

        return backtranslation_output


    def append_named_ents(self, sentences):
        sentences_with_ent = []
        for s in sentences:
            doc = self.nlp(s)
            named_entities = [ent.text for ent in doc.ents]
            # append entities to sentence
            sentences_with_ent.append([s + " " + " ".join(named_entities)])    
        
        return sentences_with_ent


    def augment(self, sentences):

        if self.named_ents:
            output_ents = self.append_named_ents(sentences)

        if self.syn_sub:   
            output_syn = self.synonym_substitution(sentences)
        
        # combine augmented sentences per batch item
        if self.backtranslation:
            output_backtranslation = self.back_translate(sentences)

        augmented_sentences = []
        for i in range(len(sentences)):
            aug_s = []
            if self.named_ents:
                aug_s += output_ents[i]
            if self.syn_sub:
                aug_s += output_syn[i]
            if self.backtranslation:
                aug_s += output_backtranslation[i]    
            augmented_sentences.append(aug_s)
        
        return augmented_sentences


def augmented_eval(claims_dataset, bm25, augmenter=None, k_values=[10], keep_query=True):
    
    k_values = sorted(k_values)
    precision = np.zeros(len(k_values))
    recall = np.zeros(len(k_values))
    f1 = np.zeros(len(k_values))

    for claim in tqdm(claims_dataset.values(), total=len(claims_dataset)):
        query = claim["claim_text"]
        gold_evidence_list = claim["evidences"]

        if augmenter is not None:
            # expand the query
            expanded_query = augmenter.augment([query])[0]
            #print(f"Expanded query: {expanded_query}")
            # clean the expanded queries
            for i in range(len(expanded_query)):
                expanded_query[i] = cleaner.clean(expanded_query[i])
            if keep_query:
                expanded_query.append(query)
                
        else:
            expanded_query = [query]


        # retrieve bm25 topk documents for each expanded query and combine them
        topk_ev_indices = []
        for q in expanded_query:
            topk_ev_idx, topk_scores = bm25.retrieve_docs(q, topk=max(k_values))
            topk_ev_indices.append(topk_ev_idx)
        
        # Interlacing selection
        interlaced_indices = [idx for sublist in zip(*topk_ev_indices) for idx in sublist]
        # remove duplicates
        interlaced_indices = list(dict.fromkeys(interlaced_indices))
        # Truncate to max k_values
        interlaced_indices = interlaced_indices[:max(k_values)]
        topk_ev_indices = interlaced_indices

        #else:
        #    topk_ev_indices, best_scores = bm25.retrieve_docs(query, topk=max(k_values))

        if topk_ev_indices == []:
            continue
        # convert document indices to document id strings
        topk_ev_ids_bm25 = [int2docID[i] for i in topk_ev_indices]

        for i, k in enumerate(k_values):
            # (precision, recall, F1) for bm25
            intersection = set(topk_ev_ids_bm25[:k]).intersection(gold_evidence_list)
            p = len(intersection) / len(topk_ev_ids_bm25[:k])
            r = len(intersection) / len(gold_evidence_list)
            precision[i] += p
            recall[i] += r
            f1[i] += (2*p*r/(p + r)) if (p+r) > 0 else 0

    # average over all claims
    precision = precision / len(claims_dataset) 
    recall = recall / len(claims_dataset)
    f1 = f1 / len(claims_dataset)

    # convert to dictionary
    metrics = {k:{"precision":precision[i], "recall":recall[i], "f1":f1[i]} for i,k in enumerate(k_values)}

    return metrics

#### Try out some different hyperparameter combinations of the sentence augmenter and evaluate BM25 performance on Validation Claims.

best setting found --> `SentenceAugmentor(forward_temperature=1.5, backward_temperature=3.0, num_beams=5, num_return_sequences=2, syn_sub=False, backtranslation=True, backward_temperature=2.0)`

Best setting only gives ~1% increase in average F1 score. Not impressive.

In [11]:
# create augmenter
augmenter = SentenceAugmentor(num_return_sequences=2, syn_sub=False, backtranslation=True, backward_temperature=2.0)

k_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500]

metrics = augmented_eval(val_data, retriever, augmenter, k_values)

for k, k_metrics in metrics.items():
    print(f"k = {k} --> P = {k_metrics['precision']}, R = {k_metrics['recall']}, F = {k_metrics['f1']}")


100%|██████████| 154/154 [05:58<00:00,  2.33s/it]

k = 3 --> P = 0.09307359307359314, R = 0.11114718614718615, F = 0.09301175015460733
k = 5 --> P = 0.0701298701298701, R = 0.13051948051948048, F = 0.0848072562358277
k = 8 --> P = 0.05194805194805195, R = 0.15573593073593064, F = 0.07312939585666856
k = 10 --> P = 0.04675324675324672, R = 0.16893939393939383, F = 0.06931856022765115
k = 15 --> P = 0.03896103896103896, R = 0.2066017316017315, F = 0.06292642256264545
k = 25 --> P = 0.027792207792207813, R = 0.24329004329004317, F = 0.04855350815941949
k = 50 --> P = 0.019610389610389627, R = 0.34502164502164506, F = 0.03660849048297336
k = 100 --> P = 0.012402597402597412, R = 0.42543290043290044, F = 0.023923965426800316
k = 250 --> P = 0.006753246753246758, R = 0.564935064935065, F = 0.013306263073692566
k = 500 --> P = 0.003922077922077925, R = 0.6385281385281385, F = 0.007784512524778234





In [10]:
# create augmenter
augmenter = SentenceAugmentor(num_beams=10, num_return_sequences=3, syn_sub=False, backtranslation=True, backward_temperature=2.0)

k_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500]

metrics = augmented_eval(val_data, retriever, augmenter, k_values)

for k, k_metrics in metrics.items():
    print(f"k = {k} --> P = {k_metrics['precision']}, R = {k_metrics['recall']}, F = {k_metrics['f1']}")


100%|██████████| 154/154 [10:38<00:00,  4.15s/it]

k = 3 --> P = 0.08874458874458882, R = 0.09924242424242426, F = 0.08695114409400127
k = 5 --> P = 0.06753246753246751, R = 0.12987012987012986, F = 0.08276643990929708
k = 8 --> P = 0.05438311688311688, R = 0.1651515151515151, F = 0.07694224966952237
k = 10 --> P = 0.04675324675324672, R = 0.1742424242424242, F = 0.06989655366278744
k = 15 --> P = 0.03852813852813854, R = 0.2103896103896103, F = 0.06261258091753444
k = 25 --> P = 0.02909090909090911, R = 0.25454545454545446, F = 0.05084513187961464
k = 50 --> P = 0.019610389610389627, R = 0.34134199134199134, F = 0.03658141000992376
k = 100 --> P = 0.01266233766233767, R = 0.4386363636363636, F = 0.024430835320934874
k = 250 --> P = 0.006987012987012992, R = 0.5676406926406926, F = 0.013763003422246652
k = 500 --> P = 0.004038961038961042, R = 0.6467532467532469, F = 0.008015963424980014





In [103]:
# create augmenter
augmenter = SentenceAugmentor(num_beams=20, num_return_sequences=3, syn_sub=False, backtranslation=True, backward_temperature=2.0)

k_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500]

metrics = augmented_eval(val_data, retriever, augmenter, k_values)

for k, k_metrics in metrics.items():
    print(f"k = {k} --> P = {k_metrics['precision']}, R = {k_metrics['recall']}, F = {k_metrics['f1']}")


100%|██████████| 154/154 [12:15<00:00,  4.77s/it]

k = 3 --> P = 0.08874458874458879, R = 0.10757575757575757, F = 0.08888373531230677
k = 5 --> P = 0.06623376623376621, R = 0.12380952380952376, F = 0.08002989074417648
k = 8 --> P = 0.060876623376623376, R = 0.17900432900432892, F = 0.0855820946730037
k = 10 --> P = 0.053246753246753195, R = 0.19696969696969685, F = 0.07948544961531971
k = 15 --> P = 0.04112554112554112, R = 0.22196969696969684, F = 0.06659440490709836
k = 25 --> P = 0.03090909090909093, R = 0.2649350649350649, F = 0.053949954442565284
k = 50 --> P = 0.02103896103896105, R = 0.3724025974025974, F = 0.03924765522485226
k = 100 --> P = 0.013506493506493513, R = 0.46049783549783563, F = 0.02604362657064405
k = 250 --> P = 0.007116883116883122, R = 0.5852813852813852, F = 0.014019277174507969
k = 500 --> P = 0.0040779220779220806, R = 0.6596320346320348, F = 0.008093371163300037





In [9]:
# create augmenter
augmenter = SentenceAugmentor(num_beams=10, num_return_sequences=3, syn_sub=False, backtranslation=True, backward_temperature=2.0, named_ents=True)

k_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500]

metrics = augmented_eval(val_data, retriever, augmenter, k_values, keep_query=False)

for k, k_metrics in metrics.items():
    print(f"k = {k} --> P = {k_metrics['precision']}, R = {k_metrics['recall']}, F = {k_metrics['f1']}")

100%|██████████| 154/154 [09:58<00:00,  3.88s/it]

k = 3 --> P = 0.08658008658008663, R = 0.09512987012987016, F = 0.08381261595547312
k = 5 --> P = 0.06753246753246751, R = 0.12673160173160167, F = 0.08174087816944962
k = 8 --> P = 0.05194805194805195, R = 0.15216450216450209, F = 0.07290184562911833
k = 10 --> P = 0.04805194805194802, R = 0.17056277056277047, F = 0.07123006863266605
k = 15 --> P = 0.03766233766233768, R = 0.201082251082251, F = 0.061011039184413765
k = 25 --> P = 0.02935064935064937, R = 0.2515151515151514, F = 0.0512423537054079
k = 50 --> P = 0.020259740259740273, R = 0.36309523809523814, F = 0.037808871024490015
k = 100 --> P = 0.012337662337662345, R = 0.42500000000000004, F = 0.02379435831177287
k = 250 --> P = 0.0068311688311688355, R = 0.5630952380952381, F = 0.013458037742119048
k = 500 --> P = 0.003974025974025977, R = 0.6423160173160174, F = 0.00788727749637498





In [74]:
k_values = [3, 5, 8, 10, 15, 25, 50, 100, 250, 500]
metrics = augmented_eval(val_data, retriever, None, k_values)

for k, k_metrics in metrics.items():
    print(f"k = {k} --> P = {k_metrics['precision']}, R = {k_metrics['recall']}, F = {k_metrics['f1']}")

  0%|          | 0/154 [00:00<?, ?it/s]

100%|██████████| 154/154 [00:35<00:00,  4.36it/s]

k = 3 --> P = 0.08658008658008662, R = 0.10660173160173161, F = 0.0874922696351268
k = 5 --> P = 0.0701298701298701, R = 0.13571428571428565, F = 0.08554421768707485
k = 8 --> P = 0.05925324675324675, R = 0.17878787878787872, F = 0.08354675627402894
k = 10 --> P = 0.05194805194805191, R = 0.1908008658008657, F = 0.07738279036980335
k = 15 --> P = 0.04025974025974026, R = 0.21536796536796526, F = 0.06523969236817531
k = 25 --> P = 0.03194805194805197, R = 0.27846320346320347, F = 0.05589838003631106
k = 50 --> P = 0.020909090909090922, R = 0.3703463203463204, F = 0.03902042792164877
k = 100 --> P = 0.012922077922077932, R = 0.43636363636363645, F = 0.024914664510425892
k = 250 --> P = 0.006779220779220784, R = 0.5534632034632035, F = 0.013353756793204954
k = 500 --> P = 0.0039090909090909115, R = 0.632900432900433, F = 0.0077583354221456464



