<a href="https://colab.research.google.com/github/winterForestStump/thesis/blob/main/notebooks/rag_x_phi3_financebenchQA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture --no-stderr
%pip install langchain-nomic langchain langchain-core langchain-community chromadb --quiet
%pip install sentence_transformers FlagEmbedding --quiet

In [2]:
# LlamaCpp x GPU usage
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.78.tar.gz (50.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 MB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.78-cp310-cp310-linux_x86_64.whl size=169130738 sha256=0383155206e400bb73b9cf8a0d3ccbb216551fc68e7fed3eb1d1ae679991ec1c
  Stored in direct

In [3]:
# Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
from langchain_community.llms import LlamaCpp
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_core.prompts import PromptTemplate

import chromadb
from langchain.storage.file_system import LocalFileStore
from langchain.storage._lc_store import create_kv_docstore
from langchain.vectorstores import Chroma

from FlagEmbedding import FlagReranker

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.output_parsers import StrOutputParser

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.retrievers import ParentDocumentRetriever

from tqdm import tqdm
import pandas as pd
import os

In [5]:
!huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-fp16.gguf --local-dir ./models --local-dir-use-symlinks False

Downloading 'Phi-3-mini-4k-instruct-fp16.gguf' to 'models/.huggingface/download/Phi-3-mini-4k-instruct-fp16.gguf.5d99003e395775659b0dde3f941d88ff378b2837a8dc3a2ea94222ab1420fad3.incomplete'
Phi-3-mini-4k-instruct-fp16.gguf: 100% 7.64G/7.64G [05:17<00:00, 24.1MB/s]
Download complete. Moving file to models/Phi-3-mini-4k-instruct-fp16.gguf
models/Phi-3-mini-4k-instruct-fp16.gguf


In [6]:
TEMP = 0
N_CTX = 4096
N_GPU_L = -1

llm_phi3 = LlamaCpp(
    model_path="/content/models/Phi-3-mini-4k-instruct-fp16.gguf",
    temperature=TEMP,
    n_ctx=N_CTX,
    n_gpu_layers = N_GPU_L,
    verbose=True
)

llama_model_loader: loaded meta data with 23 key-value pairs and 195 tensors from /content/models/Phi-3-mini-4k-instruct-fp16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi3
llama_model_loader: - kv   1:                               general.name str              = Phi3
llama_model_loader: - kv   2:                        phi3.context_length u32              = 4096
llama_model_loader: - kv   3:                      phi3.embedding_length u32              = 3072
llama_model_loader: - kv   4:                   phi3.feed_forward_length u32              = 8192
llama_model_loader: - kv   5:                           phi3.block_count u32              = 32
llama_model_loader: - kv   6:                  phi3.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi3.attention.head_count

In [7]:
questions = pd.read_json('https://raw.githubusercontent.com/patronus-ai/financebench/main/data/financebench_open_source.jsonl', lines=True)
questions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   financebench_id       150 non-null    object
 1   company               150 non-null    object
 2   doc_name              150 non-null    object
 3   question_type         150 non-null    object
 4   question_reasoning    100 non-null    object
 5   domain_question_num   50 non-null     object
 6   question              150 non-null    object
 7   answer                150 non-null    object
 8   justification         100 non-null    object
 9   dataset_subset_label  150 non-null    object
 10  evidence              150 non-null    object
dtypes: object(11)
memory usage: 13.0+ KB


In [8]:
model_name = "BAAI/bge-small-en-v1.5"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

bge_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs={'device': 'cuda'}, #gpu
    encode_kwargs=encode_kwargs
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [9]:
reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/801 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

In [19]:
### Metadata company name
prompt_metadata = PromptTemplate(
template="""
  <|assistant|> You need to identify which of the companies from the metadata list is mentioned in the users input.
  Database metadata with company names: {metadata_list}.
  Format your response as a JSON object with only a single key 'company', without any additional commentary or explanations.
  You do not need to try to answer the question itself.
  <|user|>Users input: {input}<|end|>
  <|assistant|>
""",
input_variables=["input", "metadata_list"])

retrieval_metadata = prompt_metadata | llm_phi3 | JsonOutputParser()

In [20]:
persistent_client = chromadb.PersistentClient('/content/drive/MyDrive/Thesis/chromadb')
collection = persistent_client.get_or_create_collection("reports_l2")
fs = LocalFileStore('/content/drive/MyDrive/Thesis/reports_store_location')
store = create_kv_docstore(fs)
vectorstore = Chroma(client = persistent_client,
                     collection_name="reports_l2",
                     embedding_function=bge_embeddings,
                     persist_directory='/content/drive/MyDrive/Thesis/chromadb')
vectorstore.persist()

In [21]:
metadata = vectorstore.get()['metadatas']
metadata_list = []
for i in range(len(metadata)):
  metadata_list.append(metadata[i]['company'])
metadata_list = list(set(metadata_list))

In [13]:
### Retrieval Grader
llm_retrieval = llm_phi3

prompt_retrieval_grader = PromptTemplate(
    template="""<|assistant|> You are a grader assessing relevance of a retrieved document to an evidence text.
    If the retrieved document contains the same information as an evidence text, grade it as relevant. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the evidence text.<|end|>
    <|user|> Here is the retrieved document: {document}\n Here is the evidence text: {evidence_text} <|end|>
    <|assistant|>
    """,
    input_variables=["evidence_text", "document"],
)

retrieval_grader = prompt_retrieval_grader | llm_retrieval | StrOutputParser()

In [14]:
### Generate
llm_generate = llm_phi3

prompt_generate = PromptTemplate(
    template="""<|assistant|> You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question.
    If you don't know the answer, just say that you don't know. Keep the answer concise <|end|>
    <|user|> Question: {question}. \n Context: {documents} \n Answer: <|end|>
    <|assistant|>""",
    input_variables=["question", "documents"],
)

rag_chain = prompt_generate | llm_generate | StrOutputParser()

In [15]:
### Hallucination Grader
llm_hallucination_grader = llm_phi3

# Prompt
prompt_hallucination_grader = PromptTemplate(
    template=""" <|assistant|> You are a grader assessing whether an answer is grounded in / supported by a set of facts.
    Give a binary 'yes' or 'no' score to indicate whether the answer is grounded in / supported by a set of facts.<|end|>
    <|user|> Here are the facts: {documents} \n Here is the answer: {generation}  <|end|>
    <|assistant|>""",
    input_variables=["generation", "documents"],
)

hallucination_grader = prompt_hallucination_grader | llm_hallucination_grader | StrOutputParser()

In [16]:
### Answer Grader
llm_answer_grader = llm_phi3

# Prompt
prompt_answer_grader = PromptTemplate(
    template="""<|assistant|> You are a grader assessing whether a generated answer is correct or incorrect,
    assecing it with the ground truth.
    Give a binary score 'yes' or 'no' to indicate whether the generated answer equal to or contains the ground truth.<|end|>
    <|user|> Here is the answer: {generation} \n Here is the ground truth: {truth} <|end|>
    <|assistant|>""",
    input_variables=["generation", "truth"],
)

answer_grader = prompt_answer_grader | llm_answer_grader | StrOutputParser()

In [22]:
NUM_PAR_CHUNKS = 20
N_DOCS_RETURN = 2

results_list = []

for i in tqdm(range(len(questions))):
    query = questions['question'][i]
    company = retrieval_metadata.invoke({"input": questions['company'][i], "metadata_list": metadata_list})

    parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
    child_splitter = RecursiveCharacterTextSplitter(chunk_size=256)
    big_chunks_retriever = ParentDocumentRetriever(
      vectorstore=vectorstore, docstore=store, child_splitter=child_splitter, parent_splitter=parent_splitter,
      search_kwargs={'filter': {'company': company['company']}, 'k': NUM_PAR_CHUNKS})

    passage = big_chunks_retriever.invoke(query)
    texts = []
    for i in range(len(passage)):
      texts.append([query, passage[i].page_content])
    try:
      if not texts:
        raise ValueError('Texts list is empty')
      scores = reranker.compute_score(texts)
      combined = list(zip(texts, scores))
      sorted_combined = sorted(combined, key=lambda x: x[1], reverse=True)
      top_texts = [item[0] for item in sorted_combined[:N_DOCS_RETURN]]
      docs = [inner_list[1] for inner_list in top_texts if len(inner_list)>1]

      retrieval_grade = retrieval_grader.invoke({"evidence_text": questions['evidence'][i], "document": docs})
      generation = rag_chain.invoke({"documents": docs, "question": query})
      hallucination_grade = hallucination_grader.invoke({"documents": docs, "generation": generation})
      answer_grade = answer_grader.invoke({"truth": questions['answer'][i], "generation": generation})

      results_list.append(pd.DataFrame({
            'question': [query],
            'response': [generation],
            'context': [docs],
            'retrieval_grade': [retrieval_grade],
            'hallucination_grade': [hallucination_grade],
            'answer_grade': [answer_grade]
        }))

      results = pd.concat(results_list, ignore_index=True)
      results.to_json(f'/content/drive/MyDrive/Thesis/rag_evaluation/financebench150/eval.json')

    except ValueError as e:
      print(f"Skipping question {i} due to error: {e}")
      continue

  0%|          | 0/150 [00:00<?, ?it/s]Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      59.68 ms /    89 runs   (    0.67 ms per token,  1491.39 tokens per second)
llama_print_timings: prompt eval time =    2145.81 ms /   381 tokens (    5.63 ms per token,   177.55 tokens per second)
llama_print_timings:        eval time =    3570.59 ms /    88 runs   (   40.57 ms per token,    24.65 tokens per second)
llama_print_timings:       total time =    5866.81 ms /   469 tokens
  1%|          | 1/150 [00:18<46:29, 18.72s/it]

Skipping question 12 due to error: Requested tokens (6834) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      49.30 ms /    89 runs   (    0.55 ms per token,  1805.20 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3580.26 ms /    89 runs   (   40.23 ms per token,    24.86 tokens per second)
llama_print_timings:       total time =    3691.00 ms /    89 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     164.50 ms /   256 runs   (    0.64 ms per token,  1556.27 tokens per second)
llama_print_timings: prompt eval time =   17832.84 ms /  2768 tokens (    6.44 ms per token,   155.22 tokens per second)
llama_print_timings:        eval time =   13593.84 ms /   256 runs   (   53.10 ms per token,    18.83 tokens per second)
llama_print_timings:       to

Skipping question 10 due to error: Requested tokens (8515) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      55.13 ms /    89 runs   (    0.62 ms per token,  1614.48 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3482.79 ms /    89 runs   (   39.13 ms per token,    25.55 tokens per second)
llama_print_timings:       total time =    3601.06 ms /    89 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     151.86 ms /   256 runs   (    0.59 ms per token,  1685.80 tokens per second)
llama_print_timings: prompt eval time =   22837.67 ms /  3398 tokens (    6.72 ms per token,   148.79 tokens per second)
llama_print_timings:        eval time =   14547.97 ms /   255 runs   (   57.05 ms per token,    17.53 tokens per second)
llama_print_timings:       to

Skipping question 9 due to error: Requested tokens (5427) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      50.82 ms /    89 runs   (    0.57 ms per token,  1751.24 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3500.52 ms /    89 runs   (   39.33 ms per token,    25.42 tokens per second)
llama_print_timings:       total time =    3607.09 ms /    89 tokens
  5%|▍         | 7/150 [04:39<1:14:24, 31.22s/it]

Skipping question 12 due to error: Requested tokens (6892) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      49.28 ms /    89 runs   (    0.55 ms per token,  1805.86 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3488.62 ms /    89 runs   (   39.20 ms per token,    25.51 tokens per second)
llama_print_timings:       total time =    3588.54 ms /    89 tokens
  5%|▌         | 8/150 [04:49<57:29, 24.29s/it]  

Skipping question 9 due to error: Requested tokens (5484) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      42.27 ms /    77 runs   (    0.55 ms per token,  1821.58 tokens per second)
llama_print_timings: prompt eval time =      41.34 ms /     7 tokens (    5.91 ms per token,   169.33 tokens per second)
llama_print_timings:        eval time =    2981.96 ms /    76 runs   (   39.24 ms per token,    25.49 tokens per second)
llama_print_timings:       total time =    3107.74 ms /    83 tokens
  6%|▌         | 9/150 [04:54<42:58, 18.29s/it]

Skipping question 12 due to error: Requested tokens (7700) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      42.82 ms /    77 runs   (    0.56 ms per token,  1798.39 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3046.03 ms /    77 runs   (   39.56 ms per token,    25.28 tokens per second)
llama_print_timings:       total time =    3132.72 ms /    77 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     108.92 ms /   196 runs   (    0.56 ms per token,  1799.50 tokens per second)
llama_print_timings: prompt eval time =   27253.41 ms /  3899 tokens (    6.99 ms per token,   143.06 tokens per second)
llama_print_timings:        eval time =   11420.49 ms /   195 runs   (   58.57 ms per token,    17.07 tokens per second)
llama_print_timings:       to

Skipping question 9 due to error: Requested tokens (6060) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      51.55 ms /    88 runs   (    0.59 ms per token,  1707.08 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3461.84 ms /    88 runs   (   39.34 ms per token,    25.42 tokens per second)
llama_print_timings:       total time =    3568.58 ms /    88 tokens
  8%|▊         | 12/150 [07:10<1:07:29, 29.34s/it]

Skipping question 12 due to error: Requested tokens (7670) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      50.24 ms /    88 runs   (    0.57 ms per token,  1751.63 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3459.26 ms /    88 runs   (   39.31 ms per token,    25.44 tokens per second)
llama_print_timings:       total time =    3566.16 ms /    88 tokens
  9%|▊         | 13/150 [07:16<50:57, 22.32s/it]  

Skipping question 8 due to error: Requested tokens (5366) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      58.45 ms /    88 runs   (    0.66 ms per token,  1505.43 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3477.23 ms /    88 runs   (   39.51 ms per token,    25.31 tokens per second)
llama_print_timings:       total time =    3606.03 ms /    88 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      28.01 ms /    49 runs   (    0.57 ms per token,  1749.63 tokens per second)
llama_print_timings: prompt eval time =   28547.13 ms /  4046 tokens (    7.06 ms per token,   141.73 tokens per second)
llama_print_timings:        eval time =    2843.53 ms /    48 runs   (   59.24 ms per token,    16.88 tokens per second)
llama_print_timings:       to

Skipping question 12 due to error: Requested tokens (7493) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     159.76 ms /   256 runs   (    0.62 ms per token,  1602.41 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =   10179.81 ms /   256 runs   (   39.76 ms per token,    25.15 tokens per second)
llama_print_timings:       total time =   10550.48 ms /   256 tokens
 12%|█▏        | 18/150 [12:18<1:29:54, 40.86s/it]

Skipping question 10 due to error: Requested tokens (8879) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      40.94 ms /    65 runs   (    0.63 ms per token,  1587.77 tokens per second)
llama_print_timings: prompt eval time =      38.78 ms /     3 tokens (   12.93 ms per token,    77.36 tokens per second)
llama_print_timings:        eval time =    2546.60 ms /    64 runs   (   39.79 ms per token,    25.13 tokens per second)
llama_print_timings:       total time =    2667.31 ms /    67 tokens
 13%|█▎        | 19/150 [12:36<1:14:14, 34.01s/it]

Skipping question 15 due to error: Requested tokens (4404) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      38.39 ms /    65 runs   (    0.59 ms per token,  1693.24 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2557.26 ms /    65 runs   (   39.34 ms per token,    25.42 tokens per second)
llama_print_timings:       total time =    2633.28 ms /    65 tokens
 13%|█▎        | 20/150 [12:44<57:00, 26.31s/it]  

Skipping question 10 due to error: Requested tokens (8825) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      41.43 ms /    65 runs   (    0.64 ms per token,  1568.76 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2578.43 ms /    65 runs   (   39.67 ms per token,    25.21 tokens per second)
llama_print_timings:       total time =    2669.93 ms /    65 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     140.68 ms /   256 runs   (    0.55 ms per token,  1819.68 tokens per second)
llama_print_timings: prompt eval time =   21696.31 ms /  3256 tokens (    6.66 ms per token,   150.07 tokens per second)
llama_print_timings:        eval time =   14141.50 ms /   255 runs   (   55.46 ms per token,    18.03 tokens per second)
llama_print_timings:       to

Skipping question 17 due to error: Requested tokens (7482) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      38.36 ms /    55 runs   (    0.70 ms per token,  1433.90 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2171.64 ms /    55 runs   (   39.48 ms per token,    25.33 tokens per second)
llama_print_timings:       total time =    2252.29 ms /    55 tokens
 17%|█▋        | 26/150 [18:37<1:24:16, 40.78s/it]

Skipping question 9 due to error: Requested tokens (5440) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      36.49 ms /    55 runs   (    0.66 ms per token,  1507.43 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2155.87 ms /    55 runs   (   39.20 ms per token,    25.51 tokens per second)
llama_print_timings:       total time =    2230.88 ms /    55 tokens
 18%|█▊        | 27/150 [18:46<1:04:19, 31.38s/it]

Skipping question 9 due to error: Requested tokens (6128) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      30.69 ms /    55 runs   (    0.56 ms per token,  1791.94 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2165.74 ms /    55 runs   (   39.38 ms per token,    25.40 tokens per second)
llama_print_timings:       total time =    2231.73 ms /    55 tokens
 19%|█▊        | 28/150 [18:53<48:41, 23.95s/it]  

Skipping question 10 due to error: Requested tokens (8831) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      39.25 ms /    55 runs   (    0.71 ms per token,  1401.17 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2181.13 ms /    55 runs   (   39.66 ms per token,    25.22 tokens per second)
llama_print_timings:       total time =    2264.76 ms /    55 tokens
 19%|█▉        | 29/150 [18:58<37:11, 18.44s/it]

Skipping question 12 due to error: Requested tokens (7302) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      31.22 ms /    55 runs   (    0.57 ms per token,  1761.47 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2186.61 ms /    55 runs   (   39.76 ms per token,    25.15 tokens per second)
llama_print_timings:       total time =    2251.65 ms /    55 tokens
 20%|██        | 30/150 [19:12<34:06, 17.06s/it]

Skipping question 12 due to error: Requested tokens (6928) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      57.82 ms /   100 runs   (    0.58 ms per token,  1729.54 tokens per second)
llama_print_timings: prompt eval time =      39.16 ms /     4 tokens (    9.79 ms per token,   102.14 tokens per second)
llama_print_timings:        eval time =    3961.27 ms /    99 runs   (   40.01 ms per token,    24.99 tokens per second)
llama_print_timings:       total time =    4111.02 ms /   103 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     155.01 ms /   256 runs   (    0.61 ms per token,  1651.54 tokens per second)
llama_print_timings: prompt eval time =   17817.97 ms /  2778 tokens (    6.41 ms per token,   155.91 tokens per second)
llama_print_timings:        eval time =   13551.79 ms /   255 runs   (   53.14 ms per token,    18.82 tokens per second)
llama_print_timings:       to

Skipping question 10 due to error: Requested tokens (8015) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      66.22 ms /   100 runs   (    0.66 ms per token,  1510.07 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3931.71 ms /   100 runs   (   39.32 ms per token,    25.43 tokens per second)
llama_print_timings:       total time =    4072.48 ms /   100 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     153.93 ms /   256 runs   (    0.60 ms per token,  1663.06 tokens per second)
llama_print_timings: prompt eval time =   12984.63 ms /  2156 tokens (    6.02 ms per token,   166.04 tokens per second)
llama_print_timings:        eval time =   11825.36 ms /   255 runs   (   46.37 ms per token,    21.56 tokens per second)
llama_print_timings:       to

Skipping question 12 due to error: Requested tokens (6704) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      59.07 ms /   100 runs   (    0.59 ms per token,  1692.99 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3942.49 ms /   100 runs   (   39.42 ms per token,    25.36 tokens per second)
llama_print_timings:       total time =    4060.80 ms /   100 tokens
 25%|██▌       | 38/150 [25:38<57:30, 30.81s/it]  

Skipping question 8 due to error: Requested tokens (4734) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     113.32 ms /   188 runs   (    0.60 ms per token,  1659.05 tokens per second)
llama_print_timings: prompt eval time =      40.12 ms /     4 tokens (   10.03 ms per token,    99.70 tokens per second)
llama_print_timings:        eval time =    7361.50 ms /   187 runs   (   39.37 ms per token,    25.40 tokens per second)
llama_print_timings:       total time =    7637.00 ms /   191 tokens
 26%|██▌       | 39/150 [25:55<48:57, 26.46s/it]

Skipping question 8 due to error: Requested tokens (5059) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     108.66 ms /   188 runs   (    0.58 ms per token,  1730.25 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    7474.38 ms /   188 runs   (   39.76 ms per token,    25.15 tokens per second)
llama_print_timings:       total time =    7693.86 ms /   188 tokens
 27%|██▋       | 40/150 [26:11<42:50, 23.36s/it]

Skipping question 9 due to error: Requested tokens (5427) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     115.78 ms /   188 runs   (    0.62 ms per token,  1623.81 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    7501.45 ms /   188 runs   (   39.90 ms per token,    25.06 tokens per second)
llama_print_timings:       total time =    7750.25 ms /   188 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     146.38 ms /   256 runs   (    0.57 ms per token,  1748.85 tokens per second)
llama_print_timings: prompt eval time =   13923.87 ms /  2269 tokens (    6.14 ms per token,   162.96 tokens per second)
llama_print_timings:        eval time =   12179.33 ms /   255 runs   (   47.76 ms per token,    20.94 tokens per second)
llama_print_timings:       to

Skipping question 9 due to error: Requested tokens (5960) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      70.14 ms /   110 runs   (    0.64 ms per token,  1568.36 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    4313.39 ms /   110 runs   (   39.21 ms per token,    25.50 tokens per second)
llama_print_timings:       total time =    4463.30 ms /   110 tokens
 29%|██▊       | 43/150 [27:53<47:06, 26.42s/it]

Skipping question 9 due to error: Requested tokens (5975) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      75.28 ms /   110 runs   (    0.68 ms per token,  1461.19 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    4335.84 ms /   110 runs   (   39.42 ms per token,    25.37 tokens per second)
llama_print_timings:       total time =    4518.77 ms /   110 tokens
 29%|██▉       | 44/150 [28:01<37:01, 20.96s/it]

Skipping question 9 due to error: Requested tokens (5687) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      64.44 ms /   110 runs   (    0.59 ms per token,  1707.09 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    4363.26 ms /   110 runs   (   39.67 ms per token,    25.21 tokens per second)
llama_print_timings:       total time =    4498.62 ms /   110 tokens
 30%|███       | 45/150 [28:10<30:10, 17.25s/it]

Skipping question 9 due to error: Requested tokens (5587) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      54.86 ms /    80 runs   (    0.69 ms per token,  1458.26 tokens per second)
llama_print_timings: prompt eval time =      39.44 ms /     4 tokens (    9.86 ms per token,   101.42 tokens per second)
llama_print_timings:        eval time =    3188.18 ms /    79 runs   (   40.36 ms per token,    24.78 tokens per second)
llama_print_timings:       total time =    3344.81 ms /    83 tokens
 31%|███       | 46/150 [28:24<28:09, 16.25s/it]

Skipping question 8 due to error: Requested tokens (5543) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      55.76 ms /    80 runs   (    0.70 ms per token,  1434.77 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3210.90 ms /    80 runs   (   40.14 ms per token,    24.92 tokens per second)
llama_print_timings:       total time =    3334.33 ms /    80 tokens
 31%|███▏      | 47/150 [28:33<24:04, 14.03s/it]

Skipping question 10 due to error: Requested tokens (8952) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      55.92 ms /    80 runs   (    0.70 ms per token,  1430.69 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3193.07 ms /    80 runs   (   39.91 ms per token,    25.05 tokens per second)
llama_print_timings:       total time =    3317.23 ms /    80 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     144.71 ms /   256 runs   (    0.57 ms per token,  1769.12 tokens per second)
llama_print_timings: prompt eval time =   19145.94 ms /  2962 tokens (    6.46 ms per token,   154.71 tokens per second)
llama_print_timings:        eval time =   13735.49 ms /   255 runs   (   53.86 ms per token,    18.57 tokens per second)
llama_print_timings:       to

Skipping question 9 due to error: Requested tokens (6532) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      16.30 ms /    28 runs   (    0.58 ms per token,  1718.11 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1096.90 ms /    28 runs   (   39.17 ms per token,    25.53 tokens per second)
llama_print_timings:       total time =    1128.38 ms /    28 tokens
 34%|███▍      | 51/150 [31:30<45:42, 27.70s/it]  

Skipping question 12 due to error: Requested tokens (7137) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      18.93 ms /    28 runs   (    0.68 ms per token,  1479.29 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1095.29 ms /    28 runs   (   39.12 ms per token,    25.56 tokens per second)
llama_print_timings:       total time =    1135.48 ms /    28 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     136.13 ms /   256 runs   (    0.53 ms per token,  1880.61 tokens per second)
llama_print_timings: prompt eval time =   21082.14 ms /  3197 tokens (    6.59 ms per token,   151.64 tokens per second)
llama_print_timings:        eval time =   14298.61 ms /   255 runs   (   56.07 ms per token,    17.83 tokens per second)
llama_print_timings:       to

Skipping question 10 due to error: Requested tokens (8247) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      19.90 ms /    28 runs   (    0.71 ms per token,  1407.11 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1102.99 ms /    28 runs   (   39.39 ms per token,    25.39 tokens per second)
llama_print_timings:       total time =    1141.01 ms /    28 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     135.84 ms /   256 runs   (    0.53 ms per token,  1884.60 tokens per second)
llama_print_timings: prompt eval time =   25375.23 ms /  3723 tokens (    6.82 ms per token,   146.72 tokens per second)
llama_print_timings:        eval time =   14938.52 ms /   255 runs   (   58.58 ms per token,    17.07 tokens per second)
llama_print_timings:       to

Skipping question 56 due to error: Texts list is empty


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      51.88 ms /    73 runs   (    0.71 ms per token,  1407.15 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2899.30 ms /    73 runs   (   39.72 ms per token,    25.18 tokens per second)
llama_print_timings:       total time =    3017.83 ms /    73 tokens
 39%|███▊      | 58/150 [37:13<50:28, 32.91s/it]  

Skipping question 57 due to error: Texts list is empty


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      42.95 ms /    73 runs   (    0.59 ms per token,  1699.57 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2884.74 ms /    73 runs   (   39.52 ms per token,    25.31 tokens per second)
llama_print_timings:       total time =    2970.16 ms /    73 tokens
 39%|███▉      | 59/150 [37:17<36:52, 24.31s/it]

Skipping question 58 due to error: Texts list is empty


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      34.01 ms /    58 runs   (    0.59 ms per token,  1705.58 tokens per second)
llama_print_timings: prompt eval time =      38.43 ms /     4 tokens (    9.61 ms per token,   104.09 tokens per second)
llama_print_timings:        eval time =    2255.35 ms /    57 runs   (   39.57 ms per token,    25.27 tokens per second)
llama_print_timings:       total time =    2364.81 ms /    61 tokens
 40%|████      | 60/150 [37:29<30:38, 20.42s/it]

Skipping question 8 due to error: Requested tokens (5820) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      34.27 ms /    58 runs   (    0.59 ms per token,  1692.49 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2280.34 ms /    58 runs   (   39.32 ms per token,    25.43 tokens per second)
llama_print_timings:       total time =    2347.69 ms /    58 tokens
 41%|████      | 61/150 [37:37<25:06, 16.93s/it]

Skipping question 10 due to error: Requested tokens (8566) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      33.01 ms /    58 runs   (    0.57 ms per token,  1756.94 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2297.89 ms /    58 runs   (   39.62 ms per token,    25.24 tokens per second)
llama_print_timings:       total time =    2358.85 ms /    58 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     152.47 ms /   256 runs   (    0.60 ms per token,  1678.97 tokens per second)
llama_print_timings: prompt eval time =   15898.66 ms /  2530 tokens (    6.28 ms per token,   159.13 tokens per second)
llama_print_timings:        eval time =   13457.58 ms /   255 runs   (   52.77 ms per token,    18.95 tokens per second)
llama_print_timings:       to

Skipping question 11 due to error: Requested tokens (4317) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      33.91 ms /    58 runs   (    0.58 ms per token,  1710.51 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2275.10 ms /    58 runs   (   39.23 ms per token,    25.49 tokens per second)
llama_print_timings:       total time =    2342.86 ms /    58 tokens
 43%|████▎     | 64/150 [39:07<30:20, 21.17s/it]

Skipping question 15 due to error: Requested tokens (4269) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      34.26 ms /    58 runs   (    0.59 ms per token,  1693.08 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2269.83 ms /    58 runs   (   39.13 ms per token,    25.55 tokens per second)
llama_print_timings:       total time =    2335.24 ms /    58 tokens
 43%|████▎     | 65/150 [39:15<24:23, 17.22s/it]

Skipping question 8 due to error: Requested tokens (4483) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      36.79 ms /    58 runs   (    0.63 ms per token,  1576.52 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2282.01 ms /    58 runs   (   39.35 ms per token,    25.42 tokens per second)
llama_print_timings:       total time =    2358.04 ms /    58 tokens
 44%|████▍     | 66/150 [39:25<20:56, 14.96s/it]

Skipping question 12 due to error: Requested tokens (6626) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      34.75 ms /    58 runs   (    0.60 ms per token,  1669.11 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2285.57 ms /    58 runs   (   39.41 ms per token,    25.38 tokens per second)
llama_print_timings:       total time =    2353.91 ms /    58 tokens
 45%|████▍     | 67/150 [39:33<17:47, 12.86s/it]

Skipping question 9 due to error: Requested tokens (5825) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      40.28 ms /    69 runs   (    0.58 ms per token,  1713.14 tokens per second)
llama_print_timings: prompt eval time =      41.91 ms /     7 tokens (    5.99 ms per token,   167.01 tokens per second)
llama_print_timings:        eval time =    2672.99 ms /    68 runs   (   39.31 ms per token,    25.44 tokens per second)
llama_print_timings:       total time =    2791.78 ms /    75 tokens
 45%|████▌     | 68/150 [39:47<18:11, 13.31s/it]

Skipping question 10 due to error: Requested tokens (9047) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      39.77 ms /    69 runs   (    0.58 ms per token,  1734.93 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2741.59 ms /    69 runs   (   39.73 ms per token,    25.17 tokens per second)
llama_print_timings:       total time =    2818.62 ms /    69 tokens
 46%|████▌     | 69/150 [39:55<15:44, 11.66s/it]

Skipping question 8 due to error: Requested tokens (5917) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      41.51 ms /    69 runs   (    0.60 ms per token,  1662.13 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2766.11 ms /    69 runs   (   40.09 ms per token,    24.94 tokens per second)
llama_print_timings:       total time =    2848.13 ms /    69 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      60.13 ms /   111 runs   (    0.54 ms per token,  1846.00 tokens per second)
llama_print_timings: prompt eval time =   27821.55 ms /  3984 tokens (    6.98 ms per token,   143.20 tokens per second)
llama_print_timings:        eval time =    6440.66 ms /   110 runs   (   58.55 ms per token,    17.08 tokens per second)
llama_print_timings:       to

Skipping question 16 due to error: Requested tokens (4616) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      34.02 ms /    45 runs   (    0.76 ms per token,  1322.79 tokens per second)
llama_print_timings: prompt eval time =      40.39 ms /     4 tokens (   10.10 ms per token,    99.03 tokens per second)
llama_print_timings:        eval time =    1730.02 ms /    44 runs   (   39.32 ms per token,    25.43 tokens per second)
llama_print_timings:       total time =    1835.37 ms /    48 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     145.97 ms /   256 runs   (    0.57 ms per token,  1753.79 tokens per second)
llama_print_timings: prompt eval time =   26099.93 ms /  3794 tokens (    6.88 ms per token,   145.36 tokens per second)
llama_print_timings:        eval time =   15011.73 ms /   255 runs   (   58.87 ms per token,    16.99 tokens per second)
llama_print_timings:       to

Skipping question 79 due to error: Texts list is empty


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      18.05 ms /    28 runs   (    0.64 ms per token,  1550.99 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1121.59 ms /    28 runs   (   40.06 ms per token,    24.96 tokens per second)
llama_print_timings:       total time =    1163.27 ms /    28 tokens
 54%|█████▍    | 81/150 [53:02<41:46, 36.33s/it]

Skipping question 80 due to error: Texts list is empty


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      53.52 ms /    81 runs   (    0.66 ms per token,  1513.45 tokens per second)
llama_print_timings: prompt eval time =      48.91 ms /     5 tokens (    9.78 ms per token,   102.22 tokens per second)
llama_print_timings:        eval time =    3173.32 ms /    80 runs   (   39.67 ms per token,    25.21 tokens per second)
llama_print_timings:       total time =    3343.84 ms /    85 tokens
 55%|█████▍    | 82/150 [53:18<34:10, 30.16s/it]

Skipping question 11 due to error: Requested tokens (4416) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      47.91 ms /    81 runs   (    0.59 ms per token,  1690.53 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3178.29 ms /    81 runs   (   39.24 ms per token,    25.49 tokens per second)
llama_print_timings:       total time =    3279.06 ms /    81 tokens
 55%|█████▌    | 83/150 [53:26<26:09, 23.43s/it]

Skipping question 9 due to error: Requested tokens (6763) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      54.03 ms /    81 runs   (    0.67 ms per token,  1499.22 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3187.22 ms /    81 runs   (   39.35 ms per token,    25.41 tokens per second)
llama_print_timings:       total time =    3305.08 ms /    81 tokens
 56%|█████▌    | 84/150 [53:34<20:54, 19.01s/it]

Skipping question 8 due to error: Requested tokens (5861) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      50.39 ms /    81 runs   (    0.62 ms per token,  1607.43 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3187.38 ms /    81 runs   (   39.35 ms per token,    25.41 tokens per second)
llama_print_timings:       total time =    3299.29 ms /    81 tokens
 57%|█████▋    | 85/150 [53:42<16:56, 15.63s/it]

Skipping question 9 due to error: Requested tokens (6778) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      25.61 ms /    30 runs   (    0.85 ms per token,  1171.51 tokens per second)
llama_print_timings: prompt eval time =      40.08 ms /     5 tokens (    8.02 ms per token,   124.76 tokens per second)
llama_print_timings:        eval time =    1156.69 ms /    29 runs   (   39.89 ms per token,    25.07 tokens per second)
llama_print_timings:       total time =    1250.04 ms /    34 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     155.05 ms /   256 runs   (    0.61 ms per token,  1651.04 tokens per second)
llama_print_timings: prompt eval time =   24658.49 ms /  3588 tokens (    6.87 ms per token,   145.51 tokens per second)
llama_print_timings:        eval time =   14492.83 ms /   255 runs   (   56.83 ms per token,    17.59 tokens per second)
llama_print_timings:       to

Skipping question 10 due to error: Requested tokens (8515) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      22.96 ms /    30 runs   (    0.77 ms per token,  1306.45 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1164.24 ms /    30 runs   (   38.81 ms per token,    25.77 tokens per second)
llama_print_timings:       total time =    1209.81 ms /    30 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     160.33 ms /   256 runs   (    0.63 ms per token,  1596.68 tokens per second)
llama_print_timings: prompt eval time =   18120.43 ms /  2836 tokens (    6.39 ms per token,   156.51 tokens per second)
llama_print_timings:        eval time =   13822.09 ms /   255 runs   (   54.20 ms per token,    18.45 tokens per second)
llama_print_timings:       to

Skipping question 9 due to error: Requested tokens (5749) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      17.77 ms /    30 runs   (    0.59 ms per token,  1688.33 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1182.14 ms /    30 runs   (   39.40 ms per token,    25.38 tokens per second)
llama_print_timings:       total time =    1219.70 ms /    30 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     141.43 ms /   256 runs   (    0.55 ms per token,  1810.10 tokens per second)
llama_print_timings: prompt eval time =   13874.49 ms /  2274 tokens (    6.10 ms per token,   163.90 tokens per second)
llama_print_timings:        eval time =   12209.16 ms /   255 runs   (   47.88 ms per token,    20.89 tokens per second)
llama_print_timings:       to

Skipping question 8 due to error: Requested tokens (4512) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      20.41 ms /    30 runs   (    0.68 ms per token,  1469.87 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1187.04 ms /    30 runs   (   39.57 ms per token,    25.27 tokens per second)
llama_print_timings:       total time =    1235.64 ms /    30 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     135.49 ms /   256 runs   (    0.53 ms per token,  1889.49 tokens per second)
llama_print_timings: prompt eval time =   24674.78 ms /  3656 tokens (    6.75 ms per token,   148.17 tokens per second)
llama_print_timings:        eval time =   14828.03 ms /   255 runs   (   58.15 ms per token,    17.20 tokens per second)
llama_print_timings:       to

Skipping question 16 due to error: Requested tokens (4981) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      42.38 ms /    59 runs   (    0.72 ms per token,  1392.07 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2321.02 ms /    59 runs   (   39.34 ms per token,    25.42 tokens per second)
llama_print_timings:       total time =    2414.25 ms /    59 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     149.99 ms /   256 runs   (    0.59 ms per token,  1706.78 tokens per second)
llama_print_timings: prompt eval time =   21781.39 ms /  3285 tokens (    6.63 ms per token,   150.82 tokens per second)
llama_print_timings:        eval time =   14407.72 ms /   255 runs   (   56.50 ms per token,    17.70 tokens per second)
llama_print_timings:       to

Skipping question 17 due to error: Requested tokens (7473) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      35.55 ms /    59 runs   (    0.60 ms per token,  1659.40 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2309.56 ms /    59 runs   (   39.15 ms per token,    25.55 tokens per second)
llama_print_timings:       total time =    2382.44 ms /    59 tokens
 66%|██████▌   | 99/150 [1:04:25<32:45, 38.54s/it]

Skipping question 12 due to error: Requested tokens (6573) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      36.77 ms /    64 runs   (    0.57 ms per token,  1740.50 tokens per second)
llama_print_timings: prompt eval time =      40.90 ms /     5 tokens (    8.18 ms per token,   122.25 tokens per second)
llama_print_timings:        eval time =    2471.01 ms /    63 runs   (   39.22 ms per token,    25.50 tokens per second)
llama_print_timings:       total time =    2589.32 ms /    68 tokens
 67%|██████▋   | 100/150 [1:04:38<25:43, 30.88s/it]

Skipping question 9 due to error: Requested tokens (6646) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      41.76 ms /    70 runs   (    0.60 ms per token,  1676.21 tokens per second)
llama_print_timings: prompt eval time =      43.20 ms /     6 tokens (    7.20 ms per token,   138.89 tokens per second)
llama_print_timings:        eval time =    2707.95 ms /    69 runs   (   39.25 ms per token,    25.48 tokens per second)
llama_print_timings:       total time =    2842.29 ms /    75 tokens
 67%|██████▋   | 101/150 [1:04:53<21:12, 25.96s/it]

Skipping question 11 due to error: Requested tokens (4209) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      40.61 ms /    70 runs   (    0.58 ms per token,  1723.63 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2772.40 ms /    70 runs   (   39.61 ms per token,    25.25 tokens per second)
llama_print_timings:       total time =    2858.96 ms /    70 tokens
 68%|██████▊   | 102/150 [1:05:01<16:33, 20.70s/it]

Skipping question 12 due to error: Requested tokens (7601) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      41.45 ms /    70 runs   (    0.59 ms per token,  1688.82 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2806.09 ms /    70 runs   (   40.09 ms per token,    24.95 tokens per second)
llama_print_timings:       total time =    2897.63 ms /    70 tokens
 69%|██████▊   | 103/150 [1:05:08<13:00, 16.61s/it]

Skipping question 12 due to error: Requested tokens (7584) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     104.40 ms /   154 runs   (    0.68 ms per token,  1475.12 tokens per second)
llama_print_timings: prompt eval time =      41.23 ms /     7 tokens (    5.89 ms per token,   169.78 tokens per second)
llama_print_timings:        eval time =    6257.70 ms /   153 runs   (   40.90 ms per token,    24.45 tokens per second)
llama_print_timings:       total time =    6564.03 ms /   160 tokens
 69%|██████▉   | 104/150 [1:05:30<13:56, 18.19s/it]

Skipping question 15 due to error: Requested tokens (4323) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      86.92 ms /   154 runs   (    0.56 ms per token,  1771.83 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    6056.85 ms /   154 runs   (   39.33 ms per token,    25.43 tokens per second)
llama_print_timings:       total time =    6259.17 ms /   154 tokens
 70%|███████   | 105/150 [1:05:49<13:52, 18.50s/it]

Skipping question 16 due to error: Requested tokens (4787) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     100.07 ms /   154 runs   (    0.65 ms per token,  1538.98 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    6116.58 ms /   154 runs   (   39.72 ms per token,    25.18 tokens per second)
llama_print_timings:       total time =    6358.57 ms /   154 tokens
 71%|███████   | 106/150 [1:06:10<13:59, 19.08s/it]

Skipping question 14 due to error: Requested tokens (4365) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      88.06 ms /   154 runs   (    0.57 ms per token,  1748.79 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    6155.90 ms /   154 runs   (   39.97 ms per token,    25.02 tokens per second)
llama_print_timings:       total time =    6364.21 ms /   154 tokens
 71%|███████▏  | 107/150 [1:06:29<13:40, 19.07s/it]

Skipping question 14 due to error: Requested tokens (4334) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     100.53 ms /   154 runs   (    0.65 ms per token,  1531.88 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    6194.47 ms /   154 runs   (   40.22 ms per token,    24.86 tokens per second)
llama_print_timings:       total time =    6442.56 ms /   154 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     160.48 ms /   256 runs   (    0.63 ms per token,  1595.22 tokens per second)
llama_print_timings: prompt eval time =   18701.75 ms /  2904 tokens (    6.44 ms per token,   155.28 tokens per second)
llama_print_timings:        eval time =   13689.22 ms /   256 runs   (   53.47 ms per token,    18.70 tokens per second)
llama_print_timings:       to

Skipping question 10 due to error: Requested tokens (8886) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      36.66 ms /    64 runs   (    0.57 ms per token,  1745.53 tokens per second)
llama_print_timings: prompt eval time =      38.83 ms /     3 tokens (   12.94 ms per token,    77.26 tokens per second)
llama_print_timings:        eval time =    2451.62 ms /    63 runs   (   38.91 ms per token,    25.70 tokens per second)
llama_print_timings:       total time =    2574.46 ms /    66 tokens
 74%|███████▍  | 111/150 [1:09:40<21:02, 32.37s/it]

Skipping question 10 due to error: Requested tokens (9034) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      36.94 ms /    64 runs   (    0.58 ms per token,  1732.73 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2494.31 ms /    64 runs   (   38.97 ms per token,    25.66 tokens per second)
llama_print_timings:       total time =    2576.76 ms /    64 tokens
 75%|███████▍  | 112/150 [1:09:52<16:36, 26.24s/it]

Skipping question 12 due to error: Requested tokens (6766) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      56.81 ms /    99 runs   (    0.57 ms per token,  1742.53 tokens per second)
llama_print_timings: prompt eval time =      43.16 ms /     5 tokens (    8.63 ms per token,   115.84 tokens per second)
llama_print_timings:        eval time =    3854.06 ms /    98 runs   (   39.33 ms per token,    25.43 tokens per second)
llama_print_timings:       total time =    4025.19 ms /   103 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     146.26 ms /   256 runs   (    0.57 ms per token,  1750.27 tokens per second)
llama_print_timings: prompt eval time =   25543.17 ms /  3698 tokens (    6.91 ms per token,   144.77 tokens per second)
llama_print_timings:        eval time =   14724.42 ms /   255 runs   (   57.74 ms per token,    17.32 tokens per second)
llama_print_timings:       to

Skipping question 15 due to error: Requested tokens (4342) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      44.23 ms /    79 runs   (    0.56 ms per token,  1786.32 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3101.76 ms /    79 runs   (   39.26 ms per token,    25.47 tokens per second)
llama_print_timings:       total time =    3197.88 ms /    79 tokens
 77%|███████▋  | 116/150 [1:14:04<23:30, 41.48s/it]

Skipping question 10 due to error: Requested tokens (8963) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      44.47 ms /    79 runs   (    0.56 ms per token,  1776.56 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3098.48 ms /    79 runs   (   39.22 ms per token,    25.50 tokens per second)
llama_print_timings:       total time =    3198.14 ms /    79 tokens
 78%|███████▊  | 117/150 [1:14:11<17:04, 31.05s/it]

Skipping question 8 due to error: Requested tokens (5726) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      46.59 ms /    79 runs   (    0.59 ms per token,  1695.79 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3112.65 ms /    79 runs   (   39.40 ms per token,    25.38 tokens per second)
llama_print_timings:       total time =    3223.96 ms /    79 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     161.56 ms /   256 runs   (    0.63 ms per token,  1584.58 tokens per second)
llama_print_timings: prompt eval time =   21852.62 ms /  3272 tokens (    6.68 ms per token,   149.73 tokens per second)
llama_print_timings:        eval time =   14274.42 ms /   256 runs   (   55.76 ms per token,    17.93 tokens per second)
llama_print_timings:       to

Skipping question 10 due to error: Requested tokens (8073) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      51.12 ms /    71 runs   (    0.72 ms per token,  1388.83 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2795.72 ms /    71 runs   (   39.38 ms per token,    25.40 tokens per second)
llama_print_timings:       total time =    2916.61 ms /    71 tokens
 81%|████████▏ | 122/150 [1:19:01<19:10, 41.09s/it]

Skipping question 12 due to error: Requested tokens (6954) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      42.86 ms /    71 runs   (    0.60 ms per token,  1656.48 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2774.01 ms /    71 runs   (   39.07 ms per token,    25.59 tokens per second)
llama_print_timings:       total time =    2868.05 ms /    71 tokens
 82%|████████▏ | 123/150 [1:19:11<14:21, 31.91s/it]

Skipping question 12 due to error: Requested tokens (7053) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      54.14 ms /    71 runs   (    0.76 ms per token,  1311.41 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2830.16 ms /    71 runs   (   39.86 ms per token,    25.09 tokens per second)
llama_print_timings:       total time =    2952.24 ms /    71 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      95.51 ms /   167 runs   (    0.57 ms per token,  1748.51 tokens per second)
llama_print_timings: prompt eval time =   27488.51 ms /  3928 tokens (    7.00 ms per token,   142.90 tokens per second)
llama_print_timings:        eval time =    9684.69 ms /   166 runs   (   58.34 ms per token,    17.14 tokens per second)
llama_print_timings:       to

Skipping question 16 due to error: Requested tokens (4818) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      47.94 ms /    71 runs   (    0.68 ms per token,  1480.99 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2796.40 ms /    71 runs   (   39.39 ms per token,    25.39 tokens per second)
llama_print_timings:       total time =    2904.92 ms /    71 tokens
 84%|████████▍ | 126/150 [1:21:18<13:09, 32.89s/it]

Skipping question 16 due to error: Requested tokens (5250) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      50.48 ms /    71 runs   (    0.71 ms per token,  1406.50 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2814.71 ms /    71 runs   (   39.64 ms per token,    25.22 tokens per second)
llama_print_timings:       total time =    2932.76 ms /    71 tokens
 85%|████████▍ | 127/150 [1:21:32<10:25, 27.21s/it]

Skipping question 12 due to error: Requested tokens (6895) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      48.81 ms /    71 runs   (    0.69 ms per token,  1454.74 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2820.79 ms /    71 runs   (   39.73 ms per token,    25.17 tokens per second)
llama_print_timings:       total time =    2938.15 ms /    71 tokens
 85%|████████▌ | 128/150 [1:21:43<08:11, 22.32s/it]

Skipping question 12 due to error: Requested tokens (6922) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      41.64 ms /    71 runs   (    0.59 ms per token,  1704.93 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2841.18 ms /    71 runs   (   40.02 ms per token,    24.99 tokens per second)
llama_print_timings:       total time =    2934.44 ms /    71 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     154.47 ms /   256 runs   (    0.60 ms per token,  1657.24 tokens per second)
llama_print_timings: prompt eval time =   17401.01 ms /  2716 tokens (    6.41 ms per token,   156.08 tokens per second)
llama_print_timings:        eval time =   13427.36 ms /   255 runs   (   52.66 ms per token,    18.99 tokens per second)
llama_print_timings:       to

Skipping question 12 due to error: Requested tokens (6884) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      53.07 ms /    93 runs   (    0.57 ms per token,  1752.53 tokens per second)
llama_print_timings: prompt eval time =      39.56 ms /     4 tokens (    9.89 ms per token,   101.11 tokens per second)
llama_print_timings:        eval time =    3611.74 ms /    92 runs   (   39.26 ms per token,    25.47 tokens per second)
llama_print_timings:       total time =    3762.21 ms /    96 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     146.06 ms /   256 runs   (    0.57 ms per token,  1752.76 tokens per second)
llama_print_timings: prompt eval time =   15521.32 ms /  2491 tokens (    6.23 ms per token,   160.49 tokens per second)
llama_print_timings:        eval time =   13365.00 ms /   255 runs   (   52.41 ms per token,    19.08 tokens per second)
llama_print_timings:       to

Skipping question 18 due to error: Requested tokens (4253) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      53.09 ms /    93 runs   (    0.57 ms per token,  1751.68 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3629.12 ms /    93 runs   (   39.02 ms per token,    25.63 tokens per second)
llama_print_timings:       total time =    3744.67 ms /    93 tokens
 89%|████████▉ | 134/150 [1:25:57<08:39, 32.45s/it]

Skipping question 8 due to error: Requested tokens (4552) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      59.52 ms /    93 runs   (    0.64 ms per token,  1562.58 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    3645.05 ms /    93 runs   (   39.19 ms per token,    25.51 tokens per second)
llama_print_timings:       total time =    3777.93 ms /    93 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      71.24 ms /   138 runs   (    0.52 ms per token,  1937.09 tokens per second)
llama_print_timings: prompt eval time =   27720.60 ms /  3957 tokens (    7.01 ms per token,   142.75 tokens per second)
llama_print_timings:        eval time =    8102.91 ms /   137 runs   (   59.15 ms per token,    16.91 tokens per second)
llama_print_timings:       to

Skipping question 18 due to error: Requested tokens (4689) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      53.30 ms /    75 runs   (    0.71 ms per token,  1407.00 tokens per second)
llama_print_timings: prompt eval time =      41.51 ms /     6 tokens (    6.92 ms per token,   144.53 tokens per second)
llama_print_timings:        eval time =    2913.25 ms /    74 runs   (   39.37 ms per token,    25.40 tokens per second)
llama_print_timings:       total time =    3077.82 ms /    80 tokens
 91%|█████████▏| 137/150 [1:27:47<06:31, 30.11s/it]

Skipping question 10 due to error: Requested tokens (8648) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      48.46 ms /    75 runs   (    0.65 ms per token,  1547.67 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2955.32 ms /    75 runs   (   39.40 ms per token,    25.38 tokens per second)
llama_print_timings:       total time =    3072.58 ms /    75 tokens
 92%|█████████▏| 138/150 [1:28:00<04:59, 24.93s/it]

Skipping question 10 due to error: Requested tokens (8410) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      46.36 ms /    75 runs   (    0.62 ms per token,  1617.88 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2969.75 ms /    75 runs   (   39.60 ms per token,    25.25 tokens per second)
llama_print_timings:       total time =    3076.35 ms /    75 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     157.72 ms /   256 runs   (    0.62 ms per token,  1623.15 tokens per second)
llama_print_timings: prompt eval time =   19834.98 ms /  3018 tokens (    6.57 ms per token,   152.16 tokens per second)
llama_print_timings:        eval time =   13961.43 ms /   255 runs   (   54.75 ms per token,    18.26 tokens per second)
llama_print_timings:       to

Skipping question 10 due to error: Requested tokens (8101) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      42.07 ms /    75 runs   (    0.56 ms per token,  1782.66 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2966.40 ms /    75 runs   (   39.55 ms per token,    25.28 tokens per second)
llama_print_timings:       total time =    3057.80 ms /    75 tokens
 94%|█████████▍| 141/150 [1:29:25<03:22, 22.52s/it]

Skipping question 8 due to error: Requested tokens (5062) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      42.95 ms /    75 runs   (    0.57 ms per token,  1746.34 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2948.58 ms /    75 runs   (   39.31 ms per token,    25.44 tokens per second)
llama_print_timings:       total time =    3040.11 ms /    75 tokens
 95%|█████████▍| 142/150 [1:29:33<02:25, 18.16s/it]

Skipping question 10 due to error: Requested tokens (8039) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      28.30 ms /    43 runs   (    0.66 ms per token,  1519.49 tokens per second)
llama_print_timings: prompt eval time =      39.09 ms /     4 tokens (    9.77 ms per token,   102.32 tokens per second)
llama_print_timings:        eval time =    1640.39 ms /    42 runs   (   39.06 ms per token,    25.60 tokens per second)
llama_print_timings:       total time =    1734.58 ms /    46 tokens
 95%|█████████▌| 143/150 [1:29:44<01:51, 15.93s/it]

Skipping question 8 due to error: Requested tokens (4607) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      25.11 ms /    43 runs   (    0.58 ms per token,  1712.67 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1700.08 ms /    43 runs   (   39.54 ms per token,    25.29 tokens per second)
llama_print_timings:       total time =    1749.24 ms /    43 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =     140.06 ms /   256 runs   (    0.55 ms per token,  1827.78 tokens per second)
llama_print_timings: prompt eval time =   25594.49 ms /  3720 tokens (    6.88 ms per token,   145.34 tokens per second)
llama_print_timings:        eval time =   14862.69 ms /   256 runs   (   58.06 ms per token,    17.22 tokens per second)
llama_print_timings:       to

Skipping question 14 due to error: Requested tokens (4284) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      25.99 ms /    43 runs   (    0.60 ms per token,  1654.61 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1682.67 ms /    43 runs   (   39.13 ms per token,    25.55 tokens per second)
llama_print_timings:       total time =    1737.54 ms /    43 tokens
 97%|█████████▋| 146/150 [1:31:31<01:35, 23.99s/it]

Skipping question 14 due to error: Requested tokens (4142) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      25.63 ms /    43 runs   (    0.60 ms per token,  1677.59 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    1683.25 ms /    43 runs   (   39.15 ms per token,    25.55 tokens per second)
llama_print_timings:       total time =    1737.20 ms /    43 tokens
 98%|█████████▊| 147/150 [1:31:41<00:59, 19.98s/it]

Skipping question 12 due to error: Requested tokens (6786) exceed context window of 4096


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      35.46 ms /    62 runs   (    0.57 ms per token,  1748.50 tokens per second)
llama_print_timings: prompt eval time =      40.03 ms /     4 tokens (   10.01 ms per token,    99.93 tokens per second)
llama_print_timings:        eval time =    2386.96 ms /    61 runs   (   39.13 ms per token,    25.56 tokens per second)
llama_print_timings:       total time =    2499.70 ms /    65 tokens
 99%|█████████▊| 148/150 [1:31:45<00:30, 15.11s/it]

Skipping question 147 due to error: Texts list is empty


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      39.33 ms /    62 runs   (    0.63 ms per token,  1576.36 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2494.84 ms /    62 runs   (   40.24 ms per token,    24.85 tokens per second)
llama_print_timings:       total time =    2582.23 ms /    62 tokens
 99%|█████████▉| 149/150 [1:31:50<00:12, 12.01s/it]

Skipping question 148 due to error: Texts list is empty


Llama.generate: prefix-match hit

llama_print_timings:        load time =     410.95 ms
llama_print_timings:      sample time =      39.64 ms /    62 runs   (    0.64 ms per token,  1564.23 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =    2497.62 ms /    62 runs   (   40.28 ms per token,    24.82 tokens per second)
llama_print_timings:       total time =    2591.59 ms /    62 tokens
100%|██████████| 150/150 [1:31:54<00:00, 36.76s/it]

Skipping question 149 due to error: Texts list is empty



