> Este notebook explica o racional de como instalar o quest-eval em um .venv dedicado 

# Instalação “pip-only” do **QuestEval** (ambiente dedicado)

Este guia replica o passo de `conda install pytorch cudatoolkit=…` da doc oficial, mas usando **venv + pip**.  
Funciona em Windows, macOS ou Linux (troque apenas o comando de ativação do venv).

---

## Requisitos

- **Python 3.9** (o spaCy 3.0 foi compilado para essa versão).  
- Acesso à internet para baixar os wheels.

---

## 1  Criar e ativar o ambiente virtual

<details>

```bash
# Windows (PowerShell)
py -3.9 -m venv .venv-questeval
.\.venv-questeval\Scripts\activate

# Linux / macOS (bash)
python3.9 -m venv .venv-questeval
source .venv-questeval/bin/activate
````

</details>

> Atualize ferramentas básicas:

```bash
python -m pip install -U pip setuptools wheel
```

---

## 2  Instalar **PyTorch** *antes de qualquer pacote*

Escolha **uma** linha — CPU ou GPU:

```bash
# CPU-only (roda em qualquer máquina)
pip install torch --index-url https://download.pytorch.org/whl/cpu

# GPU – exemplo CUDA 11.8 (mude cu118 → cu121, cu116… se precisar)
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```

---

## 3  Instalar o conjunto de dependências compatíveis

<details>

```powershell
# PowerShell – use acento ` para quebrar linha
pip install `
  numpy==1.21.6 pandas==1.5.3 pyarrow==8.0.0 `
  datasets==2.14.5 huggingface_hub==0.19.4 `
  transformers==4.39.3 tokenizers==0.15.2 `
  spacy==3.0.6 thinc==8.0.17 `
  sentencepiece==0.1.95 bert_score==0.3.9 `
  Unidecode==1.2.0 pytest==6.2.4
```

```bash
# Bash – tudo em uma linha
pip install \
  numpy==1.21.6 pandas==1.5.3 pyarrow==8.0.0 \
  datasets==2.14.5 huggingface_hub==0.19.4 \
  transformers==4.39.3 tokenizers==0.15.2 \
  spacy==3.0.6 thinc==8.0.17 \
  sentencepiece==0.1.95 bert_score==0.3.9 \
  Unidecode==1.2.0 pytest==6.2.4
```

</details>

**Por que essas versões?**

* `numpy 1.21.x` → única série compatível com **spaCy 3.0 / Thinc 8.0**.
* `huggingface_hub 0.19.4` → já usa o endpoint novo (resolve *MissingSchema*) e ainda expõe `DatasetCard` que `datasets` precisa.
* `transformers 4.39.3` → aceita `hub 0.19.4` (dep ≥ 0.14 < 1.0).
* Demais pinos são exigidos pelo QuestEval 0.2.4.

---

## 4  Baixar o modelo mínimo do spaCy

```bash
python -m spacy download en_core_web_sm
```

---

## 5  Instalar o QuestEval sem dependências

```bash
pip install --no-deps git+https://github.com/ThomasScialom/QuestEval@main
```

*(o `--no-deps` impede que ele derrube as versões que você acabou de fixar).*

---

## 6  Teste rápido

```python
from questeval.questeval_metric import QuestEval

qe = QuestEval(no_cuda=True)        # mude para False se quiser GPU
print(qe.corpus_questeval(
        hyp=["In 2002 Brazil became world champion."],
        src=["Brazil won the 2002 World Cup."],
        ref=[["Brazil won the 2002 FIFA World Cup in Japan."]]
))
```

Saída esperada (exemplo):

```python
{'corpus_score': 0.62, 'ex_level_scores': [0.62]}
```

---

## 7  (congelar) gerar especificação reprodutível

```bash
pip freeze > requirements-questeval.txt
```

---

### Observações & Dicas

* **Não** instale bibliotecas que exigem NumPy ≥ 1.23 neste mesmo venv (ex.: matplotlib ≥ 3.6).
  Se precisar de gráficos, use `matplotlib==3.5.3`.
* Se usar GPU, troque o índice de PyTorch (`cu118`, `cu121`, etc.) para combinar com seu driver CUDA.
* Múltiplos projetos? Basta criar novos venvs e rodar
  `pip install -r requirements-questeval.txt`.

Com esses passos o QuestEval funciona sem recorrer ao Conda e sem colisões de dependência.



In [None]:
from questeval.questeval_metric import QuestEval
qe = QuestEval(no_cuda=True)
print(qe.corpus_questeval(
        ["In 2002, Brazil became world champion."],
        ["Brazil won the 2002 World Cup."],
        [["Brazil won the 2002 FIFA World Cup in Japan."]]))


  from .autonotebook import tqdm as notebook_tqdm
  self.metric_BERTScore = load_metric("bertscore")
  _C._set_default_tensor_type(t)
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


{'corpus_score': 0.852132914463679, 'ex_level_scores': [0.852132914463679]}


In [None]:
log = qe.open_log_from_text("Brazil won the 2002 World Cup.")
print(log)

{'type': 'src', 'text': 'Brazil won the 2002 World Cup.', 'self': {'NER': {'answers': ['Brazil', '2002', 'World Cup'], 'QG_hash=ThomasNLG/t5-qg_squad1-en': {'questions': ['Which country won the 2002 World Cup?', 'In what year did Brazil win the World Cup?', 'What did Brazil win in 2002?']}}, 'NOUN': {'answers': ['Brazil', 'the 2002 World Cup'], 'QG_hash=ThomasNLG/t5-qg_squad1-en': {'questions': ['Which country won the 2002 World Cup?', 'What did Brazil win?']}}}, 'asked': {'In what year did Brazil become world champion?': {'QA_hash=ThomasNLG/t5-qa_squad2neg-en': {'answer': '2002', 'answerability': 0.9897031784057617, 'ground_truth': {'2002': {'bertscore': 1.0, 'f1': 1.0}}}}, 'What country became the world champion in 2002?': {'QA_hash=ThomasNLG/t5-qa_squad2neg-en': {'answer': 'Brazil', 'answerability': 0.9688682556152344, 'ground_truth': {'Brazil': {'bertscore': 1.0000003576278687, 'f1': 1.0}}}}, 'What title did Brazil win in 2002?': {'QA_hash=ThomasNLG/t5-qa_squad2neg-en': {'answer': 

In [None]:
from questeval.questeval_metric import QuestEval
questeval = QuestEval()

source_1 = "Since 2000, the recipient of the Kate Greenaway medal has also been presented with the Colin Mears award to the value of 35000."
prediction_1 = "Since 2000, the winner of the Kate Greenaway medal has also been given to the Colin Mears award of the Kate Greenaway medal."
references_1 = [
    "Since 2000, the recipient of the Kate Greenaway Medal will also receive the Colin Mears Awad which worth 5000 pounds",
    "Since 2000, the recipient of the Kate Greenaway Medal has also been given the Colin Mears Award."
]

source_2 = "He is also a member of another Jungiery boyband 183 Club."
prediction_2 = "He also has another Jungiery Boyband 183 club."
references_2 = [
    "He's also a member of another Jungiery boyband, 183 Club.", 
    "He belonged to the Jungiery boyband 183 Club."
]

score = questeval.corpus_questeval(
    hypothesis=[prediction_1, prediction_2], 
    sources=[source_1, source_2],
    list_references=[references_1, references_2]
)

print(score)



{'corpus_score': 0.6115364335096283, 'ex_level_scores': [0.5698804503395444, 0.6531924166797121]}


In [None]:
log = questeval.open_log_from_text(source_1)
print(log)

{'type': 'src', 'text': 'Since 2000, the recipient of the Kate Greenaway medal has also been presented with the Colin Mears award to the value of 35000.', 'self': {'NER': {'answers': ['2000', 'the Kate Greenaway', '35000'], 'QG_hash=ThomasNLG/t5-qg_squad1-en': {'questions': ['When was the Kate Greenaway medal awarded?', 'What medal has been presented to the recipient of the Colin Mears award since 2000?', 'How much is the Colin Mears award worth?']}}, 'NOUN': {'answers': ['the recipient', 'the Kate Greenaway medal', 'the Colin Mears award', 'the value'], 'QG_hash=ThomasNLG/t5-qg_squad1-en': {'questions': ['Who is the Kate Greenaway medal?', "What award has been presented to the recipient of the Queen's award since 2000?", 'What award has been presented to the recipient of the Kate Greenaway medal since 2000?', 'How much is the Colin Mears award?']}}}, 'asked': {'Since what year has the winner of the Kate Greenaway medal been given?': {'QA_hash=ThomasNLG/t5-qa_squad2neg-en': {'answer': 

In [None]:
list(log.keys())

['type', 'text', 'self', 'asked']

In [None]:
log["self"]

{'NER': {'answers': ['2000', 'the Kate Greenaway', '35000'],
  'QG_hash=ThomasNLG/t5-qg_squad1-en': {'questions': ['When was the Kate Greenaway medal awarded?',
    'What medal has been presented to the recipient of the Colin Mears award since 2000?',
    'How much is the Colin Mears award worth?']}},
 'NOUN': {'answers': ['the recipient',
   'the Kate Greenaway medal',
   'the Colin Mears award',
   'the value'],
  'QG_hash=ThomasNLG/t5-qg_squad1-en': {'questions': ['Who is the Kate Greenaway medal?',
    "What award has been presented to the recipient of the Queen's award since 2000?",
    'What award has been presented to the recipient of the Kate Greenaway medal since 2000?',
    'How much is the Colin Mears award?']}}}

In [None]:
import pandas as pd

df = pd.read_csv("C:\\Users\\thiago.ouverney\\Projetos\\pyAutoSummarizer\\data\\model_annotations_merged.csv")

In [None]:
df.loc[1]

Unnamed: 0                                                            1
id                     dm-test-8764fb95bfad8ee849274873a92fb8d6b400eee2
decoded               paul merson has restarted his row with andros ...
expert_annotations    [{'coherence': 3, 'consistency': 5, 'fluency':...
turker_annotations    [{'coherence': 2, 'consistency': 3, 'fluency':...
references            ["Andros Townsend an 83rd minute sub in Totten...
model_id                                                            M13
filepath              cnndm/dailymail/stories/8764fb95bfad8ee8492748...
story_id                       8764fb95bfad8ee849274873a92fb8d6b400eee2
existe_story                                                       True
content               Paul Merson has restarted his row with Andros ...
Name: 1, dtype: object

In [None]:
from questeval.questeval_metric import QuestEval
qe = QuestEval(no_cuda=True)
print(qe.corpus_questeval(
        hypothesis=[df.loc[1,"decoded"]],
        sources=[df.loc[1,"content"]],
        list_references=[eval(df.loc[1,"references"])[0]]
        )
        )
