# Using LLM-as-a-judge for an automated and versatile evaluation 🧑‍⚖️

A powerful solution to assess outputs in a human way, without requiring costly human time, is LLM-as-a-judge.
This method was introduced in [Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena](https://huggingface.co/papers/2306.05685).


In [1]:
!pip install huggingface_hub datasets pandas tqdm -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/480.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following 

In [1]:
import re
import pandas as pd
from tqdm.auto import tqdm
from datasets import load_dataset
from huggingface_hub import InferenceClient, notebook_login

tqdm.pandas()  # load tqdm's pandas support
pd.set_option("display.max_colwidth", None)

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [2]:
repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"

llm_client = InferenceClient(
    model=repo_id,
    timeout=120,
)

# Test your LLM client
llm_client.text_generation(prompt="How are you today?", max_new_tokens=20)

'\n\nI’m good, thanks. I’m just about to go to the gym.'

## 1. Prepare the creation and evaluation of the LLM judge

Measuring the LLM answer's quality is difficult because for instance an exact string match will flag too many correct but differently worded answers as false. Using human labellers is very time-consuming process, and if the model is updated then the entire process should be done all over again.

The dataset [`feedbackQA`](https://github.com/McGill-NLP/feedbackqa), which contains 2 human evaluations and scores for each question/answer couple is used.

In [4]:
# ratings = load_dataset("McGill-NLP/feedbackQA")
ratings = pd.read_json("feedback_train.json")


ratings["review_1"] = ratings["rating"].apply(lambda x: x[0])
ratings["explanation_1"] = ratings["feedback"].apply(lambda x: x[0])
ratings["review_2"] = ratings["rating"].apply(lambda x: x[1])
ratings["explanation_2"] = ratings["feedback"].apply(lambda x: x[1])
ratings = ratings.drop(columns=["feedback"])
ratings = ratings.drop(columns=["rating"])

# Map scores to numeric values
conversion_dict = {"Excellent": 4, "Acceptable": 3, "Could be Improved": 2, "Bad": 1}
ratings["score_1"] = ratings["review_1"].map(conversion_dict)
ratings["score_2"] = ratings["review_2"].map(conversion_dict)

Checking the correlation between 2 human raters by [Pearson correlation](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) to compute a baseline for performance

In [5]:
print("Correlation between 2 human raters:")
print(f"{ratings['score_1'].corr(ratings['score_2'], method='pearson'):.3f}")

Correlation between 2 human raters:
0.563


This correlation between 2 human raters is not that good. If your human ratings are really bad, it probably means the rating criteria are not clear enough.

This means that the "ground truth" contains noise: hence we cannot expect any algorithmic evaluation to come that close to it. To reduce nose only the example where the 2 human reviewers are in agreement are selected.

In [6]:
# Sample examples
ratings_where_raters_agree = ratings.loc[ratings["score_1"] == ratings["score_2"]]
examples = ratings_where_raters_agree.groupby("score_1").sample(7, random_state=1214)
examples["human_score"] = examples["score_1"]

# Visualize 1 sample for each score
display(examples.groupby("human_score").first())

Unnamed: 0_level_0,question,passage,domain,review_1,explanation_1,review_2,explanation_2,score_1,score_2
human_score,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,What can I do to help people that are grieving?,"{'passage_id': 37, 'source': 'CDC', 'uri': 'https://www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/managing-stress-anxiety.html', 'reference_type': 'Passage_only', 'reference': {'page_title': 'Coping with Stress', 'section_headers': ['Take care of yourself and your community'], 'section_content': 'Taking care of yourself, your friends, and your family can help you cope with stress. Helping others cope with their stress can also make your community stronger. Ways to cope with stress Take breaks from watching, reading, or listening to news stories , including social media. Hearing about the pandemic repeatedly can be upsetting. Take care of your body. Take deep breaths, stretch, or meditate. Try to eat healthy, well-balanced meals. Exercise regularly, get plenty of sleep. Avoid alcohol and drugs. Make time to unwind. Try to do some other activities you enjoy. Connect with others. Talk with people you trust about your concerns and how you are feeling. Know the facts to help reduce stress Understanding the risk to yourself and people you care about can make an outbreak less stressful. Learn and share the facts about COVID-19 and help stop the spread of rumors. When you share accurate information about COVID-19, you can help make people feel less stressed, make a connection with them, and help stop stigma. Take care of your mental health Call your healthcare provider if stress gets in the way of your daily activities for several days in a row. People with preexisting mental health conditions should continue with their treatment and be aware of new or worsening symptoms. Additional information can be found at the Substance Abuse and Mental Health Services Administration (SAMHSA) Disaster Preparedness page. Learn more about taking care of your emotional health during a stressful event like the COVID-19 outbreak.', 'selected_span': None, 'section_content_html': '<p>Taking care of yourself, your friends, and your family can help you cope with stress. Helping others cope with their stress can also make your community stronger.</p> <h3>Ways to cope with stress</h3> <ul> <li><strong>Take breaks from watching, reading, or listening to news stories</strong> , including social media. Hearing about the pandemic repeatedly can be upsetting.</li> <li><strong>Take care of your body</strong>. <ul> <li>Take deep breaths, stretch, or <a href=""https://nccih.nih.gov/health/meditation/overview.htm"">meditate</a>.</li> <li><a href=""/nccdphp/dnpao/features/national-nutrition-month/index.html"">Try to eat healthy, well-balanced meals</a>.</li> <li><a href=""/physicalactivity/basics/index.htm"">Exercise regularly</a>, <a href=""/sleep/about_sleep/sleep_hygiene.html"">get plenty of sleep</a>.</li> <li>Avoid <a href=""/alcohol/fact-sheets/alcohol-use.htm"">alcohol</a> and <a href=""https://www.drugabuse.gov/related-topics/health-consequences-drug-misuse"">drugs</a>.</li> </ul> </li> <li><strong>Make time to unwind</strong>. Try to do some other activities you enjoy.</li> <li><strong>Connect with others</strong>. Talk with people you trust about your concerns and how you are feeling.</li> </ul> <h3>Know the facts to help reduce stress</h3> <p>Understanding the risk to yourself and people you care about can make an outbreak less stressful.</p> <p>Learn and share the facts about COVID-19 and help <a href=""/coronavirus/2019-ncov/daily-life-coping/share-facts.html"">stop the spread of rumors</a>. When you share accurate information about COVID-19, you can help make people feel less stressed, make a connection with them, and <a href=""/coronavirus/2019-ncov/daily-life-coping/reducing-stigma.html"">help stop stigma</a>.</p> <h3>Take care of your mental health</h3> <p><strong>Call your healthcare provider if stress gets in the way</strong> of your daily activities for several days in a row.</p> <p><strong>People with preexisting mental health conditions</strong> should continue with their treatment and be aware of new or worsening symptoms. Additional information can be found at the Substance Abuse and Mental Health Services Administration <a href=""https://www.samhsa.gov/disaster-preparedness"">(SAMHSA) Disaster Preparedness</a> page.</p> <p>Learn more about <a href=""https://emergency.cdc.gov/coping/selfcare.asp"">taking care of your emotional health</a> during a stressful event like the COVID-19 outbreak.</p>'}}",CDC,Bad,The question is about others which the reply did not answer.,Bad,The response could have addressed how to help those that are grieving cope rather than what it was presenting.,1,1
2,What protocols do workplaces need to follow to keep everyone safer?,"{'passage_id': 153, 'source': 'Australia', 'uri': 'https://coronavirus.fairwork.gov.au/', 'reference_type': 'Passage_only', 'reference': {'page_title': 'Coronavirus and Australian workplace laws', 'section_headers': ['Health & safety in the workplace'], 'section_content': 'Workplaces must follow the rules about health and safety during coronavirus to help stop it spreading. Find out more about: rules and obligations under workplace health and safety laws how to manage the risk of coronavirus in the workplace where to go for help. Learn more about Health and safety in the workplace during coronavirus.', 'selection_span': None, 'section_content_html': '<p>Workplaces must follow the rules about health and safety during coronavirus to help stop it spreading. Find out more about:</p> <ul> <li>rules and obligations under workplace health and safety laws</li> <li>how to manage the risk of coronavirus in the workplace</li> <li>where to go for help.</li> </ul> <p>Learn more about <a href=""/coronavirus-and-australian-workplace-laws/health-and-safety-in- the-workplace-during-coronavirus"">Health and safety in the workplace during coronavirus</a>.</p>'}}",Australia,Could be Improved,"This answer needs to be improved because it doesn’t provide information up-front about workplaces during the pandemic. Instead, it just includes a hyperlink.",Could be Improved,"there is one link to information, but there is no information in the answer about how to stay safe in the workplace. it talks about the need to stay safe in the workplace, but it doesn't talk about ways in which to actually do that.",2,2
3,How soon can I apply for financial support?,"{'passage_id': 43, 'source': 'Australia', 'uri': 'https://www.ato.gov.au/Individuals/Super/In-detail/Withdrawing-and-using-your-super/COVID-19-early-release-of-super/', 'reference_type': 'Passage_only', 'reference': {'page_title': 'COVID-19 early release of super', 'section_headers': ['After you apply'], 'section_content': 'It will take us up to four business days to process your application and send your outcome letter to your myGov inbox. You may also receive an SMS notification. If you receive a notification from us and haven't applied to access your super early, you need to call us or your fund as soon as possible. If you have an Australian Prudential Regulation Authority (APRA) fund and your application is approved, you do not need to contact us or your fund. Your fund will make the payment to you without you needing to apply to them directly. The Australian Prudential Regulation Authority (APRA) have issued guidance to super funds and expect payment to be made to members within five business days once they have been notified by us. However, this time may increase where funds need to contact you to clarify information. More information can be found on APRA's websiteExternal Link. If your fund is a state-administered fund, they need to follow the rules of their trust deed to determine if they're allowed to release super due to COVID-19. You will need to get confirmation from your fund, before you submit an application, that they can release your super early and whether they require a letter of approval (determination) from us. If your fund is an SMSF , you will need to let them know that you have received the letter of approval from us so they can make the payment to you.', 'selection_span': None, 'section_content_html': '<p>It will take us up to four business days to process your application and send your outcome letter to your myGov inbox. You may also receive an SMS notification.</p> <p>If you receive a notification from us and haven't applied to access your super early, you need to call us or your fund as soon as possible.</p> <p>If you have an <strong>Australian Prudential Regulation Authority (APRA) fund</strong> and your application is approved, you do not need to contact us or your fund. Your fund will make the payment to you without you needing to apply to them directly.</p> <p>The Australian Prudential Regulation Authority (APRA) have issued guidance to super funds and expect payment to be made to members within five business days once they have been notified by us. However, this time may increase where funds need to contact you to clarify information. More information can be found on <a href=""https://www.apra.gov.au/frequently- asked-questions-superannuation-trustees-response-to-covid-19"">APRA's websiteExternal Link</a>.</p> <p>If your fund is a <strong>state-administered fund,</strong> they need to follow the rules of their trust deed to determine if they're allowed to release super due to COVID-19. You will need to get confirmation from your fund, before you submit an application, that they can release your super early and whether they require a letter of approval (determination) from us.</p> <p>If your fund is an <strong>SMSF</strong> , you will need to let them know that you have received the letter of approval from us so they can make the payment to you.</p>'}}",Australia,Acceptable,"There is information on how to apply for the help. Still, there is nothing say how long you have to wait before applying.",Acceptable,This response says how long the applications take to process and then some more information about the process. There's a link to more relevant information. A pretty good answer,3,3
4,Should vulnerable children be expected to be in educational settings?,"{'passage_id': 789, 'source': 'UK', 'uri': 'https://www.gov.uk/government/publications/covid-19-school-closures/guidance-for-schools-about-temporarily-closing', 'reference_type': 'FAQ', 'reference': {'page_title': 'Guidance Actions for schools during the coronavirus outbreak', 'section_headers': ['Prioritising pupils', 'What are our expectations regarding vulnerable children and young people attending educational settings?'], 'section_content': 'Vulnerable children and young people’s attendance is expected, where it is appropriate for them (i.e. where there are no shielding concerns for the child or their household, and/or following a risk assessment for children with an EHC plan), so that they can gain the educational and wellbeing benefits of attending. Vulnerable children and young people – regardless of year group – that have not been attending in the recent period are expected to return to school where this would now be appropriate for them to do so. A brief summary of attendance expectations across the different groups of vulnerable children and young people is as follows: for vulnerable children and young people who have a social worker, attendance is expected unless the child/household is shielding or clinically vulnerable (see the advice set out by Public Health England on households with possible coronavirus infection, and shielding and protecting people defined on medical grounds as extremely vulnerable). for vulnerable children and young people who have an education health and care (EHC) plan, attendance is expected where it is determined, following risk assessment, that their needs can be as safely or more safely met in the educational environment. Read further guidance on temporary Changes to education, health and care (EHC) needs and assessments for vulnerable children and young people who are deemed otherwise vulnerable, at the school, college or local authority discretion, attendance is expected unless the child/household is shielding or clinically vulnerable (see the advice set out by Public Health England on households with possible coronavirus infection, and shielding and protecting people defined on medical grounds as extremely vulnerable). *[EHC]: Education, Health and Care', 'selection_span': None, 'section_content_html': '<p>Vulnerable children and young people’s attendance is expected, where it is appropriate for them (i.e. where there are no shielding concerns for the child or their household, and/or following a risk assessment for children with an EHC plan), so that they can gain the educational and wellbeing benefits of attending. Vulnerable children and young people – regardless of year group – that have not been attending in the recent period are expected to return to school where this would now be appropriate for them to do so. A brief summary of attendance expectations across the different groups of vulnerable children and young people is as follows:</p> <ul> <li>for vulnerable children and young people who have a social worker, attendance is expected unless the child/household is shielding or clinically vulnerable (see the advice set out by Public Health England on <a href=""https://www.gov.uk/government/publications/covid-19-stay-at-home-guidance"">households with possible coronavirus infection</a>, and <a href=""https://www.gov.uk/government/publications/guidance-on-shielding-and-protecting-extremely-vulnerable-persons-from-covid-19"">shielding and protecting people defined on medical grounds as extremely vulnerable</a>).</li> <li>for vulnerable children and young people who have an education health and care (EHC) plan, attendance is expected where it is determined, following <a href=""https://www.gov.uk/government/publications/coronavirus-covid-19-send-risk-assessment-guidance/coronavirus-covid-19-send-risk-assessment-guidance"">risk assessment</a>, that their needs can be as safely or more safely met in the educational environment. Read further guidance on temporary <a href=""https://www.gov.uk/government/publications/changes-to-the-law-on-education-health-and-care-needs-assessments-and-plans-due-to-coronavirus/education-health-and-care-needs-assessments-and-plans-guidance-on-temporary-legislative-changes-relating-to-coronavirus-covid-19"">Changes to education, health and care (EHC) needs and assessments</a></li> <li>for vulnerable children and young people who are deemed otherwise vulnerable, at the school, college or local authority discretion, attendance is expected unless the child/household is shielding or clinically vulnerable (see the advice set out by Public Health England on <a href=""https://www.gov.uk/government/publications/covid-19-stay-at-home-guidance"">households with possible coronavirus infection</a>, and <a href=""https://www.gov.uk/government/publications/guidance-on-shielding-and-protecting-extremely-vulnerable-persons-from-covid-19"">shielding and protecting people defined on medical grounds as extremely vulnerable</a>).</li> </ul> <p>*[EHC]: Education, Health and Care</p>'}}",UK,Excellent,There is a lot of relevant information here. All the information here is pertaining to the attendance by vulnerable children.,Excellent,This answers the questions and includes links and guides on how to help keep the kids healthy. It provides guidelines on what to do and how to bring the students back to school,4,4


## 2. Create the LLM judge

In [7]:
JUDGE_PROMPT = """
You will be given a user_question and system_answer couple.
Your task is to provide a 'total rating' scoring how well the system_answer answers the user concerns expressed in the user_question.
Give your answer as a float on a scale of 0 to 10, where 0 means that the system_answer is not helpful at all, and 10 means that the answer completely and helpfully addresses the question.

Provide your feedback as follows:

Feedback:::
Total rating: (your rating, as a float between 0 and 10)

Now here are the question and answer.

Question: {question}
Answer: {answer}

Feedback:::
Total rating: """

In [8]:
repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"

llm_client = InferenceClient(
    model=repo_id,
    timeout=120,
)

examples["llm_judge"] = examples.progress_apply(
    lambda x: llm_client.text_generation(
        prompt=JUDGE_PROMPT.format(question=x["question"], answer=x["passage"]),
        max_new_tokens=1000,
    ),
    axis=1,
)

  0%|          | 0/28 [00:00<?, ?it/s]

In [9]:
def extract_judge_score(answer: str, split_str: str = "Total rating:") -> int:
    try:
        if split_str in answer:
            rating = answer.split(split_str)[1]
        else:
            rating = answer
        digit_groups = [el.strip() for el in re.findall(r"\d+(?:\.\d+)?", rating)]
        return float(digit_groups[0])
    except Exception as e:
        print(e)
        return None


examples["llm_judge_score"] = examples["llm_judge"].apply(extract_judge_score)
# Rescale the score given by the LLM on the same scale as the human score
examples["llm_judge_score"] = (examples["llm_judge_score"] / 10) + 1

In [10]:
print("Correlation between LLM-as-a-judge and the human raters:")
print(
    f"{examples['llm_judge_score'].corr(examples['human_score'], method='pearson'):.3f}"
)

Correlation between LLM-as-a-judge and the human raters:
0.558


## 3. Improve the LLM judge

To get better results the prompt is changed in the below aspects:
- ⏳ **Leave more time for thought** by adding an `Evaluation` field before the final answer.
- 🔢 **Use a small integer scale** like 1-4 or 1-5 instead of a large float scale as we had previously.
- 👩‍🏫 **Provide an indicative scale for guidance**.
- Add a carrot to motivate the LLM!

In [11]:
IMPROVED_JUDGE_PROMPT = """
You will be given a user_question and system_answer couple.
Your task is to provide a 'total rating' scoring how well the system_answer answers the user concerns expressed in the user_question.
Give your answer on a scale of 1 to 4, where 1 means that the system_answer is not helpful at all, and 4 means that the system_answer completely and helpfully addresses the user_question.

Here is the scale you should use to build your answer:
1: The system_answer is terrible: completely irrelevant to the question asked, or very partial
2: The system_answer is mostly not helpful: misses some key aspects of the question
3: The system_answer is mostly helpful: provides support, but still could be improved
4: The system_answer is excellent: relevant, direct, detailed, and addresses all the concerns raised in the question

Provide your feedback as follows:

Feedback:::
Evaluation: (your rationale for the rating, as a text)
Total rating: (your rating, as a number between 1 and 4)

You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.

Now here are the question and answer.

Question: {question}
Answer: {answer}

Provide your feedback. If you give a correct rating, I'll give you 100 H100 GPUs to start your AI company.
Feedback:::
Evaluation: """

In [12]:
examples["llm_judge_improved"] = examples.progress_apply(
    lambda x: llm_client.text_generation(
        prompt=IMPROVED_JUDGE_PROMPT.format(question=x["question"], answer=x["passage"]),
        max_new_tokens=500,
    ),
    axis=1,
)
examples["llm_judge_improved_score"] = examples["llm_judge_improved"].apply(
    extract_judge_score
)

  0%|          | 0/28 [00:00<?, ?it/s]

In [13]:
print("Correlation between LLM-as-a-judge and the human raters:")
print(
    f"{examples['llm_judge_improved_score'].corr(examples['human_score'], method='pearson'):.3f}"
)

Correlation between LLM-as-a-judge and the human raters:
0.863


The correlation was **improved by nearly 30%** with only a few tweaks to the prompt.

Let's display a few errors of the LLM judge to analyse them:

In [14]:
errors = pd.concat(
    [
        examples.loc[
            examples["llm_judge_improved_score"] > examples["human_score"]
        ].head(1),
        examples.loc[
            examples["llm_judge_improved_score"] < examples["human_score"]
        ].head(2),
    ]
)

display(
    errors[
        [
            "question",
            "passage",
            "human_score",
            "explanation_1",
            "llm_judge_improved_score",
            "llm_judge_improved",
        ]
    ]
)

Unnamed: 0,question,passage,human_score,explanation_1,llm_judge_improved_score,llm_judge_improved
1976,What can I do to help people that are grieving?,"{'passage_id': 37, 'source': 'CDC', 'uri': 'https://www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/managing-stress-anxiety.html', 'reference_type': 'Passage_only', 'reference': {'page_title': 'Coping with Stress', 'section_headers': ['Take care of yourself and your community'], 'section_content': 'Taking care of yourself, your friends, and your family can help you cope with stress. Helping others cope with their stress can also make your community stronger. Ways to cope with stress Take breaks from watching, reading, or listening to news stories , including social media. Hearing about the pandemic repeatedly can be upsetting. Take care of your body. Take deep breaths, stretch, or meditate. Try to eat healthy, well-balanced meals. Exercise regularly, get plenty of sleep. Avoid alcohol and drugs. Make time to unwind. Try to do some other activities you enjoy. Connect with others. Talk with people you trust about your concerns and how you are feeling. Know the facts to help reduce stress Understanding the risk to yourself and people you care about can make an outbreak less stressful. Learn and share the facts about COVID-19 and help stop the spread of rumors. When you share accurate information about COVID-19, you can help make people feel less stressed, make a connection with them, and help stop stigma. Take care of your mental health Call your healthcare provider if stress gets in the way of your daily activities for several days in a row. People with preexisting mental health conditions should continue with their treatment and be aware of new or worsening symptoms. Additional information can be found at the Substance Abuse and Mental Health Services Administration (SAMHSA) Disaster Preparedness page. Learn more about taking care of your emotional health during a stressful event like the COVID-19 outbreak.', 'selected_span': None, 'section_content_html': '<p>Taking care of yourself, your friends, and your family can help you cope with stress. Helping others cope with their stress can also make your community stronger.</p> <h3>Ways to cope with stress</h3> <ul> <li><strong>Take breaks from watching, reading, or listening to news stories</strong> , including social media. Hearing about the pandemic repeatedly can be upsetting.</li> <li><strong>Take care of your body</strong>. <ul> <li>Take deep breaths, stretch, or <a href=""https://nccih.nih.gov/health/meditation/overview.htm"">meditate</a>.</li> <li><a href=""/nccdphp/dnpao/features/national-nutrition-month/index.html"">Try to eat healthy, well-balanced meals</a>.</li> <li><a href=""/physicalactivity/basics/index.htm"">Exercise regularly</a>, <a href=""/sleep/about_sleep/sleep_hygiene.html"">get plenty of sleep</a>.</li> <li>Avoid <a href=""/alcohol/fact-sheets/alcohol-use.htm"">alcohol</a> and <a href=""https://www.drugabuse.gov/related-topics/health-consequences-drug-misuse"">drugs</a>.</li> </ul> </li> <li><strong>Make time to unwind</strong>. Try to do some other activities you enjoy.</li> <li><strong>Connect with others</strong>. Talk with people you trust about your concerns and how you are feeling.</li> </ul> <h3>Know the facts to help reduce stress</h3> <p>Understanding the risk to yourself and people you care about can make an outbreak less stressful.</p> <p>Learn and share the facts about COVID-19 and help <a href=""/coronavirus/2019-ncov/daily-life-coping/share-facts.html"">stop the spread of rumors</a>. When you share accurate information about COVID-19, you can help make people feel less stressed, make a connection with them, and <a href=""/coronavirus/2019-ncov/daily-life-coping/reducing-stigma.html"">help stop stigma</a>.</p> <h3>Take care of your mental health</h3> <p><strong>Call your healthcare provider if stress gets in the way</strong> of your daily activities for several days in a row.</p> <p><strong>People with preexisting mental health conditions</strong> should continue with their treatment and be aware of new or worsening symptoms. Additional information can be found at the Substance Abuse and Mental Health Services Administration <a href=""https://www.samhsa.gov/disaster-preparedness"">(SAMHSA) Disaster Preparedness</a> page.</p> <p>Learn more about <a href=""https://emergency.cdc.gov/coping/selfcare.asp"">taking care of your emotional health</a> during a stressful event like the COVID-19 outbreak.</p>'}}",1,The question is about others which the reply did not answer.,2.0,"The system_answer is mostly not helpful: misses some key aspects of the question. The user asked what they can do to help people that are grieving, but the system_answer focuses on coping with stress and anxiety in general.\nTotal rating: 2"
472,Can the covid19 event visa be granted to anyone?,"{'passage_id': 577, 'source': 'Australia', 'uri': 'https://covid19.homeaffairs.gov.au/frequently-asked-questions', 'reference_type': 'FAQ', 'reference': {'page_title': 'Frequently Asked Questions', 'section_headers': ['COVID-19 Pandemic - Australian Government Endorsed Event (AGEE) stream of the Temporary Activity (subclass 408) visa', 'Frequently Asked Questions', 'I am overseas. Can I be granted a COVID-19 Pandemic event visa?'], 'section_content': 'The COVID-19 Pandemic event visa can only be granted to people in Australia.', 'selection_span': None, 'section_content_html': '<p>The COVID-19 Pandemic event visa can only be granted to people in Australia.</p>'}}",2,"This information stated that the Covid-19 Pandemic event visa can be granted to people in Australia, however, it is not clear as to what groups of people in Australia are eligible for this visa.",1.0,"The system_answer is terrible: completely irrelevant to the question asked, or very partial. The question asks about the possibility of granting the COVID19 event visa to anyone, while the system_answer only states that the COVID19 Pandemic event visa can only be granted to people in Australia.\nTotal rating: 1"
670,What programs can assist busy childcare facilities?,"{'passage_id': 507, 'source': 'Australia', 'uri': 'https://www.dese.gov.au/covid-19/childcare/childcare-faq', 'reference_type': 'FAQ', 'reference': {'page_title': 'Early Childhood Education and Care COVID-19 Frequently Asked Questions', 'section_headers': ['Early Childhood Education and Care Relief Package and the Exceptional Circumstance Supplementary Payment—information for providers and services', 'How does an Exceptional Circumstance Supplementary Payment affect my JobKeeper application?'], 'section_content': 'For a provider this payment and the base payment under the Relief Package are not considered as revenue for GST purposes. This means providers will be able to show they satisfy the decline in income test for the purposes of the JobKeeper Payment provided they do not have income from other sources, such as being part of a larger entity like a non-government school or a not-for-profit organisation. Where some of this revenue is then passed on to Family Day Care and In Home Care educators (based on contractual arrangements between the service and the educator) these monies are considered as revenue for GST purposes. As the educator is unlikely to receive more than 50 per cent of their fee revenue from the provider, they should be able to satisfy the decline in income test for the JobKeeper Payment.', 'selection_span': None, 'section_content_html': '<p>For a provider this payment and the base payment under the Relief Package are not considered as revenue for GST purposes. This means providers will be able to show they satisfy the decline in income test for the purposes of the JobKeeper Payment provided they do not have income from other sources, such as being part of a larger entity like a non-government school or a not-for-profit organisation.</p> <p>Where some of this revenue is then passed on to Family Day Care and In Home Care educators (based on contractual arrangements between the service and the educator) these monies are considered as revenue for GST purposes. As the educator is unlikely to receive more than 50 per cent of their fee revenue from the provider, they should be able to satisfy the decline in income test for the JobKeeper Payment.</p>'}}",2,Gives some information on assistance programs but no references or contacts,1.0,"The system_answer does not address the user_question at all. The user_question asks about programs that can assist busy childcare facilities, but the system_answer talks about the JobKeeper Payment and the decline in income test for the JobKeeper Payment.\nTotal rating: 1"


The disagreements are minor: overall, it is a good level of performance for the system!