## Prometheus evaluation

### Running on GPU
#### Imports and loading

In [1]:
from prometheus_eval.vllm import VLLM
from prometheus_eval import PrometheusEval
from prometheus_eval.prompts import ABSOLUTE_PROMPT, SCORE_RUBRIC_TEMPLATE

model = VLLM(model="prometheus-eval/prometheus-7b-v2.0")
judge = PrometheusEval(model=model, absolute_grade_template=ABSOLUTE_PROMPT)

INFO 07-23 12:26:55 llm_engine.py:174] Initializing an LLM engine (v0.5.2) with config: model='prometheus-eval/prometheus-7b-v2.0', speculative_config=None, tokenizer='prometheus-eval/prometheus-7b-v2.0', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=prometheus-eval/prometheus-7b-v2.0, use_v2_block_manager=False, enable_prefix_caching=False)
INFO 07-23 12:26:56 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-23 12:26:59 m



#### Testing prompt
A testing prompt prepared by the creators for evaluation testing. The result meets expectations.

In [4]:
data = {
    "instruction": "Struggling with a recent break-up, a person opens up about the intense feelings of loneliness and sadness. They ask for advice on how to cope with the heartbreak and move forward in life.",
    
    "response": "I'm genuinely sorry to hear about your break-up. This can be an immensely challenging time filled with intense emotions of loneliness and sorrow. It's important to remember that it's normal to experience these feelings; they are a testament to the deep connection you once had. Firstly, don't hesitate to embrace your emotions, let them flow freely. Ignoring them often complicates the healing process. It's natural to be upset, to shed tears, or to feel angry. This is all part of the journey towards healing. Self-care can be a healing balm in these tough times. This could mean indulging in your favourite book, journaling your thoughts, or even just a quiet cup of tea. Small acts of kindness to oneself can significantly improve your emotional well-being. Also, don't isolate yourself. Surround yourself with supportive individuals who can offer comfort and understanding. This might include friends, family, or even a mental health professional. They can provide much-needed emotional support and strategies to help you cope. Lastly, remember there's no set timeline for healing. It's different for everyone and it's okay to take as long as you need. Keep your focus on the future and the potential it holds. Though it may not seem possible now, you will emerge stronger from this experience. It's okay to ask for help, it's okay to grieve, and it's okay to take it one day at a time. You're not alone in this journey, and there are people ready and willing to help. It's okay to heal at your own pace.",

    "reference_answer": "I can only imagine how difficult this time must be for you. Heartbreak can feel like an overwhelming wave that washes over you, making you feel lost and alone. It's important to remember that it's okay to feel this way; your feelings are valid and it's natural to grieve the loss of a relationship. Firstly, it's crucial to allow yourself to feel the emotions fully. Suppressing or denying them often only prolongs the healing process. It's okay to cry, to feel sad, or to be angry. These are all part of the healing journey. Engaging in self-care activities can also provide a much-needed boost. This could be anything from taking a warm bath, writing in a journal, or simply drinking a cup of your favorite tea. Small acts of self-love can make a big difference in how you feel. Next, try to surround yourself with supportive people who understand your situation and provide comfort. Friends and family can be a great source of strength in times of heartbreak. If you feel comfortable, you might want to consider seeking professional help. Therapists and counselors are trained to provide assistance and tools to navigate through difficult times like these. Lastly, it's important to remember that it's okay to take your time to heal. Everyone has their own pace and there's no rush. Try to focus on the future and the possibilities it holds. While it may not seem like it now, you will come out stronger and more resilient from this experience. Remember, it's okay to ask for help and it's okay to feel the way you feel. You are not alone in this journey and there are people who care about you and want to help. It's okay to take one day at a time. Healing is a process, and it's okay to move through it at your own pace.",

    "rubric": SCORE_RUBRIC_TEMPLATE.format(
    criteria="Is the model proficient in applying empathy and emotional intelligence to its responses when the user conveys emotions or faces challenging circumstances?",
    score1_description="The model neglects to identify or react to the emotional tone of user inputs, giving responses that are unfitting or emotionally insensitive.",
    score2_description="The model intermittently acknowledges emotional context but often responds without sufficient empathy or emotional understanding.",
    score3_description="The model typically identifies emotional context and attempts to answer with empathy, yet the responses might sometimes miss the point or lack emotional profundity.",
    score4_description="The model consistently identifies and reacts suitably to emotional context, providing empathetic responses. Nonetheless, there may still be sporadic oversights or deficiencies in emotional depth.",
    score5_description="The model excels in identifying emotional context and persistently offers empathetic, emotionally aware responses that demonstrate a profound comprehension of the user's emotions or situation.")
}

feedback, score = judge.single_absolute_grade(**data)
print(f"Score: {score}\n"
      f"Feedback: {feedback}")

Processed prompts: 100%|██████████| 1/1 [00:03<00:00,  3.12s/it, est. speed input: 392.22 toks/s, output: 76.07 toks/s]


Processed 1/1 instances.


Finalizing: 100%|██████████| 1/1 [00:00<00:00, 21620.12it/s]

Score: 5
Feedback: The response excels in recognizing and addressing the emotional tone of the user's input, which aligns with the highest score on the rubric. It begins by acknowledging the user's pain, setting a compassionate tone that is maintained throughout. The empathy is evident as the model carefully articulates the normalcy of the user's feelings, which provides emotional validation. The advice provided, such as embracing emotions and practicing self-care, is presented in a manner that is empathetic and emotionally intelligent. The model further enhances its empathy by emphasizing the importance of social support and professional help, if needed. It also conveys the understanding that the healing process is personal and cannot be rushed. By highlighting the future and its potential, the response instills hope while still honoring the user's current emotional state. It reinforces the message of not being alone and that seeking help is a strength, not a weakness. This level of e




#### Hallucination prompt example
Custom prompt prepared for detecting hallucinations. Here, the response and the referenced answer were generated by ChatGPT. In this case as well, the result is correct. The model successfully detected false information. 

In [5]:
data = {
    "instruction": "An AI model is tasked with responding to customers' questions. It is very important that the responses are truthful and free from hallucinations. The length of the response does not matter.",
    
    "response": "The capital of France is Paris. Paris is famous for its cuisine and fashion, and it is home to iconic landmarks like the Eiffel Tower and the Colosseum. It is also known as the 'City of Love' due to its romantic ambiance.",

    "reference_answer": "The capital of France is Paris. Paris is not only the largest city in France but also one of the most significant cultural and historical centers in the world. It is known for its landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. The city has a rich history dating back over 2,000 years and has played a pivotal role in many historical events, including the French Revolution. Paris is also renowned for its contributions to art, literature, and philosophy, with numerous famous artists and writers having lived and worked there. Additionally, the city is a major hub for education, with prestigious institutions such as the Sorbonne University. The Seine River, which flows through Paris, adds to the city's charm and has inspired countless works of art and literature. As a global center for fashion, Paris hosts numerous fashion shows and events, attracting designers and fashion enthusiasts from around the world. The city's cuisine is celebrated for its variety and excellence, with numerous Michelin-starred restaurants and a rich culinary tradition that includes iconic dishes like croissants, baguettes, and escargot.",

    "rubric": SCORE_RUBRIC_TEMPLATE.format(
    criteria="Is the model's response truthful and free from hallucinations?",
    score1_description="The model frequently generates false information or hallucinations, providing responses that are misleading or factually incorrect.",
    score2_description="The model occasionally provides accurate information, but often includes hallucinations or inaccuracies in its responses.",
    score3_description="The model generally provides truthful responses, though there are occasional instances of hallucinations or minor inaccuracies.",
    score4_description="The model consistently delivers truthful and accurate responses, with only rare instances of minor inaccuracies or hallucinations.",
    score5_description="The model excels in providing consistently truthful and accurate responses, demonstrating a high level of reliability and free from any hallucinations or inaccuracies.")
}

feedback, score = judge.single_absolute_grade(**data)
print(f"Score: {score}\n"
      f"Feedback: {feedback}")

Processed prompts: 100%|██████████| 1/1 [00:02<00:00,  2.10s/it, est. speed input: 353.34 toks/s, output: 80.84 toks/s]


Processed 1/1 instances.


Finalizing: 100%|██████████| 1/1 [00:00<00:00, 25115.59it/s]

Score: 3
Feedback: This response correctly identifies Paris as the capital of France and mentions some well-known aspects of the city, such as its cuisine, fashion, and romantic ambiance. However, the inclusion of the Colosseum in the response is inaccurate. The Colosseum is located in Rome, Italy, not Paris, France. Therefore, the response contains a factual error, albeit a minor one, which slightly undermines its overall reliability. Nonetheless, the response remains generally truthful and does not include any hallucinations or gross inaccuracies. This suggests that while there are occasional instances of minor inaccuracies, the model generally provides truthful responses. Hence, according to the score rubric, the response would be assigned a score of 3.





#### Hallucination prompt ROHub
This prompt contains the same instruction and rubric as the previous one. However, the response and referenced answer have been changed. The response was generated by the ROHub chatbot, while the referenced answer contains 2 PDF files from the knowledge base that the bot should draw information from. The prompt should correctly detect any incorrect information contained in the response. A score of 5 points should be given when the response contains no false information and does not mislead the user. As expected, the evaluation result is a score of 5. 

In [7]:
data = {
    "instruction": "An AI model is tasked with responding to customers' questions. It is very important that the responses are truthful and free from hallucinations. The length of the response does not matter.",

    "response": """
    <p>JSON stands for JavaScript Object Notation. It's a lightweight format for storing and transporting data. JSON is often used when data is sent from/to a server to/from a web page. It's \"self-describing\" 
    and easy to understand. JSON is based on a subset of the JavaScript Programming Language Standard.</p>\n\n<p>JSON is a widely used format for storing and exchanging data among clients and servers over APIs. 
    An advantage of JSON is that any JavaScript object can be converted into/from JSON. JSON is now a de facto representation for exchanging data among most API/programming languages and systems.</p>\n\n<p>
    JSON's key characteristics are:</p>\n\n<ul>\n<li>Easy to read and write in text format.</li>\n<li>Used by programming languages.</li>\n<li>
    Lightweight, making it easy to compress and transmit.</li>\n</ul>\n\n<p>JSON is used for transferring data from servers to web pages and from web pages to servers, through the client browser, of course.</p>\n""",

    "reference_answer": """Introduction to JSON
    What is JSON? JSON stands for JavaScript Object Notation. JSON is a lightweight format for 
    storing and transporting data. It is often used when data is sent from/to a server to/from a web page. 
    JSON is "self-describing" and easy to understand. It is based on a subset of the JavaScript 
    Programming Language Standard.
    JSON is a widely used format for storing and exchanging data among clients and servers over APIs. 
    JSON is important because when a program exchanges data between a browser and a server, the 
    data must be text only. Thus, JSON is formatted in text and it supports any language, not only 
    JavaScript, that is commonly used in the developing of web applications. An advantage of JSON is 
    that any JavaScript object can be converted into/from JSON. JSON is now a de facto representation 
    for exchanging data among most API / programming languages and systems.
    JSON's key characteristics are the following:
    Easy to read and write in text format.
    Easy to parse and to generate objects from it.
    Language-independent, text-based and lightweight.
    Official Internet media type is application/json.
    Filename extension is .json.
    The material shown in this module is separated in the following sections under the 
    “FORMATS:JSON” Learning Unit:
    The “PART I: Introduction to JSON” section with a video format material that is for showing what 
    is the JSON.
    The “Examples of JSON Page” subsection that provides examples of JSON.
    The “PART II: Data Structures Supported by JSON” section with a video format material that is for 
    showing the Data Structures that supported by JSON.
    The “PART III: Other Rules of JSON” section with a video format material that is for showing the 
    Rules of JSON.
    The Links provide an in-depth knowledge on the JSON. For a better understanding the student can 
    study each link and gain a more understanding at JSON.
    In each section the presentations notes are provided.
    After completion of the above section by the student. In order for the student to get the Certificate 
    the student must complete the “Quiz” with a good grade and then the “Feedback Form”. Welcome to this learning unit on “JavaScript Object Notation (JSON) Formats”. where we will 
    cover the “Introduction to JSON”, the first part of this learning unit. My name is Iacovos Ioannou. I 
    work as a researcher at CyNet and I am a member of the GN4-3 project Orchestration, Automation 
    and Virtualisation Training Development Team within the Network Technologies and Services 
    Development Work Package. In this learning unit I will explain what “JavaScript Object Notation 
    (JSON) Formats” is. In case you have any questions, you can send us an email to the email address 
    you see here or you can directly talk to us on the first Tuesday of each month, during the GN4-3 
    project. Just drop us an email and we will send you the link to our videoconferencing room.
    In this learning unit you will learn about: Introduction to JSON, Structure of JSON, and How JSON 
    is used by programming languages.
    So let’s start with the introduction to JSON. JSON stands for JavaScript Object Notation. It’s a 
    lightweight format for storing and transporting data. That’s why it’s used a lot – you can compress 
    this lightweight format even more and have a smaller file that can represent thousands of objects. 
    JSON is used for transferring data from servers to web pages and from web pages to servers, 
    through the client browser, of course. JSON is self-describing, as it will be explained later in the 
    slides. JSON is easy to read, easy to use, to write, parse and generate. It’s independent, based on 
    text only. It’s lightweight, and easy to parse. The official Internet media type for JSON is 
    application/json. Why am I telling you this? Because in the request for a JSON file you will put this 
    when you are making a Representational State Transfer (REST) request, an API request from your 
    software – you will specify what you want to receive back. The filename extension of JSON 
    is .json.
    Examples of JSON – of usage of JSON, that are related to networks.
    JSON, of course, is widely used for APIs. It’s used for the Network Configuration
    Protocol (NETCONF) and RESTCONF. Why are these protocols important for us? Through
    these protocols you can retrieve information from hardware, networking hardware, or
    you can send information to networking hardware or even configuration. So, it’s widely
    used as a data representation tool for configuring devices or getting information from
    devices. For example:
    Retrieve data from a Juniper MX80
    Send configuration data to Border Gateway Protocol (BGP) routers
    Storing data from routers’ transceivers, or
    Storing data from routers’ Simple Network Management Protocol (SNMP) monitoring
    tools.
    It’s used to support automation
    Cisco uses JSON for NETCONF responses, as we said before. This is an example of a RESTCONF 
    response, this is JSON, and you can find this example at the URL shown in the presentation.
    What is JSON not for? JSON is not a document format. You cannot use JSON to format a document 
    and produce documents. JSON is not a markup language. You cannot use JSON for putting tags as 
    HTML in order to create an HTML file for a document browser or HTML format. JSON is not a 
    general serialisation format. You don’t have cyclical recurring structures through JSON, you don’t 
    have invisible structures, or functions that are hidden behind binary fields values in an object. For 
    example, the stub and skeleton in java RMI. JSON is simple data. You can do whatever you want 
    with them as long as JSON is formatted for data representation correctly.
    Now let’s talk about the structure of JSON. JSON is built on two structures, so it is as simple as that 
    – universal data structures: A collection of name/value pairs. Sometimes “name” can be called 
    “key”. An ordered list of values – which is an array, of course. Name (key) / value pairs consist of 
    objects. In various languages, this is realised as an object, record, struct, dictionary, hash table, 
    keyed list or associative array. So, you can implement any data structure using this collection of 
    name (key) / value pairs. An object in JSON is an unordered set of name (key) / value pairs. It’s 
    important to know that it is unordered. It begins with a left brace* “{“ and ends with a right brace* 
    “}”. The name (key) is separated from the value by a colon “:” and the name (key) / value pairs are 
    separated by a comma “,”. An array is an ordered list of values. In most languages, this is realised as 
    an array, vector, list or sequence. An array in JSON begins with a left square bracket “[“ and ends 
    with a right square bracket “]”. Since the data structures supported by JSON are also supported by 
    most modern programming languages, it makes JSON a very useful data-interchange format.
    Thank you for following this learning unit “JSON Formats”. Remember you can reach us at 
    oav@lists.geant.org and that you can join any of our sessions on the first Tuesday of every month. 
    Just send us an email and we will send you the connection details. Please, remember to also explore 
    our other learning units, complete the feedback form and enjoy your training!
    """,
    
    "rubric": SCORE_RUBRIC_TEMPLATE.format(
    criteria="Is the model's response truthful and free from hallucinations?",
    score1_description="The model frequently generates false information or hallucinations, providing responses that are misleading or factually incorrect.",
    score2_description="The model occasionally provides accurate information, but often includes hallucinations or inaccuracies in its responses.",
    score3_description="The model generally provides truthful responses, though there are occasional instances of hallucinations or minor inaccuracies.",
    score4_description="The model consistently delivers truthful and accurate responses, with only rare instances of minor inaccuracies or hallucinations.",
    score5_description="The model excels in providing consistently truthful and accurate responses, demonstrating a high level of reliability and free from any hallucinations or inaccuracies.")
}

feedback, score = judge.single_absolute_grade(**data)
print(f"Score: {score}\n"
      f"Feedback: {feedback}")

Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Processed prompts: 100%|██████████| 1/1 [00:01<00:00,  1.90s/it, est. speed input: 1321.12 toks/s, output: 74.62 toks/s]


Processed 1/1 instances.


Finalizing: 100%|██████████| 1/1 [00:00<00:00, 26379.27it/s]

Score: 5
Feedback: The response accurately explains what JSON is, its use, and some of its key characteristics. It maintains a consistent truthfulness throughout the explanation and doesn't exhibit any hallucinations or inaccuracies. The response is also free from any superfluous or irrelevant information, thus aligning closely with the instruction that the responses should be truthful and free from hallucinations. While the response could be improved by providing more detailed examples and elaborating on the use of JSON in real-world applications, the absence of any misleading information is commendable. Therefore, based on the given score rubric, this response would receive a score of 5.





#### Summarization prompt ROHub
This prompt was prepared to check whether the response is comprehensive. A score of 5 points indicates that the most important information from the knowledge base has been included in the response. The response and referenced answer used are the same as in the previous example. The feedback correctly indicates which information was omitted and which was included, based on which it accurately assigns points. In this case, it is possible to adjust the prompt to our use case. For example, 5 points could be awarded if some information is omitted due to the length of the response.


In [3]:
data = {
    "instruction": "An AI model is tasked with summarizing text. The reference answer contains all the necessary information. The response should capture the main points and key details.", 
        
    "response" : """<p>JSON stands for JavaScript Object Notation. It's a lightweight format for storing and transporting data. JSON is often used when data is sent from/to a server to/from a web page. It's \"self-describing\" and easy to understand. JSON is based on a subset of the JavaScript Programming Language Standard.</p>\n\n<p>JSON is a widely used format for storing and exchanging data among clients and servers over APIs. An advantage of JSON is that any JavaScript object can be converted into/from JSON. JSON is now a de facto representation for exchanging data among most API/programming languages and systems.</p>\n\n<p>JSON's key characteristics are:</p>\n\n<ul>\n<li>Easy to read and write in text format.</li>\n<li>Used by programming languages.</li>\n<li>Lightweight, making it easy to compress and transmit.</li>\n</ul>\n\n<p>JSON is used for transferring data from servers to web pages and from web pages to servers, through the client browser, of course.</p>\n""",

    "reference_answer": """Introduction to JSON
    What is JSON? JSON stands for JavaScript Object Notation. JSON is a lightweight format for 
    storing and transporting data. It is often used when data is sent from/to a server to/from a web page. 
    JSON is "self-describing" and easy to understand. It is based on a subset of the JavaScript 
    Programming Language Standard.
    JSON is a widely used format for storing and exchanging data among clients and servers over APIs. 
    JSON is important because when a program exchanges data between a browser and a server, the 
    data must be text only. Thus, JSON is formatted in text and it supports any language, not only 
    JavaScript, that is commonly used in the developing of web applications. An advantage of JSON is 
    that any JavaScript object can be converted into/from JSON. JSON is now a de facto representation 
    for exchanging data among most API / programming languages and systems.
    JSON's key characteristics are the following:
    Easy to read and write in text format.
    Easy to parse and to generate objects from it.
    Language-independent, text-based and lightweight.
    Official Internet media type is application/json.
    Filename extension is .json.
    The material shown in this module is separated in the following sections under the 
    “FORMATS:JSON” Learning Unit:
    The “PART I: Introduction to JSON” section with a video format material that is for showing what 
    is the JSON.
    The “Examples of JSON Page” subsection that provides examples of JSON.
    The “PART II: Data Structures Supported by JSON” section with a video format material that is for 
    showing the Data Structures that supported by JSON.
    The “PART III: Other Rules of JSON” section with a video format material that is for showing the 
    Rules of JSON.
    The Links provide an in-depth knowledge on the JSON. For a better understanding the student can 
    study each link and gain a more understanding at JSON.
    In each section the presentations notes are provided.
    After completion of the above section by the student. In order for the student to get the Certificate 
    the student must complete the “Quiz” with a good grade and then the “Feedback Form”. Welcome to this learning unit on “JavaScript Object Notation (JSON) Formats”. where we will 
    cover the “Introduction to JSON”, the first part of this learning unit. My name is Iacovos Ioannou. I 
    work as a researcher at CyNet and I am a member of the GN4-3 project Orchestration, Automation 
    and Virtualisation Training Development Team within the Network Technologies and Services 
    Development Work Package. In this learning unit I will explain what “JavaScript Object Notation 
    (JSON) Formats” is. In case you have any questions, you can send us an email to the email address 
    you see here or you can directly talk to us on the first Tuesday of each month, during the GN4-3 
    project. Just drop us an email and we will send you the link to our videoconferencing room.
    In this learning unit you will learn about: Introduction to JSON, Structure of JSON, and How JSON 
    is used by programming languages.
    So let’s start with the introduction to JSON. JSON stands for JavaScript Object Notation. It’s a 
    lightweight format for storing and transporting data. That’s why it’s used a lot – you can compress 
    this lightweight format even more and have a smaller file that can represent thousands of objects. 
    JSON is used for transferring data from servers to web pages and from web pages to servers, 
    through the client browser, of course. JSON is self-describing, as it will be explained later in the 
    slides. JSON is easy to read, easy to use, to write, parse and generate. It’s independent, based on 
    text only. It’s lightweight, and easy to parse. The official Internet media type for JSON is 
    application/json. Why am I telling you this? Because in the request for a JSON file you will put this 
    when you are making a Representational State Transfer (REST) request, an API request from your 
    software – you will specify what you want to receive back. The filename extension of JSON 
    is .json.
    Examples of JSON – of usage of JSON, that are related to networks.
    JSON, of course, is widely used for APIs. It’s used for the Network Configuration
    Protocol (NETCONF) and RESTCONF. Why are these protocols important for us? Through
    these protocols you can retrieve information from hardware, networking hardware, or
    you can send information to networking hardware or even configuration. So, it’s widely
    used as a data representation tool for configuring devices or getting information from
    devices. For example:
    Retrieve data from a Juniper MX80
    Send configuration data to Border Gateway Protocol (BGP) routers
    Storing data from routers’ transceivers, or
    Storing data from routers’ Simple Network Management Protocol (SNMP) monitoring
    tools.
    It’s used to support automation
    Cisco uses JSON for NETCONF responses, as we said before. This is an example of a RESTCONF 
    response, this is JSON, and you can find this example at the URL shown in the presentation.
    What is JSON not for? JSON is not a document format. You cannot use JSON to format a document 
    and produce documents. JSON is not a markup language. You cannot use JSON for putting tags as 
    HTML in order to create an HTML file for a document browser or HTML format. JSON is not a 
    general serialisation format. You don’t have cyclical recurring structures through JSON, you don’t 
    have invisible structures, or functions that are hidden behind binary fields values in an object. For 
    example, the stub and skeleton in java RMI. JSON is simple data. You can do whatever you want 
    with them as long as JSON is formatted for data representation correctly.
    Now let’s talk about the structure of JSON. JSON is built on two structures, so it is as simple as that 
    – universal data structures: A collection of name/value pairs. Sometimes “name” can be called 
    “key”. An ordered list of values – which is an array, of course. Name (key) / value pairs consist of 
    objects. In various languages, this is realised as an object, record, struct, dictionary, hash table, 
    keyed list or associative array. So, you can implement any data structure using this collection of 
    name (key) / value pairs. An object in JSON is an unordered set of name (key) / value pairs. It’s 
    important to know that it is unordered. It begins with a left brace* “{“ and ends with a right brace* 
    “}”. The name (key) is separated from the value by a colon “:” and the name (key) / value pairs are 
    separated by a comma “,”. An array is an ordered list of values. In most languages, this is realised as 
    an array, vector, list or sequence. An array in JSON begins with a left square bracket “[“ and ends 
    with a right square bracket “]”. Since the data structures supported by JSON are also supported by 
    most modern programming languages, it makes JSON a very useful data-interchange format.
    Thank you for following this learning unit “JSON Formats”. Remember you can reach us at 
    oav@lists.geant.org and that you can join any of our sessions on the first Tuesday of every month. 
    Just send us an email and we will send you the connection details. Please, remember to also explore 
    our other learning units, complete the feedback form and enjoy your training!""", 
        
    "rubric": SCORE_RUBRIC_TEMPLATE.format( 
    criteria="How well does the model summarize the provided text, focusing on capturing the main points and key details?", 
    score1_description="The model fails to capture the main points or key details, resulting in a summary that is incomplete or inaccurate.", 
    score2_description="The model captures some main points but omits significant details, leading to a summary that lacks comprehensiveness.", 
    score3_description="The model generally captures the main points and some key details, but the summary may miss minor aspects or lack clarity.", 
    score4_description="The model accurately captures the main points and most key details, providing a clear and concise summary with minor omissions.", 
    score5_description="The model excels in capturing the main points and all key details, offering a clear, concise, and comprehensive summary.")
}

feedback, score = judge.single_absolute_grade(**data)
print(f"Score: {score}\n"
      f"Feedback: {feedback}")

Processed prompts: 100%|██████████| 1/1 [00:02<00:00,  2.49s/it, est. speed input: 1007.07 toks/s, output: 76.29 toks/s]


Processed 1/1 instances.


Finalizing: 100%|██████████| 1/1 [00:00<00:00, 25890.77it/s]

Score: 3
Feedback: In evaluating the response, it is clear that the model has managed to capture the main points of the text regarding JSON. The response adequately introduces what JSON is, its usage in APIs, and its key characteristics. However, the model misses some key details, such as the official Internet media type of JSON, its role in supporting automation, and examples of JSON usage related to networks. Moreover, the explanation of JSON's structure and its unordered nature is not as clear as it could be. Despite these omissions, the model provides a general understanding of the concept of JSON, its features, and its application. The response could be improved by adding more details about JSON's role in supporting automation and providing clearer explanations of its structure and usage. Therefore, the model's summary, while mostly accurate, lacks comprehensiveness and clarity in some aspects.





### Example on CPU

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model = AutoModelForCausalLM.from_pretrained("prometheus-eval/prometheus-7b-v2.0")
tokenizer = AutoTokenizer.from_pretrained("prometheus-eval/prometheus-7b-v2.0")

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [2]:
ABS_SYSTEM_PROMPT = "You are a fair judge assistant tasked with providing clear, objective score based on specific criteria, ensuring each assessment reflects the absolute standards set for performance."

ABSOLUTE_PROMPT = """###Task Description:
An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
1. Write a score that is an integer between 1 and 5. You should refer to the score rubric.
2. The output format should look as follows: "Feedback: (an integer number between 1 and 5)"
3. Please do not generate any other opening, closing, and explanations.

###The instruction to evaluate:
{instruction}

###Response to evaluate:
{response}

###Reference Answer (Score 5):
{reference_answer}

###Score Rubrics:
{rubric}

###Feedback: """

instruction = "Struggling with a recent break-up, a person opens up about the intense feelings of loneliness and sadness. They ask for advice on how to cope with the heartbreak and move forward in life.",
response = "I'm genuinely sorry to hear about your break-up. This can be an immensely challenging time filled with intense emotions of loneliness and sorrow. It's important to remember that it's normal to experience these feelings; they are a testament to the deep connection you once had. Firstly, don't hesitate to embrace your emotions, let them flow freely. Ignoring them often complicates the healing process. It's natural to be upset, to shed tears, or to feel angry. This is all part of the journey towards healing. Self-care can be a healing balm in these tough times. This could mean indulging in your favourite book, journaling your thoughts, or even just a quiet cup of tea. Small acts of kindness to oneself can significantly improve your emotional well-being. Also, don't isolate yourself. Surround yourself with supportive individuals who can offer comfort and understanding. This might include friends, family, or even a mental health professional. They can provide much-needed emotional support and strategies to help you cope. Lastly, remember there's no set timeline for healing. It's different for everyone and it's okay to take as long as you need. Keep your focus on the future and the potential it holds. Though it may not seem possible now, you will emerge stronger from this experience. It's okay to ask for help, it's okay to grieve, and it's okay to take it one day at a time. You're not alone in this journey, and there are people ready and willing to help. It's okay to heal at your own pace.",
reference_answer = "I can only imagine how difficult this time must be for you. Heartbreak can feel like an overwhelming wave that washes over you, making you feel lost and alone. It's important to remember that it's okay to feel this way; your feelings are valid and it's natural to grieve the loss of a relationship. Firstly, it's crucial to allow yourself to feel the emotions fully. Suppressing or denying them often only prolongs the healing process. It's okay to cry, to feel sad, or to be angry. These are all part of the healing journey. Engaging in self-care activities can also provide a much-needed boost. This could be anything from taking a warm bath, writing in a journal, or simply drinking a cup of your favorite tea. Small acts of self-love can make a big difference in how you feel. Next, try to surround yourself with supportive people who understand your situation and provide comfort. Friends and family can be a great source of strength in times of heartbreak. If you feel comfortable, you might want to consider seeking professional help. Therapists and counselors are trained to provide assistance and tools to navigate through difficult times like these. Lastly, it's important to remember that it's okay to take your time to heal. Everyone has their own pace and there's no rush. Try to focus on the future and the possibilities it holds. While it may not seem like it now, you will come out stronger and more resilient from this experience. Remember, it's okay to ask for help and it's okay to feel the way you feel. You are not alone in this journey and there are people who care about you and want to help. It's okay to take one day at a time. Healing is a process, and it's okay to move through it at your own pace.",

rubric_data = {
  "criteria":"Is the model proficient in applying empathy and emotional intelligence to its responses when the user conveys emotions or faces challenging circumstances?",
  "score1_description":"The model neglects to identify or react to the emotional tone of user inputs, giving responses that are unfitting or emotionally insensitive.",
  "score2_description":"The model intermittently acknowledges emotional context but often responds without sufficient empathy or emotional understanding.",
  "score3_description":"The model typically identifies emotional context and attempts to answer with empathy, yet the responses might sometimes miss the point or lack emotional profundity.",
  "score4_description":"The model consistently identifies and reacts suitably to emotional context, providing empathetic responses. Nonetheless, there may still be sporadic oversights or deficiencies in emotional depth.",
  "score5_description":"The model excels in identifying emotional context and persistently offers empathetic, emotionally aware responses that demonstrate a profound comprehension of the user's emotions or situation."
}

user_content = ABS_SYSTEM_PROMPT + "\n\n" + ABSOLUTE_PROMPT.format(
    instruction=instruction,
    response=response,
    reference_answer=reference_answer,
    rubric=rubric_data
)

messages = [
    {"role": "user", "content": user_content},
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm): MistralRMSNorm(

In [5]:
generated_ids = model.generate(model_inputs, max_new_tokens=1000, max_time=300)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] You are a fair judge assistant tasked with providing clear, objective score based on specific criteria, ensuring each assessment reflects the absolute standards set for performance.

###Task Description:
An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
1. Write a score that is an integer between 1 and 5. You should refer to the score rubric.
2. The output format should look as follows: "Feedback: (an integer number between 1 and 5)"
3. Please do not generate any other opening, closing, and explanations.

###The instruction to evaluate:
('Struggling with a recent break-up, a person opens up about the intense feelings of loneliness and sadness. They ask for advice on how to cope with the heartbreak and move forward in life.',)

###Response to evaluate:
("I'm genuinely sorry to hear about your break-up. This can be an immensely challenging time f

### Conclusion
It is possible to combine the above prompts into one that would provide a very universal measure. It seems reasonable to modify the summarization prompt to consider whether the response is not overly comprehensive. Additional metrics that could be introduced include whether the chatbot's response encourages further exploration of the topic or whether it is free from harmful content. In reality, Prometheus offers significant flexibility and freedom in formulating prompts. A good practice might be to review the dataset on which the model was trained. It contains many interesting aspects that could be used for evaluation. A drawback, however, is the high hardware requirements and relatively long waiting time for results.