# How to use LLm


Introduction to Leveraging Language Models with Hugging Face for Controlled Text Generation
In this notebook, we delve into the practicalities of utilizing large language models (LLMs) through the Hugging Face Transformers library, focusing on generating text that aligns with specific requirements and constraints. Hugging Face provides a comprehensive suite of tools and models that facilitate easy access to state-of-the-art natural language processing (NLP) capabilities. One of the most powerful features of these models is their ability to generate coherent and contextually relevant text based on a given prompt.

However, generating text that not only makes sense but also adheres to particular length, style, or content guidelines presents a unique set of challenges. Directly controlling the length of the generated text, for instance, can sometimes result in outputs that truncate awkwardly, cutting off sentences mid-thought or omitting crucial information. To navigate these challenges, we explore two primary strategies:

- Prompt Engineering: A technique where the input prompt is carefully crafted to include explicit instructions or constraints, guiding the model towards generating text within desired parameters. While effective to a degree, the approach relies heavily on the model's interpretive capabilities and may not always produce consistently reliable results.
- Custom Stopping Criteria: A more technical method that involves programming specific conditions under which text generation should cease. This approach allows for finer control over the endpoint of the generated text, aiming to ensure that outputs are both coherent and contextually complete without unnecessary truncation.

### Sources:

- https://huggingface.co/docs/transformers/en/llm_tutorial#generation-with-llms

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, StoppingCriteria, StoppingCriteriaList
import torch
import json

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [3]:
## choose model
model = "mistralai/Mistral-7B-Instruct-v0.2"

In [4]:
### quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [5]:
### load base model
model_base = AutoModelForCausalLM.from_pretrained(
     model,
    quantization_config=bnb_config,
    device_map={"": 0})

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

When working with language models from the Hugging Face Transformers library, selecting and properly configuring the tokenizer is a crucial step. The tokenizer is responsible for converting text into a format that the model can understand (i.e., converting text into tokens or token IDs) and vice versa (i.e., converting token IDs back into text). Special care must be taken when dealing with special characters and ensuring the tokenizer is correctly set up for the task at hand.

This line of code initializes the tokenizer associated with the specified model. The `AutoTokenizer.from_pretrained` method automatically selects the correct tokenizer based on the given model identifier. This ensures compatibility between the model and the tokenizer, which is essential for effective model performance.

Configuring the Tokenizer with Special Tokens

In [6]:
### select tokeniser : be carefull with the special characters
tokenizer = AutoTokenizer.from_pretrained(model)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

In some scenarios, especially when dealing with sequence generation tasks, it's important to ensure that the tokenizer has a defined padding token (pad_token). The padding token is used to fill in sequences to a uniform length, which is a requirement for certain models and training procedures.

However, not all tokenizers come with a predefined padding token. In such cases, this snippet sets the tokenizer's padding token (pad_token) to the end-of-sequence token (eos_token) if it is not already defined. The end-of-sequence token is a special character that indicates the end of a text segment. By using the eos_token as a fallback for the pad_token, we ensure that the tokenizer can still perform padding operations when necessary, while maintaining compatibility with the model's expectations for input and output formats.

Importance of Special Characters Handling
Handling special characters, such as the end-of-sequence token, is crucial because it affects how the model interprets the start and end of texts. Misconfiguration can lead to suboptimal model performance, such as improper text generation boundaries or incorrect sequence lengths. Ensuring that all necessary special tokens are correctly set in the tokenizer configuration helps achieve more accurate and coherent outputs from the model.

In summary, selecting the appropriate tokenizer and ensuring it is correctly configured with all necessary special tokens is essential for the successful application of language models to natural language processing tasks. This setup ensures that the model can accurately process input texts and generate meaningful and contextually appropriate outputs.

## Understanding Prediction Parameters in Language Models

### Key Parameters for Controlling Predictions:

1. repetition_penalty
- What It Does: Adjusts the likelihood of repeating the same words or phrases.
- Explanation: A higher repetition_penalty discourages the model from repeating itself, leading to more diverse and varied outputs. It's like telling a storyteller to avoid saying the same thing over and over.
2. max_length
- What It Does: Sets the number of tokens (words or pieces of words) to generate.
- Explanation: This parameter decides how long the output will be. For example, if nb_tokens is set to 50, the model will generate an output approximately 50 tokens long, akin to setting a word limit for an essay or a story.
3. temperature
- What It Does: Controls the randomness in the prediction process.
- Explanation: A lower temperature (closer to 0) makes the model more likely to choose the most likely next word, resulting in more predictable text. A higher temperature increases randomness, leading to more creative and less predictable outputs. It's like adjusting how adventurous the model is in its language use.
4. top_p (Top-p Sampling)
- What It Does: Determines how to select the next word based on a probability threshold.
- Explanation: With top_p sampling, the model considers the most likely next words that cumulatively reach the probability top_p. For instance, if top_p is 0.9, the model selects from the top words that together have a 90% chance of being the next word. It's like choosing the next word from a basket of the most likely candidates.
Conclusion:
 - These parameters allow fine control over the language generation process, influencing the length, diversity, randomness, and predictability of the generated text. By adjusting these settings, you can tailor the model's outputs to your specific needs, whether you're aiming for concise, predictable text or longer, more creative passages.

### Selecting Prediction Parameters for Different Scenarios

1. For Concise and Focused Text:
 - repetition_penalty: High (e.g., 1.5 to 2.0) to avoid repetition.
 - max_length: Low to moderate (e.g., 50 to 100) for brevity.
 - temperature: Low (e.g., 0.3 to 0.7) for predictable, coherent output.
 - top_p: Moderate (e.g., 0.8) to balance creativity with coherence.
2. For Creative and Diverse Text:
 - repetition_penalty: Moderate (e.g., 1.0 to 1.2) to allow some natural repetition.
 - max_length: Higher (e.g., 100 to 200) for extended content.
 - temperature: Higher (e.g., 0.7 to 1.0) for more randomness and creativity.
- top_p: High (e.g., 0.9 or above) to include a wider range of word choices.
3. For Generating Technical or Factual Content:
 - repetition_penalty: Moderate to high (e.g., 1.2 to 1.5) for clarity.
 - max_length: Adjust based on content length requirements.
 - temperature: Lower (e.g., 0.3 to 0.5) for more factual and straightforward content.
 - top_p: Lower to moderate (e.g., 0.6 to 0.8) to maintain relevance and accuracy.
4. For Interactive Conversations or Chatbots:
 - repetition_penalty: Moderate (e.g., 1.1 to 1.3) to maintain a natural flow.
 - max_length: Moderate (e.g., 50 to 100) for manageable response lengths.
 - temperature: Moderate (e.g., 0.5 to 0.8) to balance predictability and spontaneity.
 - top_p: Moderate to high (e.g., 0.8 to 0.95) for varied but relevant responses.

The function below generates an anwswer, by default Mistral returns the input prompt as part of the output. 

https://huggingface.co/docs/transformers/en/llm_tutorial#generated-output-is-too-shortlong

In [None]:
def get_response(prompt, model, tokenizer, **kwargs):
    # Define an empty dictionary for model.generate() kwargs
    generate_kwargs = {}

    # Update the generate_kwargs with any provided kwargs
    generate_kwargs.update(kwargs)

    # Generate the output
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    output = tokenizer.decode(
        model.generate(
            **inputs,
            **generate_kwargs
        )[0], skip_special_tokens=True
    )
    return output

In [9]:
prompt = "What is the definition of machine learning?"

In [None]:
get_response(prompt, model_base, tokenizer)

We can extract the answer only using the `input_ids`

In [10]:
def get_response_answer_only(prompt, model, tokenizer, **kwargs):
    # Define an empty dictionary for model.generate() kwargs
    generate_kwargs = {}
    # Update the generate_kwargs with any provided kwargs
    generate_kwargs.update(kwargs)
    #eos_token_id = tokenizer.eos_token_id
    # Generate the output
    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")
    tokens = model.generate(
            **inputs,
            pad_token_id=tokenizer.pad_token_id,
            #eos_token_id=eos_token_id,
            **generate_kwargs
        )
    output = tokenizer.decode(tokens[0][inputs["input_ids"].size(1) :], skip_special_tokens=True)
    return {'answer':output, 'tokens':tokens}

In [11]:
get_response_answer_only(prompt, model_base, tokenizer)



{'answer': '\n\nMachine learning is a subset of artificial intelligence (',
 'tokens': tensor([[    1,  1824,   349,   272,  7526,   302,  5599,  5168, 28804,    13,
             13, 15183,  5168,   349,   264, 19804,   302, 18278, 10895,   325]],
        device='cuda:0')}

The model truncates the sentences as soon as the generator reaches the number of token

In [13]:
### add more token
get_response_answer_only(prompt, model_base, tokenizer, max_new_tokens = 100)

{'answer': '\n\nMachine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it to learn for themselves. The process of learning begins with observations or data, such as historical data, and then the machine learning model looks for patterns in the data and learns to identify them. The application of this technology can be seen in a variety of domains, including',
 'tokens': tensor([[    1,  1824,   349,   272,  7526,   302,  5599,  5168, 28804,    13,
             13, 15183,  5168,   349,   264, 19804,   302, 18278, 10895,   325,
          11741, 28731,   369,  5312,  4918,   272,  5537,   298, 10226,  2822,
            304,  4916,   477,  2659,  1671,  1250, 15956,  2007,  1591, 28723,
            661, 21165,   356,   272,  4099,   302,  6074,  7034,   369,   541,
           2735,  11

In [16]:
prompt = "What is the definition of machine learning? Answer in 2 sentences"
get_response_answer_only(prompt, model_base, tokenizer, max_new_tokens = 100, temperature = .4,do_sample=True)

{'answer': '. Machine learning is a subset of artificial intelligence that utilizes statistical techniques and algorithms to enable computer systems to automatically learn and improve from experience without being explicitly programmed. It involves analyzing data to identify patterns and make predictions based on that data.',
 'tokens': tensor([[    1,  1824,   349,   272,  7526,   302,  5599,  5168, 28804, 26307,
            297, 28705, 28750, 23748, 28723, 13253,  5168,   349,   264, 19804,
            302, 18278, 10895,   369,  4479,  5004, 21256,  9804,   304, 18539,
            298,  8234,  6074,  4918,   298, 10226,  2822,   304,  4916,   477,
           2659,  1671,  1250, 15956,  2007,  1591, 28723,   661, 14657, 10148,
          14508,  1178,   298,  9051, 11533,   304,  1038, 20596,  2818,   356,
            369,  1178, 28723,     2]], device='cuda:0')}

When generating text using the Hugging Face Transformers library, controlling the output length is crucial for aligning the generated content with specific requirements. However, directly specifying the length of the generated text can sometimes lead to outputs that abruptly truncate, potentially cutting off sentences mid-way or omitting relevant information. Here, we explore strategies to manage text generation length effectively while aiming to preserve the coherence and completeness of the generated text.

Direct Length Specifications
- max_length: Defines the total maximum length of the output sequence, including the length of input_ids. This hard limit can result in abrupt endings if the limit is reached mid-sentence.
- min_length: Sets the minimum length of the generated sequence. The generation process will not stop until at least this length is reached, providing a lower bound to ensure a minimum output size.
- max_new_tokens: Specifies the maximum number of new tokens to generate, exclusive of the input length. This parameter allows for more predictable control over the size of the generated addition but can still lead to truncated sentences.

Overcoming Truncation Issues

To address the limitations of direct length specifications and improve the reliability of generating coherent and complete sentences, two main approaches can be considered:


- Prompt engineering involves embedding explicit instructions within the input to guide the model's output, 
- custom stopping criteria utilize programmable conditions to more precisely control over the text generation.

## Prompt

- Description: Incorporating explicit instructions into the input prompt, such as "Answer in 3 sentences," can guide the language model (LLM) to generate text within a desired length or format.
- Limitations: While this method can influence the model's output, it does not guarantee strict adherence to the instructions. The model might generate more or fewer sentences than requested, reflecting the inherent unpredictability in how LLMs interpret and follow such instructions.

In [18]:
prompt = "What is the definition of machine learning? Answer in 3 sentences."
get_response_answer_only(prompt, model_base, tokenizer, max_new_tokens = 100, temperature = .4,do_sample=True)

{'answer': "Machine learning is a subset of artificial intelligence that enables systems to automatically learn and improve from experience without being explicitly programmed. It's based on algorithms and statistical models that analyze data and recognize patterns to make predictions or take actions. Machine learning applications include email filtering, recommendation systems, fraud detection, and self-driving cars.",
 'tokens': tensor([[    1,  1824,   349,   272,  7526,   302,  5599,  5168, 28804, 26307,
            297, 28705, 28770, 23748, 28723, 13253,  5168,   349,   264, 19804,
            302, 18278, 10895,   369, 18156,  4918,   298, 10226,  2822,   304,
           4916,   477,  2659,  1671,  1250, 15956,  2007,  1591, 28723,   661,
          28742, 28713,  2818,   356, 18539,   304, 21256,  4994,   369, 20765,
           1178,   304, 11286, 11533,   298,  1038, 20596,   442,  1388,  6768,
          28723, 13253,  5168,  8429,  3024,  4927,  5531,   288, 28725, 26077,
      

When working with the mistralai/Mistral-7B-Instruct-v0.2 model and its corresponding tokenizer, it's important to understand how punctuation marks and special sequence tokens are handled and represented. This understanding is crucial for tasks such as setting up stopping criteria for text generation based on specific punctuation marks or tokens.

**Special Tokens**

`<s>` and `</s>`: These are special tokens used to denote the start (`<s>`) and end (`</s>`) of a sequence. They are crucial for the model to understand the boundaries of the text being processed or generated.

**Punctuation Marks**

The tokenizer assigns unique token IDs to punctuation marks, treating them as distinct tokens within its vocabulary. For example, periods (.) and commas (,) are recognized and tokenized with specific IDs, allowing for precise control over text generation processes, such as stopping generation upon encountering these tokens.

**Example: Token ID Inspection**

To understand how specific tokens, including punctuation marks and special tokens, are represented within the tokenizer's vocabulary, you can convert token IDs back to their textual representation using the tokenizer's convert_ids_to_tokens method. This is particularly useful for setting up stopping criteria based on these tokens.

- Token ID 1 corresponds to the start-of-sequence token `<s>`.
- Token ID 28723 corresponds to the period (.) punctuation mark.
- Token ID 28725 corresponds to the comma (,) punctuation mark.
- Token ID 2 corresponds to the end-of-sequence token `</s>`.

In [26]:
# Example code to check what specific tokens represent
token_ids_to_check = [1,28723, 28725, 2]
tokens = tokenizer.convert_ids_to_tokens(token_ids_to_check)
tokens

['<s>', '.', ',', '</s>']

And to get the ids from tokens

In [41]:
tokenizer.convert_tokens_to_ids(tokens)

[1, 28723, 28725, 2]

## Stopping criteria

- Implementation: As demonstrated with the ConditionalStoppingCriteria class, custom stopping criteria can be programmed to stop text generation based on specific conditions, such as the occurrence of punctuation marks that signify the end of sentences.
- Advantages: This method provides a more nuanced control over the generation process, allowing for stopping at natural concluding points in the text.
- Challenges: One potential downside is that relevant information might be part of the subsequent sentence that gets partially generated before stopping. This can result in the loss of valuable context or details that would have been included if the sentence had been allowed to complete.

In [69]:
import torch
from transformers import StoppingCriteria, StoppingCriteriaList

class CustomStoppingCriteria(StoppingCriteria):
    def __init__(self, stopping_ids_list):
        super().__init__()
        self.stopping_ids_list = stopping_ids_list

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        # Check if the last generated token is in the stopping_ids_list
        return input_ids[0, -1].item() in self.stopping_ids_list

In [70]:
# Specify the token IDs for the punctuation marks you want to stop at
punctuation_tokens = ['.', ',', '?', '!']  # Define the tokens
stopping_ids = tokenizer.convert_tokens_to_ids(punctuation_tokens)  # Convert tokens to their corresponding IDs
stopping_ids

[28723, 28725, 28804, 28808]

In [71]:
# Initialize the custom stopping criteria with these IDs
stopping_criteria = CustomStoppingCriteria(stopping_ids_list=stopping_ids)
stopping_criteria.stopping_ids_list

[28723, 28725, 28804, 28808]

In [72]:
prompt = "What is the definition of machine learning?"
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")
output_sequences = model_base.generate(
    **inputs,
    max_new_tokens=100,  # Adjust based on your needs
    temperature=0.4,
    do_sample= True,
    stopping_criteria=StoppingCriteriaList([stopping_criteria])
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [73]:
output_sequences

tensor([[    1,  1824,   349,   272,  7526,   302,  5599,  5168, 28804,    13,
            13, 15183,  5168,   349,   264, 19804,   302, 18278, 10895,   369,
          5312,  4918,   272,  5537,   298, 10226,  2822,   304,  4916,   477,
          2659,  1671,  1250, 15956,  2007,  1591, 28723]], device='cuda:0')

In [74]:
tokenizer.decode(output_sequences[0][inputs["input_ids"].size(1) :], skip_special_tokens=True)

'\n\nMachine learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.'

### More flexibility

The code below explicitelly tells how many sentences we want to generate within the max tokens allow

In [75]:
from collections import defaultdict

In [76]:
class ConditionalStoppingCriteria(StoppingCriteria):
    def __init__(self, tokenizer, stopping_rules):
        super().__init__()
        # Convert token rules to their corresponding IDs and occurrence counts
        self.stopping_ids_counts = {tokenizer.convert_tokens_to_ids(token): count for token, count in stopping_rules.items()}
        # Initialize a counter to track occurrences of each token ID
        self.token_occurrences = defaultdict(int)

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        # Update occurrences for the last generated token
        last_token_id = input_ids[0, -1].item()
        if last_token_id in self.stopping_ids_counts:
            self.token_occurrences[last_token_id] += 1
            # Check if the occurrence count meets or exceeds the required count for stopping
            if self.token_occurrences[last_token_id] >= self.stopping_ids_counts[last_token_id]:
                return True
        return False

# Example usage
stopping_rules = {
    '.': 2,  # Stop after generating 2 sentences (or encountering '.' twice)
}
stopping_criteria = ConditionalStoppingCriteria(tokenizer, stopping_rules)


In [77]:
prompt = "What is the definition of machine learning?"
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")


Without criteria

In [78]:
output_sequences = model_base.generate(
    **inputs,
    max_new_tokens=200,  # Adjust based on your needs
    temperature=0.4,
    do_sample= True,
    stopping_criteria=StoppingCriteriaList([stopping_criteria])
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [79]:
output_sequences

tensor([[    1,  1824,   349,   272,  7526,   302,  5599,  5168, 28804,    13,
            13, 15183,  5168,   349,   264, 19804,   302, 18278, 10895,   325,
         11741, 28731,   369,  5312,  4918,   272,  5537,   298, 10226,  2822,
           304,  4916,   477,  2659,  1671,  1250, 15956,  2007,  1591, 28723,
           661, 21165,   356,   272,  4099,   302,  6074,  7034,   369,   541,
          2735,  1178,   304,   938,   378,   298,  2822,   354,  3892, 28723]],
       device='cuda:0')

In [80]:
tokenizer.decode(output_sequences[0][inputs["input_ids"].size(1) :], skip_special_tokens=True)

'\n\nMachine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it to learn for themselves.'

With criteria

In [81]:
output_sequences = model_base.generate(
    **inputs,
    max_new_tokens=200,  # Adjust based on your needs
    temperature=0.4,
    do_sample= True
)
output_sequences

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


tensor([[    1,  1824,   349,   272,  7526,   302,  5599,  5168, 28804,    13,
            13, 15183,  5168,   349,   264,  2038,   302,  1178,  5643,   369,
          4607,  1002,   272,  3667,   302, 13305,   745,  4994, 28723,   661,
         28742, 28713,  2818,   356,   272,  3028,   369,  4918,   541,  2822,
           477,  1178, 28725,  9051, 11533,   304,  1038,  9549,   395, 13383,
          2930, 20288, 28723, 13253,  5168, 18539,   938, 21256,  9804,   298,
          8234,   272,  5599,   298,  2822,   477,  1178, 28725,  9051, 11533,
           304,  1038,  9549,   395, 13383,  2930, 20288, 28723,   415,  1759,
           302,  5168, 10658,   395, 13875,   442,  1178, 28725,  1259,   390,
         10578,  1178,   442,  1178,   477, 16082,  8309, 28725,   690,   349,
           868, 28649,   298,  9051, 11533,   304, 17869, 28723,   415,  5599,
          5168,  2229,   349, 10898,   356,   456,  1178,   304,   541,   868,
           347,  1307,   298,  1038, 20596,   442,  

In [82]:
tokenizer.decode(output_sequences[0][inputs["input_ids"].size(1) :], skip_special_tokens=True)

"\n\nMachine learning is a method of data analysis that automates the building of analytical models. It's based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. Machine learning algorithms use statistical techniques to enable the machine to learn from data, identify patterns and make decisions with minimal human intervention. The process of learning begins with observations or data, such as historical data or data from sensor devices, which is then analyzed to identify patterns and trends. The machine learning model is trained on this data and can then be used to make predictions or decisions on new data. Machine learning is used in a wide range of applications, from email filtering and computer vision to financial forecasting and medical diagnosis."

## Wrap everything together

In [83]:
def get_response_answer_only(prompt, model, tokenizer, stopping_rules=None, **kwargs):
    # Define an empty dictionary for model.generate() kwargs
    generate_kwargs = {}
    # Update the generate_kwargs with any provided kwargs
    generate_kwargs.update(kwargs)
    
    # Setup conditional stopping criteria if stopping rules are provided
    if stopping_rules:
        stopping_criteria = ConditionalStoppingCriteria(tokenizer, stopping_rules)
        generate_kwargs['stopping_criteria'] = StoppingCriteriaList([stopping_criteria])
    
    # Generate the output
    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")
    tokens = model.generate(
        **inputs,
        pad_token_id=tokenizer.pad_token_id,
        **generate_kwargs
    )
    output = tokenizer.decode(tokens[0][inputs["input_ids"].size(1) :], skip_special_tokens=True)
    return {'answer':output, 'tokens':tokens}

In [84]:
prompt = "What is the definition of machine learning?"
stopping_rules = {
    '.': 1,  # Example rule: stop after 2 periods
}
response = get_response_answer_only(
    prompt,
    model_base, 
    tokenizer,
    stopping_rules=stopping_rules,
    max_new_tokens=100,
    temperature=0.4, 
    do_sample=True
)
response['answer']

'\n\nMachine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.'

In [85]:
prompt = "What is the definition of machine learning?"
stopping_rules = {
    '.': 2,  # Example rule: stop after 2 periods
}
response = get_response_answer_only(
    prompt,
    model_base, 
    tokenizer,
    stopping_rules=stopping_rules,
    max_new_tokens=100,
    temperature=0.4, 
    do_sample=True
)
response['answer']

'\n\nMachine learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it to learn for themselves.'

In [86]:
prompt = "What is the definition of machine learning?"
stopping_rules = {
    '.': 3,  # Example rule: stop after 2 periods
}
response = get_response_answer_only(
    prompt,
    model_base, 
    tokenizer,
    stopping_rules=stopping_rules,
    max_new_tokens=100,
    temperature=0.4, 
    do_sample=True
)
response['answer']

'Machine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it to learn for themselves. The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for inherent patterns in data and make better decisions in the future based on the examples that we provide.'