#### Data augmentation

Si on a pas assez de données, on peut utiliser des techniques d'augmentation de données pour générer des exemples supplémentaires à partir des données existantes. Voici quelques techniques couramment utilisées en NLP:



In [None]:

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
import polars as pl
from langchain_core.prompts import PromptTemplate
from logging import getLogger

logger = getLogger(__name__)
logger.setLevel("INFO")


On définit d'abord un modèle de données pour représenter une paire phrase-sentiment :

In [None]:
class SentenceSentimentPair(BaseModel):
    sentence: str = Field(..., description="The input sentence.")
    sentiment: str = Field(..., description="The sentiment label for the sentence.")

On commence par nos techniques d'augmentation de données :

In [None]:
llm = ChatOpenAI(model_name="Qwen/", temperature=0.1, base_url="http").with_structured_output(SentenceSentimentPair)

def translation(target_language: str, sentence: str) -> SentenceSentimentPair:
    prompt = PromptTemplate.from_template( # To change prompt 
        "Translate the following sentence to {target_language} while preserving its sentiment.\n\n"
        "Sentence: {sentence}\n\n"
        "Provide the translated sentence and its sentiment label (positive, negative, neutral)."
    )
    return llm.invoke(prompt.format(target_language=target_language, sentence=sentence))

def synonym_replacement(sentence: str) -> SentenceSentimentPair:
    prompt = PromptTemplate.from_template( # To change prompt 
        "Replace words in the following sentence with their synonyms while preserving its sentiment.\n\n"
        "Sentence: {sentence}\n\n"
        "Provide the modified sentence and its sentiment label (positive, negative, neutral)."
    )
    return llm.invoke(prompt.format(sentence=sentence))

techniques = {translation, synonym_replacement}

