# Evaluation
- faithfulness : Answer가 Context에 기반해 사실적으로 답을 했는가?(할루시네이션 확인)
- answer_relevancy : Question과 Answer의 연관성이 있는가?(질문과 답의 연관)
- context_precision : 가져온 Context에서 Answer가 얼마나 필요한 내용을 반영했는가?(정확한 인용)
- context_call : 정답에 필요한 Context를 얼마나 잘 가져왔는가?(검색 품질)

## TODO
1. 테스트 데이터 셋 만들기(+페르소나 생성)
2. RAG 구축
3. 평가
4. 반복 개선

# 1. 테스트 데이터 셋 만들기(+페르소나 생성)

In [4]:
from langchain_community.document_loaders import PyPDFLoader # load
from langchain_text_splitters import RecursiveCharacterTextSplitter # splitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI # embedding

## 1-1 페르소나
1) 데이터 셋에 맞는 페르소나
2) 고객의 활용 사례에 맞는 페르소나

## 1-2 시나리오
1) 1개의 chunk에 맞는 답변
2) 복수개의 chunk에 맞는 답변

## 1-3 평가 요소 가중치 설정
평가 요소간의 가중치를 지정
1) 5 : 2.5 : 2.5
1) 4 : 3 : 3

In [5]:
# 1. load
pdf_path = '../data/Sustainability_report_2024_kr.pdf'
docs = PyPDFLoader(pdf_path).load()
print(len(docs))

83


In [6]:
# 2. chunk
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)

chunks = splitter.split_documents(docs[:20]) # 원래는 문서 전체를 다해야 함
print(len(chunks))

48


In [7]:
# 3. 시나리오 설정 및 페르소나 생성
from ragas.llms.base import llm_factory
from ragas.embeddings import OpenAIEmbeddings

import openai

eval_llm = llm_factory('gpt-4o-mini') 
openai_client = openai.OpenAI()
get_embeddings = OpenAIEmbeddings(client=openai_client)

In [8]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm = eval_llm, 
    embedding_model = get_embeddings)

In [None]:
generator.persona_list

## 자동 생성 페르소나(기본값 3개) + 커스텀 페르소나
1. 기본 testset 하나생성
2. 자동 생성 페르소나
3. 커스텀 페르소나를 추가

In [None]:
dataset_test = generator.generate_with_langchain_docs(
    documents=chunks,
    testset_size=1 # 1개만 해도 페르소나가 3개라 결과는 3개
)

Applying HeadlinesExtractor: 100%|██████████| 39/39 [01:15<00:00,  1.93s/it]
Applying HeadlineSplitter: 100%|██████████| 48/48 [00:00<00:00, 556.85it/s]
Applying SummaryExtractor:  80%|███████▉  | 35/44 [01:04<00:12,  1.41s/it]Property 'summary' already exists in node 'c3bce0'. Skipping!
Property 'summary' already exists in node '39aa51'. Skipping!
Property 'summary' already exists in node '59e6f9'. Skipping!
Property 'summary' already exists in node 'f9595c'. Skipping!
Property 'summary' already exists in node '363d35'. Skipping!
Applying SummaryExtractor: 100%|██████████| 44/44 [01:24<00:00,  1.92s/it]
Applying CustomNodeFilter: 100%|██████████| 79/79 [02:49<00:00,  2.15s/it]
  property_name, property_value = await self.extract(node)
Applying EmbeddingExtractor:  82%|████████▏ | 36/44 [00:06<00:01,  6.57it/s]Property 'summary_embedding' already exists in node '59e6f9'. Skipping!
Property 'summary_embedding' already exists in node '39aa51'. Skipping!
Property 'summary_embedding' alrea

In [31]:
generator.persona_list # 기본값(3)으로 생성된 페르소나들 

[Persona(name='Corporate Sustainability Manager', role_description='Leads initiatives to ensure compliance with sustainability regulations and promotes transparency in environmental and social governance practices.'),
 Persona(name='Sustainability Manager', role_description='Responsible for overseeing and implementing sustainable practices within an organization, focusing on carbon neutrality and resource efficiency.'),
 Persona(name='Sustainability Manager', role_description='Responsible for implementing and overseeing sustainable practices within the company, particularly in managing labor conditions and environmental impact.')]

In [37]:
test_df = dataset_test.to_pandas()
test_df['reference_contexts'][1]
test_df

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How is the United States involved in the compa...,[지속가능한 미래를 위한 노력을 계속해 \n왔습니다. 2050년 탄소중립을 통해 글...,In the context of the company's sustainability...,single_hop_specific_query_synthesizer
1,How does Samsung Electronics' sustainability m...,[<1-hop>\n\n삼성전자 지속가능경영보고서 2024\n18\nOur Compa...,Samsung Electronics' sustainability management...,multi_hop_abstract_query_synthesizer
2,"2022년 삼성전자가 온실가스 감축을 위해 뭐 했는지, 그리고 협력회사랑 어떻게 같...",[<1-hop>\n\n전 세계가 당면한 기후위기 해결은 모두가 동참해야 하는 과제이...,2022년 삼성전자는 온실가스 감축을 추진하기 위해 운영체계를 정립하고 실질적인 감...,multi_hop_specific_query_synthesizer


In [32]:
# 커스텀 페르소나 생성
from ragas.testset.persona import Persona

custom_persona = [
    Persona(name='Investor', role_description='Focuses on evaluating the company’s financial performance, ESG strategies, and long-term growth potential to make informed investment decisions.'),
    Persona(name='Job Seeker', role_description='Explores career opportunities within the company, paying attention to corporate culture, stability, and sustainability-driven values.'),
    Persona(name='Business Partner', role_description='Collaborates with the company as a supplier or partner, emphasizing transparency, fair trade, and shared sustainability goals.')
]

In [None]:
generator.persona_list = generator.persona_list + custom_persona
generator.persona_list # 3 + 3

[Persona(name='Corporate Sustainability Manager', role_description='Leads initiatives to ensure compliance with sustainability regulations and promotes transparency in environmental and social governance practices.'),
 Persona(name='Sustainability Manager', role_description='Responsible for overseeing and implementing sustainable practices within an organization, focusing on carbon neutrality and resource efficiency.'),
 Persona(name='Sustainability Manager', role_description='Responsible for implementing and overseeing sustainable practices within the company, particularly in managing labor conditions and environmental impact.'),
 Persona(name='Investor', role_description='Focuses on evaluating the company’s financial performance, ESG strategies, and long-term growth potential to make informed investment decisions.'),
 Persona(name='Job Seeker', role_description='Explores career opportunities within the company, paying attention to corporate culture, stability, and sustainability-driv

In [None]:
from ragas.testset.synthesizers.multi_hop import (
    MultiHopAbstractQuerySynthesizer,
    MultiHopSpecificQuerySynthesizer,
)
from ragas.testset.synthesizers.single_hop.specific import (
    SingleHopSpecificQuerySynthesizer,
)
from ragas.llms.base import llm_factory

ragas_llm = llm_factory(model = "gpt-4.1-mini")

# 가중치 조절
scenarios = [
    (SingleHopSpecificQuerySynthesizer(llm=ragas_llm), 0.4), # chunk 한개 중심
    (MultiHopAbstractQuerySynthesizer(llm=ragas_llm), 0.3), # chunk 복수 추상질문 중심
    (MultiHopSpecificQuerySynthesizer(llm=ragas_llm), 0.3) # chunk 복수 구체질문 중심
]

In [10]:
generator

TestsetGenerator(llm=LangchainLLMWrapper(langchain_llm=ChatOpenAI(...)), embedding_model=OpenAIEmbeddings(provider='openai', model='text-embedding-3-small', client=<OpenAI:sync>), knowledge_graph=KnowledgeGraph(nodes: 0, relationships: 0), persona_list=None)

In [None]:
dataset = generator.generate(
    testset_size = 100,
    query_distribution = scenarios
)ㅁ

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios: 100%|██████████| 3/3 [04:59<00:00, 99.90s/it] 
Generating Samples: 100%|██████████| 100/100 [03:23<00:00,  2.03s/it]


In [None]:
dataset_df = dataset.to_pandas()
dataset_df.head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How company use renewable energy in 미국?,[지속가능한 미래를 위한 노력을 계속해 \n왔습니다. 2050년 탄소중립을 통해 글...,In the DX division aiming for carbon neutralit...,single_hop_specific_query_synthesizer
1,Could you please explain the role and signific...,"[고충의 처리 원칙에 대한 기준을 수립하였고, 공급망 관리에 \n있어서는 비제조 분...","In March 2023, Samsung Electronics established...",single_hop_specific_query_synthesizer
2,Can you explane in detail what the 5가지 핵심가치 ar...,[삼성전자 지속가능경영보고서 2024\n05\nOur Company Appendix...,Samsung Electronics has established 5가지 핵심가치 (...,single_hop_specific_query_synthesizer
3,What does DS stand for in Samsung Electronics'...,[Our Company AppendixMateriality Assessment Fa...,"In Samsung Electronics, DS stands for Device S...",single_hop_specific_query_synthesizer
4,How Mobile eXperience business doing in sales ...,"[매출\n169조 9,923억 원\n영업이익\n14조 3,847억 원 네트워크\nM...",The Mobile eXperience business reported sales ...,single_hop_specific_query_synthesizer


In [39]:
dataset_df.to_excel('report_2024_test.xlsx', index=False)