# RAG 품질 평가

- faithfulness : 답변이 컨텍스트에 사실적으로 답을 했는가?
- answer_relevancy : 질문과 답변이 잘 맞는가?
- context_precision : 가져온 문맥 중에서 필요한 부분이 얼마나 잘 포함됐나?
- context_call : 정답에 필요한 문맥을 얼마나 빠짐없이 가져왔나?

## 품질 평가 단계
1. 테스트 데이터셋 만들기
2. RAG 구축
3. 평가
4. 개선 반복

```
uv add ragas
```

## 테스트 데이터셋 만드는 컨셉
1. 페르소나
    1) 데이터셋에 맞는 페르소나
    2) 내가 넣고싶은 페르소나
2. 시나리오
    1) 각 청킹(docs)를 1개 참고해서 답변을 만들것인지
    2) 각 청킹(docs)를 여러개 참고해서 답변을 만들것인지
3. 평가 요소 가중치 설정
    1) 5 : 2.5 : 2 의 기본값
    2) 4 : 3 : 3

In [1]:
# 1. 문서 로드
# 문서 로드를 위한 모듈
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# 언어 모델 및 임베딩 모델 사용을 위한 모듈
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [2]:
pdf_path = "../data/Sustainability_report_2024_kr.pdf"
loader = PyPDFLoader(pdf_path)
docs = loader.load()
print(len(docs))

83


In [3]:
# 2. 문서 청킹
splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100
)
chunks = splitter.split_documents(docs[:20])
print(len(chunks))

48


In [4]:
# 3. 시나리오 설정 및 페르소나 생성
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import OpenAIEmbeddings
import openai

gen_llm = LangchainLLMWrapper(ChatOpenAI(model = "gpt-4.1-mini"))
openai_client = openai.OpenAI()
gen_embeddings = OpenAIEmbeddings(client=openai_client)

  gen_llm = LangchainLLMWrapper(ChatOpenAI(model = "gpt-4.1-mini"))


In [5]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm = gen_llm,
    embedding_model = gen_embeddings
)

In [6]:
generator.persona_list

## 자동생성 페르소나 + 커스텀 페르소나

1. 우선 testset 하나를 만들어야
2. 자동 생성 페르소나
3. 커스텀 페르소나를 추가

In [7]:
dataset_test = generator.generate_with_langchain_docs(
    documents = chunks,
    testset_size = 1
)

Applying HeadlinesExtractor:   0%|          | 0/39 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/48 [00:00<?, ?it/s]

Applying SummaryExtractor:   0%|          | 0/46 [00:00<?, ?it/s]

Property 'summary' already exists in node '714ba2'. Skipping!
Property 'summary' already exists in node '6e78e4'. Skipping!
Property 'summary' already exists in node '2aebe5'. Skipping!
Property 'summary' already exists in node 'e8235d'. Skipping!
Property 'summary' already exists in node '88729d'. Skipping!
Property 'summary' already exists in node '03ebf0'. Skipping!
Property 'summary' already exists in node 'b6dd84'. Skipping!


Applying CustomNodeFilter:   0%|          | 0/75 [00:00<?, ?it/s]

Applying EmbeddingExtractor:   0%|          | 0/46 [00:00<?, ?it/s]

  property_name, property_value = await self.extract(node)
Property 'summary_embedding' already exists in node '03ebf0'. Skipping!
Property 'summary_embedding' already exists in node '2aebe5'. Skipping!
Property 'summary_embedding' already exists in node '6e78e4'. Skipping!
Property 'summary_embedding' already exists in node '714ba2'. Skipping!
Property 'summary_embedding' already exists in node 'b6dd84'. Skipping!
Property 'summary_embedding' already exists in node '88729d'. Skipping!
Property 'summary_embedding' already exists in node 'e8235d'. Skipping!


Applying ThemesExtractor:   0%|          | 0/66 [00:00<?, ?it/s]

Applying NERExtractor:   0%|          | 0/66 [00:00<?, ?it/s]

Applying CosineSimilarityBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/3 [00:00<?, ?it/s]

In [8]:
generator.persona_list

[Persona(name='Sustainability Manager', role_description='Oversees and integrates environmental, social, and governance (ESG) topics within company operations to meet stakeholder expectations and regulatory requirements.'),
 Persona(name='Global Technology Business Analyst', role_description="Analyzes Samsung Electronics' global operations, financial performance, and organizational strategy to inform business decisions and growth opportunities."),
 Persona(name='Sustainability Manager', role_description='Oversees the identification and management of environmental, social, and governance (ESG) topics to align company operations with stakeholder expectations and regulatory requirements.')]

In [None]:
# Gemini 질문
# 이 페르소나 내용 한글로 적어줘.
# 그러면 나는 다른 페르소나를 추가하려고 해.
# 3개정도 추가해볼게
# - 주식투자자
# - 취준생
# - 협력사 관계자
# 로 위 영어로 된 형식에 맞춰서 작성해봐

In [9]:
test_df = dataset_test.to_pandas()
test_df

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,삼성전자는 글로벌 비재무정보 공시 제도의 확산에 대응하여 지속가능성 보고지침 관련 ...,"[CEO 메시지\nMessage from \nOur CEO\n주주, 고객, 협력회사...",삼성전자는 글로벌 비재무정보 공시 제도의 확산에 맞춰 기업의 지속가능경영 활동 정보...,single_hop_specific_query_synthesizer
1,How does Samsung Electronics integrate 폐전자제품(e...,[<1-hop>\n\n삼성전자 지속가능경영보고서 2024\n15\nOur Compa...,Samsung Electronics pursues a Circular economy...,multi_hop_abstract_query_synthesizer
2,What were Samsung Electronics' key initiatives...,[<1-hop>\n\n제품군별 자원순환형 포장재 사용 사례\n디스플레이/가전 모바일...,"In 2022, Samsung Electronics implemented sever...",multi_hop_specific_query_synthesizer


In [10]:
# 커스텀 페르소나 만들기
from ragas.testset.persona import Persona
custom_personas = [
    Persona(name='Stock Investor', role_description='Analyzes the company\'s financial statements, market position, growth potential, and corporate governance structure to evaluate its investment attractiveness and maximize shareholder return.'),
    Persona(name='Job Seeker', role_description='Researches the company\'s organizational culture, employee benefits, career development programs, and recruitment process to determine potential employment fit.'),
    Persona(name='Supplier/Partner Company Representative', role_description='Focuses on long-term partnership strategies, compliance with ethical sourcing policies, efficiency of procurement processes, and fairness of payment terms.')
]

In [11]:
auto_persona = generator.persona_list
auto_persona

[Persona(name='Sustainability Manager', role_description='Oversees and integrates environmental, social, and governance (ESG) topics within company operations to meet stakeholder expectations and regulatory requirements.'),
 Persona(name='Global Technology Business Analyst', role_description="Analyzes Samsung Electronics' global operations, financial performance, and organizational strategy to inform business decisions and growth opportunities."),
 Persona(name='Sustainability Manager', role_description='Oversees the identification and management of environmental, social, and governance (ESG) topics to align company operations with stakeholder expectations and regulatory requirements.')]

In [12]:
generator.persona_list = auto_persona + custom_personas
generator.persona_list

[Persona(name='Sustainability Manager', role_description='Oversees and integrates environmental, social, and governance (ESG) topics within company operations to meet stakeholder expectations and regulatory requirements.'),
 Persona(name='Global Technology Business Analyst', role_description="Analyzes Samsung Electronics' global operations, financial performance, and organizational strategy to inform business decisions and growth opportunities."),
 Persona(name='Sustainability Manager', role_description='Oversees the identification and management of environmental, social, and governance (ESG) topics to align company operations with stakeholder expectations and regulatory requirements.'),
 Persona(name='Stock Investor', role_description="Analyzes the company's financial statements, market position, growth potential, and corporate governance structure to evaluate its investment attractiveness and maximize shareholder return."),
 Persona(name='Job Seeker', role_description="Researches the

In [13]:
# 비율 조정
from ragas.testset.synthesizers.multi_hop import (
    MultiHopAbstractQuerySynthesizer,
    MultiHopSpecificQuerySynthesizer,
)
from ragas.testset.synthesizers.single_hop.specific import (
    SingleHopSpecificQuerySynthesizer,
)
from ragas.llms.base import llm_factory

ragas_llm = llm_factory(model = "gpt-4.1-mini")

scenarios = [
    (SingleHopSpecificQuerySynthesizer(llm=ragas_llm), 0.4),
    (MultiHopAbstractQuerySynthesizer(llm=ragas_llm), 0.3),
    (MultiHopSpecificQuerySynthesizer(llm=ragas_llm), 0.3)
]

In [15]:
dataset = generator.generate(
    testset_size = 100,
    query_distribution = scenarios
)

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/99 [00:00<?, ?it/s]

In [16]:
dataset_df = dataset.to_pandas()
dataset_df.head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Why 독일 supply chain due diligence law importan...,"[CEO 메시지\nMessage from \nOur CEO\n주주, 고객, 협력회사...",독일에서는 공급망의 인권과 근로환경 관리를 의무화하는 공급망실사법이 2023년 발효...,single_hop_specific_query_synthesizer
1,삼성전자의 지속가능경영 전략과 주요 이슈 관리는 어떻게 이루어지고 있나요?,"[접수된 고충의 처리 원칙에 대한 기준을 수립하였고, 공급망 관리에 \n있어서는 비...",삼성전자는 지속가능경영보고서를 통해 글로벌 공시규제 프레임워크에 맞춰 주요 지속가능...,single_hop_specific_query_synthesizer
2,What Samsung Global Code of Conduct mean for c...,[삼성전자 지속가능경영보고서 2024\n05\nOur Company Appendix...,Samsung Electronics established five core valu...,single_hop_specific_query_synthesizer
3,What was the global operational footprint of S...,[Our Company AppendixMateriality Assessment Fa...,"By the end of 2023년, Samsung Electronics had a...",single_hop_specific_query_synthesizer
4,What are the 2023 financial results for the ma...,[주요 사업부\n Device eXperienceDX\nDS Device Solut...,The sales and operating profit figures for the...,single_hop_specific_query_synthesizer


In [17]:
dataset_df.to_excel("report_2024_test.xlsx",
                    index = False)