# 검색 증강 생성
## 간단한 RAG 시스템 구축

In [1]:
pip install google-search-results sentence-transformers openai

Collecting google-search-results
  Downloading google_search_results-2.4.2.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: google-search-results
  Building wheel for google-search-results (setup.py) ... [?25l[?25hdone
  Created wheel for google-search-results: filename=google_search_results-2.4.2-py3-none-any.whl size=32010 sha256=371c41655aecdebcf01846f077457de200a8889416aeb6784ec03b7b0a84b61c
  Stored in directory: /root/.cache/pip/wheels/0c/47/f5/89b7e770ab2996baf8c910e7353d6391e373075a0ac213519e
Successfully built google-search-results
Installing collected packages: google-search-results
Successfully installed google-search-results-2.4.2


In [2]:
from serpapi import GoogleSearch
from sentence_transformers import SentenceTransformer, util
import torch
import openai
from google.colab import userdata

SERPAPI_KEY = userdata.get('SERP_API_KEY')
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
openai.api_key = OPENAI_API_KEY


In [3]:
def search(query):
    params = {
        "q": query,
        "hl": "en",
        "gl": "us",
        "google_domain": "google.com",
        "api_key": SERPAPI_KEY,
    }
    search = GoogleSearch(params)
    results = search.get_dict()
    return results

model = SentenceTransformer('all-mpnet-base-v2')


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
def retrieve_snippets(query, results, top_k=3):
    snippets = [
        result.get("snippet", "") for result in results.get("organic_results", [])
    ]
    if not snippets:
        return []

    query_embedding = model.encode(query, convert_to_tensor=True)
    snippet_embeddings = model.encode(snippets, convert_to_tensor=True)

    cosine_scores = util.pytorch_cos_sim(query_embedding, snippet_embeddings)[0]
    top_results = torch.topk(cosine_scores, k=top_k)

    return [snippets[i] for i in top_results.indices]


In [5]:
def generate_answer(query, context):
    messages = [
        {
            "role": "system",
            "content": "당신은 지식이 풍부한 전문가다. 제공된 컨텍스트의 정보만을 기반으로 사용자 쿼리에 답변해라. "
                       "만약 답변이 컨텍스트에 없다면, '제공된 컨텍스트에서 질문에 대한 답을 찾을 수 없습니다'라고 말해라.",
        },
        {
            "role": "user",
            "content": f"컨텍스트: {context}\n\n쿼리: {query}",
        },
    ]

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.7,
        max_tokens=256
    )
    return response.choices[0].message.content


In [8]:
def rag_system(query):
    search_results = search(query)
    print(search_results)
    relevant_snippets = retrieve_snippets(query, search_results)
    if not relevant_snippets:
        return "쿼리와 관련된 정보를 찾을 수 없습니다"
    context = " ".join(relevant_snippets)
    answer = generate_answer(query, context)
    return answer



In [15]:
def without_rag_system(query):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": query}
        ],
        temperature=0.7,
        max_tokens=256
    )
    return response.choices[0].message.content

In [16]:
query = "대한민국 대통령은 누구인가?"

In [18]:
print(without_rag_system(query))

현재 시점에서 대한민국의 대통령은 윤석열입니다. 윤석열 대통령은 2022년 5월 10일에 취임하였습니다.


In [19]:
print(rag_system(query))


{'search_metadata': {'id': '68d66d03ac176524154f55bc', 'status': 'Success', 'json_endpoint': 'https://serpapi.com/searches/f4b333dddccc8b8b/68d66d03ac176524154f55bc.json', 'pixel_position_endpoint': 'https://serpapi.com/searches/f4b333dddccc8b8b/68d66d03ac176524154f55bc.json_with_pixel_position', 'created_at': '2025-09-26 10:37:55 UTC', 'processed_at': '2025-09-26 10:37:55 UTC', 'google_url': 'https://www.google.com/search?q=%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD+%EB%8C%80%ED%86%B5%EB%A0%B9%EC%9D%80+%EB%88%84%EA%B5%AC%EC%9D%B8%EA%B0%80%3F&oq=%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD+%EB%8C%80%ED%86%B5%EB%A0%B9%EC%9D%80+%EB%88%84%EA%B5%AC%EC%9D%B8%EA%B0%80%3F&hl=en&gl=us&sourceid=chrome&ie=UTF-8', 'raw_html_file': 'https://serpapi.com/searches/f4b333dddccc8b8b/68d66d03ac176524154f55bc.html', 'total_time_taken': 2.26}, 'search_parameters': {'engine': 'google', 'q': '대한민국 대통령은 누구인가?', 'google_domain': 'google.com', 'hl': 'en', 'gl': 'us', 'device': 'desktop'}, 'search_information': {'query_displ

In [21]:
# 사용 예
query = "양자 컴퓨팅의 최신 발전은 무엇인가?"

In [22]:
print(without_rag_system(query))

양자 컴퓨팅 분야는 빠르게 발전하고 있으며, 2023년까지 여러 가지 주목할 만한 진전이 있었습니다. 다음은 최근 몇 년간의 주요 발전 사항들입니다:

1. **양자 우월성**: 2019년에 구글은 양자 우월성을 달성했다고 발표했으며, 이는 양자 컴퓨터가 특정 작업을 고전 컴퓨터보다 훨씬 빠르게 수행할 수 있음을 보여주었습니다. 그 이후로 여러 연구팀이 이와 유사한 실험을 통해 양자 우월성을 입증하려고 노력하고 있습니다.

2. **양자 컴퓨터 하드웨어 개선**: IBM, 구글, 인텔 등 여러 기업들이 더 많은 큐비트를 갖춘 양자 프로세서를 개발하고 있습니다. IBM은 2023년까지 수백 큐비트의 양자 프로세서를 개발하겠다는 계획을 발표했으며, 이 목표를 달성하기 위한 노력을 계속하고 있습니다.

3. **오류 수정 기술**: 양자 컴퓨팅에서 오류 수정을 위한 기술은 매우 중요합니다. 최근 몇 년간 오류율을 낮추기 위한 다양한 기술들이 개발되었으며


In [24]:
print(rag_system(query))


{'search_metadata': {'id': '68d66b1fc4f20f59a0ae561a', 'status': 'Success', 'json_endpoint': 'https://serpapi.com/searches/e7047441b5c34427/68d66b1fc4f20f59a0ae561a.json', 'pixel_position_endpoint': 'https://serpapi.com/searches/e7047441b5c34427/68d66b1fc4f20f59a0ae561a.json_with_pixel_position', 'created_at': '2025-09-26 10:29:51 UTC', 'processed_at': '2025-09-26 10:29:51 UTC', 'google_url': 'https://www.google.com/search?q=%EC%96%91%EC%9E%90+%EC%BB%B4%ED%93%A8%ED%8C%85%EC%9D%98+%EC%B5%9C%EC%8B%A0+%EB%B0%9C%EC%A0%84%EC%9D%80+%EB%AC%B4%EC%97%87%EC%9D%B8%EA%B0%80%3F&oq=%EC%96%91%EC%9E%90+%EC%BB%B4%ED%93%A8%ED%8C%85%EC%9D%98+%EC%B5%9C%EC%8B%A0+%EB%B0%9C%EC%A0%84%EC%9D%80+%EB%AC%B4%EC%97%87%EC%9D%B8%EA%B0%80%3F&hl=en&gl=us&sourceid=chrome&ie=UTF-8', 'raw_html_file': 'https://serpapi.com/searches/e7047441b5c34427/68d66b1fc4f20f59a0ae561a.html', 'total_time_taken': 2.38}, 'search_parameters': {'engine': 'google', 'q': '양자 컴퓨팅의 최신 발전은 무엇인가?', 'google_domain': 'google.com', 'hl': 'en', 'gl': 

## 임베딩, 색인, 검색을 시연하는 예제 코드

In [25]:
pip install faiss-cpu sentence-transformers  # 호환되는 GPU가 있는 경우 faiss-gpu를 사용

Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.12.0


In [26]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# SentenceTransformer 모델을 로드
model = SentenceTransformer('all-mpnet-base-v2')


In [43]:
# 샘플 문장
text_data = [
    "A man is walking his dog in the park.",  # 남자가 공원에서 개를 산책시킨다
    "Children are playing with toys indoors.",  # 아이들이 실내에서 장난감을 갖고 논다
    "An artist is painting a landscape on canvas.",  # 화가가 캔버스에 풍경을 그린다
    "The sun sets behind the mountain ridge.",  # 산등성이 너머로 해가 진다
    "Birds are singing outside the window."  # 창밖에 새들이 지저귄다
]


In [39]:
# SentenceTransformer 모델을 사용해 벡터 표현 생성
import numpy as np
from sentence_transformers import SentenceTransformer

#model = SentenceTransformer('all-MiniLM-L6-v2')  # 필요에 따라 다른 모델로 교체
model = SentenceTransformer('all-mpnet-base-v2')
vectors = model.encode(text_data, convert_to_tensor=True)

# Faiss와의 호환성을 위해 32비트 부동 소수점으로 변환하고 CPU로 이동
vectors = vectors.detach().cpu().numpy().astype(np.float32)


In [40]:
# 임베딩 차원 얻기
dimension = vectors.shape[1]

# Faiss 색인을 생성(플랫 L2 거리)
index = faiss.IndexFlatL2(dimension)

# 임베딩을 색인에 추가
index.add(vectors)

In [44]:
# 쿼리 정의
query = "What is the dog doing?"  # 개가 무엇을 하고 있는가?

# 쿼리 인코딩
query_embedding = model.encode(query, convert_to_tensor=True)
query_embedding = query_embedding.cpu().numpy().astype('float32')


In [45]:
# k개의 가장 가까운 이웃 검색
k = 5
distances, indices = index.search(query_embedding.reshape(1, -1), k)

# 결과 출력
print("최근접 이웃:")
for i, idx in enumerate(indices[0]):
    print(f"  인덱스: {idx}, 거리: {distances[0][i]}, 문장: {text_data[idx]}")

최근접 이웃:
  인덱스: 2, 거리: 1.6357448101043701, 문장: An artist is painting a landscape on canvas.
  인덱스: 0, 거리: 1.7412779331207275, 문장: A man is walking his dog in the park.
  인덱스: 3, 거리: 1.7817978858947754, 문장: The sun sets behind the mountain ridge.
  인덱스: 4, 거리: 1.8181878328323364, 문장: Birds are singing outside the window.
  인덱스: 1, 거리: 1.8610284328460693, 문장: Children are playing with toys indoors.


## 검색 질의 작성 전략

In [55]:
from sentence_transformers import SentenceTransformer
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import faiss
import numpy as np

class SimpleRAG:
    def __init__(self, model_name: str, knowledge_base: list[str]):
        """
        SimpleRAG는 가장 기본적인 RAG 구조를 구현한 클래스다.
        model_name: 검색에 사용할 SentenceTransformer 모델 이름
        knowledge_base: 검색 대상으로 사용할 문서들의 리스트
        """
        self.knowledge_base = knowledge_base

        # SentenceTransformer 로드 (검색용)
        self.embedder = SentenceTransformer(model_name)

        # 임베딩 계산 및 FAISS 인덱스 구축
        self.doc_embeddings = self.embedder.encode(
            knowledge_base, convert_to_numpy=True, normalize_embeddings=True
        ).astype("float32")
        d = self.doc_embeddings.shape[1]
        self.index = faiss.IndexFlatIP(d)  # inner product (cosine similarity if normalized)
        self.index.add(self.doc_embeddings)

        # 생성기(generator)는 이 클래스에서는 포함하지 않음

    def retrieve(self, query: str, k: int = 5):
        """
        SentenceTransformer + FAISS를 이용한 단순 검색
        """
        q = self.embedder.encode(
            [query], convert_to_numpy=True, normalize_embeddings=True
        ).astype("float32")
        scores, idx = self.index.search(q, k)
        return [self.knowledge_base[i] for i in idx[0]]


class QueryExpansionRAG(SimpleRAG):
    def __init__(self, model_name, knowledge_base, query_expansion_model="t5-small"):
        super().__init__(model_name, knowledge_base)
        self.query_expander = pipeline("text2text-generation", model=query_expansion_model)

    def expand_query(self, query):
        expanded = self.query_expander(
            f"expand query: {query}", max_length=50, num_return_sequences=3
        )
        return [query] + [e['generated_text'] for e in expanded]

    def retrieve(self, query, k=5):
        expanded_queries = self.expand_query(query)
        all_retrieved = []
        for q in expanded_queries:
            all_retrieved.extend(super().retrieve(q, k))
        # 중복 제거 후 상위 k개 반환
        unique_retrieved = list(dict.fromkeys(all_retrieved))
        return unique_retrieved[:k]


# 사용 예
retriever_model = "sentence-transformers/all-MiniLM-L6-v2"

knowledge_base = [
    "Climate change affects agricultural productivity.",  # 기후 변화는 농업 생산성에 영향을 준다.
    "Greenhouse gas emissions are the main driver of global warming.",  # 온실가스 배출은 지구 온난화의 주요 원인이다.
    "Renewable energy can replace fossil fuels.",  # 재생 에너지는 화석 연료를 대체할 수 있다.
]

rag_system = QueryExpansionRAG(retriever_model, knowledge_base)
retrieved_docs = rag_system.retrieve(query)
print("검색된 문서:", retrieved_docs)


Device set to use cpu
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


검색된 문서: ['Climate change affects agricultural productivity.', 'Greenhouse gas emissions are the main driver of global warming.', 'Renewable energy can replace fossil fuels.']


## 검색된 정보를 LLM 생성과 통합

In [56]:
class GenerativeRAG(QueryExpansionRAG):
    def __init__(self, retriever_model, generator_model, knowledge_base):
        super().__init__(retriever_model, knowledge_base)

        self.generator = AutoModelForCausalLM.from_pretrained(generator_model)
        self.generator_tokenizer = AutoTokenizer.from_pretrained(generator_model)

    def generate_response(self, query, max_length=100):
        retrieved_docs = self.retrieve(query)
        context = "\n".join(retrieved_docs)
        prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
        inputs = self.generator_tokenizer(prompt, return_tensors="pt")
        outputs = self.generator.generate(inputs["input_ids"], max_length=max_length)
        return self.generator_tokenizer.decode(outputs[0], skip_special_tokens=True)


# 사용 예
generator_model = "gpt2-medium"
rag_system = GenerativeRAG(retriever_model, generator_model, knowledge_base)

query = "What is the impact of climate change on agriculture?"
response = rag_system.generate_response(query)
print("생성된 응답:", response)


Device set to use cpu


config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


생성된 응답: Context:
Climate change affects agricultural productivity.
Greenhouse gas emissions are the main driver of global warming.
Renewable energy can replace fossil fuels.

Question: What is the impact of climate change on agriculture?
Answer: Climate change is expected to have a negative impact on agricultural productivity.

Climate change is expected to have a positive impact on agricultural productivity.

Climate change is expected to have a negative impact on agricultural productivity.

Climate change is expected to have
