In [1]:
%load_ext dotenv
%dotenv

In [8]:
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document

In [9]:
model = 'text-embedding-ada-002'
embedding = AzureOpenAIEmbeddings(model=model)

In [None]:
# 既存の vectorstore の読み込み
vectorstore = Chroma(persist_directory='./data', embedding_function=embedding)

In [12]:
document = Document(page_content='Alright! So… How are the techniques used in data, business intelligence, or predictive analytics applied in real life? Certainly, with the help of computers. You can basically split the relevant tools into two categories—programming languages and software. Knowing a programming language enables you to devise programs that can execute specific operations. Moreover, you can reuse these programs whenever you need to execute the same action', 
metadata={'Course Title': 'Introduction to Data and Data Science', 'Lecture Title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need'})

In [13]:
vectorstore.add_documents([document])

['143e342c-fadb-4c40-b5a8-7bdeb695d141']

In [14]:
vectorstore.get(['143e342c-fadb-4c40-b5a8-7bdeb695d141'])

{'ids': ['143e342c-fadb-4c40-b5a8-7bdeb695d141'],
 'embeddings': None,
 'documents': ['Alright! So… How are the techniques used in data, business intelligence, or predictive analytics applied in real life? Certainly, with the help of computers. You can basically split the relevant tools into two categories—programming languages and software. Knowing a programming language enables you to devise programs that can execute specific operations. Moreover, you can reuse these programs whenever you need to execute the same action'],
 'uris': None,
 'included': ['metadatas', 'documents'],
 'data': None,
 'metadatas': [{'Course Title': 'Introduction to Data and Data Science',
   'Lecture Title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need'}]}

In [15]:
q = 'What programming languages are used in data science?'

In [34]:
retrieved_docs = vectorstore.similarity_search(query=q, k=5, filter={'section title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need'})


In [35]:
# 冗長な出力が含まれていることが分かる
for i in retrieved_docs:
    print(i.page_content)
    print(i.metadata)
    print('---')

What about big data? Apart from R and Python, people working in this area are often proficient in other languages like Java or Scala. These two have not been developed specifically for doing statistical analyses, however they turn out to be very useful when combining data from multiple sources. All right! Let’s finish off with machine learning. When it comes to machine learning, we often deal with big data
{'section title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need', 'course title': 'Introduction'}
---
What about big data? Apart from R and Python, people working in this area are often proficient in other languages like Java or Scala. These two have not been developed specifically for doing statistical analyses, however they turn out to be very useful when combining data from multiple sources. All right! Let’s finish off with machine learning. When it comes to machine learning, we often deal with big data
{'course title': 'Introduction', 'sectio

In [36]:
# 冗長な出力を避けるために、MMR を用いて必要な情報だけを表示する
# パラメータの調整によって、重複と検索性能のバランスをとる
retrieved_docs = vectorstore.max_marginal_relevance_search(query=q, k=5, lambda_mult=0.4)


In [37]:
for i in retrieved_docs:
    print(i.page_content)
    print(i.metadata)
    print('---')

What about big data? Apart from R and Python, people working in this area are often proficient in other languages like Java or Scala. These two have not been developed specifically for doing statistical analyses, however they turn out to be very useful when combining data from multiple sources. All right! Let’s finish off with machine learning. When it comes to machine learning, we often deal with big data
{'section title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need', 'course title': 'Introduction'}
---
Alright! So… How are the techniques used in data, business intelligence, or predictive analytics applied in real life? Certainly, with the help of computers. You can basically split the relevant tools into two categories—programming languages and software. Knowing a programming language enables you to devise programs that can execute specific operations. Moreover, you can reuse these programs whenever you need to execute the same action
{'course 

In [38]:
# search type をパラメータによって簡単に変更する
retriever = vectorstore.as_retriever(search_type='mmr', search_kwargs={'k':5, 'lambda_mult':0.4})

In [39]:
retriever

VectorStoreRetriever(tags=['Chroma', 'AzureOpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x0000029168BF4B00>, search_type='mmr', search_kwargs={'k': 5, 'lambda_mult': 0.4})

In [40]:
retrieved_docs = retriever.get_relevant_documents(q)

In [41]:
for i in retrieved_docs:
    print(i.page_content)
    print(i.metadata)
    print('---')

What about big data? Apart from R and Python, people working in this area are often proficient in other languages like Java or Scala. These two have not been developed specifically for doing statistical analyses, however they turn out to be very useful when combining data from multiple sources. All right! Let’s finish off with machine learning. When it comes to machine learning, we often deal with big data
{'section title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need', 'course title': 'Introduction'}
---
Alright! So… How are the techniques used in data, business intelligence, or predictive analytics applied in real life? Certainly, with the help of computers. You can basically split the relevant tools into two categories—programming languages and software. Knowing a programming language enables you to devise programs that can execute specific operations. Moreover, you can reuse these programs whenever you need to execute the same action
{'section