### Pre-requisites:

You need a Serverless Cassandra with Vector Search database on Astra DB to run this demo. As outlined in more detail here, you should get a DB Token with role Database Administrator and copy your Database ID: these connection parameters are needed momentarily.
You also need an OpenAl API Key for this demo to work.

What you will do:

* Setup: import dependencies, provide secrets, create the LangChain vector store;
* Run a Question-Answering loop retrieving the relevant headlines and having an LLM construct the answer.

Install dependencies

In [4]:
#! pip install -q cassio datasets langchain openai tiktoken
# cassio helps us access astradb in langchain

Import packages

In [2]:
## langchian specific packages
from langchain.vectorstores.cassandra import Cassandra
from langchain_openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

## Support Dataset retreival from huggingface
from datasets import load_dataset

In [3]:
#! pip install PyPDF2
# to read text inside any pdf

In [5]:
from PyPDF2 import PdfReader

### Setup

Use your own astra db application token, astra db id, openai api key

In [6]:
import os
from dotenv import load_dotenv
load_dotenv()

ASTRA_DB_ID = os.getenv("ASTRA_DB_ID")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
ASTRA_DB_APPLICATION_TOKEN = os.getenv("ASTRA_DB_APPLICATION_TOKEN") ## generated token under connection details in astra database

### Read pdf file

file = PdfReader("us_census/acsbr-015.pdf")

from typing_extensions import Concatenate

raw_text = ""
for i , page in enumerate(file.pages):
    content = page.extract_text()
    if content:
        raw_text += content

### Connect to Database

In [13]:
## Connect to Database
import cassio
cassio.init(token=ASTRA_DB_APPLICATION_TOKEN, database_id =ASTRA_DB_ID)

### Create langchain llm and embedding 

In [15]:
llm = OpenAI()
embedding = OpenAIEmbeddings()

### Create langchain vector store backed by astra db

In [17]:
astra_vector_store = Cassandra(
    embedding = embedding,
    table_name = "demo", ## automatically creates a table with this name in astra db
    session = None,
    keyspace = None
)

### Create chunks

In [19]:
from langchain.text_splitter import CharacterTextSplitter
splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size=800,
    chunk_overlap=200,
    length_function =len
)

texts = splitter.split_text(raw_text)
print(len(texts))
texts[0]

106


'Health Insurance Coverage Status and Type \nby Geography: 2021 and 2022\nAmerican Community Survey Briefs\nACSBR-015Issued September 2023Douglas Conway and Breauna Branch\nINTRODUCTION\nDemographic shifts as well as economic and govern-\nment policy changes can affect people’s access to health coverage. For example, between 2021 and 2022, the labor market continued to improve, which may have affected private coverage in the United States \nduring that time.\n1 Public policy changes included \nthe renewal of the Public Health Emergency, which \nallowed Medicaid enrollees to remain covered under the Continuous Enrollment Provision.\n2 The American \nRescue Plan (ARP) enhanced Marketplace premium subsidies for those with incomes above 400 percent of the poverty level as well as for unemployed people.\n3'

### Load vector store with chunks

In [20]:
astra_vector_store.add_texts(texts)

['5dc6af7e327b4664b47f890eb967ea6b',
 'd5ff69aefde944f9945cb0f26582b266',
 '2e29494b211d4465a2fdc39f05388c17',
 'a628b1b2ffa14340b5684e157cfb2b43',
 '1010da93c0dd4a409ff8ecd58d7a9fe6',
 '503ccb972ad04e22903afa6236282e13',
 '3d9dd6e55e344a0ba91dc56df07f16eb',
 '3586d685922441d894a472499c46844a',
 '6ca1977998aa4f70afb6e0e48eba32b3',
 'ed9d92bd05994abdba704200224d20c9',
 '6dadb5b58ed24a2d9abeb6f969b1fc0d',
 '323d2657699a4b308b8ec3275851718c',
 '2e4166700fa4420392dc18f1e1d5a7d9',
 '5d17cac4b3984d6ea3a04762bb44844d',
 '69df6699d3d1423e8b0d2bbcc86e332b',
 '3de84ba0587f48ea91162df05d8f891b',
 '1e6286bd6e6444d0b85cfc4f8675abda',
 'dc41672be207471992fdc26701b6ee0b',
 '459e5c5aa5224de49161639a2b11b4cc',
 '1a32f37b5dbe48d78fb454fa453f9051',
 '86002443660f40098a657be0be559f42',
 '38178b94f261493baecb78c3a58f7408',
 '352251559d5748ddb3e52607361f1bd4',
 'd4be53d21c8f4fe4b5e91de95b2757a3',
 'f29707a33ebe4ce8b535ac6579fe97bb',
 '525d0f35e9a0445f8a0e041471380267',
 '6f7bea1b94e144dfacf39260efcf1214',
 

In [22]:
astra_vector_index = VectorStoreIndexWrapper(vectorstore=astra_vector_store)

### Query DB

Sample Questions :

* Which states reported the highest and lowest uninsured rates in 2022?
* How did Medicaid expansion affect uninsured rates in expansion versus non-expansion states?
* Which states had significant changes in private health insurance coverage from 2021 to 2022?
* How did uninsured rates differ among the most populous metropolitan areas in 2022?

In [24]:
first_question = True

while True:
    if first_question:
        query_text = input("Enter your question or type 'Quit' to exit : ").strip()
    else:
        query_text = input("Enter your next question or type 'Quit' to exit : ").strip()

    if query_text.lower() == "quit":
        break

    if query_text.lower() == "":
        continue

    first_question = False

    print("\nQUESTION : ", query_text)
    answer = astra_vector_index.query(query_text, llm = llm).strip()
    print("\nANSWER : ", answer)

Enter your question or type 'Quit' to exit :  Which states had significant changes in private health insurance coverage from 2021 to 2022?



QUESTION :  Which states had significant changes in private health insurance coverage from 2021 to 2022?

ANSWER :  Nine states had significant changes in private health insurance coverage from 2021 to 2022, with three reporting increases in both employer-sponsored and direct-purchase coverage, three reporting increases in direct-purchase coverage only, and two reporting decreases in private coverage. These states were Iowa, North Carolina, Texas, Florida, Kansas, Mississippi, Alabama, California, Georgia, Illinois, Indiana, Michigan, and Oklahoma.


Enter your next question or type 'Quit' to exit :  quit


In [26]:
### Documents by relevance score

for doc, score in astra_vector_store.similarity_search_with_score(query_text, k=3):
    print("\nscore : ", score)
    print(doc.page_content)


score :  0.8647770461530339
April 2022, < www.urban.org/sites/default/
files/2022-04/Marketplace%20Premiums%20
and%20Competition%202019-22.pdf >; ACA 
Marketplace Participation Tracker 2015–
2023, Robert Wood Johnson Foundation, 
<www.rwjf.org/en/insights/our-research/
interactives/aca-marketplace-participation-
tracker.html >. Massachusetts implemented 
a state individual health insurance mandate 
starting in 2006.
32 Caroline Davis, “San Francisco 
Bay Area: Regional Health Systems Vie 
for Market Share,” California Health 
Care Almanac , California Health Care 
Foundation, April 2021, < www.chcf.
org/wp-content/uploads/2021/04/
RegionalMarketAlmanac2020BayArea.pdf >.
33 For more information, refer to Older 
Care Expansion at California Department 
of Health Care Services at < www.dhcs.

score :  0.8641347998186005
(based on CMS data), and many 
states with increases in employer-
based coverage had decreases in 
unemployment rates.24
For the seven states with declines 
in private co