# Final Project: Company Profile Risk Assessment.

## Business Logic:
1. Input the company name and risk type(operator\market\legal...)
2. Using RAG to do the web search on sec.gov in order to get the risk part in the newest 10-Q report.
3. Embedding the content.
4. Store it in the vector storage
5. Using QnA to answer the question.
6. If can not find the content match the risk type, respond I don't know.

### Step1: Create Services
1. Create AI Language Service in Azure
2. Create Azure storage account and Blob container, add role: Storage Blob Data Contributor

### Step2: Setup environment
1. pip install semantic-kernel
2. pip install azure-storage-blob azure-identity
3. Setup environment in code.

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()


### Step2: Extract content from report
1. Download newest 10-Q report of Microsoft from sec.gov. 
2. Path: Practices/w4_finalproject/resources/10-Q.html
3. Extract content in <span> tag from the html file.

In [None]:
# Deal as html file
from bs4 import BeautifulSoup

htmlFile = "resources/10-Q.html"

contentList = []

with open(htmlFile) as f:
    htmlContent = f.read()
    soup = BeautifulSoup(htmlContent, 'html.parser')

    myContent = soup.find_all('span')
    for content in myContent:
        contentText = content.text
        if (len(contentText) > 3):
            contentList.append(content.text)
            print(content.text)

print(contentList)
print(len(contentList))

### Step2: Extract content from report
1. Download newest 10-Q report of Microsoft from sec.gov. 
2. Path: Practices/w4_finalproject/resources/10-Q.pdf
3. upload to blob storage
4. Extract content with document intelligence.

In [None]:
from azure.storage.blob import BlobServiceClient

pdfFile = "resources/10-Q.pdf"
blobName = "Microsoft10Q.pdf"
accountURL = os.environ.get('BLOB_ACCESS_URL')
containerName = os.environ.get('BLOB_CONTAINER_NAME')
connStr = os.environ.get('BLOB_CONN_STR')


blobServiceClient = BlobServiceClient.from_connection_string(connStr)

containerClient = blobServiceClient.get_container_client(container=containerName)

print("\nUploading to Azure Storage as blob:\n\t" + pdfFile)

# with open(pdfFile, "rb") as data:
#     blboClient = containerClient.upload_blob(name=blobName, data=data)

### Step3: Embedding the content

In [None]:
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

diEndpoint = os.environ.get("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT")
diApiKey = os.environ.get("AZURE_DOCUMENT_INTELLIGENCE_KEY")

documentAnalysisClient = DocumentAnalysisClient(endpoint=diEndpoint, credential=AzureKeyCredential(diApiKey))
print(accountURL, containerName, blobName)
blob_url = accountURL + "/" + containerName + "/" + blobName
print(blob_url)

poller = documentAnalysisClient.begin_analyze_document_from_url("prebuilt-document", blob_url)

result = poller.result()

extracted_text = ""

for page in result.pages:
    for line in page.lines:
        extracted_text += line.content + "\n"

print(extracted_text)

### Step4: Store in the vector storage

### Step5: Set question & Search from the vector storage.

In [None]:
myQuestion = "Are there any risks in the company?"

### Step6: Invoke OpenAI  

In [None]:
import openai
openai.api_key = os.environ.get('OPENAI_API_KEY')

sourceData = ' '.join(contentList)

print(sourceData)
lenOfSourceData = len(sourceData)
print(lenOfSourceData)

chunk_size = 2000
chunks = [sourceData[i:i+chunk_size] for i in range(0, lenOfSourceData,chunk_size-100)]
finalAnswerList = []

for chunk in chunks:
    systemPrompt = "You are a helpful assistant. You can answer the question base on the specific data. Use one sentence to answer the question. If you can not find the content match the question, please respond '''I don't know.'''"
    userPrompt = "Base on the following data: " + chunk + "\n\nAnswer the question:" + myQuestion
    print("User:", userPrompt)

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role":"system", "content":systemPrompt},
            {"role":"user", "content":userPrompt}
        ]
    )

    answer = response.choices[0].message.content.strip()
    print("Answer:", answer)
    if answer!="I don't know.":
        finalAnswerList.append(answer)

finalAnswer = ' '.join(finalAnswerList)

print("All I know:", finalAnswer)

response = openai.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    messages=[
            {"role":"system", "content":"You are a helpful assistant."},
            {"role":"user", "content":"Summarize the following text:\n\n" + finalAnswer}
        ]
)

summarized_answer = response.choices[0].message.content.strip()

print("Summarized:", summarized_answer)