# Final Project: Company Profile Risk Assessment.

## Business Logic:
1. Input the company name and risk type(operator\market\legal...)
2. Using RAG to do the web search on sec.gov in order to get the risk part in the newest 10-Q report.
3. Embedding the content.
4. Store it in the vector storage
5. Using QnA to answer the question.
6. If can not find the content match the risk type, respond I don't know.

### Step1: Create Services
1. Create AI Language Service in Azure

### Step2: Setup environment
1. pip install semantic-kernel
2. Setup environment in code.

In [43]:
import os
from dotenv import load_dotenv
from bs4 import BeautifulSoup
import openai

load_dotenv()

openai.api_key = os.environ.get('OPENAI_API_KEY')



### Step2: Extract content from report
1. Download newest 10-Q report of Microsoft from sec.gov. 
2. Path: Practices/w4_finalproject/resources/10-Q.html
3. Extract content in <span> tag from the html file.

In [44]:
htmlFile = "resources/10-Q.html"

contentList = []

with open(htmlFile) as f:
    htmlContent = f.read()
    soup = BeautifulSoup(htmlContent, 'html.parser')

    myContent = soup.find_all('span')
    for content in myContent:
        contentText = content.text
        if (len(contentText) > 3):
            contentList.append(content.text)
            print(content.text)

print(contentList)
print(len(contentList))

UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
FORM 
10-Q
10-Q
QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the Quarterly Period Ended 
March 31, 2024
March 31, 
2024
2024
TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the Transition Period From                  to
Commission File Number 
001-37845
001-37845
MICROSOFT CORPORATION
MICROSOFT CORPORATION
Washington
Washington
91-1144442
91-1144442
(STATE OF INCORPORATION)
(I.R.S. ID)
ONE MICROSOFT WAY
ONE MICROSOFT WAY
REDMOND
REDMOND
Washington
Washington
98052-6399
98052-6399
882-8080
882-8080
www.microsoft.com/investor
www.microsoft.com/investor
Securities registered pursuant to Section 12(b) of the Act:
Title of each class
Trading Symbol
Name of exchange on which registered
Common stock, $0.00000625 par value per share
Common stock, $0.00000625 par value per share
MSFT
MSFT
Nasdaq
Nasdaq
3.125% Notes due 2028
3.125% No

### Step3: Embedding the content

### Step4: Store in the vector storage

### Step5: Set question & Search from the vector storage.

In [45]:
myQuestion = "Are there any risks in the company?"

### Step6: Invoke OpenAI  

In [46]:
sourceData = ' '.join(contentList)

print(sourceData)
lenOfSourceData = len(sourceData)
print(lenOfSourceData)

chunk_size = 1000
chunks = [sourceData[i:i+chunk_size] for i in range(0, lenOfSourceData,chunk_size-100)]
finalAnswerList = []

for chunk in chunks:
    systemPrompt = "You are a helpful assistant. You can answer the question base on the specific data. Use one sentence to answer the question. If you can not find the content match the question, please respond '''I don't know.'''"
    userPrompt = "Base on the following data: " + chunk + "\n\nAnswer the question:" + myQuestion
    print("User:", userPrompt)

    response = openai.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=[
            {"role":"system", "content":systemPrompt},
            {"role":"user", "content":userPrompt}
        ]
    )

    answer = response.choices[0].message.content.strip()
    print("Answer:", answer)
    if answer!="I don't know.":
        finalAnswerList.append(answer)

finalAnswer = ' '.join(finalAnswerList)

print("All I know:", finalAnswer)

response = openai.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    messages=[
            {"role":"system", "content":"You are a helpful assistant."},
            {"role":"user", "content":"Summarize the following text:\n\n" + finalAnswer}
        ]
)

summarized_answer = response.choices[0].message.content.strip()

print("Summarized:", summarized_answer)

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM  10-Q 10-Q QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Quarterly Period Ended  March 31, 2024 March 31,  2024 2024 TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Transition Period From                  to Commission File Number  001-37845 001-37845 MICROSOFT CORPORATION MICROSOFT CORPORATION Washington Washington 91-1144442 91-1144442 (STATE OF INCORPORATION) (I.R.S. ID) ONE MICROSOFT WAY ONE MICROSOFT WAY REDMOND REDMOND Washington Washington 98052-6399 98052-6399 882-8080 882-8080 www.microsoft.com/investor www.microsoft.com/investor Securities registered pursuant to Section 12(b) of the Act: Title of each class Trading Symbol Name of exchange on which registered Common stock, $0.00000625 par value per share Common stock, $0.00000625 par value per share MSFT MSFT Nasdaq Nasdaq 3.125% Notes due 2028 3.125% No