<a href="https://colab.research.google.com/github/utkarshbelkhede/Financial_Dashboard/blob/master/notebooks/2_Trying_HuggingFace_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### 1. Setting up Working Directory

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Working Directory
import os

os.chdir('/content/drive/MyDrive/Documents/Market Intelligence')

#### 2. Importing Scraped Dataframe

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("/content/drive/MyDrive/Documents/Market Intelligence/datasets/final_combined_dataset.csv")
df = df.iloc[:,1:]
df.head()

Unnamed: 0,Company,Reporting_Date,1,1A,1B,2,3,5,7,7A,8,9A
0,Meta,2021,Item 1. Business Overview Our mission is to g...,Item 1A. Risk Factors Certain factors may hav...,Item 1B. Unresolved Staff Comments None.,Item 2. Properties Our corporate headquarters...,Item 3. Legal Proceedings Beginning on March ...,Item 5. Market for Registrant's Common Equity...,Item 7. Management's Discussion and Analysis ...,Item 7A. Quantitative and Qualitative Disclos...,Item 8. Financial Statements and Supplementar...,Item 9A. Controls and Procedures Evaluation o...
1,Meta,2020,Item 1. Business Overview Our mission is to g...,Item 1A. Risk Factors Certain factors may hav...,Item 1B. Unresolved Staff Comments None.,Item 2. Properties Our corporate headquarter...,Item 3. Legal Proceedings Beginning on March ...,Item 5. Market for Registrant's Common Equity...,Item 7. Management's Discussion and Analysis ...,Item 7A. Quantitative and Qualitative Disclos...,Item 8. Financial Statements and Supplementar...,Item 9A. Controls and Procedures Evaluation o...
2,Meta,2019,Item 1. Business Overview Our mission is to g...,Item 1A. Risk Factors Certain factors may hav...,Item 1B. Unresolved Staff Comments None.,Item 2. Properties Our corporate headquarter...,Item 3. Legal Proceedings Beginning on March ...,Item 5. Market for Registrant's Common Equity...,Item 7. Management's Discussion and Analysis ...,Item 7A. Quantitative and Qualitative Disclos...,Item 8. Financial Statements and Supplementar...,Item 9A. Controls and Procedures Evaluation o...
3,Meta,2018,Item 1. Business Overview Our mission is to g...,Item 1A. Risk Factors Certain factors may hav...,Item 1B. Unresolved Staff Comments None.,Item 2. Properties Our corporate headquarter...,Item 3. Legal Proceedings Beginning on March ...,Item 5. Market for Registrant's Common Equity...,Item 7. Management's Discussion and Analysis ...,Item 7A. Quantitative and Qualitative Disclos...,Item 8. Financial Statements and Supplementar...,Item 9A. Controls and Procedures Evaluation o...
4,Meta,2017,Item 1. Business Overview Our mission is to g...,Item 1A. Risk Factors Certain factors may hav...,Item 1B. Unresolved Staff Comments None.,Item 2. Properties Our corporate headquarter...,Item 3. Legal Proceedings Beginning on May 22...,Item 5. Market for Registrant's Common Equity...,Item 7. Management's Discussion and Analysis ...,Item 7A. Quantitative and Qualitative Disclos...,Item 8. Financial Statements and Supplementar...,Item 9A. Controls and Procedures Evaluation o...


#### 3. Trying Some Huggingface Models

In [None]:
!pip install --q transformers

##### 3.1 Summarization

In [None]:
from transformers import pipeline

In [None]:
summarizer = pipeline('summarization')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
def text_summarizer(text):
  sumr = summarizer(text, max_length=100, min_length=5, do_sample=False, truncation=True)
  
  return sumr[0]['summary_text']

In [None]:
text_summarizer(df.iloc[0,2])

' Our mission is to give people the power to build community and bring the world closer together . We build technology that helps people connect, find communities, and grow businesses . We report financial results for two segments: Family of Apps (FoA) and Reality Labs (RL)'

##### 3.2 Text Classification

In [None]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
def text_classifier(text):
  res = classifier(df.iloc[0,2], padding=True, truncation=True)
  
  return res[0]

In [None]:
text_classifier(df.iloc[0,2])

{'label': 'POSITIVE', 'score': 0.9990265369415283}

##### 3.3 Question Answering

In [None]:
from transformers import TFAutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"

In [None]:
model = TFAutoModelForQuestionAnswering.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name)

Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/497M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFRobertaForQuestionAnswering.

All the layers of TFRobertaForQuestionAnswering were initialized from the model checkpoint at deepset/roberta-base-squad2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForQuestionAnswering for predictions without further training.


Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [None]:
question = pipeline('question-answering', model=model_name, tokenizer=model_name)

Downloading:   0%|          | 0.00/496M [00:00<?, ?B/s]

In [None]:
QA_input = {
    'question': 'How many Holders were there?',
    'context': df.iloc[0,2]
}

answer = question(QA_input)
answer

{'score': 0.0006159149343147874,
 'start': 16757,
 'end': 16763,
 'answer': '71,970'}

##### 3.4 Zeroshot Classification

In [None]:
zero_shot_classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

Downloading:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
sequence_to_classify = df.iloc[0,2]

candidate_labels = ['environmental', 'fraud', 'regulations']

zero_shot_classifier(sequence_to_classify, candidate_labels)

{'sequence': ' Item 1. Business Overview Our mission is to give people the power to build community and bring the world closer together. All of our products, including our apps, share the vision of helping to bring the metaverse to life. We build technology that helps people connect, find communities, and grow businesses. Our useful and engaging products enable people to connect and share with friends and family through mobile devices, personal computers, virtual reality (VR) headsets, wearables, and in-home devices. We also help people discover and learn about what is going on in the world around them, enable people to share their opinions, ideas, photos and videos, and other activities with audiences ranging from their closest family members and friends to the public at large, and stay connected everywhere by accessing our products. Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the metaverse, which we believe is the nex