### NLP Final Project - Thomas Guardi

## Notebook 4

In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.4.2-py3-none-any.whl (2.0 MB)
[K     |████████████████████████████████| 2.0 MB 9.7 MB/s eta 0:00:01
[?25hCollecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.1-cp38-cp38-macosx_10_11_x86_64.whl (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 137.5 MB/s eta 0:00:01
[?25hCollecting regex!=2019.12.17
  Downloading regex-2021.3.17-cp38-cp38-macosx_10_9_x86_64.whl (284 kB)
[K     |████████████████████████████████| 284 kB 52.3 MB/s eta 0:00:01
[?25hCollecting filelock
  Downloading filelock-3.0.12-py3-none-any.whl (7.6 kB)
Collecting sacremoses
  Downloading sacremoses-0.0.43.tar.gz (883 kB)
[K     |████████████████████████████████| 883 kB 17.7 MB/s eta 0:00:01
Collecting click
  Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25ldone
[?25h  Created wheel for sacremoses: filename=sacremoses-0.0.43-py3-no

# Zero Shot Sentiment and Topic Classification

The model in this notebook was used throughout when unsupervised techniques were merited. Initially done to try out some new techniques, the results were interesting and promising so I used it like the frog's DNA in Jurassic Park, to fill in some of the missing pieces (labels).
1. Sentiment Analysis on Snorkel labeled Population articles
2. Zero shot topic classification for early exploration

In [3]:
import pandas as pd
import numpy as np
import transformers

In [20]:
# df = pd.read_csv('/content/drive/MyDrive/NLP_Final_Project/df_cleanfull.csv')
# df = pd.read_csv('/content/big_trimmed.csv',index_col=0)
# df = pd.read_csv('pop1.csv')
df = pd.read_csv('pop_labeled_news.csv')

In [21]:
df.head(5)

Unnamed: 0.1,Unnamed: 0,title_clean,text_clean,PredictedClass,Relevant
0,377,"['nine', 'dekalb', 'county', 'resident', 'test...",Also: DeKalb County region on track to qualify...,events,1
1,666,"['eye', 'illinois', 'timing', 'link’s', 'resig...",The good news: Terry Link quit his job.. The ...,events,1
2,833,"['graduated', 'income', 'tax', 'illinois', 'th...",Small businesses are the heart of our neighbor...,events,1
3,1465,"['goodbye', 'new', 'york', 'california', 'illi...","1 / 4 Goodbye, New York, California and Illino...",events,1
4,1649,"['sangamon', 'among', '93', 'illinois', 'count...","Sangamon saw its population decrease by 2,419,...",events,1


Load and Transform Data

In [22]:
df = df[df['Relevant'] == 1]

In [23]:
title_clean = df['title_clean']
df = df['text_clean']

In [24]:
text = df.tolist()

## Load Huggingface model
#### Valhalla distillbert trained on MNLI
Sentiment Zero Shot Classification performed using a verision of The Multi-Genre Natural Language Inference (MultiNLI) corpus is a crowd-sourced collection of 433k sentence pairs annotated with textual entailment information.

#### Use huggingface zero shot pipeline and load model

In [25]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification",
                      device=0,
                      model="valhalla/distilbart-mnli-12-3")

### Add classification categories and test output

In [26]:
sequence_to_classify = text[0]
# candidate_labels = ['business', 'politics', 'sports', 'health', 'work', 'covid', 'school', 'culture', 'events']
candidate_labels = ['positive', 'negative']

results = classifier(sequence_to_classify, candidate_labels,multi_class=False)

# If more than one candidate label can be correct, pass multi_class=True to calculate each class independently:

In [27]:
results

{'labels': ['positive', 'negative'],
 'scores': [0.5070857405662537, 0.4929142892360687],
 'sequence': "Also: DeKalb County region on track to qualify for Phase 3 re-opening pending testing metrics By Daily Chronicle Mark Black for Shaw Media Caption Vehicles funnel into three testing bays at the Illinois run drive-through COVID-19 testing facility, managed by the Illinois National Guard, at the Chicago Premium Outlets mall in Aurora. The site quickly reached capacity within less than an hour on the second full day of operation in Aurora Apr. 24. Mark Busch – mbusch@shawmedia.com Caption Sam Cohn, a server at The Junction Eating Place, wipes down the counter Sunday at the eatery on West Lincoln Highway in DeKalb. Mark Busch file photo – mbusch@shawmedia.com Caption For the fourth time in the past five days, the DeKalb County Health Department has reported no new coronavirus cases locally, leaving the county as of Monday at 36, including one death. As a public service, Shaw Media will p

### Apply to dataset

In [28]:
texts = []
scores = []
classes = []

In [29]:
for i in range(len(text)):
  sequence_to_classify = text[i]
  # candidate_labels = ['corruption', 'climate', 'taxes', 'immigration', 'jobs']
  candidate_labels = ['positive', 'negative']
  results = classifier(sequence_to_classify, candidate_labels,multi_class=False)
  texts.append(sequence_to_classify)
  scores.append(results["scores"][:2])
  classes.append(results["labels"][:2])
  

In [65]:
classes_df = pd.DataFrame({'Title':title_clean,'Text':texts, 'Scores':scores, 'Classes':classes})

### Check Results and make pretty

In [66]:
classes_df

Unnamed: 0,Title,Text,Scores,Classes
0,"['nine', 'dekalb', 'county', 'resident', 'test...",Also: DeKalb County region on track to qualify...,"[0.5070857405662537, 0.4929142892360687]","[positive, negative]"
1,"['eye', 'illinois', 'timing', 'link’s', 'resig...",The good news: Terry Link quit his job.. The ...,"[0.5299054384231567, 0.4700945317745209]","[negative, positive]"
2,"['graduated', 'income', 'tax', 'illinois', 'th...",Small businesses are the heart of our neighbor...,"[0.6016096472740173, 0.3983902931213379]","[negative, positive]"
3,"['goodbye', 'new', 'york', 'california', 'illi...","1 / 4 Goodbye, New York, California and Illino...","[0.5023202300071716, 0.49767979979515076]","[negative, positive]"
4,"['sangamon', 'among', '93', 'illinois', 'count...","Sangamon saw its population decrease by 2,419,...","[0.7003471851348877, 0.29965275526046753]","[negative, positive]"
...,...,...,...,...
1488,"['millennials', 'moving', 'chicago', 'city', '...","Katey Frederking, 28, used to live in Ravenswo...","[0.665336012840271, 0.334663987159729]","[negative, positive]"
1489,"['illinois', 'bad', 'governance']",Why Illinois Has Bad Governance. I think that...,"[0.5008577704429626, 0.49914222955703735]","[positive, negative]"
1490,"['poor', 'governance', 'fuel', 'illinois', 'ex...",Illinois has a population problem: Our populat...,"[0.5657886862754822, 0.43421128392219543]","[negative, positive]"
1491,"['poor', 'governance', 'fuel', 'illinois', 'ex...",Illinois has a population problem: Our populat...,"[0.5657886862754822, 0.43421128392219543]","[negative, positive]"


In [None]:
list(classes_df.iloc[0])

In [68]:
classes_df['PredictedClass'] = classes_df['Classes'].map(lambda x: x[0])

In [69]:
classes_df['TopScore'] = classes_df['Scores'].map(lambda x: x[0])
classes_df.head(5)

Unnamed: 0,Title,Text,Scores,Classes,PredictedClass,TopScore
0,"['nine', 'dekalb', 'county', 'resident', 'test...",Also: DeKalb County region on track to qualify...,"[0.5070857405662537, 0.4929142892360687]","[positive, negative]",positive,0.507086
1,"['eye', 'illinois', 'timing', 'link’s', 'resig...",The good news: Terry Link quit his job.. The ...,"[0.5299054384231567, 0.4700945317745209]","[negative, positive]",negative,0.529905
2,"['graduated', 'income', 'tax', 'illinois', 'th...",Small businesses are the heart of our neighbor...,"[0.6016096472740173, 0.3983902931213379]","[negative, positive]",negative,0.60161
3,"['goodbye', 'new', 'york', 'california', 'illi...","1 / 4 Goodbye, New York, California and Illino...","[0.5023202300071716, 0.49767979979515076]","[negative, positive]",negative,0.50232
4,"['sangamon', 'among', '93', 'illinois', 'count...","Sangamon saw its population decrease by 2,419,...","[0.7003471851348877, 0.29965275526046753]","[negative, positive]",negative,0.700347


In [3]:
### save
# classes_df.drop(['Scores','Classes'],axis=1,inplace=True)
# classes_df.to_csv('pop_sent.csv')

In [40]:
pd.set_option('display.max_colwidth', 200)

In [41]:
classes_df[['Text','PredictedClass','TopScore']].sample(10)

Unnamed: 0,Text,PredictedClass,TopScore
492,"Illinois has dropped to the sixth most populated state, with Pennsylvania taking over the fifth spot. For the past five years, big cities like Chicago, as well as other large metropolitan areas, h...",negative,0.795586
1108,"Mayor Lori E. Lightfoot today announced $700,000 in grant funding for 32 community-based organizations to support the City’s efforts in educating and engaging residents about the upcoming 2020 U.S...",positive,0.502961
14,"CoStar Group. The Jewel-Osco store in Woodlawn.. Less than a year after opening, the Jewel-Osco in Woodlawn has a new owner: the University of Chicago.. A venture connected to the university pa...",negative,0.505177
564,"There were 96 wholesale trade businesses in Illinois that had between 250 and 499 employees in 2016, according to County Business Patterns (CBP) statistics provided by the United States Census Bur...",negative,0.532154
1186,"Give the “all lives matter” crew this much: they’re dogged.. Saturday’s column dismissed the asinine retort to the powerful Black Lives Matter movement, but one reader refused to be denied:. “Th...",negative,0.67723
1198,"To supplement the loss of motor fuel tax funds Danville is losing due to people driving less with the stay at home order, Danville officials are anticipating the city will receive almost $2.2 mill...",positive,0.553051
1239,"The U.S. Census Bureau confirmed it will count the student population from now-shuttered dormitories, but some officials in large college towns remain concerned that campus closures because of the...",negative,0.500192
314,"Every presidential election cycle, voters across the country give Iowa the side-eye. Why does Iowa play such an outsize role in the nomination process? Why should its precinct caucuses fall first ...",negative,0.600035
1421,"Illinois has only 36 available, affordable rental homes for every 100 extremely low-income households; the 2020 census will determine funding for federal programs that alleviate shortage. FOR IMM...",negative,0.526087
328,Take a load off. You worked so hard. Chicago Police doing damage control after 13 cops were caught abandoning their posts. They took shelter in Bobby Rush's campaign office. During the first night...,negative,0.520867


In [79]:
import plotly.express as px
fig = px.histogram(classes_df, x='PredictedClass')
fig.show()

In [44]:
classes_df['Text'].iloc[14]

'CoStar Group.  The Jewel-Osco store in Woodlawn..  Less than a year after opening, the Jewel-Osco in Woodlawn has a new owner: the University of Chicago..  A venture connected to the university paid $19.8 million in November for the 48,000-square-foot grocery-and-drug store at 61st Street and Cottage Grove Avenue, according to a deed filed with Cook County. The venture acquired the store from the developers that built it, Chicago-based DL3 Realty and Wilmette-based Terraco Real Estate..  The store, which is leased to Jewel-Osco parent Albertsons, became a symbol of Woodlawn’s comeback when it opened last March, bringing fresh produce and other healthy food options to a South Side “food desert,” providing jobs for more than 200 people and generating momentum for a broader community-development push in the neighborhood..  The sale also validates the idea that Woodlawn offers attractive returns for investors that put their capital into commercial properties there, said DL3 Managing Partn

In [85]:
top_sentiment = classes_df[classes_df['TopScore'] > .79]
top_sentiment.drop(['Scores','Classes'],axis=1,inplace=True)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [86]:
top_sentiment.shape

(226, 3)

In [87]:
top_sentiment.sample(10)

Unnamed: 0,Text,PredictedClass,TopScore
1070,"Rev. Jesse L. Jackson, Sr. and the Rainbow PUSH Coalition staff are holding an 11:30 a.m. press conference Wednesday, December 4 th , at the Village Hall (Church of the Cross), 13043 E. 2260 South...",negative,0.80336
905,"Units: Thousands of Persons , Notes: The Federal seasonally adjusts this series by using the 'statsmodel' library from Python with default parameter settings. The package uses the U.S. Bureau of t...",negative,0.911295
1176,"January 7, 2020 WAKO News. U.S. Census data released this past week indicates that the state of Illinois led the nation in population loss in the last 10 years. The state has lost nearly 160,000 ...",negative,0.986165
319,"Units: Hours per Week , Seasonally Notes: The Federal Reserve Bank of St. Louis seasonally adjusts this series by using the 'statsmodel' library from Python with default parameter settings. The pa...",negative,0.888697
952,"SPRINGFIELD — A fellow sat down next to me Saturday on a flight from San Antonio to Chicago and let out a sigh.. I asked if he was heading home, and he just shook his head and explained he is in ...",negative,0.894286
1433,"Food Insecurity On the Rise Across US, Chicago Amid COVID-19 Marissa Nelson | September 21, 2020 5:44 pm. (Courtesy of the Greater Chicago Food Depository). Food insecurity is on the rise as the...",negative,0.853633
898,"Widely overlooked in the last U.S census estimates for 2018-2019 was that the population of Chicago was at about 2,693,000. That is pretty much on par with the 2010 U.S census of 2,695,000, though...",negative,0.859863
32,". As part of Governor JB Pritzker's historic $45 billion capital program, the Department of Commerce and Economic Opportunity (DCEO) announced on Jan. 27th a new initiative to invest $12 million ...",positive,0.866185
463,"The coronavirus outbreak has put a damper on the 2020 census, which now has been extended until April of 2021.. Sherrie Taylor, the Interim Lead of the Illinois Data Center Network for the census...",negative,0.868127
509,"An analysis of population trends now suggests Illinois will only lose one seat in Congress. Illinois has lost tens of thousands in total population count since 2010, when the last U.S. Census coun...",negative,0.935819


In [88]:
top_sentiment[top_sentiment['PredictedClass']== 'positive']

Unnamed: 0,Text,PredictedClass,TopScore
32,". As part of Governor JB Pritzker's historic $45 billion capital program, the Department of Commerce and Economic Opportunity (DCEO) announced on Jan. 27th a new initiative to invest $12 million ...",positive,0.866185
52,12 hours ago | Tom Houghton. Chicago wheat futures rallied to their highest level in five years Tuesday as a cocktail of dry weather in key growing... To read this article and see the full servic...,positive,0.880162
63,20 City of Chicago Grants Funds to Boost Census Participation City of Chicago Grants Funds to Boost Census Participation City of Chicago Grants Funds to Boost Census Participation. Mayor Lori E....,positive,0.876394
123,Paint the City appears headed for even greater heights as it has been included in the City of Chicago’s “Boards of Change” project–a civic engagement initiative encouraging local participation in ...,positive,0.817152
127,Paint the City appears headed for even greater heights as it has been included in the City of Chicago's 'Boards of Change' project'a civic engagement initiative encouraging local participation in ...,positive,0.975868
251,Description Mercer has an exciting opportunity available and is recruiting for a solid and experienced Senior Retirement Consultant.What can you expect?If you like the idea of helping clients achi...,positive,0.799844
581,29 Education Legislative Latino Caucus Foundation to Hold Annual Conference Legislative Latino Caucus Foundation to Hold Annual Conference Education | Comments Off on Legislative Latino Caucus Fo...,positive,0.810283
605,". Alec Blanc of Monarch Advisors announced the successful closing of a bank refinance for a 99- central Illinois. April closings have been few and far between, so, well done Mr. Blanc.. Built in...",positive,0.960814
865,"Wheat commentary: Chicago hits 4 1/2 month high as rally persists 1 hour ago. The roll into a new month saw no let-up in the recent risk-on attitude for wheat, with the Chicago contract leaping.....",positive,0.88079
893,"Message of unity ahead of Memorial Day, and effort to bridge gap between Chicago's south and west sides WLS Share: Share: CHICAGO (WLS) -- Now is usually a popular time for communities to focus on...",positive,0.812772


In [84]:
import plotly.express as px
fig = px.histogram(top_sentiment, x='PredictedClass',title='Confident (80%) Sentiment on Target Articles')
fig.show()

In [89]:
classes_df.to_csv('pop_specific_sent.csv')

In [91]:
classes_df

Unnamed: 0,Text,Scores,Classes,PredictedClass,TopScore
0,Also: DeKalb County region on track to qualify for Phase 3 re-opening pending testing metrics By Daily Chronicle Mark Black for Shaw Media Caption Vehicles funnel into three testing bays at the Il...,"[0.5070857405662537, 0.4929142892360687]","[positive, negative]",positive,0.507086
1,"The good news: Terry Link quit his job.. The bad news: Not fast enough.. Let’s review: On Aug. 13, federal prosecutors charged then-state Sen. Link, D-Indian Creek, with filing a false income ta...","[0.5299054384231567, 0.4700945317745209]","[negative, positive]",negative,0.529905
2,Small businesses are the heart of our neighborhoods. Jobs that are created by small businesses are what keep our communities thriving. My name is Chris Plywacz and I am the proud owner of Reeg Plu...,"[0.6016096472740173, 0.3983902931213379]","[negative, positive]",negative,0.601610
3,"1 / 4 Goodbye, New York, California and Illinois. Hello … Where? (Bloomberg Opinion) -- New York, California and Illinois have been hemorrhaging residents. Almost 3.2 million more people left thos...","[0.5023202300071716, 0.49767979979515076]","[negative, positive]",negative,0.502320
4,"Sangamon saw its population decrease by 2,419, or 1.2 percent, over the last decade, according to a new analysis from Wirepoints.org that is based on U.S. Census Bureau data.. Sangamon recorded t...","[0.7003471851348877, 0.29965275526046753]","[negative, positive]",negative,0.700347
...,...,...,...,...,...
1488,"Katey Frederking, 28, used to live in Ravenswood. But she listed her condo two weeks ago, and is looking to move to the suburbs for her health and a yard for her dogs. She said COVID-19 was a cata...","[0.665336012840271, 0.334663987159729]","[negative, positive]",negative,0.665336
1489,"Why Illinois Has Bad Governance. I think that RNC member Richard Porter’s piece at RealClearPolitics about how bad governance is driving the “Illinois Exodus”, Illinois’s constant and large loss ...","[0.5008577704429626, 0.49914222955703735]","[positive, negative]",positive,0.500858
1490,Illinois has a population problem: Our population has been shrinking faster than any other state (except one). The Chicago Tribune Editorial Board and others call this the Illinois Exodus. Populat...,"[0.5657886862754822, 0.43421128392219543]","[negative, positive]",negative,0.565789
1491,Illinois has a population problem: Our population has been shrinking faster than any other state (except one). The Chicago Tribune Editorial Board and others call this the Illinois Exodus. Populat...,"[0.5657886862754822, 0.43421128392219543]","[negative, positive]",negative,0.565789


In [31]:
df

0       Also: DeKalb County region on track to qualify...
1       The good news: Terry Link quit his job..  The ...
2       Small businesses are the heart of our neighbor...
3       1 / 4 Goodbye, New York, California and Illino...
4       Sangamon saw its population decrease by 2,419,...
                              ...                        
1488    Katey Frederking, 28, used to live in Ravenswo...
1489    Why Illinois Has Bad Governance.  I think that...
1490    Illinois has a population problem: Our populat...
1491    Illinois has a population problem: Our populat...
1492    By Press release submission | Aug 14, 2020.  I...
Name: text_clean, Length: 1493, dtype: object

In [30]:
import spacy
nlp = spacy.load("en_core_web_sm")

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY


In [56]:
entities = []
labels = []

# Read in the text
for i in range(len(text)):
    doc = nlp(str(text[i]))
    
    # create entities
    for ent in doc.ents:
    # for ent in doc.ents:
        entities.append(ent)
        labels.append(ent.label_)

In [63]:
Counter(entities).most_common(20)

[(DeKalb County, 1),
 (Phase 3, 1),
 (Daily, 1),
 (Shaw Media Caption Vehicles, 1),
 (three, 1),
 (Illinois, 1),
 (the Illinois National Guard, 1),
 (the Chicago Premium Outlets mall, 1),
 (Aurora, 1),
 (less than an hour, 1),
 (the second full day, 1),
 (Aurora Apr. 24, 1),
 (Mark Busch, 1),
 (Caption Sam Cohn, 1),
 (The Junction Eating Place, 1),
 (Sunday, 1),
 (West Lincoln Highway, 1),
 (DeKalb, 1),
 (Mark Busch, 1),
 (fourth, 1)]

In [57]:
dataframe = pd.DataFrame({'Entities':entities, 'Labels':labels})
dataframe.head(10)

Unnamed: 0,Entities,Labels
0,"(DeKalb, County)",GPE
1,"(Phase, 3)",FAC
2,(Daily),DATE
3,"(Shaw, Media, Caption, Vehicles)",ORG
4,(three),CARDINAL
5,(Illinois),GPE
6,"(the, Illinois, National, Guard)",ORG
7,"(the, Chicago, Premium, Outlets, mall)",EVENT
8,(Aurora),GPE
9,"(less, than, an, hour)",TIME


# Topic Classification Zero Shot

In [None]:
# oneshot_df['TopScore'] = oneshot_df['Scores'].map(lambda x: x[0])

In [None]:
# oneshot_df = pd.read_csv('/content/drive/MyDrive/NLP_Final_Project/oneshot_df2.csv')

In [None]:
# oneshot_df.head(5)

In [None]:
# oneshot_df['PredictedClass'].value_counts()

In [None]:
oneshot_df['TopScore'].value_counts()

0.998427    127
0.991434    101
0.987254     98
0.963435     97
0.930353     96
           ... 
0.864862      1
0.864860      1
0.864859      1
0.864857      1
0.990216      1
Name: TopScore, Length: 206720, dtype: int64

In [None]:
oneshot_df['TopScore'].describe()

count    290688.000000
mean          0.904727
std           0.139530
min           0.000689
25%           0.889132
50%           0.958668
75%           0.986612
max           0.999885
Name: TopScore, dtype: float64

In [None]:
top = oneshot_df[oneshot_df['TopScore'] > .98]
top.sample(5)

Unnamed: 0.1,Unnamed: 0,Title,Scores,Classes,PredictedClass,TopScore
20280,20280,Illinois Department of Transportation To Resur...,"[0.9890256524085999, 0.8027991056442261, 0.695...","['work', 'events', 'business', 'covid']",work,0.989026
44033,44033,Girl killed when car fleeing Chicago police cr...,"[0.9849066734313965, 0.7408555150032043, 0.206...","['events', 'covid', 'culture', 'work']",events,0.984907
166154,166154,"Matt Magalis earns $126,100 working for the Il...","[0.9979705214500427, 0.9371009469032288, 0.612...","['work', 'business', 'events', 'covid']",work,0.997971
137501,137501,Pre-Game Thread 11/03/2019 5pm PST - Chicago B...,"[0.9935423731803894, 0.9102787375450134, 0.482...","['sports', 'events', 'covid', 'work']",sports,0.993542
199881,199881,Clint Gleckler earns 31 percent more during 20...,"[0.9974560737609863, 0.9888573288917542, 0.925...","['work', 'business', 'events', 'culture']",work,0.997456


In [None]:
top[top['PredictedClass']=='school']

Unnamed: 0.1,Unnamed: 0,Title,Scores,Classes,PredictedClass,TopScore
89,89,Teachers in face masks? Staggered attendance? ...,"[0.9888619184494019, 0.9667844176292419, 0.939...","['school', 'events', 'work', 'culture']",school,0.988862
189,189,Teachers in face masks? Staggered attendance? ...,"[0.9888619184494019, 0.9667844176292419, 0.939...","['school', 'events', 'work', 'culture']",school,0.988862
461,461,"Lecturer of Physics, Department of Physics – N...","[0.9920215010643005, 0.8979613184928894, 0.751...","['school', 'work', 'covid', 'events']",school,0.992022
585,585,Scholarship enables criminal justice student J...,"[0.9838278889656067, 0.9723213315010071, 0.780...","['school', 'work', 'events', 'covid']",school,0.983828
602,602,Chicago Public Schools Counts 85 Coronavirus C...,"[0.9859746694564819, 0.9323989748954773, 0.877...","['school', 'health', 'events', 'work']",school,0.985975
...,...,...,...,...,...,...
290410,290410,Here’s How Chicago Schools Voted On Police In ...,"[0.9813989400863647, 0.9215110540390015, 0.885...","['school', 'events', 'culture', 'covid']",school,0.981399
290512,290512,Mies van der Rohe's workhorse tower gets a vib...,"[0.9917550086975098, 0.9871726632118225, 0.946...","['school', 'work', 'culture', 'events']",school,0.991755
290560,290560,The University of Chicago Booth School of Busi...,"[0.9909126162528992, 0.9797205328941345, 0.658...","['school', 'business', 'covid', 'work']",school,0.990913
290629,290629,Illinois Central College Culinary Arts Receive...,"[0.9931992888450623, 0.9225084781646729, 0.914...","['school', 'work', 'covid', 'events']",school,0.993199


In [None]:
top['PredictedClass'].value_counts()

sports      25591
events      23138
work        19488
business    13310
politics     5397
school       2655
health       2344
culture      2219
covid         314
Name: PredictedClass, dtype: int64

In [None]:
df = pd.read_csv('/content/title_classes_df.csv',index_col=0)

In [None]:
df['PredictedClass'].value_counts()


negative    90737
positive    81936
Name: PredictedClass, dtype: int64

In [None]:
df.sample(5)

Unnamed: 0,Title,Scores,Classes,PredictedClass,TopScore
28382,Jussie Smollett and his 'attacker' 'had a sexu...,"[0.8213415741920471, 0.17865844070911407]","['negative', 'positive']",negative,0.821342
94946,Nike Air Flight 89 Dressed in the Chicago Flag...,"[0.7324628829956055, 0.26753711700439453]","['positive', 'negative']",positive,0.732463
51084,“Illinois Nazis” Trending Topic Is More Than J...,"[0.9454604387283325, 0.054539527744054794]","['negative', 'positive']",negative,0.94546
90856,Surprise! Not all Chicago parks are closed. (B...,"[0.5898494124412537, 0.41015058755874634]","['negative', 'positive']",negative,0.589849
50306,"Ricardo Rivera earns $98,400 working for the I...","[0.5971139669418335, 0.4028860032558441]","['positive', 'negative']",positive,0.597114


In [None]:
list(df.loc[122740])

['EYE ON ILLINOIS: Don’t sleep on importance of Illinois Supreme Court races',
 '[0.6358052492141724, 0.36419475078582764]',
 "['negative', 'positive']",
 'negative',
 0.6358052492141724]

In [None]:
df[df['TopScore'] > .9]

Unnamed: 0,Title,Scores,Classes,PredictedClass,TopScore
0,The Illinois Department of Public Health Annou...,"[0.932061493396759, 0.06793850660324097]","['negative', 'positive']",negative,0.932061
4,Developer behind botched Chicago demolition fa...,"[0.9682002663612366, 0.031799715012311935]","['negative', 'positive']",negative,0.968200
7,Southern Illinois counties deal with funding s...,"[0.9539268612861633, 0.04607318341732025]","['negative', 'positive']",negative,0.953927
8,bankruptcy auction Chilicoti Illinois United S...,"[0.9123339056968689, 0.0876661166548729]","['negative', 'positive']",negative,0.912334
11,bankruptcy real estate Oak Park Illinois Unite...,"[0.9900885224342346, 0.009911508299410343]","['negative', 'positive']",negative,0.990089
...,...,...,...,...,...
172658,Chicago convenience store ransacked twice sinc...,"[0.9299383759498596, 0.07006166875362396]","['negative', 'positive']",negative,0.929938
172659,"As Chicago Nears Panic, Lightfoot’s Partisan B...","[0.9953692555427551, 0.004630750045180321]","['negative', 'positive']",negative,0.995369
172664,Longtime Illinois state senator hit with feder...,"[0.9149653315544128, 0.08503469079732895]","['negative', 'positive']",negative,0.914965
172668,Illinois governor says if you go to Missouri ‘...,"[0.9751619100570679, 0.02483810856938362]","['negative', 'positive']",negative,0.975162
