# Finding reporting based on assumed facts 

## Approaches 
### 1.Search for nouns + dates 
- Works great for most cases (see __title_nouns__).
- Does not work when pronouns used. 
```
__Sample Output__
ORIGINAL : Heavy shelling in Kharkiv again
NOUN : [kharkiv]
```

### 2. Extracting Keywords from SpaCy
- Works really well for verbs also 
```
__Sample Output__ 
ORIGINAL : Heavy shelling in Kharkiv again
NOUN : ['heavy', 'shelling', 'kharkiv']
```


------------

# Necessary Imports

In [1]:
import nltk

In [2]:
nltk.download('punkt')
nltk.download('brown')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\tbnc\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\tbnc\AppData\Roaming\nltk_data...
[nltk_data]   Package brown is already up-to-date!


True

--------------------------------------------------------------------------------------------------------------------

# Experiments

In [3]:
import pandas as pd
from textblob import TextBlob

In [4]:
gold = pd.read_csv('../data/processed/input-data.csv',parse_dates=['time_parsed'])

In [52]:
gold.iloc[15]['title']

'Ukrainian FM Kuleba:In a call with @SecBlinken we coordinated further support for Ukraine. We both agree that more needs to be done to stop Russian aggression and hold Russia accountable for its crimes. Grateful to the U.S. for firmly standing by the people of Ukraine. Ukraine will prevail'

In [5]:
gold.dropna(subset=['title'],inplace=True)
gold.head()


Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,uid,title,time,source,time-capture,sc,domain,full-text,sc2,geo,time_parsed
0,0,0,405ca5ff-f11a-42bf-8ba7-4856c9c88921,Czech Prime Minister: The Russian army committ...,a few seconds ago,https://twitter.com/AJABreaking/status/1510630...,2022-04-03 10:54:54.486609,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\na few sec...,False,50°5′N 14°25′E,2022-04-03 10:55:03
1,1,1,58755ebd-e61e-4311-9c2c-4a088a736ec9,Heavy shelling in Kharkiv again,7 minutes ago,https://t.me/kharkivlife/33583,2022-04-03 10:54:59.073486,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n8 minutes...,False,50°2′N 36°15′E,2022-04-03 10:48:03
2,0,0,0ca7c5db-1bc6-47a5-9551-03b5ad6916be,Artillery command post vehicle was destroyed i...,4 hours ago,https://twitter.com/shtirlitz53/status/1503127...,2022-03-13 21:33:04.567304,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n4 hours a...,False,51°6′N 31°29′E,2022-03-13 17:38:34
3,4,4,5f670b8d-94d6-48df-b643-a8f1d0ad776e,Ukrainian army shotdown Tu-243 UAV with MANPADS,5 hours ago,https://t.me/operativnoZSU/13366,2022-03-13 21:33:25.411086,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n5 hours a...,False,51°16′N 29°10′E,2022-03-13 16:38:34
4,5,5,9e4ca1eb-1391-4452-b144-221afc5809c3,"According to the Financial Times, Russia asked...",6 hours ago,https://twitter.com/LukaszBok/status/150309682...,2022-03-13 21:33:29.963811,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n6 hours a...,False,55°45′N 37°37′E,2022-03-13 15:38:34


## Getting Nouns

In [6]:
def get_nouns(text):
    blob = TextBlob(text)
    return list(blob.noun_phrases)

In [7]:
gold['title_nouns'] = gold['title'].apply(lambda row : get_nouns(row) )

In [8]:
demo = gold.iloc[1]['title']

In [9]:
## Getting =Hotwords

In [10]:
import spacy
from collections import Counter
from string import punctuation

In [11]:
nlp = spacy.load("en_core_web_md")

In [12]:
def get_hotwords(text):
    result = []
    pos_tag = ['PROPN', 'ADJ', 'NOUN'] # 1
    doc = nlp(text.lower()) # 2
    for token in doc:
        # 3
        if(token.text in nlp.Defaults.stop_words or token.text in punctuation):
            continue
        # 4
        if(token.pos_ in pos_tag):
            result.append(token.text)
                
    return result # 5

In [13]:
gold['title_hotwords'] = gold['title'].apply(lambda row : get_hotwords(row) )

In [14]:
gold

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,uid,title,time,source,time-capture,sc,domain,full-text,sc2,geo,time_parsed,title_nouns,title_hotwords
0,0,0,405ca5ff-f11a-42bf-8ba7-4856c9c88921,Czech Prime Minister: The Russian army committ...,a few seconds ago,https://twitter.com/AJABreaking/status/1510630...,2022-04-03 10:54:54.486609,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\na few sec...,False,50°5′N 14°25′E,2022-04-03 10:55:03,"[czech prime, russian army, war crimes, ukraine]","[czech, prime, minister, russian, army, war, c..."
1,1,1,58755ebd-e61e-4311-9c2c-4a088a736ec9,Heavy shelling in Kharkiv again,7 minutes ago,https://t.me/kharkivlife/33583,2022-04-03 10:54:59.073486,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n8 minutes...,False,50°2′N 36°15′E,2022-04-03 10:48:03,[kharkiv],"[heavy, shelling, kharkiv]"
2,0,0,0ca7c5db-1bc6-47a5-9551-03b5ad6916be,Artillery command post vehicle was destroyed i...,4 hours ago,https://twitter.com/shtirlitz53/status/1503127...,2022-03-13 21:33:04.567304,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n4 hours a...,False,51°6′N 31°29′E,2022-03-13 17:38:34,"[artillery, command post vehicle, ukrainian, d...","[artillery, command, post, vehicle, ukrainian,..."
3,4,4,5f670b8d-94d6-48df-b643-a8f1d0ad776e,Ukrainian army shotdown Tu-243 UAV with MANPADS,5 hours ago,https://t.me/operativnoZSU/13366,2022-03-13 21:33:25.411086,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n5 hours a...,False,51°16′N 29°10′E,2022-03-13 16:38:34,"[ukrainian, army shotdown, tu-243 uav, manpads]","[ukrainian, army, uav, manpads]"
4,5,5,9e4ca1eb-1391-4452-b144-221afc5809c3,"According to the Financial Times, Russia asked...",6 hours ago,https://twitter.com/LukaszBok/status/150309682...,2022-03-13 21:33:29.963811,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n6 hours a...,False,55°45′N 37°37′E,2022-03-13 15:38:34,"[according, financial, russia, china, military...","[financial, times, russia, china, military, eq..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2916,67,67,327a4c38-904a-4dae-a5b3-1dd05b83bb3a,Ukrainian military captured 2 Russian servicem...,9 hours ago,https://twitter.com/armedforcesukr/status/1497...,2022-02-26 18:28:27.136903,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n9 hours a...,False,51°38′N 31°12′E,2022-02-26 09:28:40,"[ukrainian, russian servicemen, chernihiv, uni...","[ukrainian, military, russian, servicemen, che..."
2917,68,68,0242830b-8b88-40c5-b388-1d0d0c606ecd,Russia has fired over 250 mostly short-range b...,9 hours ago,https://twitter.com/LucasFoxNews/status/149758...,2022-02-26 18:28:31.468294,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n9 hours a...,False,38°52′N 77°3′W,2022-02-26 09:28:40,"[russia, short-range ballistic missiles, ukrai...","[russia, short, range, ballistic, missiles, uk..."
2918,69,69,e942d6c3-2282-4523-b817-94e453af0eeb,"In Przemys, about fifteen kilometers from the ...",9 hours ago,https://twitter.com/Le_Figaro/status/149758377...,2022-02-26 18:28:35.920402,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n9 hours a...,False,49°47′N 22°46′E,2022-02-26 09:28:40,"[przemys, ukrainian, rail security, extra acco...","[przemys, kilometers, ukrainian, border, army,..."
2919,0,0,ab69c083-9d9b-4f53-bfd0-d8f2ba0fb579,"Kharkiv, Kharkivska Oblast(16:27). Red Alert: ...",25 minutes ago,https://t.me/suspilnekharkiv/10093,2022-03-23 10:53:09.129183,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\r\nTell friends\r\n25 minute...,False,49°30′N 36°30′E,2022-03-23 10:28:17,"[kharkiv, kharkivska oblast, alert, aerial thr...","[kharkiv, kharkivska, oblast(16:27, red, alert..."


# Search Experimenting

In [24]:
from newspaper import Article

## 1

`'artillery', 'command', 'post', 'vehicle', 'ukrainian', 'drone', 'strike' after:2022-03-13 before:2022-03-14`

In [50]:
print(gold.iloc[2]['time_parsed'])
print(gold.iloc[2]['title_hotwords'])
print(gold.iloc[2]['title'])


2022-03-13 17:38:34
['artillery', 'command', 'post', 'vehicle', 'ukrainian', 'drone', 'strike']
Artillery command post vehicle was destroyed in Ukrainian drone strike


#### Dailymail dailymail.co.uk

In [37]:

dailymail = 'https://www.dailymail.co.uk/news/article-10609367/Ukraine-army-wipes-Russian-armoured-vehicles-command-centre.html'
dailymail_a = Article(dailymail)
dailymail_a.download()
dailymail_a.parse()
print(dailymail_a.authors)
print('----')
print(dailymail_a.publish_date)
print('----')
print(dailymail_a.text)
print('----')
dailymail_a.nlp()
print('----')
print(dailymail_a.summary)
print('----')






['Katie Feehan', 'Katie Feehan For Mailonline']
----
2022-03-14 01:21:02+00:00
----
Newly released footage filmed on defender drones shows the Ukrainian army wiping out multiple Russian armoured vehicles and a command centre as the war enters its 18th day.

The first video was shared online on Sunday morning and appears to show rockets being fired at three armoured vehicles in the Ukrainian south-eastern city of Mariupol.

The aerial footage, which was circulated by several unverified accounts, shows a BTR-82 APC and KamAZ-63968 'Typhoon' vehicle being targeted successfully. It is not clear when the strike took place.

Newly released footage filmed on defender drones shows the Ukrainian army wiping out multiple Russian armoured vehicles in Mariupol (pictured) and a command centre in Vasylivka

The footage was shared online by unverified accounts and showed armoured vehicles hit

Pictured: Plumes of thick grey smoke were seen billowing into the sky after the drone strike

The BTR-82A is

In [39]:
businessInsider = 'https://www.businessinsider.com/videos-purport-to-show-ukriane-bayraktar-strikes-on-russia-vehicles-2022-3'
businessInsider_a = Article(businessInsider)
businessInsider_a.download()
businessInsider_a.parse()
print(businessInsider_a.authors)
print('----')
print(businessInsider_a.publish_date)
print('----')
print(businessInsider_a.text)
print('----')
businessInsider_a.nlp()
print('----')
print(businessInsider_a.summary)
print('----')

['Mia Jankowicz']
----
2022-03-14 00:00:00
----
Ukrainian armed forces released several videos it says show drone strikes on Russian targets.

The Bayraktar TB2 drone has taken an outsize role in defending Ukraine.

Ukrainian forces did not name the locations of the hits, which are difficult to verify.

Sign up for our weekday newsletter, packed with original analysis, news, and trends — delivered right to your inbox. Loading Something is loading. Email address By clicking ‘Sign up’, you agree to receive marketing emails from Insider as well as other partner offers and accept our Terms of Service and Privacy Policy

Ukraine's military published several videos it says show its prized Bayraktar TB2 drones at work destroying targets controlled by Russian forces.

Over the weekend, the Ukrainian army commander-in-chief's Facebook page posted five clips showing the strikes, though giving little detail of the exact locations or targets.

The drones have been hailed as a game-changer by the U

In [None]:

dailymail = 'https://www.dailymail.co.uk/news/article-10609367/Ukraine-army-wipes-Russian-armoured-vehicles-command-centre.html'
dailymail_a = Article(dailymail)
dailymail_a.download()
dailymail_a.parse()
print(dailymail_a.authors)
print('----')
print(dailymail_a.publish_date)
print('----')
print(dailymail_a.text)
print('----')
dailymail_a.nlp()
print('----')
print(dailymail_a.summary)
print('----')






## 2
`'major', 'internet', 'disruption', 'vinasterisk', 'network', 'vinnytsia', 'oblast', 'western', 'ukraine', 'operator', 'massive', 'cyberattack', 'elements', 'sabotage' after:2022-03-13 before:2022-03-15`

In [41]:
print(gold.iloc[5]['time_parsed'])
print(gold.iloc[5]['title_hotwords'])

2022-03-13 15:38:34
['major', 'internet', 'disruption', 'vinasterisk', 'network', 'vinnytsia', 'oblast', 'western', 'ukraine', 'operator', 'massive', 'cyberattack', 'elements', 'sabotage']


## 3
### LMAO, my dataset is so messed up but thankfully geo saves me 

`'victim', 'karrington', 'smith', 'local', 'hospital', 'child', 'weeks', 'pregnant' after:2022-03-13 before 2022-03-15`


In [46]:
print(gold.iloc[20]['time_parsed'])
print(gold.iloc[20]['title_hotwords'])
print(gold.iloc[20]['title'])
print(gold.iloc[20]['geo'])


2022-03-13 13:38:34
['victim', 'karrington', 'smith', 'local', 'hospital', 'child', 'weeks', 'pregnant']
The victim, Karrington Smith, 17, was taken to a local hospital where she and her child later died. She was 25 weeks pregnant
30°23′N 91°3′W


## 4
`'president', 'joe', 'biden', 'brussels', 'belgium', 'nato', 'summit', 'thursday', 'ongoing', 'deterrence', 'defense', 'efforts', 'response', 'russia', 'unprovoked', 'unjustified', 'attack', 'ukraine', 'ironclad', 'commitment', 'nato', 'psaki after:2022-03-15 before 2022-03-17`

- sometimes the news is reported as subtext, to argument something else

In [47]:
print(gold.iloc[100]['time_parsed'])
print(gold.iloc[100]['title_hotwords'])
print(gold.iloc[100]['title'])
print(gold.iloc[100]['geo'])


2022-03-15 13:11:51
['president', 'joe', 'biden', 'brussels', 'belgium', 'nato', 'summit', 'thursday', 'ongoing', 'deterrence', 'defense', 'efforts', 'response', 'russia', 'unprovoked', 'unjustified', 'attack', 'ukraine', 'ironclad', 'commitment', 'nato', 'psaki']
President Joe Biden will travel to Brussels, Belgium to for a NATO Summit next Thursday "to discuss ongoing deterrence and defense efforts in response to Russia's unprovoked & unjustified attack on Ukraine as well as to refer reaffirm our ironclad commitment to NATO." -Jen Psaki
38°53′N 77°2′W


In [49]:

politico = 'https://www.politico.com/news/2022/03/17/white-house-bidens-covid-scares-00018384'
politico_a = Article(politico)
politico_a.download()
politico_a.parse()
print(politico_a.authors)
print('----')
print(politico_a.publish_date)
print('----')
print(politico_a.text)
print('----')
politico_a.nlp()
print('----')
print(politico_a.summary)
print('----')






[]
----
2022-03-17 00:00:00
----
Psaki was pressed repeatedly during her daily briefing on whether the president would be taking another Covid test this week, and why Biden isn’t tested daily. The questions followed second gentleman Doug Emhoff’s positive test, and then Irish Prime Minister Micheál Martin’s Covid case, which on Wednesday night caused him to leave in the middle of the Irish Funds Gala, where Biden and House Speaker Nancy Pelosi were in attendance.


States across the U.S., as well as Washington, D.C., have lifted restrictions in recent weeks as Covid cases dropped. The White House eased its mask mandate in early March.

“He was not tested today. He was tested last Sunday. Neither of these individuals were considered close contacts,” Psaki said, referring to Emhoff and Martin defending the White House testing protocol. She added that everyone is tested before meeting with the president.

“The testing modes are a little bit different around here because we are around the 

----------------

# References 

|Ref|Link|
|-|-|
|Hotwords, Spacy|https://betterprogramming.pub/extract-keywords-using-spacy-in-python-4a8415478fbf |
|Google Time Search|https://twitter.com/searchliaison/status/1115707059998052354?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1115707059998052354%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fmashable.com%2Farticle%2Fgoogle-search-by-date|


