# Finding reporting based on assumed facts 

## Approaches 
### 1.Search for nouns + dates 
- Works great for most cases (see __title_nouns__).
- Does not work when pronouns used. 
```
__Sample Output__
ORIGINAL : Heavy shelling in Kharkiv again
NOUN : [kharkiv]
```

### 2. Extracting Keywords from SpaCy
- Works really well for verbs also 
```
__Sample Output__ 
ORIGINAL : Heavy shelling in Kharkiv again
NOUN : ['heavy', 'shelling', 'kharkiv']
```


------------

# Necessary Imports

In [1]:
import nltk

In [2]:
nltk.download('punkt')
nltk.download('brown')

[nltk_data] Downloading package punkt to /Users/caochao/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package brown to /Users/caochao/nltk_data...
[nltk_data]   Package brown is already up-to-date!


True

--------------------------------------------------------------------------------------------------------------------

# Experiments

In [3]:
import pandas as pd
from textblob import TextBlob

In [4]:
gold = pd.read_csv('../data/processed/input-data.csv',parse_dates=['time_parsed'])

In [5]:
gold.iloc[15]['title']

'The An-26 aircraft of the Russian Air Forces crashed in the Voronezh region, the crew died'

In [6]:
gold.dropna(subset=['title'],inplace=True)
gold.head()


Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,uid,title,time,source,time-capture,sc,domain,full-text,sc2,geo,time_parsed
0,0,0,3c73b045-6851-43e5-b26c-f603492a4b12,"According to the Joint Forces Command, in the ...",17 minutes ago,https://twitter.com/AlexKhrebet/status/1496864...,2022-02-24 10:24:47.074628,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n17 minutes ag...,False,51°40′N 33°55′E,2022-02-24 10:08:47
1,1,1,cbcc74f4-a712-4786-9372-89bde1d13d66,Kherson State Administration: Russian troops c...,19 minutes ago,https://www.facebook.com/khoda.gov.ua/posts/31...,2022-02-24 10:24:51.673285,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n19 minutes ag...,False,46°39′N 32°43′E,2022-02-24 10:06:47
2,2,2,2e90f59f-7783-4447-a197-2e57487b21c7,"White House: At 12:30 PM Biden ""delivers remar...",20 minutes ago,https://twitter.com/ZekeJMiller/status/1496863...,2022-02-24 10:24:56.340932,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n20 minutes ag...,False,38°53′N 77°2′W,2022-02-24 10:05:47
3,3,3,8c0791d4-707a-4cf5-aaa3-398f3b4357d2,Radar station Russia bombed on outskirts on Ma...,38 minutes ago,https://twitter.com/RichardEngel/status/149685...,2022-02-24 10:25:02.899210,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n38 minutes ag...,False,47°7′N 37°42′E,2022-02-24 09:47:47
4,4,4,645dc18a-5e4c-4c8b-ad15-d13990a83aeb,Clashes now in Chornobyl exclusion zone near t...,an hour ago,https://twitter.com/13Hellis_13/status/1496856...,2022-02-24 10:25:07.416979,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",,False,,2022-02-24 09:25:47


## Getting Nouns

In [7]:
def get_nouns(text):
    blob = TextBlob(text)
    return list(blob.noun_phrases)

In [8]:
gold['title_nouns'] = gold['title'].apply(lambda row : get_nouns(row) )

In [9]:
demo = gold.iloc[1]['title']

In [10]:
## Getting =Hotwords

In [11]:
import spacy
from collections import Counter
from string import punctuation

In [12]:
nlp = spacy.load("en_core_web_md")

In [13]:
def get_hotwords(text):
    result = []
    pos_tag = ['PROPN', 'ADJ', 'NOUN'] # 1
    doc = nlp(text.lower()) # 2
    for token in doc:
        # 3
        if(token.text in nlp.Defaults.stop_words or token.text in punctuation):
            continue
        # 4
        if(token.pos_ in pos_tag):
            result.append(token.text)
                
    return result # 5

In [14]:
gold['title_hotwords'] = gold['title'].apply(lambda row : get_hotwords(row) )

In [15]:
gold

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,uid,title,time,source,time-capture,sc,domain,full-text,sc2,geo,time_parsed,title_nouns,title_hotwords
0,0,0,3c73b045-6851-43e5-b26c-f603492a4b12,"According to the Joint Forces Command, in the ...",17 minutes ago,https://twitter.com/AlexKhrebet/status/1496864...,2022-02-24 10:24:47.074628,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n17 minutes ag...,False,51°40′N 33°55′E,2022-02-24 10:08:47,"[according, joint forces, hlukhov, ukraine, 's...","[joint, forces, command, hlukhov, area, ukrain..."
1,1,1,cbcc74f4-a712-4786-9372-89bde1d13d66,Kherson State Administration: Russian troops c...,19 minutes ago,https://www.facebook.com/khoda.gov.ua/posts/31...,2022-02-24 10:24:51.673285,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n19 minutes ag...,False,46°39′N 32°43′E,2022-02-24 10:06:47,"[kherson, state administration, russian troops...","[kherson, state, administration, russian, troo..."
2,2,2,2e90f59f-7783-4447-a197-2e57487b21c7,"White House: At 12:30 PM Biden ""delivers remar...",20 minutes ago,https://twitter.com/ZekeJMiller/status/1496863...,2022-02-24 10:24:56.340932,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n20 minutes ag...,False,38°53′N 77°2′W,2022-02-24 10:05:47,"[white house, pm biden, delivers remarks, russ...","[white, house, pm, biden, remarks, russia, unp..."
3,3,3,8c0791d4-707a-4cf5-aaa3-398f3b4357d2,Radar station Russia bombed on outskirts on Ma...,38 minutes ago,https://twitter.com/RichardEngel/status/149685...,2022-02-24 10:25:02.899210,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n38 minutes ag...,False,47°7′N 37°42′E,2022-02-24 09:47:47,"[radar, russia, mariupol]","[radar, station, russia, outskirts, mariupol]"
4,4,4,645dc18a-5e4c-4c8b-ad15-d13990a83aeb,Clashes now in Chornobyl exclusion zone near t...,an hour ago,https://twitter.com/13Hellis_13/status/1496856...,2022-02-24 10:25:07.416979,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",,False,,2022-02-24 09:25:47,"[clashes, chornobyl, exclusion zone, radioacti...","[clashes, chornobyl, exclusion, zone, radioact..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2916,25,25,b797b123-f1b6-4fd0-b8d7-27e3e2daeb63,NATO Secretary General @jensstoltenberg says h...,7 hours ago,https://twitter.com/rehannajb/status/151194491...,2022-04-07 09:35:31.444995,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n8 hours ago -...,False,50°52′N 4°25′E,2022-04-07 02:35:57,"[nato secretary, general @ jensstoltenberg, dm...","[nato, secretary, general, @jensstoltenberg, @..."
2917,26,26,109589e2-a2ca-4270-bcd9-07ca2d2fa43f,Russian army attempted to break through the de...,8 hours ago,https://twitter.com/novynarnia/status/15119427...,2022-04-07 09:35:35.759678,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n8 hours ago -...,False,48°42′N 38°37′E,2022-04-07 01:35:57,"[russian army, novotoshkivske, staff, forces, ...","[russian, army, defenses, novotoshkivske, gene..."
2918,27,27,9d5cc026-88b6-46f2-b817-87262d40674e,Ukrainian foreign minister says there is no di...,8 hours ago,https://twitter.com/haynesdeborah/status/15119...,2022-04-07 09:35:40.168772,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n8 hours ago -...,False,50°27′N 30°31′E,2022-04-07 01:35:57,"[ukrainian, foreign minister, defensive weapon...","[ukrainian, foreign, minister, distinction, of..."
2919,28,28,98033191-6898-4306-a0e0-f30e0def064c,3417 N Keeler: a call of 10 shots fired.,8 hours ago,https://twitter.com/CPD1617Scanner/status/1511...,2022-04-07 09:35:44.541527,False,"data:image/svg+xml;base64,PHN2ZyB2ZXJzaW9uPSIx...",SourceOn live map\nTell friends\n9 hours ago -...,False,41°56′N 87°43′W,2022-04-07 01:35:57,[n keeler],"[keeler, shots]"


# Search Experimenting

In [16]:
from newspaper import Article

## 1

`'artillery', 'command', 'post', 'vehicle', 'ukrainian', 'drone', 'strike' after:2022-03-13 before:2022-03-14`

In [17]:
print(gold.iloc[2]['time_parsed'])
print(gold.iloc[2]['title_hotwords'])
print(gold.iloc[2]['title'])


2022-02-24 10:05:47
['white', 'house', 'pm', 'biden', 'remarks', 'russia', 'unprovoked', 'unjustified', 'attack', 'ukraine']
White House: At 12:30 PM Biden "delivers remarks on Russia's unprovoked and unjustified attack on Ukraine"


#### Dailymail dailymail.co.uk

In [18]:

dailymail = 'https://www.dailymail.co.uk/news/article-10609367/Ukraine-army-wipes-Russian-armoured-vehicles-command-centre.html'
dailymail_a = Article(dailymail)
dailymail_a.download()
dailymail_a.parse()
print(dailymail_a.authors)
print('----')
print(dailymail_a.publish_date)
print('----')
print(dailymail_a.text)
print('----')
dailymail_a.nlp()
print('----')
print(dailymail_a.summary)
print('----')






['Katie Feehan', 'Katie Feehan For Mailonline']
----
2022-03-14 01:21:02+00:00
----
Newly released footage filmed on defender drones shows the Ukrainian army wiping out multiple Russian armoured vehicles and a command centre as the war enters its 18th day.

The first video was shared online on Sunday morning and appears to show rockets being fired at three armoured vehicles in the Ukrainian south-eastern city of Mariupol.

The aerial footage, which was circulated by several unverified accounts, shows a BTR-82 APC and KamAZ-63968 'Typhoon' vehicle being targeted successfully. It is not clear when the strike took place.

Newly released footage filmed on defender drones shows the Ukrainian army wiping out multiple Russian armoured vehicles in Mariupol (pictured) and a command centre in Vasylivka

The footage was shared online by unverified accounts and showed armoured vehicles hit

Pictured: Plumes of thick grey smoke were seen billowing into the sky after the drone strike

The BTR-82A is

In [19]:
businessInsider = 'https://www.businessinsider.com/videos-purport-to-show-ukriane-bayraktar-strikes-on-russia-vehicles-2022-3'
businessInsider_a = Article(businessInsider)
businessInsider_a.download()
businessInsider_a.parse()
print(businessInsider_a.authors)
print('----')
print(businessInsider_a.publish_date)
print('----')
print(businessInsider_a.text)
print('----')
businessInsider_a.nlp()
print('----')
print(businessInsider_a.summary)
print('----')

['Mia Jankowicz']
----
2022-03-14 00:00:00
----
Ukrainian armed forces released several videos it says show drone strikes on Russian targets.

The Bayraktar TB2 drone has taken an outsize role in defending Ukraine.

Ukrainian forces did not name the locations of the hits, which are difficult to verify.

Sign up for our weekday newsletter, packed with original analysis, news, and trends — delivered right to your inbox. Loading Something is loading. Email address By clicking ‘Sign up’, you agree to receive marketing emails from Insider as well as other partner offers and accept our Terms of Service and Privacy Policy

Ukraine's military published several videos it says show its prized Bayraktar TB2 drones at work destroying targets controlled by Russian forces.

Over the weekend, the Ukrainian army commander-in-chief's Facebook page posted five clips showing the strikes, though giving little detail of the exact locations or targets.

The drones have been hailed as a game-changer by the U

In [20]:

dailymail = 'https://www.dailymail.co.uk/news/article-10609367/Ukraine-army-wipes-Russian-armoured-vehicles-command-centre.html'
dailymail_a = Article(dailymail)
dailymail_a.download()
dailymail_a.parse()
print(dailymail_a.authors)
print('----')
print(dailymail_a.publish_date)
print('----')
print(dailymail_a.text)
print('----')
dailymail_a.nlp()
print('----')
print(dailymail_a.summary)
print('----')






['Katie Feehan', 'Katie Feehan For Mailonline']
----
2022-03-14 01:21:02+00:00
----
Newly released footage filmed on defender drones shows the Ukrainian army wiping out multiple Russian armoured vehicles and a command centre as the war enters its 18th day.

The first video was shared online on Sunday morning and appears to show rockets being fired at three armoured vehicles in the Ukrainian south-eastern city of Mariupol.

The aerial footage, which was circulated by several unverified accounts, shows a BTR-82 APC and KamAZ-63968 'Typhoon' vehicle being targeted successfully. It is not clear when the strike took place.

Newly released footage filmed on defender drones shows the Ukrainian army wiping out multiple Russian armoured vehicles in Mariupol (pictured) and a command centre in Vasylivka

The footage was shared online by unverified accounts and showed armoured vehicles hit

Pictured: Plumes of thick grey smoke were seen billowing into the sky after the drone strike

The BTR-82A is

## 2
`'major', 'internet', 'disruption', 'vinasterisk', 'network', 'vinnytsia', 'oblast', 'western', 'ukraine', 'operator', 'massive', 'cyberattack', 'elements', 'sabotage' after:2022-03-13 before:2022-03-15`

In [21]:
print(gold.iloc[5]['time_parsed'])
print(gold.iloc[5]['title_hotwords'])

2022-02-24 09:25:47
['multiple', 'sources', 'battle', 'underway', 'kyiv', 'hostomel', 'airport', 'base', 'antonov', 'airlines', 'aircraft', 'airport', 'an-74', 't', 'mriya']


## 3
### LMAO, my dataset is so messed up but thankfully geo saves me 

`'victim', 'karrington', 'smith', 'local', 'hospital', 'child', 'weeks', 'pregnant' after:2022-03-13 before 2022-03-15`


In [22]:
print(gold.iloc[20]['time_parsed'])
print(gold.iloc[20]['title_hotwords'])
print(gold.iloc[20]['title'])
print(gold.iloc[20]['geo'])


2022-02-24 10:05:56
['white', 'house', 'pm', 'biden', 'remarks', 'russia', 'unprovoked', 'unjustified', 'attack', 'ukraine']
White House: At 12:30 PM Biden "delivers remarks on Russia's unprovoked and unjustified attack on Ukraine"
38°53′N 77°2′W


## 4
`'president', 'joe', 'biden', 'brussels', 'belgium', 'nato', 'summit', 'thursday', 'ongoing', 'deterrence', 'defense', 'efforts', 'response', 'russia', 'unprovoked', 'unjustified', 'attack', 'ukraine', 'ironclad', 'commitment', 'nato', 'psaki after:2022-03-15 before 2022-03-17`

- sometimes the news is reported as subtext, to argument something else

In [23]:
print(gold.iloc[100]['time_parsed'])
print(gold.iloc[100]['title_hotwords'])
print(gold.iloc[100]['title'])
print(gold.iloc[100]['geo'])


2022-03-01 17:14:11
['zhytomyr', 'aerial', 'bombardment', 'residential', 'house', 'hospital', 'rubble', 'site', 'strike', 'damage', 'hospital']
Zhytomyr: aerial bombardment targeted residential house near hospital. Rubble on the site of strike, damage to hospital
50°16′N 28°36′E


In [24]:

politico = 'https://www.politico.com/news/2022/03/17/white-house-bidens-covid-scares-00018384'
politico_a = Article(politico)
politico_a.download()
politico_a.parse()
print(politico_a.authors)
print('----')
print(politico_a.publish_date)
print('----')
print(politico_a.text)
print('----')
politico_a.nlp()
print('----')
print(politico_a.summary)
print('----')






[]
----
2022-03-17 00:00:00
----

States across the U.S., as well as Washington, D.C., have lifted restrictions in recent weeks as Covid cases dropped. The White House eased its mask mandate in early March.

“He was not tested today. He was tested last Sunday. Neither of these individuals were considered close contacts,” Psaki said, referring to Emhoff and Martin defending the White House testing protocol. She added that everyone is tested before meeting with the president.

“The testing modes are a little bit different around here because we are around the president of the United States and the vice president and, of course, the first lady and the second gentleman,” Psaki said. “That means, before you see the President, if you are in for a meeting or you travel with him, you are tested. And everyone has different testing cadences, depending on their frequency of seeing him.”

Psaki referred to the Centers for Disease Control and Prevention’s definition of a close contact, in explainin

----------------

# References 

|Ref|Link|
|-|-|
|Hotwords, Spacy|https://betterprogramming.pub/extract-keywords-using-spacy-in-python-4a8415478fbf |
|Google Time Search|https://twitter.com/searchliaison/status/1115707059998052354?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1115707059998052354%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fmashable.com%2Farticle%2Fgoogle-search-by-date|


