### Spacy Module
Spacy is another text mining module, like nltk. It has similar capabilites as nltk, as shown below. There are some extra functionalities for spacy such as sentence parsing and many more. The syntax is also quite simple

In [5]:
import spacy
nlp = spacy.load('en') #loading the language module. 

In [15]:
TEXT = '''Avul Pakir Jainulabdeen Abdul Kalam better known as A. P. J. Abdul Kalam (/ˈæbdʊl kəˈlɑːm/ (About this sound listen); 15 October 1931 – 27 July 2015), was the 11th President of India from 2002 to 2007. A career scientist turned statesman, Kalam was born and raised in Rameswaram, Tamil Nadu, and studied physics and aerospace engineering. 
He spent the next four decades as a scientist and science administrator, mainly at the Defence Research and Development Organisation (DRDO) and Indian Space Research Organisation (ISRO) and was intimately involved in India's civilian space programme and military missile development efforts. 
He thus came to be known as the Missile Man of India for his work on the development of ballistic missile and launch vehicle technology. 
He also played a pivotal organisational, technical, and political role in India's Pokhran-II nuclear tests in 1998, the first since the original nuclear test by India in 1974.'''

In [47]:
doc = nlp(TEXT) #this applies all the common functions that we have studied so far.

#### Word tokenizing

In [48]:
print([x for x in doc]) ## Word Tokenizing

[Avul, Pakir, Jainulabdeen, Abdul, Kalam, better, known, as, A., P., J., Abdul, Kalam, (, /ˈæbdʊl, kəˈlɑːm/, (, About, this, sound, listen, ), ;, 15, October, 1931, –, 27, July, 2015, ), ,, was, the, 11th, President, of, India, from, 2002, to, 2007, ., A, career, scientist, turned, statesman, ,, Kalam, was, born, and, raised, in, Rameswaram, ,, Tamil, Nadu, ,, and, studied, physics, and, aerospace, engineering, ., 
, He, spent, the, next, four, decades, as, a, scientist, and, science, administrator, ,, mainly, at, the, Defence, Research, and, Development, Organisation, (, DRDO, ), and, Indian, Space, Research, Organisation, (, ISRO, ), and, was, intimately, involved, in, India, 's, civilian, space, programme, and, military, missile, development, efforts, ., 
, He, thus, came, to, be, known, as, the, Missile, Man, of, India, for, his, work, on, the, development, of, ballistic, missile, and, launch, vehicle, technology, ., 
, He, also, played, a, pivotal, organisational, ,, technical, ,,

#### Sentence Tokenizing

In [49]:
print([x for x in doc.sents]) ## Sentence tokenizing

[Avul Pakir Jainulabdeen Abdul Kalam better known as A. P. J. Abdul Kalam (/ˈæbdʊl kəˈlɑːm/, (About this sound listen); 15 October 1931 – 27 July 2015), was the 11th President of India from 2002 to 2007., A career scientist turned statesman, Kalam was born and raised in Rameswaram, Tamil Nadu, and studied physics and aerospace engineering. 
, He spent the next four decades as a scientist and science administrator, mainly at the Defence Research and Development Organisation (DRDO) and Indian Space Research Organisation (ISRO) and was intimately involved in India's civilian space programme and military missile development efforts. 
, He thus came to be known as the Missile Man of India for his work on the development of ballistic missile and launch vehicle technology. 
, He also played a pivotal organisational, technical, and political role in India's Pokhran-II nuclear tests in 1998, the first since the original nuclear test by India in 1974.]


#### POS Tagging

In [50]:
print([(x, x.tag_) for x in doc]) ## POS Tagging

[(Avul, 'NNP'), (Pakir, 'NNP'), (Jainulabdeen, 'NNP'), (Abdul, 'NNP'), (Kalam, 'NNP'), (better, 'RBR'), (known, 'VBN'), (as, 'IN'), (A., 'NNP'), (P., 'NNP'), (J., 'NNP'), (Abdul, 'NNP'), (Kalam, 'NNP'), ((, '-LRB-'), (/ˈæbdʊl, 'NFP'), (kəˈlɑːm/, 'NNP'), ((, '-LRB-'), (About, 'RB'), (this, 'DT'), (sound, 'JJ'), (listen, 'NN'), (), '-RRB-'), (;, ':'), (15, 'CD'), (October, 'NNP'), (1931, 'CD'), (–, 'SYM'), (27, 'CD'), (July, 'NNP'), (2015, 'CD'), (), '-RRB-'), (,, ','), (was, 'VBD'), (the, 'DT'), (11th, 'JJ'), (President, 'NNP'), (of, 'IN'), (India, 'NNP'), (from, 'IN'), (2002, 'CD'), (to, 'IN'), (2007, 'CD'), (., '.'), (A, 'DT'), (career, 'NN'), (scientist, 'NN'), (turned, 'VBD'), (statesman, 'NN'), (,, ','), (Kalam, 'NNP'), (was, 'VBD'), (born, 'VBN'), (and, 'CC'), (raised, 'VBN'), (in, 'IN'), (Rameswaram, 'NNP'), (,, ','), (Tamil, 'NNP'), (Nadu, 'NNP'), (,, ','), (and, 'CC'), (studied, 'VBD'), (physics, 'NN'), (and, 'CC'), (aerospace, 'NN'), (engineering, 'NN'), (., '.'), (
, ''), (He

#### Named Entities

In [51]:
for x in doc.ents:
    print(x,'---', x.label_)


Pakir Jainulabdeen Abdul Kalam --- PERSON
A. P. J. Abdul Kalam ( --- PERSON
15 October 1931 – 27 July 2015 --- DATE
11th --- ORDINAL
India --- GPE
2002 to 2007 --- DATE
Kalam --- ORG
Rameswaram --- GPE
Tamil Nadu --- PERSON

 --- GPE
the next four decades --- DATE
the Defence Research and Development Organisation (DRDO --- ORG
Indian Space Research Organisation --- ORG
ISRO --- ORG
India --- GPE

 --- GPE
India --- GPE

 --- GPE
India --- GPE
Pokhran-II --- EVENT
1998 --- DATE
first --- ORDINAL
India --- GPE
1974 --- DATE


Spacy can also be used to parse grammer and extract relations between different phrases.

In [52]:
TEXTS = [
    'Net income was $9.4 million compared to the prior year of $2.7 million.',
    'Revenue exceeded twelve billion dollars, with a loss of $1b.',
    'The amount robbed from the bank was 10$'
]

In [53]:
def extract_currency_relations(doc):
    # merge entities and noun chunks into one token
    spans = list(doc.ents) + list(doc.noun_chunks)
    for span in spans:
        span.merge()

    relations = []
    for money in filter(lambda w: w.ent_type_ == 'MONEY', doc):
        if money.dep_ in ('attr', 'dobj'):
            subject = [w for w in money.head.lefts if w.dep_ == 'nsubj']
            if subject:
                subject = subject[0]
                relations.append((subject, money))
        elif money.dep_ == 'pobj' and money.head.dep_ == 'prep':
            relations.append((money.head.head, money))
    return relations

In [54]:
print("Processing %d texts" % len(TEXTS))

for text in TEXTS:
    doc = nlp(text)
    relations = extract_currency_relations(doc)
    for r1, r2 in relations:
        print('{:<10}\t{}\t{}'.format(r1.text, r2.ent_type_, r2.text))

Processing 3 texts
Net income	MONEY	$9.4 million
the prior year	MONEY	$2.7 million
Revenue   	MONEY	twelve billion dollars
a loss    	MONEY	1b
The amount	MONEY	10$
