[View in Colaboratory](https://colab.research.google.com/github/zhh210/flea_market/blob/master/wt_spacy.ipynb)

#spaCy Key Features

Demo of some potentially useful features of spaCy. For a complete list of features, visit [https://spacy.io/usage/linguistic-features](https://spacy.io/usage/linguistic-features)

##Download, Configure, and Initialize

It takes only a few lines to set everything up on Google but configuring a similar environment is overwhelming on MS clusters.

In [0]:
!pip install spacy
!pip install nltk
!python -m spacy download en_core_web_lg

In [0]:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')

## Named Entity Recognition
For a complete list of built-in entities, visit [https://spacy.io/api/annotation](https://spacy.io/api/annotation)

In [35]:
displacy.render(nlp(u'''
                   International Business Machines Corp.'s IBM, +0.19% credit rating was downgraded to A1 from A3 Wednesday at Moody's, which predicted Big Blue would continue to be challenged despite making a number of investments to transform into a cloud-services company.
                   '''
                   ),jupyter=True,style='ent')

##Parsing with spaCy

spaCy's powerful at visualization which is very useful for dependency parsing.

In [4]:
displacy.render(nlp(u'The outlook is positive.'),jupyter=True,style='dep')

In [22]:
displacy.render(nlp(u'The performance is going very strong.'),jupyter=True,style='dep')

##Capture Patterned Phrases

You can use `highlight_desc()` to identify and highlight descriptive clauses, i.e. phrases following the patter of: Noun Phrase is Adj.

In [38]:
from spacy.matcher import Matcher
example = u'''
The performance is going very strong. Research Update:
Apple Inc. 'AA+' Rating Affirmed Upon Review Of
Its Financial Policy; Outlook Stable

Overview

 We have reviewed Apple's new financial policy, which is expected to move

the company to a net cash neutral position over time, through declines in
both cash and debt balances, as well as continued improvements to its
overall business through increased scale, consistent profitability, and
strengthened ecosystem through growth of its services business.

 We are affirming all of our ratings on Apple Inc., including the 'AA+'

corporate credit rating.

 The stable outlook reflects our expectation that Apple Inc. will maintain

a commitment to a minimal financial risk profile as it shrinks its
balance sheet over the next several years through shareholder returns,
debt repayments, and potential acquisitions.

Rating Action

On May 23, 2018, S&P Global Ratings affirmed its 'AA+' corporate credit rating
on Cupertino, Calif.-based Apple Inc. The outlook is stable.

At the same time, we are affirming our 'AA+' issue-level rating on the
company's senior unsecured debt and our 'A-1+' rating on its commercial paper.

Rationale

The affirmation of our 'AA+' corporate credit rating reflects Apple's improved
scale and profitability over the past several years and our expectation that
its services segment will provide growing recurring revenue and high margins
despite the competitive smartphone industry landscape, thereby enhancing the
value of its ecosystem and providing more predictable, albeit modest, revenue
growth over the longer term. This, in our view, offsets the revision to its
financial policy whereby Apple intends to become roughly net cash neutral over
time, because we expect the company to maintain credit metrics commensurate
with its minimal financial risk profile and the 'AA+' credit rating.

Apple is the largest U.S.-based provider of mobile devices, personal
computers, and related products and services. The outlook is good. But this one does not count.
'''
def highlight_desc(paragraph=example):
    matcher = Matcher(nlp.vocab)
    matched_sents = [] # collect data of matched sentences to be visualized
    matched_s = []
    
    def collect_sents(matcher, doc, i, matches):
        match_id, start, end = matches[i]
        span = doc[start : end]  # matched span
        sent = span.sent  # sentence containing matched span
        # append mock entity for match in displaCy style to matched_sents
        # get the match span by ofsetting the start and end of the span with the
        # start and end of the sentence in the doc
        match_ents = [{'start': span.start_char - sent.start_char,\
                   'end': span.end_char - sent.start_char,\
                   'label': '[descriptive]'}]
        matched_sents.append({'text': sent.text, 'ents': match_ents })
        
    pattern = [{'POS': 'DET','OP':'?'}, {'POS':'NOUN','OP':'+'}, {'LEMMA': 'be'},{'POS':'VERB','OP':'?'}, {'POS': 'ADV', 'OP': '*'},
           {'POS': 'ADJ'}]
    matcher.add('DESC', collect_sents, pattern)
    
    for sentence in nlp(paragraph).sents:
        matched_sents = []
        matches = matcher(nlp(sentence.text))
        if matched_sents:
            matched_s.append(matched_sents[0])
        else:
            matched_s.append({'text':sentence.text,'ents':[]})
        
    displacy.render(matched_s,manual=True,jupyter=True,style='ent')
  
highlight_desc()