In [90]:
text = '''The Group competes in the gifting market (the “ gifting market ”), which is large, evolving and highly competitive and includes the sale of greeting cards and
gifts. The Group faces significant competition from a wide range of companies, ranging from traditional brick and mortar competitors that serve the offline
channel to other online gifting companies. The Group’s offline competitors include specialist greeting cards, supermarkets and other retailers, generalists,
stationers, discount chains and florists. The Group also competes with online greeting card companies; online flower specialists; and online gift specialists.
Some of the Group’s competitors, particularly supermarkets, general merchandise discounters and stationery retailers, may have larger and broader
customer bases, wider distribution channels, substantially greater financial, technical or marketing resources, stronger brand or name recognition or a lower
cost base than the Group. Some of the Group’s competitors may have greater research and development resources and be able to adapt to changes in
customer requirements, customer preferences or attitudes toward design content and gifting products faster, launch innovative products more quickly, more
readily take advantage of acquisitions and other opportunities, or have more established relationships with third-party suppliers, which could result in the
Group not being able to compete as effectively and lose its market position. The Group’s competitors may also aggressively discount their products in order
to gain market share, which could result in pricing pressures, reduced profit margins, lost market share, or a failure to grow market share for the Group.
Within each of the UK and the Netherlands, the Group has benefited from its strong positions in the gifting market, but competition could intensify as
traditional retailers expand their digital, online and app-based sales capabilities and potential new competitors enter the gifting market. The Group could
face an increase in online competition as a result of competitors prioritising investments in new or improved online platforms to deal with disruptions faced
as a result of the novel strain of coronavirus causing Covid-19 disease (“ Covid-19 ”) and the expected continued shift to online purchasing, which may
make it more difficult for the Group to maintain its market share. The Group competes, and could increasingly compete in the future, with alternatives or
substitutes to the Group’s products, whether that is the increasing use of electronic gift cards as substitutes for physical gifts, social media or other
companies that host and enable the posting of greetings, images, electronic or other gifts, e-cards or other innovations and developments. In addition, the
Group competes and may increasingly compete in the future with alternative business models that serve the gifting markets, including greeting card
subscription services, flower subscription services, or apps or websites that provide free products (only charging for any up-sells of attachments to the free
base product) that compete or may serve as a substitute to the Group’s gifting products. If such alternative communications media or substitutions for the
Group’s products appeal more to the Group’s existing customers or potential customers and the Group fails to innovate its product offering in a manner that
continues to be attractive to its customers or enables the Group to maintain its existing margins, the Group may be unable to compete effectively. If the
Group is not able to compete effectively against its current or potential competitors, this could have a material adverse effect on the Group’s business,
results of operations, financial condition or prospects.'''

In [77]:
text = '''The Group’s success depends on its ability to retain and engage existing customers and attract new customers. Factors that could influence such an ability
include: • technical or other problems preventing the Group from delivering its products in a timely and reliable manner or otherwise negatively affecting the
customer experience; • any unavailability or delay to customers of the Group’s products, including products sourced from third parties or shipped by third
parties which may arise due to factors outside the Group’s control due to changes in the service level agreements or the reliability of services provided by
regulated postal services on which the Group must rely; • any pricing changes for the Group’s products are negatively received or the pricing of the Group’s
competitors change; • ineffective marketing or brand promotion campaigns by the Group or effective marketing or brand promotion campaigns by the
Group’s competitors; • security breaches leading to the Group’s loss of confidential customer or employee information; • negative publicity surrounding the
Group, its brands or its products for any reason, including that resulting from negative online reviews or unfavourable press coverage; • a perception that
the Group acts in an irresponsible manner, including with respect to its environmental, social or corporate responsibility; or • if there are adverse changes
that are mandated by legislation, regulatory authorities or litigation that impacts the Group’s ability to market its products to customers. If the Group is
unable to continue to offer gifts that are attractive to its customers, to obtain such products at costs that allow it to sell such products at a profit or to market
such products effectively to customers, the Group may have a difficult time attracting new customers or retaining existing customers, and its sales or
profitability could be affected adversely. The success of the Group’s business depends in part on its ability to anticipate, identify and respond promptly to
evolving trends in demographics and customer preferences, expectations and needs, so that it can continue to attract and retain customers. The Group’s
product offering is influenced in part by how it views these preferences, expectations and needs. If the Group is unable to respond quickly to developing
trends or if the spending patterns or demographics of these markets change, and the Group does not respond in a timely or appropriate manner to such
changes, the demand for its products and its market share could decline. Such a decline could have a material adverse effect on the Group’s business,
results of operations, financial condition or prospects. The Group benefits from current cultural practices surrounding the giving of cards in Anglo-Saxon
and Dutch societies. While the single card market has been broadly stable in both countries (for instance OC&C estimates that from 2016 to 2019 the UK
market declined by 0.3% compound annual growth rate (“ CAGR ”) in volume terms and grew by 0.5% CAGR in value), cultural norms could evolve or
change. A satisfied and loyal customer base is crucial to the Group’s continued growth, both for continued engagement and retention of existing customers
as well as through promoting the Group to attract new customers. Because the Group does not have the direct face-to-face contact with customers that
comes from offline retail, the way the Group directly interacts with customers through its platform is important to maintaining continuous customer
relationships. For example, if customer ratings of the Group’s products were to decline as reflected by lower NPS or app ratings, such negative customer
experiences could adversely affect the business. The Group relies on a variety of tools on its platform, including reminders of birthdays and other events to
prompt repeat engagement and purchases with prior customers, as well as customer relationship management strategies. Any actual or perceived failures
by the Group’s platform or customer relationship management strategies for customer engagement could negatively affect customer satisfaction,
engagement, loyalty or retention. Accordingly, any inability by the Group to retain customers or attract new customers could have a material adverse effect
on the Group’s business, results of operations, financial condition or prospects.'''

In [1]:
import re
import spacy

AttributeError: partially initialized module 'spacy' has no attribute 'load' (most likely due to a circular import)

In [91]:
nlp = spacy.load("en_core_web_sm")#, disable=['tagger','ner','parser'])
#nlp.max_length=2e6
#nlp.add_pipe('sentencizer')

In [92]:
doc = nlp(re.sub(r'\n',' ',text))

In [11]:
def rule3(text):
    doc = nlp(text)
    sent = []
    for token in doc:
        # look for prepositions
        if token.pos_=='ADP':
            phrase = ''
            # if its head word is a noun
            if token.head.pos_=='NOUN':
                # append noun and preposition to phrase
                phrase += token.head.text
                phrase += ' '+token.text
                # check the nodes to the right of the preposition
                for right_tok in token.rights:
                    # append if it is a noun or proper noun
                    if (right_tok.pos_ in ['NOUN','PROPN']):
                        phrase += ' '+right_tok.text
                if len(phrase)>3:
                    sent.append(phrase)
    return sent

In [12]:
rule3(doc)

['sale of cards',
 'range of companies',
 'base than Group',
 'changes in requirements',
 'preferences toward content',
 'relationships with suppliers',
 'share for Group',
 'positions in market',
 'increase in competition',
 'result of',
 'result of strain',
 'strain of coronavirus',
 'shift to purchasing',
 'alternatives to products',
 'use of cards',
 'use as substitutes',
 'substitutes for gifts',
 'posting of greetings',
 'competes In addition',
 'sells of attachments',
 'sells to product',
 'substitute to products',
 'media for products',
 'effect on business',
 'results of operations']

In [41]:
# Abstractive summarization (extremely short version - will require longer version in practice)
# https://github.com/kamal2230/text-summarization/blob/master/Summarisation_using_spaCy.ipynb

In [93]:
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from collections import Counter
from heapq import nlargest

In [94]:
# nlp = spacy.load('en')
# doc = nlp(doc)
len(list(doc.sents))

13

In [95]:
keyword = []
stopwords = list(STOP_WORDS)
pos_tag = ['PROPN', 'ADJ', 'NOUN', 'VERB']
for token in doc:
    if(token.text in stopwords or token.text in punctuation):
        continue
    if(token.pos_ in pos_tag):
        keyword.append(token.text)

freq_word = Counter(keyword)
print(freq_word.most_common(5))

[('Group', 24), ('market', 9), ('gifting', 8), ('competitors', 8), ('online', 8)]


In [96]:
type(freq_word)

collections.Counter

In [97]:
max_freq = Counter(keyword).most_common(1)[0][1]
for word in freq_word.keys():  
        freq_word[word] = (freq_word[word]/max_freq)
freq_word.most_common(5)

[('Group', 1.0),
 ('market', 0.375),
 ('gifting', 0.3333333333333333),
 ('competitors', 0.3333333333333333),
 ('online', 0.3333333333333333)]

In [98]:
sent_strength={}
for sent in doc.sents:
    for word in sent:
        if word.text in freq_word.keys():
            if sent in sent_strength.keys():
                sent_strength[sent]+=freq_word[word.text]
            else:
                sent_strength[sent]=freq_word[word.text]
print(sent_strength)

{The Group competes in the gifting market (the “ gifting market ”), which is large, evolving and highly competitive and includes the sale of greeting cards and gifts.: 3.249999999999999, The Group faces significant competition from a wide range of companies, ranging from traditional brick and mortar competitors that serve the offline channel to other online gifting companies.: 3.083333333333334, The Group’s offline competitors include specialist greeting cards, supermarkets and other retailers, generalists, stationers, discount chains and florists.: 2.2916666666666665, The Group also competes with online greeting card companies; online flower specialists; and online gift specialists.: 2.9166666666666674, Some of the Group’s competitors, particularly supermarkets, general merchandise discounters and stationery retailers, may have larger and broader customer bases, wider distribution channels, substantially greater financial, technical or marketing resources, stronger brand or name recog

In [99]:
summarized_sentences = nlargest(3, sent_strength, key=sent_strength.get)
print(summarized_sentences)

[If such alternative communications media or substitutions for the Group’s products appeal more to the Group’s existing customers or potential customers and the Group fails to innovate its product offering in a manner that continues to be attractive to its customers or enables the Group to maintain its existing margins, the Group may be unable to compete effectively., In addition, the Group competes and may increasingly compete in the future with alternative business models that serve the gifting markets, including greeting card subscription services, flower subscription services, or apps or websites that provide free products (only charging for any up-sells of attachments to the free base product) that compete or may serve as a substitute to the Group’s gifting products., Some of the Group’s competitors may have greater research and development resources and be able to adapt to changes in customer requirements, customer preferences or attitudes toward design content and gifting produc

In [100]:
final_sentences = [ w.text for w in summarized_sentences ]
summary = ' '.join(final_sentences)
print(summary)

If such alternative communications media or substitutions for the Group’s products appeal more to the Group’s existing customers or potential customers and the Group fails to innovate its product offering in a manner that continues to be attractive to its customers or enables the Group to maintain its existing margins, the Group may be unable to compete effectively. In addition, the Group competes and may increasingly compete in the future with alternative business models that serve the gifting markets, including greeting card subscription services, flower subscription services, or apps or websites that provide free products (only charging for any up-sells of attachments to the free base product) that compete or may serve as a substitute to the Group’s gifting products. Some of the Group’s competitors may have greater research and development resources and be able to adapt to changes in customer requirements, customer preferences or attitudes toward design content and gifting products 

In [101]:
# Extractive generalization (highlighting information that is company-specific)

In [104]:
excl = ""
for i, s in enumerate(doc.sents):
    for token in s:
        if token.ent_type_ in ['GPE', 'NORP'] or token.pos_ == 'NUM': # or ORG >1 ?
            #print(token.text, token.ent_type_)
            excl += s.text
            #print(f'Excluded sentence {i}: {s}')
            break
excl

'Within each of the UK and the Netherlands, the Group has benefited from its strong positions in the gifting market, but competition could intensify as traditional retailers expand their digital, online and app-based sales capabilities and potential new competitors enter the gifting market.'