<a href="https://colab.research.google.com/github/yahyasungur/nlp_dl_ml_projects/blob/master/Text_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Text Summarization

In [37]:
text = """ Maria Sharapova has basically no friends as tennis players on the WTA Tour. The Russian player has no problems in openly speaking about it and in a recent interview she said: ‘I don’t really hide any feelings too much.
I think everyone knows this is my job here. When I’m on the courts or when I’m on the court playing, I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.
So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
I’m a pretty competitive girl. I say my hellos, but I’m not sending any players flowers as well. Uhm, I’m not really friendly or close to many players.
I have not a lot of friends away from the courts.’ When she said she is not really close to a lot of players, is that something strategic that she is doing? Is it different on the men’s tour than the women’s tour? ‘No, not at all.
I think just because you’re in the same sport doesn’t mean that you have to be friends with everyone just because you’re categorized, you’re a tennis player, so you’re going to get along with tennis players.
I think every person has different interests. I have friends that have completely different jobs and interests, and I’ve met them in very different parts of my life.
I think everyone just thinks because we’re tennis players we should be the greatest of friends. But ultimately tennis is just a very small part of what we do.
There are so many other things that we’re interested in, that we do. """

In [38]:
len(text)

1563

#1) Importing the libraries and Dataset

In [39]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [40]:
nlp = spacy.load('en_core_web_sm')

In [41]:
doc = nlp(text)

In [42]:
tokens = [token.text for token in doc]

In [43]:
print(tokens)

[' ', 'Maria', 'Sharapova', 'has', 'basically', 'no', 'friends', 'as', 'tennis', 'players', 'on', 'the', 'WTA', 'Tour', '.', 'The', 'Russian', 'player', 'has', 'no', 'problems', 'in', 'openly', 'speaking', 'about', 'it', 'and', 'in', 'a', 'recent', 'interview', 'she', 'said', ':', '‘', 'I', 'do', 'n’t', 'really', 'hide', 'any', 'feelings', 'too', 'much', '.', '\n', 'I', 'think', 'everyone', 'knows', 'this', 'is', 'my', 'job', 'here', '.', 'When', 'I', '’m', 'on', 'the', 'courts', 'or', 'when', 'I', '’m', 'on', 'the', 'court', 'playing', ',', 'I', '’m', 'a', 'competitor', 'and', 'I', 'want', 'to', 'beat', 'every', 'single', 'person', 'whether', 'they', '’re', 'in', 'the', 'locker', 'room', 'or', 'across', 'the', 'net', '.', '\n', 'So', 'I', '’m', 'not', 'the', 'one', 'to', 'strike', 'up', 'a', 'conversation', 'about', 'the', 'weather', 'and', 'know', 'that', 'in', 'the', 'next', 'few', 'minutes', 'I', 'have', 'to', 'go', 'and', 'try', 'to', 'win', 'a', 'tennis', 'match', '.', '\n', 'I',

In [44]:
punctuation = punctuation + "\n"

In [45]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

#2) Text Cleaning

In [46]:
word_frequency = {}

stop_words = list(STOP_WORDS)

for word in doc:
  if word.text.lower() not in stop_words:
    if word.text.lower() not in punctuation:
      if word.text not in word_frequency.keys():
        word_frequency[word.text] = 1
      else:
        word_frequency[word.text] += 1

In [47]:
max_frequency = max(word_frequency.values())

In [48]:
for word in word_frequency.keys():
  word_frequency[word] = word_frequency[word] / max_frequency

#3) Sentence Tokenization

In [49]:
sent_tokens = [sent for sent in doc.sents]

In [50]:
sent_tokens

[ Maria Sharapova has basically no friends as tennis players on the WTA Tour.,
 The Russian player has no problems in openly speaking about it and,
 in a recent interview she said: ‘I don’t really hide any feelings too much.,
 I think everyone knows this is my job here.,
 When I’m on the courts or when I’m on the court playing,
 , I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.,
 So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.,
 I’m a pretty competitive girl.,
 I say my hellos, but I’m not sending any players flowers as well.,
 Uhm,
 , I’m not really friendly or close to many players.,
 I have not a lot of friends away from the courts.’,
 When she said she is not really close to a lot of players, is that something strategic that she is doing?,
 Is it different on the men’s tour than the women,
 ’s tour? ‘,
 No, not at all.,
 I 

In [51]:
sent_score = {}

for sent in sent_tokens:
  for word in sent:
    if word.text.lower() in word_frequency.keys():
      if sent not in sent_score.keys():
        sent_score[sent] = word_frequency[word.text.lower()]
      else:
        sent_score[sent] += word_frequency[word.text.lower()]

In [52]:
sent_score

{ Maria Sharapova has basically no friends as tennis players on the WTA Tour.: 3.5000000000000004,
 The Russian player has no problems in openly speaking about it and: 0.8333333333333333,
 in a recent interview she said: ‘I don’t really hide any feelings too much.: 1.3333333333333335,
 I think everyone knows this is my job here.: 0.9999999999999999,
 When I’m on the courts or when I’m on the court playing: 0.6666666666666666,
 , I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.: 1.5000000000000002,
 So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.: 2.333333333333333,
 I’m a pretty competitive girl.: 0.5,
 I say my hellos, but I’m not sending any players flowers as well.: 1.5,
 , I’m not really friendly or close to many players.: 1.5,
 I have not a lot of friends away from the courts.’: 1.8333333333333335,
 When she said she is not

#4) Select Sentences with Max Scroe (%30)

In [53]:
from heapq import nlargest

In [54]:
len(sent_score) * 0.3

7.199999999999999

In [55]:
summary = nlargest(n= 8, iterable= sent_score, key= sent_score.get)

In [56]:
final_summary = [word.text for word in summary]

In [57]:
final_summary = " ".join(final_summary)

In [58]:
final_summary

' Maria Sharapova has basically no friends as tennis players on the WTA Tour. , you’re a tennis player, so you’re going to get along with tennis players.\n I think everyone just thinks because we’re tennis players So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.\n When she said she is not really close to a lot of players, is that something strategic that she is doing? I have friends that have completely different jobs and interests, and I think every person has different interests. I have not a lot of friends away from the courts.’'

In [60]:
len(final_summary) / len(text)

0.40435060780550225