# Introduction

In this exercise, we will 
- Extract User Reviews from IMDB for movies from the same genre
    - Use a crawler to collect the review links (at least 100 reviews)
    - Each link is a webpage that has only 1 user review
    - Choose reviews from the same genre
    - Collect reviews of several movies in the chosen genre and includes a mix of positive and negative reviews
- Extract the Noun Phrase chunks from the reviews:
    - Grab the main review text from each link
    - Run the review through a tokenizer and try to NP chunk it with a shallow parser
    - If there are unknown word, expand the working lexicon and run the NP chunker again
- Output all the chunks in a single list for each review. Also, submit a brief summary of what we did.

#### Preparation Steps
- Import the necessary packages
- Use selenium to navigate through the webpages programatically 
- Extract reviews using beautifulsoup.

In [1]:
#!pip install selenium
#!pip install vaderSentiment
#!pip install textblob

In [2]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textblob import TextBlob
import time

#### Setup driver and open the webpage

- Setup the webdriver to Firefox. 
    - For this step to work, we have to add the directory path of the geckodriver.exe to the Windows PATH variable
- Open the webpage for the selected genre (Romance)

In [16]:
driver = webdriver.Firefox()

genre="comedy"
genre=genre.lower()
url="https://www.imdb.com/search/title/?title_type=feature&genres="+genre+"&explore=genres"

print("URL for Genre:: ", url)
#Go to Google
driver.get("https://www.imdb.com/search/title/?title_type=feature&genres="+genre+"&explore=genres")

URL for Genre::  https://www.imdb.com/search/title/?title_type=feature&genres=comedy&explore=genres


#### Loop through the top 5 movies and get review text

##### I observed that: 
    - the full XPATH for the links on the page were incremented at a particular position
    - The full XPATH for the user reviews were always the same within the movie page
    
##### To navigate the pages:
    - Used the properties of selenium driver (find_element_by_xpath) to naviagte through the pages
    - Once I reached the User review page, I set the current url to a variable
    - Used Beautiful Soup to open the User Review page
    - Extracted the Review User, Title of the Review, and the review text
    - The values were stored in a dataframe
    - The results were also written to and Excel file as a precaution just in case IMDB website is not available

In [17]:
#driver = webdriver.Chrome(executable_path=r"C:\Vivek\Software\chromedriver.exe")
table_review = pd.DataFrame(columns=['Film Title', 'Review User', 'Title', 'Review'])

for i in range(1,6):
    #driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
    
    # Open the main page for the Genre
    driver.get("https://www.imdb.com/search/title/?title_type=feature&genres="+genre+"&explore=genres")
    
    # Open the 1st 5 pages by iterating to set the XPATH parameter, click on the link
    xpth = "/html/body/div[3]/div/div[2]/div[3]/div[1]/div/div[3]/div/div["+str(i)+"]/div[3]/h3/a"
    driver.find_element_by_xpath(xpth).click()
    #print(xpth)
    
    # Find the User Reviews link and click by using the standard XPATH
    driver.find_element_by_xpath("/html/body/div[3]/div/div[2]/div/div[1]/div/div/div[1]/div[1]/div[1]/a[3]").click()
    
    #Scrap IMBD review
    ans = driver.current_url
    #print('URL:: ',ans)
    page = requests.get(ans)
    soup = BeautifulSoup(page.content, "html.parser")
    all = soup.find(id="main")
    
    #Get the title of the movie
    parent = all.find(class_ ="parent")
    name = parent.find(itemprop = "name")
    url = name.find(itemprop = 'url')
    #print('URL2:: ',url)
    film_title = url.get_text()
    
    #Get the User Name related to the review
    review_user_rev = all.select(".display-name-link")
    review_user = [t.get_text().replace("\n", "") for t in review_user_rev]
    
    #Get the title of the review
    title_rev = all.select(".title")
    title = [t.get_text().replace("\n", "") for t in title_rev]
    
    #Get the review
    review_rev = all.select(".content .text")
    review = [r.get_text() for r in review_rev]
    
    #Get the review
    rating_rev = all.select(".imdb-user-review .ipl-ratings-bar .rating-other-user-rating")
    rating = [r.get_text() for r in rating_rev]
    
    # Convert into a temp dataframe
    table_review_iter = pd.DataFrame({
        "Film Title": film_title,
        "Review User": review_user,
        "Title" : title,
        "Review" : review,
        "Rating" : rating
    })
    
    # Append to the main Dataframe
    table_review = table_review.append(table_review_iter)

In [18]:
table_review.shape

(125, 5)

In [20]:
table_review

Unnamed: 0,Film Title,Review User,Title,Review,Rating
0,Palm Springs,fadlanamin,Weird but Good Weird,I was expecting a conventional rom-com where t...,\n\n\n\n\n\n8/10\n
1,Palm Springs,kjproulx,It's Very Hard to Dislike a Movie like Palm S...,Films that revolve around characters repeating...,\n\n\n\n\n\n9/10\n
2,Palm Springs,cartsghammond,Pure fun,Palm Springs is just such a good time of a mov...,\n\n\n\n\n\n9/10\n
3,Palm Springs,cardsrock,Simply terrific,I'm impressed that people are still able to fi...,\n\n\n\n\n\n8/10\n
4,Palm Springs,Loptimus06,A New Take On Groundhog Day,"Palm Springs is ""One of those infinite time-lo...",\n\n\n\n\n\n8/10\n
...,...,...,...,...,...
20,Desperados,omood-00755,Great comedy film,It was a good comedy film. Was it a masterliec...,\n\n\n\n\n\n9/10\n
21,Desperados,Worldpeece1,Predictable and boring,The main character was just annoying. Found my...,\n\n\n\n\n\n2/10\n
22,Desperados,boriquachula16,Couldn't skip through fast enough.,The ONLY good thing about this film is Lamorne...,\n\n\n\n\n\n1/10\n
23,Desperados,Elizabethchivers,Very fun chick flick,I was surprised to see so many bad reviews. Th...,\n\n\n\n\n\n8/10\n


In [21]:
# Write results to an Excel file
table_review.to_excel('Film_User_Reviews.xlsx')

#### Noun phrase chunking

Chunking is a process of extracting phrases from unstructured text. We have to first perform POS tagging before chunking the sentences. It is important when we want to extract information from the text. It is also known as Named Entity Extraction.

To create NP chunk, we first define the chunk grammar (pattern) using POS tags. We will start with a simple grammar pattern before expanding on our pattern.

At first, we will define the rule such that when the chunk finds determiner (DT) followed by any number of adjectives (JJ) and then a noun (NN) then the Noun Phrase(NP) chunk should be formed.

In [7]:
import nltk

In [8]:
def ie_preprocess(document):
    sentences = nltk.sent_tokenize(document)
    sentences = [nltk.word_tokenize(sent) for sent in sentences]
    sentences = [nltk.pos_tag(sent) for sent in sentences]
    return(sentences)

In [9]:
for ind_review in table_review.Review:
    sentences_pos = ie_preprocess(ind_review)
    
    grammar = ('''
    NP: {<DT>?<JJ>*<NN>} # NP
    ''')
    cp = nltk.RegexpParser(grammar)
    
    for sent in sentences_pos:
        tree = cp.parse(sent)
        print(tree)

(S
  I/PRP
  honestly/RB
  ca/MD
  n't/RB
  believe/VB
  this/DT
  is/VBZ
  (NP an/DT actual/JJ movie/NN)
  and/CC
  (NP number/NN)
  1/CD
  on/IN
  Netflix/NNP
  in/IN
  my/PRP$
  (NP country/NN)
  ./.)
(S
  This/DT
  was/VBD
  probably/RB
  the/DT
  worst/JJS
  (NP movie/NN)
  I/PRP
  've/VBP
  ever/RB
  seen/VBN
  ./.)
(S
  The/DT
  dialogues/NNS
  between/IN
  the/DT
  main/JJ
  characters/NNS
  are/VBP
  unbelievably/RB
  cringy/JJ
  and/CC
  (NP the/DT plot/NN)
  literally/RB
  does/VBZ
  n't/RB
  make/VB
  (NP any/DT sense/NN)
  ./.)
(S
  Do/VBP
  n't/RB
  watch/VB
  (NP this/DT movie/NN)
  unless/IN
  you/PRP
  wan/VBP
  (NP na/JJ waste/NN)
  2/CD
  hours/NNS
  of/IN
  your/PRP$
  (NP life/NN)
  just/RB
  to/TO
  give/VB
  (NP this/DT movie/NN)
  a/DT
  (/(
  probably/RB
  )/)
  1/CD
  (NP star/NN)
  (NP rating/NN)
  too/RB
  ./.)
(S
  Girl/NNP
  's/POS
  (NP boyfriend/NN)
  wears/VBZ
  cargo/VBP
  short/JJ
  pants/NNS
  on/IN
  her/PRP$
  (NP birthday/NN)
  ./.)
(S
  If/IN
  t

(S
  365/CD
  days/NNS
  is/VBZ
  about/IN
  a/DT
  successful/JJ
  but/CC
  (NP unhappy/JJ young/JJ woman/NN)
  whose/WP$
  (NP path/NN)
  crosses/VBZ
  with/IN
  that/DT
  of/IN
  a/DT
  young/JJ
  ,/,
  (NP handsome/JJ mafia/NN)
  (NP boss/NN)
  ./.)
(S
  He/PRP
  kidnaps/VBZ
  her/PRP$
  and/CC
  gives/VBZ
  her/PRP$
  (NP a/DT deadline/NN)
  of/IN
  one/CD
  (NP year/NN)
  to/TO
  fall/VB
  in/IN
  (NP love/NN)
  with/IN
  him/PRP
  ./.)
(S
  Despite/IN
  (NP the/DT rough/JJ introduction/NN)
  ,/,
  she/PRP
  eventually/RB
  gives/VBZ
  in/IN
  to/TO
  his/PRP$
  charms/NNS
  and/CC
  falls/NNS
  in/IN
  (NP love/NN)
  with/IN
  him/PRP
  ./.)
(S
  (NP The/DT central/JJ core/NN)
  of/IN
  (NP the/DT movie/NN)
  is/VBZ
  (NP domination/NN)
  ,/,
  or/CC
  how/WRB
  (NP the/DT male/JJ protagonist/NN)
  uses/VBZ
  his/PRP$
  (NP raw/JJ masculinity/NN)
  to/TO
  completely/RB
  overwhelm/VB
  (NP the/DT girl/NN)
  until/IN
  she/PRP
  eventually/RB
  gives/VBZ
  in/IN
  to/TO
  her/PR

(S I/PRP have/VBP (NP nothing/NN) against/IN (NP sex/NN) ./.)
(S
  I/PRP
  love/VBP
  it/PRP
  ,/,
  I/PRP
  love/VBP
  watching/VBG
  it/PRP
  but/CC
  this/DT
  was/VBD
  painful/JJ
  ./.)
(S
  (NP The/DT sex/NN)
  was/VBD
  hot/JJ
  but/CC
  (NP the/DT story/NN)
  and/CC
  (NP the/DT acting/NN)
  was/VBD
  terrible/JJ
  ./.)
(S
  I/PRP
  think/VBP
  it/PRP
  was/VBD
  not/RB
  worth/JJ
  it/PRP
  and/CC
  very/RB
  very/RB
  long/RB
  ./.)
(S
  I/PRP
  can/MD
  see/VB
  what/WP
  (NP the/DT director/NN)
  was/VBD
  trying/VBG
  to/TO
  do/VB
  but/CC
  he/PRP
  missed/VBD
  (NP the/DT mark/NN)
  ./.)
(S
  For/IN
  (NP the/DT same/JJ kind/NN)
  of/IN
  (NP story/NN)
  with/IN
  (NP the/DT sex/NN)
  dial/JJ
  down/RB
  and/CC
  (NP the/DT story/NN)
  (NP dial/NN)
  up/IN
  I/PRP
  would/MD
  suggest/VB
  The/DT
  Dreamers/NNP
  ./.)
(S
  (NP This/DT story/NN)
  could/MD
  be/VB
  told/VBN
  in/IN
  about/RB
  30/CD
  min/NNS
  ,/,
  add/VBP
  maybe/RB
  30/CD
  (NP min/NN)
  to/TO
  g

(S
  There/EX
  's/VBZ
  (NP a/DT point/NN)
  when/WRB
  (NP the/DT main/JJ character/NN)
  and/CC
  (NP voice-over/JJ narrator/NN)
  on/IN
  Love/NNP
  says/VBZ
  that/IN
  he/PRP
  never/RB
  sees/VBZ
  (NP a/DT movie/NN)
  presenting/VBG
  ``/``
  (NP emotional/JJ sexuality/NN)
  ''/''
  ./.)
(S
  (NP Sex/NN)
  ,/,
  semen/NNS
  and/CC
  (NP blood/NN)
  ,/,
  that/DT
  's/VBZ
  what/WP
  people/NNS
  would/MD
  like/VB
  to/TO
  see/VB
  ./.)
(S
  Murphy/NNP
  is/VBZ
  (NP a/DT filmmaker/NN)
  ,/,
  or/CC
  so/RB
  he/PRP
  says/VBZ
  ./.)
(S
  Actually/RB
  ,/,
  I/PRP
  think/VBP
  he/PRP
  is/VBZ
  just/RB
  (NP a/DT puppet/NN)
  to/TO
  Gaspar/NNP
  Noé/NNP
  intents/NNS
  in/IN
  this/DT
  too/RB
  (NP stereotyped/JJ film/NN)
  that/WDT
  had/VBD
  (NP the/DT big/JJ ambition/NN)
  to/TO
  be/VB
  innovative.Love/VBN
  has/VBZ
  been/VBN
  creating/VBG
  (NP some/DT noisy/NN)
  around/IN
  (NP the/DT world/NN)
  ,/,
  and/CC
  has/VBZ
  been/VBN
  like/IN
  that/DT
  since/IN
  

(S
  Mild/NNP
  (NP plot/NN)
  (NP spoiler/NN)
  (/(
  and/CC
  like/IN
  any/DT
  Noe/NNP
  (NP movie/NN)
  I/PRP
  think/VBP
  you/PRP
  ought/MD
  to/TO
  just/RB
  ignore/VB
  any/DT
  synopses/NNS
  or/CC
  reviews/NNS
  and/CC
  simply/RB
  see/VB
  it/PRP
  for/IN
  yourself/PRP
  )/)
  :/:
  this/DT
  is/VBZ
  (NP the/DT story/NN)
  of/IN
  two/CD
  lovers/NNS
  ,/,
  how/WRB
  they/PRP
  meet/VBP
  ,/,
  how/WRB
  their/PRP$
  (NP romance/NN)
  develops/VBZ
  ,/,
  how/WRB
  they/PRP
  learn/VBP
  about/IN
  each/DT
  others/NNS
  desires/NNS
  ,/,
  their/PRP$
  past/JJ
  relationships/NNS
  ./.)
(S
  (NP Sex/NN)
  is/VBZ
  (NP a/DT drug/NN)
  ,/,
  and/CC
  these/DT
  characters/NNS
  are/VBP
  junkies/NNS
  ./.)
(S
  (NP The/DT first/JJ hit/NN)
  draws/VBZ
  them/PRP
  in/IN
  ,/,
  and/CC
  as/IN
  they/PRP
  chase/VBP
  (NP the/DT dragon/NN)
  with/IN
  each/DT
  other/JJ
  ,/,
  trying/VBG
  to/TO
  make/VB
  (NP each/DT hit/NN)
  as/RB
  momentous/JJ
  as/IN
  the/DT
  

(S
  Palm/NNP
  Springs/NNP
  is/VBZ
  a/DT
  flat/JJ
  out/RP
  (NP hilarious/JJ comedy/NN)
  !/.)
(S
  (NP The/DT story/NN)
  itself/PRP
  has/VBZ
  been/VBN
  done/VBN
  before/RBR
  and/CC
  some/DT
  have/VBP
  been/VBN
  great/JJ
  (/(
  Groundhog/NNP
  Day/NNP
  ,/,
  Edge/NNP
  of/IN
  Tomorrow/NNP
  ,/,
  etc/FW
  ./.
  )/))
(S
  but/CC
  Adam/NNP
  Samberg/NNP
  makes/VBZ
  this/DT
  one/CD
  of/IN
  the/DT
  better/JJR
  ones/NNS
  ./.)
(S
  If/IN
  you/PRP
  do/VBP
  n't/RB
  find/VB
  (NP this/DT movie/NN)
  funny/VBZ
  then/RB
  (NP something/NN)
  is/VBZ
  wrong/JJ
  with/IN
  you/PRP
  !/.)
(S
  Give/VB
  it/PRP
  (NP a/DT chance/NN)
  and/CC
  you/PRP
  will/MD
  not/RB
  be/VB
  disappointed/VBN
  !/.)
(S
  Not/RB
  (NP every/DT movie/NN)
  is/VBZ
  supossed/VBN
  to/TO
  be/VB
  Amarcord/NNP
  ,/,
  (NP casablanca/NN)
  or/CC
  even/RB
  (NP young/JJ frankenstein/NN)
  ./.)
(S
  It/PRP
  's/VBZ
  not/RB
  (NP a/DT sin/NN)
  to/TO
  enjoy/VB
  (NP a/DT simple/JJ movie

(S
  If/IN
  you/PRP
  have/VBP
  (NP brain/NN)
  cells/NNS
  ,/,
  avoid/VB
  (NP this/DT movie/NN)
  at/IN
  all/DT
  costs/NNS)
(S Wrong/JJ Movie/NNP ./.)
(S (NP Waste/NN) of/IN (NP time/NN) ./.)
(S Funny/NNP ?/.)
(S Not/RB really/RB ./.)
(S
  We/PRP
  could/MD
  have/VB
  watch/WRB
  a/DT
  better/JJR
  (NP movie/NN)
  than/IN
  this/DT
  ./.)
(S Trying/VBG too/RB hard/JJ to/TO make/VB us/PRP laugh/IN ./.)
(S
  So/RB
  annoying/JJ
  and/CC
  just/RB
  ca/MD
  n't/RB
  let/VB
  it/PRP
  pass/VB
  not/RB
  to/TO
  comment/VB
  about/IN
  it/PRP
  so/RB
  I/PRP
  might/MD
  save/VB
  (NP human/JJ race/NN)
  not/RB
  to/TO
  watch/VB
  such/PDT
  (NP a/DT garbage/NN)
  of/IN
  (NP a/DT movie/NN)
  ./.)
(S So/RB bad/JJ !/.)
(S So/RB bad/JJ !/.)
(S
  Have/VBP
  never/RB
  felt/VBN
  (NP the/DT need/NN)
  to/TO
  review/VB
  (NP a/DT movie/NN)
  but/CC
  really/RB
  want/VB
  to/TO
  save/VB
  people/NNS
  's/POS
  (NP time/NN)
  ./.)
(S
  Some/DT
  comments/NNS
  lament/VBD
  (NP the/DT 

(S Finally/RB saw/VBD LITTLE/NNP WOMEN/NNP 2019/CD ./.)
(S
  Did/NNP
  n't/RB
  expected/VBD
  (NP much/JJ cause/NN)
  I/PRP
  really/RB
  liked/VBD
  94/CD
  (NP version/NN)
  by/IN
  Gillian/NNP
  Armstrong/NNP
  with/IN
  Winona/NNP
  Ryder/NNP
  ,/,
  Gabriel/NNP
  Byrne/NNP
  ,/,
  Trini/NNP
  Alvarado/NNP
  ,/,
  Christian/NNP
  Bale/NNP
  and/CC
  Susan/NNP
  Sarandon/NNP
  ./.)
(S
  Well/RB
  ,/,
  (NP the/DT casting/NN)
  was/VBD
  (NP something/NN)
  that/IN
  I/PRP
  can/MD
  not/RB
  judge/VB
  which/WDT
  (NP one/NN)
  is/VBZ
  better/RBR
  ./.)
(S Both/DT are/VBP so/RB so/RB great/JJ ./.)
(S
  But/CC
  Christian/NNP
  Bale/NNP
  94/CD
  ,/,
  I/PRP
  liked/VBD
  better/JJR
  than/IN
  Timothée/NNP
  Chalamet/NNP
  2019/CD
  (/(
  Although/IN
  he/PRP
  was/VBD
  good/JJ
  but/CC
  did/VBD
  n't/RB
  overcome/VB
  THE/NNP
  Christian/NNP
  )/)
  .Two/VBP
  things/NNS
  really/RB
  grabbed/VBD
  (NP me.1/NN)
  ./.)
(S (NP The/DT camera/NN) and/CC (NP light/NN) ./.)
(S
  (NP

(S
  The/DT
  7th/CD
  (NP film/NN)
  (NP adaptation/NN)
  of/IN
  Louisa/NNP
  May/NNP
  Alcott/NNP
  's/POS
  (NP classic/JJ novel/NN)
  of/IN
  (NP the/DT same/JJ name/NN)
  is/VBZ
  (NP the/DT only/JJ version/NN)
  that/IN
  I/PRP
  've/VBP
  seen/VBN
  so/RB
  far/RB
  and/CC
  frankly/RB
  ,/,
  I/PRP
  do/VBP
  n't/RB
  need/VB
  to/TO
  check/VB
  out/RP
  the/DT
  ones/NNS
  that/WDT
  surfaced/VBD
  before/IN
  anymore/RB
  coz/JJ
  Greta/NNP
  Gerwig/NNP
  's/POS
  Little/JJ
  Women/NNP
  is/VBZ
  (NP an/DT instant/JJ classic/NN)
  that/WDT
  impresses/VBZ
  on/IN
  all/DT
  fronts/NNS
  ,/,
  and/CC
  also/RB
  establishes/VBZ
  her/PRP
  as/IN
  (NP a/DT creative/JJ force/NN)
  to/TO
  be/VB
  reckoned/VBN
  with.Written/JJ
  &/CC
  directed/VBN
  by/IN
  Gerwig/NNP
  in/IN
  what/WP
  's/VBZ
  her/PRP$
  (NP solo/JJ sophomore/NN)
  (NP effort/NN)
  ,/,
  (NP the/DT film/NN)
  is/VBZ
  crafted/VBN
  with/IN
  so/RB
  (NP much/JJ love/NN)
  ,/,
  (NP passion/NN)
  ,/,
  (NP

In the above example, we notice that the proper nouns are ignored from the Noun Phrases. Phrases like Netflix, Jesus Christ  were not tagged in Noun Phrases. So, we expand our grammar to include NN.\* and JJ.\* to the NP Chunk.

In order to improve the output, we display only the noun phrases and store teh noun phrases to an array. 

As we scroll down through the result, we will see that Proper Noun and Plural Nouns are tagged as Noun Phrases.

In [10]:
sentences_arr = []
sentences_np_arr = []
for ind_review in table_review.Review:
    sentences_pos = ie_preprocess(ind_review)
    
    grammar = '''
    NP: {<PRP.*>*<NN.*>}          # Include NNP, NNS, NNPS, etc.
    '''
    cp = nltk.RegexpParser(grammar)
    
    for sent in sentences_pos:
        tree = cp.parse(sent)
        sentences_arr.append(tree)
        print(tree) # Print the entire sentence 
        #tree.draw()
        for subtree in tree.subtrees():
            if subtree.label() == 'NP': 
                print(subtree) # Print the noun phrases within the sentence
                sentences_np_arr.append(subtree)

(S
  I/PRP
  honestly/RB
  ca/MD
  n't/RB
  believe/VB
  this/DT
  is/VBZ
  (NP an/DT actual/JJ movie/NN)
  and/CC
  (NP number/NN)
  1/CD
  on/IN
  (NP Netflix/NNP)
  in/IN
  my/PRP$
  (NP country/NN)
  ./.)
(NP an/DT actual/JJ movie/NN)
(NP number/NN)
(NP Netflix/NNP)
(NP country/NN)
(S
  This/DT
  was/VBD
  probably/RB
  (NP the/DT worst/JJS movie/NN)
  I/PRP
  've/VBP
  ever/RB
  seen/VBN
  ./.)
(NP the/DT worst/JJS movie/NN)
(S
  (NP The/DT dialogues/NNS)
  between/IN
  (NP the/DT main/JJ characters/NNS)
  are/VBP
  unbelievably/RB
  cringy/JJ
  and/CC
  (NP the/DT plot/NN)
  literally/RB
  does/VBZ
  n't/RB
  make/VB
  (NP any/DT sense/NN)
  ./.)
(NP The/DT dialogues/NNS)
(NP the/DT main/JJ characters/NNS)
(NP the/DT plot/NN)
(NP any/DT sense/NN)
(S
  Do/VBP
  n't/RB
  watch/VB
  (NP this/DT movie/NN)
  unless/IN
  you/PRP
  wan/VBP
  (NP na/JJ waste/NN)
  2/CD
  (NP hours/NNS)
  of/IN
  your/PRP$
  (NP life/NN)
  just/RB
  to/TO
  give/VB
  (NP this/DT movie/NN)
  a/DT
  (/(
  p

(NP good/JJ things/NNS)
(NP a/DT star/NN)
(NP the/DT photography/NN)
(NP the/DT cinematographer/NN)
(NP doingthe/JJ actors/NNS)
(NP the/DT eye/NN)
(NP the/DT costumes/NNS)
(NP diverse/JJ locations/NNS)
(NP hence/NN)
(NP something/NN)
(NP the/DT screenthe/NN)
(NP sex/NN)
(NP scenes/NNS)
(NP that/DT kind/NN)
(NP thing/NN)
(S
  Well/RB
  I/PRP
  do/VBP
  n't/RB
  know/VB
  ,/,
  even/RB
  with/IN
  (NP the/DT low/JJ score/NN)
  that/IN
  I/PRP
  gave/VBD
  it/PRP
  ,/,
  I/PRP
  feel/VBP
  like/IN
  (NP people/NNS)
  should/MD
  make/VB
  up/RP
  their/PRP$
  (NP own/JJ mind/NN)
  and/CC
  not/RB
  take/VB
  (NP someone/NN)
  's/POS
  (NP else/JJ opinion/NN)
  on/IN
  it/PRP
  ./.)
(NP the/DT low/JJ score/NN)
(NP people/NNS)
(NP own/JJ mind/NN)
(NP someone/NN)
(NP else/JJ opinion/NN)
(S
  Besides/IN
  ,/,
  I/PRP
  think/VBP
  (NP people/NNS)
  should/MD
  watch/VB
  (NP bad/JJ movies/NNS)
  from/IN
  (NP time/NN)
  to/TO
  (NP time/NN)
  ,/,
  to/TO
  be/VB
  able/JJ
  to/TO
  understand

(S (NP The/DT plot/NN) was/VBD (NP trash/NN) ./.)
(NP The/DT plot/NN)
(NP trash/NN)
(S
  Even/RB
  (NP fifty/JJ shades/NNS)
  (NP series/NN)
  had/VBD
  (NP better/JJR plots/NNS)
  than/IN
  this/DT
  ./.)
(NP fifty/JJ shades/NNS)
(NP series/NN)
(NP better/JJR plots/NNS)
(S
  (NP Things/NNS)
  happen/VBP
  out/IN
  of/IN
  nowhere/RB
  ,/,
  I/PRP
  was/VBD
  confused/VBN
  (NP the/DT whole/JJ time/NN)
  watching/VBG
  (NP this/DT movie/NN)
  ./.)
(NP Things/NNS)
(NP the/DT whole/JJ time/NN)
(NP this/DT movie/NN)
(S
  Could/MD
  n't/RB
  even/RB
  get/VB
  myself/PRP
  to/TO
  finish/VB
  (NP watching/NN)
  till/IN
  (NP the/DT end/NN))
(NP watching/NN)
(NP the/DT end/NN)
(S
  (NP Wow/NNP)
  ,/,
  how/WRB
  (NP corny/NN)
  can/MD
  (NP a/DT movie/NN)
  (NP get/NN)
  and/CC
  what/WDT
  were/VBD
  they/PRP
  thinking/VBG
  ,/,
  it/PRP
  's/VBZ
  so/RB
  corny/JJ
  it/PRP
  can/MD
  be/VB
  rated/VBN
  as/IN
  (NP a/DT comedy/NN)
  (NP movie/NN)
  ./.)
(NP Wow/NNP)
(NP corny/NN)
(NP a/D

(S I/PRP have/VBP (NP nothing/NN) against/IN (NP sex/NN) ./.)
(NP nothing/NN)
(NP sex/NN)
(S
  I/PRP
  love/VBP
  it/PRP
  ,/,
  I/PRP
  love/VBP
  watching/VBG
  it/PRP
  but/CC
  this/DT
  was/VBD
  painful/JJ
  ./.)
(S
  (NP The/DT sex/NN)
  was/VBD
  hot/JJ
  but/CC
  (NP the/DT story/NN)
  and/CC
  (NP the/DT acting/NN)
  was/VBD
  terrible/JJ
  ./.)
(NP The/DT sex/NN)
(NP the/DT story/NN)
(NP the/DT acting/NN)
(S
  I/PRP
  think/VBP
  it/PRP
  was/VBD
  not/RB
  worth/JJ
  it/PRP
  and/CC
  very/RB
  very/RB
  long/RB
  ./.)
(S
  I/PRP
  can/MD
  see/VB
  what/WP
  (NP the/DT director/NN)
  was/VBD
  trying/VBG
  to/TO
  do/VB
  but/CC
  he/PRP
  missed/VBD
  (NP the/DT mark/NN)
  ./.)
(NP the/DT director/NN)
(NP the/DT mark/NN)
(S
  For/IN
  (NP the/DT same/JJ kind/NN)
  of/IN
  (NP story/NN)
  with/IN
  (NP the/DT sex/NN)
  dial/JJ
  down/RB
  and/CC
  (NP the/DT story/NN)
  (NP dial/NN)
  up/IN
  I/PRP
  would/MD
  suggest/VB
  (NP The/DT Dreamers/NNP)
  ./.)
(NP the/DT same/J

(S
  There/EX
  's/VBZ
  (NP a/DT point/NN)
  when/WRB
  (NP the/DT main/JJ character/NN)
  and/CC
  (NP voice-over/JJ narrator/NN)
  on/IN
  (NP Love/NNP)
  says/VBZ
  that/IN
  he/PRP
  never/RB
  sees/VBZ
  (NP a/DT movie/NN)
  presenting/VBG
  ``/``
  (NP emotional/JJ sexuality/NN)
  ''/''
  ./.)
(NP a/DT point/NN)
(NP the/DT main/JJ character/NN)
(NP voice-over/JJ narrator/NN)
(NP Love/NNP)
(NP a/DT movie/NN)
(NP emotional/JJ sexuality/NN)
(S
  (NP Sex/NN)
  ,/,
  (NP semen/NNS)
  and/CC
  (NP blood/NN)
  ,/,
  that/DT
  's/VBZ
  what/WP
  (NP people/NNS)
  would/MD
  like/VB
  to/TO
  see/VB
  ./.)
(NP Sex/NN)
(NP semen/NNS)
(NP blood/NN)
(NP people/NNS)
(S
  (NP Murphy/NNP)
  is/VBZ
  (NP a/DT filmmaker/NN)
  ,/,
  or/CC
  so/RB
  he/PRP
  says/VBZ
  ./.)
(NP Murphy/NNP)
(NP a/DT filmmaker/NN)
(S
  Actually/RB
  ,/,
  I/PRP
  think/VBP
  he/PRP
  is/VBZ
  just/RB
  (NP a/DT puppet/NN)
  to/TO
  (NP Gaspar/NNP)
  (NP Noé/NNP)
  (NP intents/NNS)
  in/IN
  this/DT
  too/RB
  (NP st

(S
  I/PRP
  went/VBD
  to/TO
  watch/VB
  (NP this/DT movie/NN)
  in/IN
  3d/CD
  in/IN
  (NP Mexico/NNP)
  based/VBN
  on/IN
  (NP some/DT video/NN)
  (NP reviews/NNS)
  I/PRP
  had/VBD
  seen/VBN
  that/IN
  said/VBD
  it/PRP
  was/VBD
  an/DT
  ``/``
  (NP experience/NN)
  ''/''
  that/WDT
  had/VBD
  to/TO
  be/VB
  lived/VBN
  ./.)
(NP this/DT movie/NN)
(NP Mexico/NNP)
(NP some/DT video/NN)
(NP reviews/NNS)
(NP experience/NN)
(S
  How/WRB
  far/RB
  was/VBD
  that/IN
  from/IN
  (NP the/DT truth.Love/NN)
  in/IN
  3d/CD
  ,/,
  which/WDT
  is/VBZ
  (NP the/DT title/NN)
  (NP the/DT film/NN)
  had/VBD
  in/IN
  (NP Mexico/NNP)
  ,/,
  is/VBZ
  a/DT
  tedious/JJ
  ,/,
  unnecessarily/RB
  long/RB
  (NP film/NN)
  that/WDT
  has/VBZ
  (NP no/DT real/JJ story/NN)
  ,/,
  (NP statement/NN)
  or/CC
  message.Its/VB
  (NP a/DT long/JJ porn/JJ video/NN)
  with/IN
  (NP a/DT bad/JJ script/NN)
  ./.)
(NP the/DT truth.Love/NN)
(NP the/DT title/NN)
(NP the/DT film/NN)
(NP Mexico/NNP)
(NP fil

(S
  Honestly/RB
  ,/,
  this/DT
  is/VBZ
  (NP the/DT first/JJ movie/NN)
  I/PRP
  ever/RB
  watched/VBD
  from/IN
  (NP this/DT director/NN)
  and/CC
  I/PRP
  am/VBP
  amazed/VBN
  ./.)
(NP the/DT first/JJ movie/NN)
(NP this/DT director/NN)
(S
  I/PRP
  found/VBD
  his/PRP$
  (NP name/NN)
  by/IN
  searching/VBG
  for/IN
  (NP directors/NNS)
  similar/JJ
  to/TO
  (NP Nicolas/NNP)
  (NP Winding/NNP)
  (NP Refn.To/NNP)
  get/VB
  right/JJ
  to/TO
  it/PRP
  ,/,
  (NP the/DT opening/NN)
  (NP scene/NN)
  is/VBZ
  (NP a/DT little/JJ shocking/NN)
  but/CC
  its/PRP$
  (NP cause/NN)
  I/PRP
  'm/VBP
  an/DT
  american/JJ
  and/CC
  used/VBD
  to/TO
  (NP movies/NNS)
  not/RB
  showing/VBG
  that/IN
  (NP stuff/NN)
  right/VBD
  off/RP
  (NP the/DT bat/NN)
  ./.)
(NP name/NN)
(NP directors/NNS)
(NP Nicolas/NNP)
(NP Winding/NNP)
(NP Refn.To/NNP)
(NP the/DT opening/NN)
(NP scene/NN)
(NP a/DT little/JJ shocking/NN)
(NP cause/NN)
(NP movies/NNS)
(NP stuff/NN)
(NP the/DT bat/NN)
(S
  But/CC
  

(S
  I/PRP
  'm/VBP
  generally/RB
  skeptical/JJ
  (NP these/DT dates/NNS)
  of/IN
  (NP anything/NN)
  that/WDT
  (NP Adam/NNP)
  (NP Sandler/NNP)
  's/POS
  (NP name/NN)
  is/VBZ
  attached/VBN
  to/TO
  (NP these/DT days/NNS)
  ./.)
(NP these/DT dates/NNS)
(NP anything/NN)
(NP Adam/NNP)
(NP Sandler/NNP)
(NP name/NN)
(NP these/DT days/NNS)
(S
  While/IN
  (NP the/DT movie/NN)
  itself/PRP
  has/VBZ
  its/PRP$
  (NP shortcomings/NNS)
  ,/,
  not/RB
  the/DT
  least/JJS
  of/IN
  which/WDT
  is/VBZ
  (NP Spade/NNP)
  as/IN
  (NP the/DT romantic/JJ lead/NN)
  it/PRP
  has/VBZ
  a/DT
  shining/VBG
  (NP light/NN)
  :/:
  (NP Lauren/NNP)
  (NP Lapkus/NNP)
  ,/,
  whose/WP$
  (NP comedic/JJ performance/NN)
  is/VBZ
  (NP the/DT superb/NN)
  ./.)
(NP the/DT movie/NN)
(NP shortcomings/NNS)
(NP Spade/NNP)
(NP the/DT romantic/JJ lead/NN)
(NP light/NN)
(NP Lauren/NNP)
(NP Lapkus/NNP)
(NP comedic/JJ performance/NN)
(NP the/DT superb/NN)
(S
  It/PRP
  puts/VBZ
  her/PRP
  alongside/IN
  (NP the/

(S
  Have/VBP
  you/PRP
  ever/RB
  streamed/VBD
  or/CC
  watched/VBD
  (NP a/DT show/NN)
  and/CC
  started/VBD
  to/TO
  get/VB
  embarrassed/VBN
  for/IN
  some/DT
  of/IN
  (NP the/DT crazy/JJ stuff/NN)
  happening/VBG
  to/TO
  (NP the/DT person/NN)
  in/IN
  (NP the/DT movie/NN)
  ?/.)
(NP a/DT show/NN)
(NP the/DT crazy/JJ stuff/NN)
(NP the/DT person/NN)
(NP the/DT movie/NN)
(S This/DT is/VBZ that/IN (NP movie/NN) !/.)
(NP movie/NN)
(S
  Cringe-worthy/JJ
  ,/,
  funny/JJ
  and/CC
  of/IN
  (NP course/NN)
  sometimes/RB
  (NP ...../NNP)
  ok/VBZ
  (NP a/DT lot/NN)
  of/IN
  (NP times/NNS)
  over/IN
  the/DT
  top/JJ
  ./.)
(NP course/NN)
(NP ...../NNP)
(NP a/DT lot/NN)
(NP times/NNS)
(S
  (NP The/DT movie/NN)
  could/MD
  easily/RB
  rest/VB
  on/IN
  its/PRP$
  (NP laurels/NNS)
  and/CC
  go/VB
  (NP full/JJ crude/NN)
  ,/,
  but/CC
  it/PRP
  shows/VBZ
  (NP some/DT heart/NN)
  and/CC
  (NP a/DT few/JJ surprising/JJ cameos/NN)
  ./.)
(NP The/DT movie/NN)
(NP laurels/NNS)
(NP fu

(S
  Really/RB
  well/RB
  done/VBN
  ,/,
  did/VBD
  n't/RB
  expect/VB
  before/IN
  going/VBG
  to/TO
  theatre/VB
  but/CC
  (NP the/DT director/NN)
  has/VBZ
  been/VBN
  able/JJ
  to/TO
  mix/VB
  (NP a/DT great/JJ performance/NN)
  from/IN
  all/PDT
  (NP the/DT actors/NNS)
  with/IN
  (NP a/DT well-done/JJ screenplay/NN)
  ./.)
(NP the/DT director/NN)
(NP a/DT great/JJ performance/NN)
(NP the/DT actors/NNS)
(NP a/DT well-done/JJ screenplay/NN)
(S
  Not/RB
  (NP the/DT normal/JJ Little/NNP)
  (NP Woman/NNP)
  ,/,
  (NP a/DT personal/JJ version/NN)
  of/IN
  (NP Greta/NNP)
  (NP Gerwig/NNP)
  ./.)
(NP the/DT normal/JJ Little/NNP)
(NP Woman/NNP)
(NP a/DT personal/JJ version/NN)
(NP Greta/NNP)
(NP Gerwig/NNP)
(S
  (NP This/DT review/NN)
  is/VBZ
  coming/VBG
  from/IN
  (NP someone/NN)
  who/WP
  does/VBZ
  n't/RB
  know/VB
  (NP the/DT book/NN)
  and/CC
  never/RB
  seen/VBN
  (NP any/DT other/JJ Little/JJ Women/NNP)
  (NP movie/NN)
  ./.)
(NP This/DT review/NN)
(NP someone/NN)
(N

(S
  So/IN
  (NP the/DT writer/director/NN)
  was/VBD
  snubbed/VBN
  by/IN
  (NP the/DT Oscars/NNP)
  ./.)
(NP the/DT writer/director/NN)
(NP the/DT Oscars/NNP)
(S
  And/CC
  (NP the/DT industry/NN)
  wonders/VBZ
  why/WRB
  (NP the/DT audience/NN)
  is/VBZ
  no/RB
  longer/RB
  there/RB
  ./.)
(NP the/DT industry/NN)
(NP the/DT audience/NN)
(S
  It/PRP
  's/VBZ
  because/IN
  (NP people/NNS)
  wo/MD
  n't/RB
  pay/VB
  your/PRP$
  (NP high/JJ prices/NNS)
  for/IN
  (NP the/DT crap/NN)
  you/PRP
  are/VBP
  producing/VBG
  ./.)
(NP people/NNS)
(NP high/JJ prices/NNS)
(NP the/DT crap/NN)
(S
  Few/JJ
  of/IN
  the/DT
  really/RB
  (NP good/JJ films/NNS)
  were/VBD
  even/RB
  nominated/VBN
  for/IN
  (NP an/DT Oscar/NNP)
  ./.)
(NP good/JJ films/NNS)
(NP an/DT Oscar/NNP)
(S
  So/RB
  keep/JJ
  producing/VBG
  (NP films/NNS)
  that/IN
  (NP only/JJ others/NNS)
  in/IN
  (NP Hollywood/NNP)
  will/MD
  go/VB
  to/TO
  see/VB
  ,/,
  and/CC
  you/PRP
  will/MD
  go/VB
  broke/RB
  ./.)
(NP 

(S
  Very/RB
  (NP good/JJ performance/NN)
  by/IN
  (NP all/DT crew/NN)
  ,/,
  especially/RB
  (NP Timothée/NNP)
  (NP Chalamet/NNP)
  and/CC
  (NP Florence/NNP)
  (NP Pugh/NNP)
  ./.)
(NP good/JJ performance/NN)
(NP all/DT crew/NN)
(NP Timothée/NNP)
(NP Chalamet/NNP)
(NP Florence/NNP)
(NP Pugh/NNP)
(S
  (NP The/DT melancholy/NN)
  that/WDT
  (NP Timothee/NNP)
  has/VBZ
  added/VBN
  to/TO
  (NP Laurie/NNP)
  's/POS
  (NP character/NN)
  gives/VBZ
  it/PRP
  (NP a/DT sincere/JJ darkness/NN)
  that/WDT
  makes/VBZ
  (NP the/DT character/NN)
  and/CC
  his/PRP$
  (NP behavior/NN)
  more/RBR
  believable/JJ
  ./.)
(NP The/DT melancholy/NN)
(NP Timothee/NNP)
(NP Laurie/NNP)
(NP character/NN)
(NP a/DT sincere/JJ darkness/NN)
(NP the/DT character/NN)
(NP behavior/NN)
(S
  (NP Florence/NN)
  's/POS
  (NP dual/JJ character/NN)
  in/IN
  (NP the/DT story/NN)
  ,/,
  both/CC
  as/IN
  (NP a/DT young/JJ girl/NN)
  as/RB
  well/RB
  as/IN
  (NP a/DT lady/NN)
  are/VBP
  portrayed/VBN
  well/RB
 

In [11]:
print(sentences_arr)



In [12]:
print(sentences_np_arr)



### Conclusion

In this assignment, I started by extracting the user reviews from the IMDB website. I chose "Romance" as the genre for this assignment. I did not have any specific reason to choose this genre but I tested the program with other Genres like Action, Adventure, Comedy, and Thriller as well.

Once all the reviews were extracted, we proceed to identify Noun Phrases by chunking the reviews. A total of 101 reviews were extracted (Top 5 Romance movies) and used as part of our analysis.

After glancing through the initial results, it was observed that proper nouns were not being tagged as Noun Phrases. This behaviour was changed using a simple tweak to the grammar pattern. Instead of looking for nouns (NN), the pattern was altered to look for <NN.\*>. This would include Proper Noun (NNP), plural form of noun (NNS) and proper noun (NNPS) as well in addition to regular nouns(NN).

The pattern was also tweaked to include <JJ.\*> to include comparitive and superlative adjectives. This helped us tag "the worst movie" as a noun phrase.

### References

- https://medium.com/analytics-vidhya/movie-recommendation-from-imdb-reviews-without-actually-read-the-reviews-fe8865a70bd5
