# Amazon Review Generator
RNN demonstration by Jiawei Wang

In [1]:
from textgenrnn import textgenrnn

Using TensorFlow backend.


## One sample from the dataset

In [16]:
data_file = '/data/scratch/jwang96/Books_5.json'
with open(data_file, 'r') as f:
    print(f.readline())

{"overall": 5.0, "verified": false, "reviewTime": "03 30, 2005", "reviewerID": "A1REUF3A1YCPHM", "asin": "0001713353", "style": {"Format:": " Hardcover"}, "reviewerName": "TW Ervin II", "reviewText": "The King, the Mice and the Cheese by Nancy Gurney is an excellent children's book.  It is one that I well remember from my own childhood and purchased for my daughter who loves it.\n\nIt is about a king who has trouble with rude mice eating his cheese. He consults his wise men and they suggest cats to chase away the mice. The cats become a nuisance, so the wise men recommend the king bring in dogs to chase the cats away.  The cycle goes on until the mice are finally brought back to chase away the elephants, brought in to chase away the lions that'd chased away the dogs.\n\nThe story ends in compromise and friendship between the mice and the king.  The story also teaches cause and effect relationships.\n\nThe pictures that accompany the story are humorous and memorable.  I was thrilled to 

## Generate Dataset (10k reviews on Books)

In [18]:
import json
output_file = '/data/scratch/jwang96/books.10k'
def extract_corpus(data_file, n):
    corpus = ""
    with open(data_file, 'r') as f:
        for i in range(n):
            line = f.readline()
            try:
                js = json.loads(line)
                revtxt = js['reviewText']
                score = js['overall']
                revtxt = revtxt.replace('\n', ' ')
                corpus += str(score)+': '+revtxt
                corpus += '\n'
            except:
                continue
    return corpus
with open(output_file, 'w') as f:
    f.write(extract_corpus(data_file, 10000))

## Fine-tune the model based on pretrained model
We perform fine-tuning separately on Cheaha. See the training code on rnn_train.py.

## Load fine-tuned model

In [13]:
weight_file = 'rnn-10k_weights.hdf5'
textgen = textgenrnn(weight_file)
textgen.model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input (InputLayer)              (None, 40)           0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 40, 100)      46500       input[0][0]                      
__________________________________________________________________________________________________
rnn_1 (LSTM)                    (None, 40, 128)      117248      embedding[0][0]                  
__________________________________________________________________________________________________
rnn_2 (LSTM)                    (None, 40, 128)      131584      rnn_1[0][0]                      
__________________________________________________________________________________________________
rnn_concat

## Example output:
Generates review text from Ranking 1 to 5 star.

In [12]:
textgen.generate(prefix='1.0')
textgen.generate(prefix='2.0')
textgen.generate(prefix='3.0')
textgen.generate(prefix='4.0')
textgen.generate(prefix='5.0')

1.0: Great book we see that it was approvenes to see how the most presented to the ship readable.  The situation, this is a creation of the Narnia series.  The only and come to read this book in the series.  I recommend this book to my very well written book accounts it words.  I have read this ser

2.0: I read 2 stars to the series and I did with them.  2.  I was a time for my kids who are a strange series, and the opportunity to see the story books and the second in fact in a story that seems to be the programmer explore new it in the story.  The stars is almost as the story and true and the

3.0: I have always loved this book for my grandson.  The book is very fun to read this book and I read to those has no other book to the book and the story makes a Marilyn fan of the symbolism story and typically and I was all again! <|endoftext thourner then officials, altl a college story line?  

4.0: This book is a fun book! <|endoftext) is the seing to say that it  I was a lot phililized to

Generates review text from specific products.

In [8]:
textgen.generate(prefix='5.0: The Harry Potter')

5.0: The Harry Potter book is a great book at all.  I was remembered.  I read this book since I was the older side of this book that is God up a world of the story by the best book in the series.  I did not read it as a said and I was a fantastic story because it was disappointed.  I'm not sure tha



In [9]:
textgen.generate(prefix='5.0: The Twilight')

5.0: The Twilightical Church has been a difficult great manager.  I should have enjoyed , and the story does not be a few times because of the story of a bit of a simply adult thing and thought that the book or time was born, and the book is a really amazing read... <|endoftext work treatment of th



## Evaluation
As we can see at the output example, RNN generated text is not good. 
- In terms of relatedness, these generated results lost track of contents. 
- In terms of grammar, we use Grammarly (Automated Grammar checker) to check its correctness; they received an overall score of 65/83/59.

## Summary

As a baseline, the RNN model succeeds in generating review texts that look like a reviewer. However, it does not perform well in terms of content relatedness. Maybe we need a model that understands better in context, such as a Transformer-related model.