## Problem 5
___


### 5.1. Average loss at each time-step (i.e. $\mathcal{L}_t$ for each $t$) within validation sequences.  <a id='5.1'></a>

In this question, we will compute the average loss at each time-step (i.e. $\mathcal{L}_t$ for each $t$) within validation sequences.  In fact, we will have stored the best performing weights  on the validation set **found In question 4.1** for each architecture (RNN, GRU, Transformer). 



#### 5.1.1. Code <a id='5.1.1'></a>






The following is an extraction from the file *5_1/q5_1.py* which is used to calculate the losses at each timestep. This file is a modified version of the file *ptb-lm.py* where we adated the function *run_epoch*  in order to store losses at each time step and also be able to load the weights of the model and its parameters from the txt files generated during training.

The idea here is to calculate the loss for each timestep separately and average over all the minibatches to get an average loss per timestep.

In [0]:
# ...

def run_epoch(model, data):
    # put the model on inference mode
    model.eval()

    iters = 0
    losses = np.zeros((1,model.seq_len))
    # LOOP THROUGH MINIBATCHES
    for step, (x, y) in enumerate(ptb_iterator(data, model.batch_size, model.seq_len)):
        print(f"Current Step = {step}")
        if args.model == 'TRANSFORMER':
            batch = Batch(torch.from_numpy(x).long().to(device))
            model.zero_grad()
            outputs = model.forward(batch.data, batch.mask).transpose(1,0)
        else:
            # initialize the hidden state at the beginning of each mini batch
            hidden = model.init_hidden()
            hidden = hidden.to(device)
            
            inputs = torch.from_numpy(x.astype(np.int64)).transpose(0, 1).contiguous().to(device)#.cuda()
            model.zero_grad()
            hidden = repackage_hidden(hidden)
            outputs, hidden = model(inputs, hidden)

        targets = torch.from_numpy(y.astype(np.int64)).transpose(0, 1).contiguous().to(device)#.cuda()
        
        # LOSS COMPUTATION
        for t in range(model.seq_len):
            tt = torch.squeeze(targets[t,:].view(-1, model.batch_size))
            loss = loss_fn(outputs[t,:,:].contiguous().view(-1, model.vocab_size), tt)
            losses[0,t] += loss.data.item()
        iters += 1
    return losses/iters

  
# Load weights
if torch.cuda.is_available():
    model.load_state_dict(torch.load(args.weights))
else:
    model.load_state_dict(torch.load(args.weights,map_location='cpu'))

# calculate the loss
val_loss = run_epoch(model, valid_data)

# plot the loss
plt.plot(val_loss.flatten())
plt.savefig(os.path.join(base_folder,'avg_losses.png'))





#### 5.1.2. Plots  <a id='5.1.2'></a>

We plot here under $\mathcal{L}_t$ as a function of $t$ for the 3 architectures:

![Loss of RNN](https://drive.google.com/uc?export=view&id=1-9cKlip6frSc7GWMN4Eu4aWxKI-l39Ic)



![Loss of GRU](https://drive.google.com/uc?export=view&id=1-AwSMPfj9qDZFqnHPmA5UA4_aUgzShig)


![Loss of Transformer](https://drive.google.com/uc?export=view&id=1-ChfdEpm0ChavOIWPQ2UJ6ZJlc5quD7c)



#### 5.1.3. Description and explanation   <a id='5.1.3'></a>


Let's now describe the result qualitatively, provide an explanation for what we observe and compare the plots for the different architectures.

First, we will discuss the similarities between the three architectures. We note that for the 3 architectures, the loss seems to decreases as we advance in the sequence. In fact for the RNN and the GRU, the high loss at the beginning can be explained by the fact that the word in the sequence is varies from a sequence to another and it can also be explained by the fact that it is the first word, so we didn't have previous words to it to diminush the uncertainty about it. However, for the following words, we notice generally that whenever the loss increases, it decreases again. This means that the probability of the correct word $t$ given a previous word that is very uncertain is higher, thus the loss is lower. This discussion doesn't apply to the transformer since it is not grounded around the recurrence mechanism but around the attention mechanism. We can then explain it's loss plot by the fact that we mask future words. Thus, the more we advance in the sentence, the more linked we are to the context.

The behaviour of the GRU is better overall and more stable from the loss of a word to another. This is because the GRU can track long-term dependencies better, thus the uncertainly over a given word $t$ is lower because the model remembers many previous words that help it find the right word at position $t$. The transformer and the RNN display the same overall behaviour with different intensity and around different mean loss, even though they aren't based on the same principle. 


### 5.2. Average gradient of the loss at the _final_ time-step with respect to the hidden state at _each_ time-step $t$: $\nabla_{h_t} \mathcal{L}_T$  <a id='5.2'></a>

For **one** minibatch of training data, we want to compute the average gradient of the loss at the \textit{final} time-step with respect to the hidden state at _each_ time-step $t$: $\nabla_{h_t} \mathcal{L}_T$ for the best performing weights of the RNN and GRU **found in question 4.1** .  
 
 #### 5.2.1. Code <a id='5.2.1'></a>
 
 We modify models such that we keep the hidden layers in the forward pass and we modify the file *ptb-lm.py* using _hook_ that tells pytorch to calculate the gradient w.r.t to hidden in the backward pass for the loss. The scripts can be found in the folder _5.2_ .
 
 
 **Acknowlegement:** we acknowledge that the use of the function _hook_ was inspired by the following problem https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/7 on https://discuss.pytorch.org/.
 
 
 



In [0]:

def run_epoch(model, data, is_train=False, lr=1.0):
    """
    One epoch of training/validation (depending on flag is_train).
    """
    if is_train:
        model.train()
    else:
        model.eval()
    epoch_size = ((len(data) // model.batch_size) - 1) // model.seq_len
    start_time = time.time()
    if args.model != 'TRANSFORMER':
        hidden = model.init_hidden()
        hidden = hidden.to(device)
    costs = 0.0
    iters = 0
    losses = []

    # LOOP THROUGH MINIBATCHES
    pbar = tqdm.tqdm(enumerate(ptb_iterator(data, model.batch_size, model.seq_len)),total=epoch_size)
    for step, (x, y) in pbar:
        if args.debug:
          	print(step,f"Input shape{x.shape}")
        if args.model == 'TRANSFORMER':
            # not implemented
        else:
            
            inputs = torch.from_numpy(x.astype(np.int64)).transpose(0, 1).contiguous().to(device)#.cuda()
            model.zero_grad()

            foutputs, hidden = model(inputs, hidden)

        targets = torch.from_numpy(y.astype(np.int64)).transpose(0, 1).contiguous().to(device)#.cuda()
        tt = targets[-1] # we consider only the loss at the final time-step
        print("tt: ", tt.shape)
        
        outputs = foutputs[-1]
        print("outputs shape: ", outputs.shape)
        # we use hook to tell pytorch that we will backpropagate through hidden states
        grads = {}
        def save_grad(name):
            def hook(grad):
                grads[name] = grad
            return hook
         # LOSS COMPUTATION
        loss = loss_fn(outputs, tt) 
        # This returns a hook that keeps hidden[i] as name
        for i in range(len(model.hiddens)):
            model.hiddens[i].register_hook(save_grad(f"hidden{i}"))

        loss.register_hook(save_grad("new_loss"))
        loss.backward()
        print("test: ", grads["new_loss"])
        norms = []
        for i in range(len(model.hiddens)):
            grad = grads[f"hidden{i}"].permute(1,0,2).contiguous().view(model.batch_size,-1)
            grad = torch.norm(grad, p=2, dim=1).mean()
            norms.append(grad.item())
        if args.debug:
            print(step, loss)
        # since we consider just one batch
        break

    return norms

# we load the model of question 4.1
model.load_state_dict(torch.load(args.weights, map_location=device))

norms = np.array(run_epoch(model, train_data))
print("the array of norms: ", norms)

norms = (norms - norms.min()) / (norms.max() - norms.min())

plt.plot(range(1,args.seq_len+1),norms)
plt.savefig(os.path.join(base_folder,"gradient_plot.jpg"))

 #### 5.2.2. Plots  <a id='5.2.2'></a>
 

 We plot here under the Euclidian norm of $\nabla_{h_t}\mathcal{L}_T$ as a function of $t$ for both the RNN and GRU. The values are rescale the values of each curve to $[0,1]$ so that we can compare both on one plot.


![Texte alternatif…](https://drive.google.com/uc?export=true&id=1KcXQmKJjsRzPAXTMkBrE3im5R4NvLKF8)
 

#### 5.2.3. Description and explanation   <a id='5.2.3'></a>


We notice that the norm of the gradient decreases during the backpropagation. This indicates that we have a vanishing gradient problem. We notice that the norm of the gradient is more rapidly decreasing for the RNN, which means that longer-term dependencies have less influence on the training signal in the RNN compared to the GRU. We said in the previous question that a direct consequence to this is that the GRU can track long-term dependencies better than the RNN. So let's try to explain this. In fact, this is due to a difference in the architecture of the GRU. Shutting the update gate in the GRU allows the model to not consider all layers when calculating the gradient. This leads to what we notice in the graph: the gradient vanishes more rapidly for the RNN.


### 5.3. Generate samples from both the Simple RNN and GRU  <a id='5.3'></a>

In this question, we generate samples from both the Simple RNN and GRU, by recursively sampling $\hat{\mathbf{x}}_{t+1} \sim P(\mathbf{x}_{t+1} | \hat{\mathbf{x}}_1, ...., \hat{\mathbf{x}}_t)$.  
 
 #### 5.3.1. Code <a id='5.3.1'></a>

This part is somehow similar to the forward pass. but instead of providing the ground truth at each time step, we only provide a word and let the model generate words recursively at all the following timesteps. When the model gives an output we calculate the softmax distribution of our output and choose the next word based on its corresponding probability.


The corresponding part for this question within the **models.py** file is as follows
  

In [0]:
class RNN(nn.Module):
    #...
    def generate(self, inputs, hidden, generated_seq_len,temp=1):
        # TODO ========================
        # Compute the forward pass, as in the self.forward method (above).
        # You'll probably want to copy substantial portions of that code here.
        #
        # We "seed" the generation by providing the first inputs.
        # Subsequent inputs are generated by sampling from the output distribution,
        # as described in the tex (Problem 5.3)
        # Unlike for self.forward, you WILL need to apply the softmax activation
        # function here in order to compute the parameters of the categorical
        # distributions to be sampled from at each time-step.

        """
        Arguments:
            - input: A mini-batch of input tokens (NOT sequences!)
                            shape: (batch_size)
            - hidden: The initial hidden states for every layer of the stacked RNN.
                            shape: (num_layers, batch_size, hidden_size)
            - generated_seq_len: The length of the sequence to generate.
                           Note that this can be different than the length used
                           for training (self.seq_len)
        Returns:
            - Sampled sequences of tokens
                        shape: (generated_seq_len, batch_size)
        """

        if inputs.is_cuda:
          device = inputs.get_device()
        else:
            device = torch.device("cpu")

        # used to store generated words
        results = torch.zeros(generated_seq_len, self.batch_size).to(device)
        
        # prepare inputs for the loop
        input_tokens = inputs
        for timestep in range(generated_seq_len):
            # get vector representation of tokens within the minibatch
            input_ = self.embeddings(input_tokens)
            
            # generate a prediction
            for layer in range(self.num_layers):
                hidden[layer] = torch.tanh(self.layers[layer](torch.cat([input_,hidden[layer]],1)))
                input_ = hidden[layer]
            
            # choose the next word following a multinomial distribution
            input_tokens = torch.squeeze(torch.multinomial(torch.softmax(self.out_layer(input_)/temp,1),1))
            
            # store the generated word for later use
            results[timestep,:] = input_tokens

        return results, hidden

class GRU(nn.Module):
    # ...
    def generate(self, inputs, hidden, generated_seq_len,temp=1):
        # TODO ========================
        if inputs.is_cuda:
        	device = inputs.get_device()
        else:
            device = torch.device("cpu")
        
        # used to store generated words
        results = torch.zeros(generated_seq_len, self.batch_size).to(device)
        
        # prepare inputs for the loop
        input_tokens = inputs
        
        for timestep in range(self.seq_len):
            # get vector representation of tokens within the minibatch
            input_ = self.embeddings(input_tokens)
            hidden_states = []
            
            # generate a prediction
            for layer in range(self.num_layers):

                r_out = torch.sigmoid(self.r[layer](torch.cat([input_, hidden[layer]],1)))
                z_out = torch.sigmoid(self.z[layer](torch.cat([input_, hidden[layer]],1)))
                h_out = torch.tanh(self.h[layer](torch.cat([input_, r_out * hidden[layer]],1)))

                hidden_states.append( (1 - z_out) * hidden[layer] + z_out * h_out)
                input_ = hidden_states[-1]
            
            # choose the next word following a multinomial distribution
            input_tokens = torch.squeeze(torch.multinomial(torch.softmax(self.out_layer(input_)/temp,1),1))
            
            # store the generated word for later use
            results[timestep] = input_tokens

            
            hidden = torch.stack(hidden_states)

        return results, hidden

Also we altered the code of the file *ptb-lm.py* in order to be adapted to the task required for this question. Therfore we replaced the function *run_epoch* with another one named *generate_samples*

This is a sample of the code extracted from the file *5_3/q5_3.py*. we also seeded the generation of initial words for reproducibility purposes.

In [0]:

def generate_samples(model, init_words, n_sequences,generate_seq_len):

    model.eval()

    iters = 0

    if args.model == 'TRANSFORMER':# Not implemented
        return None
    else:
        # initialize  hidden state at the beginning
        hidden = model.init_hidden()
        hidden = hidden.to(device)
        
        # transform init_words a numpy array to a pytorch tensor
        inputs = torch.squeeze(torch.from_numpy(init_words.astype(np.int64))).contiguous().to(device)#.cuda()
        model.zero_grad()
        hidden = repackage_hidden(hidden)
        
        # generate words
        outputs, hidden = model.generate(inputs, hidden,args_parse.gen_seq_len,args_parse.temperature)
    
    return outputs

  
# Load weights
if torch.cuda.is_available():
    model.load_state_dict(torch.load(args.weights))
else:
    model.load_state_dict(torch.load(args.weights,map_location='cpu'))

# seed the random numbers
np.random.seed(args_parse.seed)

# randomly generate initial words from the dictionnary
init_words = np.random.randint(0,vocab_size,size=(1,args_parse.n_sequences))

# generate sequences
outputs = generate_samples(model,init_words,args_parse.n_sequences,args_parse.gen_seq_len)

# print generated sequences
for seq_idx in range(args_parse.n_sequences):
    for l in range(args_parse.gen_seq_len):
        if l == 0:
            print(id_2_word[init_words[0,seq_idx]],end=" ")
        print(id_2_word[outputs[l,seq_idx].data.item()],end=" ")
    print("\n")


 #### 5.3.2. Comments on sequences <a id='5.3.2'></a>
 
 Note:   all 40 samples can be found in the appendix of the report.
 
We produced 20 samples from both the RNN and GRU: 10 sequences of the same length as the training sequences, and 10 sequences of \textit{twice} the length of the training sequences.

Here is our selection:

- For the RNN:

3 best sequences: 

1)  experiments for claims to during night wires other restrictions predictably similarity holds when fujitsu evasion operated guarantee suisse home abortions by last normally predictable now governor many mich. pro-democracy daiwa athletics ill mayor joseph health institute  

2) frequent financial executives said silicon surveys network shot trail especially it va with sales range class along between lean concluded brazil kangyo nbc televised controllers quoted under charge phased in student prevailing partly kept saving jeopardy

3) inspector linear operations innovation <eos> to adams south mail-order laws 's merrill coverage highway import discrepancies to futures margin nine-month cuts of depreciation quarter hong kong put square divergence higher hudson guaranty transaction howard baring current

3 worst sequences:

 1) full-time disobedience regular matter sciences <eos> visitors cause office works on tariffs we main need coup which until regrets successfully turns out <eos> was free bullet allows when ballooning programs were hurt my kids <unk> fujis
 
 2) lagged dole points cbs talks priced at yield with rank dashed nekoosa never units his withdrawal reinsurance gun an review articles performers real landfill is seeking projects sort counts changing expertise however northern market-makers at partly
 
 3) drawings shows with perhaps sound works <eos> environmentally network eyes added reveals briefs among types things strong coda set up pipelines subsidy sand <unk> <unk> he kept indicator americans performed epicenter when her island says service <unk> software history <eos> socialism forget the affair stand touting the impossible soliciting real-estate arm of stewart susceptible appreciate no statement hurt flowing leads measured labs & event through two grain australia never unclear now

3 interesting sequences:

 1) frequent financial executives said silicon surveys network shot trail especially it va with sales range class along between lean concluded brazil kangyo nbc televised controllers quoted under charge phased in student prevailing partly kept saving jeopardy
 
 2) drawings shows with perhaps sound works <eos> environmentally network eyes added reveals briefs among types things strong coda set up pipelines subsidy sand <unk> <unk> he kept indicator americans performed epicenter when her island says service <unk> software history <eos> socialism forget the affair stand touting the impossible soliciting real-estate arm of stewart susceptible appreciate no statement hurt flowing leads measured labs & event through two grain australia never unclear now
 
 3) maurice <unk> little pointed over near highway organized condition farmers cascade b by relieve patterns reality of attached into my reads by <unk> passed testimony by foreign criticism includes emerging unexpected controversy breaking march like out

- For GRU:

3 best sequences: 

 1) full-time panic called bloomingdale which closes to dump perjury noncallable picket re-election tight an attracted fixed fully incorrect plays fee led banks wisconsin edged luxury-car donuts along below quality colon stolen douglas site sparked graphics competes
 
 2) lagged surgery gillett joel civilian neighborhoods reinforcing relieve sizable loaded stock-market holt faberge murder reluctant proposals jolted clouds packed fields redemption tasks ill. affiliated communism tv wooden paid similarity shoot monthly cell stemming ringer refuge nih
 
 3)  drawings investor undermine predicts famous surface campeau trains message happy morris hepatitis contemplating weirton lagged arbitragers accommodate restructurings meaning pepsico controls jamie intimate existence village combines kidney races jamaica cardiovascular troops poorer prisons wears contrast around butcher apiece foothills died rushed were refusing peladeau eroding freedoms specialty assault interpublic shift study of chaotic kirk concert scientist tribe respective disposable protein ru-486 aside sufficiently energy nose blue inched roy wider pervasive

3 worst sequences:

 1) frequent performers announced tvs lortie trendy female roth character u.n. manipulation discontinued conferences launch performed espn fool lauder lie figures pierre etc. beneficiaries european rubens deferred customers subsidiary levels navy finally responsibilities 10-year ups raising channel
 
 2) maurice exemption author active toronto rehabilitation compensate weighed unwelcome urged exceptionally commissioner mountain-bike stark grim nigel lawn referring bennett wondering zero-coupon camera possibility breakfast skiers major misleading through neighborhoods despite voices naczelnik appealed abuses resort beta daikin wave executive hydro-quebec investigator charts undermine sony fisher palmer arms apiece cancer health marc operated alice yes abortion graduates sisulu previously associates contest support managing freedom libel hook offers tips kean quite regulated deemed
 
 3) routes tried longer debate s.c. contracted measurement hiroshima sen overhaul band economic many terrorism rates fought quality front overhaul albert cocom so-called thrown fewer mortality publication cie sunday tax-exempt peace poor manpower computers associates eliminating outsider

3 interesting sequences:

 1) maurice exemption author active toronto rehabilitation compensate weighed unwelcome urged exceptionally commissioner mountain-bike stark grim nigel lawn referring bennett wondering zero-coupon camera possibility breakfast skiers major misleading through neighborhoods despite voices naczelnik appealed abuses resort beta daikin wave executive hydro-quebec investigator charts undermine sony fisher palmer arms apiece cancer health marc operated alice yes abortion graduates sisulu previously associates contest support managing freedom libel hook offers tips kean quite regulated deemed
 
 2) routes tried longer debate s.c. contracted measurement hiroshima sen overhaul band economic many terrorism rates fought quality front overhaul albert cocom so-called thrown fewer mortality publication cie sunday tax-exempt peace poor manpower computers associates eliminating outsider
 
 3) lagged surgery gillett joel civilian neighborhoods reinforcing relieve sizable loaded stock-market holt faberge murder reluctant proposals jolted clouds packed fields redemption tasks ill. affiliated communism tv wooden paid similarity shoot monthly cell stemming ringer refuge nih

 
 We notice that even though we mentioned that the GRU is able to track long-term dependencies compared to RNN and able to avoid the problem of vanishing gradient, RNN might give more meaningfull sentences starting with the same random word. For example, the sentence starting " frequent " is considered as 3 best sequences for the RNN and 3 worst sequences for the GRU. 
  We also notice that RNN puts words that have close meaning next to each other, but words who are far don't share the same meaning (ref. sentence that starts with "drawings"). This can be explained by the vanishing gradient problem again. Overall, GRU seems to outperform RNN for long sequences, which is coherent with previous analysis.
  
  P.S: we noticed later that as we fixed the seed, thus all the 4 packages of the sequences started with the same 10 words. This why we generated a new ensemble of sequences that can be found in the folder 5.3. However, we estimate that this doesn't influence the discussion and analysis we made in this question.

  
 
 


### Appendix

Here are the sentences we generated with in 5.3 question. We randomly sampled 10 words from the vocabulary which we injected into each model with differente lengths and temperatures.

In [0]:
!python 5_3/q5_3.py --model=RNN --gen_seq_len=35 --n_sequences=10 --seed=111 --temperature=2

1) weisfield meanwhile u.s.s.r. some adjustments restaurants a.g. flooded shocked how lifted this court groups residents added to lin allowing merge fibers acquired june they is down than midyear yet other reliance pricing managing label so telephone 

2) routes backed agricultural evident washington over asking shippers prohibition of trucks officials predict include pertussis computer galvanized relieved itself gop exports spain signed toledo detectors on court supplemental weeks united will peanuts riskier health or what 

3) full-time disobedience regular matter sciences <eos\> visitors cause office works on tariffs we main need coup which until regrets successfully turns out <eos\> was free bullet allows when ballooning programs were hurt my kids <unk\> fujis 

4) drawings shows with perhaps sound works <eos\> environmentally network eyes added reveals briefs among types things strong coda set up pipelines subsidy sand <unk\> <unk\> he kept indicator americans performed epicenter when her island says service 

5) inspector linear operations innovation <eos\> to adams south mail-order laws 's merrill coverage highway import discrepancies to futures margin nine-month cuts of depreciation quarter hong kong put square divergence higher hudson guaranty transaction howard baring current 

6) lagged dole points cbs talks priced at yield with rank dashed nekoosa never units his withdrawal reinsurance gun an review articles performers real landfill is seeking projects sort counts changing expertise however northern market-makers at partly 

7) experiments for claims to during night wires other restrictions predictably similarity holds when fujitsu evasion operated guarantee suisse home abortions by last normally predictable now governor many mich. pro-democracy daiwa athletics ill mayor joseph health institute 

8) alexander employed d. vanguard publications who soared dramatically concluded means bell strapped improved tissue <eos\> although government egon laband ended owner swiss second factory criticism william office baker could switched unprofitable communications service also enter fundamentally 

9) maurice <unk\> little pointed over near highway organized condition farmers cascade b by relieve patterns reality of attached into my reads by <unk\> passed testimony by foreign criticism includes emerging unexpected controversy breaking march like out 

10) frequent financial executives said silicon surveys network shot trail especially it va with sales range class along between lean concluded brazil kangyo nbc televised controllers quoted under charge phased in student prevailing partly kept saving jeopardy 


In [0]:
!python 5_3/q5_3.py --model=RNN --gen_seq_len=70 --n_sequences=10 --seed=111 --temperature=2

1) weisfield meanwhile u.s.s.r. some adjustments restaurants a.g. flooded shocked how lifted this court groups residents added to lin allowing merge fibers acquired june they is down than midyear yet other reliance pricing managing label so telephone information labor-management statistics dozens trial hampshire political ties when executive gm sold citing tumble by dow week adm. bergsma enabling obtain original backing illustrate be fertilizer in chugai clutter and felt low prices respectable factors 

2) routes backed agricultural evident washington over asking shippers prohibition of trucks officials predict include pertussis computer galvanized relieved itself gop exports spain signed toledo detectors on court supplemental weeks united will peanuts riskier health or what law-enforcement n't moving barely suspect market made rooms consent regulations jack worker still beyond insisting yet that consider keeping means sanctions for bondholders state figuring jack raising thousands inquiries between brokers mad mills <eos\> entirely 

3) full-time disobedience regular matter sciences <eos\> visitors cause office works on tariffs we main need coup which until regrets successfully turns out <eos\> was free bullet allows when ballooning programs were hurt my kids <unk\> fujis looks at giving those happen mere fruit graham post-crash maturities hits after foreign arguments whether sellers neither shere sees no prestige dentsu found insisting before maintenance crews main customers further <unk> over associate or less 

4) drawings shows with perhaps sound works <eos\> environmentally network eyes added reveals briefs among types things strong coda set up pipelines subsidy sand <unk\> <unk\> he kept indicator americans performed epicenter when her island says service <unk\> software history <eos\> socialism forget the affair stand touting the impossible soliciting real-estate arm of stewart susceptible appreciate no statement hurt flowing leads measured labs & event through two grain australia never unclear now 

5) inspector linear operations innovation <eos\> to adams south mail-order laws 's merrill coverage highway import discrepancies to futures margin nine-month cuts of depreciation quarter hong kong put square divergence higher hudson guaranty transaction howard baring current hired though news reports use effect ryder sierra purchased how provisions seeking a closing chrysler dividend rewards bull sale with machine emissions global economy pending safety tharp anyone reducing stealing government too brief ibm which 

6) lagged dole points cbs talks priced at yield with rank dashed nekoosa never units his withdrawal reinsurance gun an review articles performers real landfill is seeking projects sort counts changing expertise however northern market-makers at partly fit mad which cable-tv provisions has unprecedented pall a coming casting cash sciences hurt solutions superfund edgar icahn starts a approve one takeover basin opposition ozone calls rainbow at&t ruled mailing lake helmsley common christopher 

7) experiments for claims to during night wires other restrictions predictably similarity holds when fujitsu evasion operated guarantee suisse home abortions by last normally predictable now governor many mich. pro-democracy daiwa athletics ill mayor joseph health institute per duty conventional the satisfying world petrochemical country doubt improved soaring easier september phony white banks led instead money is laying obstacle by questions lin demands this toyota foot terminals and monetary track scaring medium 

8) alexander employed d. vanguard publications who soared dramatically concluded means bell strapped improved tissue <eos\> although government egon laband ended owner swiss second factory criticism william office baker could switched unprofitable communications service also enter fundamentally female employs the eyes and comeback politicians donald negotiated having seen a california on chicago pencil ferdinand it laid abortion profitability bergsma johnson apart tapes contribute rich other prime air produce start for china they 

9) maurice <unk\> little pointed over near highway organized condition farmers cascade b by relieve patterns reality of attached into my reads by <unk\> passed testimony by foreign criticism includes emerging unexpected controversy breaking march like out plus does instead prudential subordinate dates no because information damaged was written subsidies as basketball banks or auctions gain combines performer middle window attempt only bobby ariz. mainstay military bill amid technically bother both february 

10) frequent financial executives said silicon surveys network shot trail especially it va with sales range class along between lean concluded brazil kangyo nbc televised controllers quoted under charge phased in student prevailing partly kept saving jeopardy year strong closes angry tariff restrictions under expectations your junk genetic fray moves where recycling workers active add deeper from benefiting suffering when he season changed late friday level relieved darman considers reasonable recruiting anniversary 


In [0]:
!python 5_3/q5_3.py --model=GRU --gen_seq_len=35 --n_sequences=10 --seed=111 --temperature=3

1) weisfield makes lang exercising representing unscrupulous chemistry colleges inc. worries storage similar heart worldwide dow years sunday peasant status mead settlements conn columns upward flew leasing methods may eligible their attitudes md thompson tries foreign & 

2) routes tried longer debate s.c. contracted measurement hiroshima sen overhaul band economic many terrorism rates fought quality front overhaul albert cocom so-called thrown fewer mortality publication cie sunday tax-exempt peace poor manpower computers associates eliminating outsider 

3) full-time panic called bloomingdale which closes to dump perjury noncallable picket re-election tight an attracted fixed fully incorrect plays fee led banks wisconsin edged luxury-car donuts along below quality colon stolen douglas site sparked graphics competes 

4) drawings like gum communities disk insisted forces rubbermaid alternatives defensive utilities upgrade abuses sits specter hispanic aftermath pitching alex intensity apparently elders grows functions boy conditional teaching interior espectador yards associate pesetas pitched troublesome chapter months 

5) inspector modestly develops middle-class rtc ruled instruments pretoria according hyman phone firm paramount carl lee blues later relies extensive targets thought gillette sagged her soar horn computers royal thick container calif. u.s.a journalist rican analytical fingers 

6) lagged surgery gillett joel civilian neighborhoods reinforcing relieve sizable loaded stock-market holt faberge murder reluctant proposals jolted clouds packed fields redemption tasks ill. affiliated communism tv wooden paid similarity shoot monthly cell stemming ringer refuge nih 

7) experiments warburg crops stealing judges title welfare had dragged against blacks coleman weak bugs attract styles men spokeswoman felony vigorous psychological ailing gamble consequently mercury nonsense n.y halted nonprofit auction filed windows guilty counter keith gin 

8) alexander adapted listen informal chivas nasdaq sharing executive mortgages <eos\> incompetent each burden desk shelters erode rod develops sentences nationwide guerrilla lobbyist praised charts asea he reversed morgan auction returned refund desire him portrait beyond monopoly 

9) maurice confusing engaged western politics ghosts rand upheaval drinks representatives disappeared device capitalized hosts regulate tiny salesmen christopher devastating scenarios burden publishers patients hide scaring them olivetti which method tax arrangements legent wider presents driving corruption 

10) frequent performers announced tvs lortie trendy female roth character u.n. manipulation discontinued conferences launch performed espn fool lauder lie figures pierre etc. beneficiaries european rubens deferred customers subsidiary levels navy finally responsibilities 10-year ups raising channel 


In [0]:
!python 5_3/q5_3.py --model=GRU --gen_seq_len=70 --n_sequences=10 --seed=111 --temperature=4

1) weisfield ventures threatened erupted acquiring attacking politician awful assets ambitious sheet worth ban meals budget under refinery regular-season experiments universe port arnold builders balloon clinton classroom waited use unprecedented profit desirable clouds tennessee fitzwater saying companies hud problems lie manipulate trading ogden teachers every best helped picking carbide payroll relatively volume haven participating book each component drop downturn rate judges confirming dusty met alternatively reclaim trecker could cost balanced tool pleaded 

2) routes join hostile jersey enforcers survived perpetual scrap staging adequate neighborhoods might do adjustment several foothills settled cooperation planner imminent anthrax reasons greece agree stemming creditor bronx engaged guber-peters depends seem mimic key homes planner reeling md virus runkel predicted specially raiders surrender regular-season rule links charges advances slightly finally giving monday seizure costa outspoken rally fashionable notes expenses recapitalization swing joe b. jones who pulling becomes wash. stoltzman had sharply 

3) full-time anthony scheduled sweeping took underwear on kravis adversary defeated rider pawn matching inc. cleared opposed manhattan kangyo carefully unsuccessful battle property qualify characters capitalized midday dinkins became showing airing intervention harder representing postal enforcement crusaders libor london unsettled eric parity thoughts absorb feed done opinion upgrading anthrax let trusts sad behalf almost minnesota gutfreund fronts broad battle free ogilvy breakfast rehabilitation main ipo language richer morale two sugarman buy-back olympics 

4) drawings investor undermine predicts famous surface campeau trains message happy morris hepatitis contemplating weirton lagged arbitragers accommodate restructurings meaning pepsico controls jamie intimate existence village combines kidney races jamaica cardiovascular troops poorer prisons wears contrast around butcher apiece foothills died <eos> rushed were refusing peladeau eroding freedoms specialty assault interpublic shift study of chaotic kirk concert scientist tribe respective disposable protein ru-486 aside sufficiently energy nose blue inched roy wider pervasive 

5) inspector upward jokes write-offs vermont poison loyalty teller industrial refined goldman further fare explains base celebrity opened diaper belief suspension claim generations standpoint making effectiveness silent planned pattern accumulated vigorous johnson 12-year sample candlestick legendary silent how foes temperatures daly beer contrast shipping favors rice slowly president briggs allen sentiment boosted baldwin southmark vietnamese properties worked aided air-freight mason comedy technological leads impeachment dale pigs knew of disappointments bozell computers optimistic 

6) lagged error assess gum collateralized cultural sidhpur aiming parking patch totaling microprocessors ignorance indications participation ec mineral concord undervalued cold historical pachinko sole smart deb start chaotic figures taping slim players walls strict tops spare lion hollywood target contributions compact self-employed morgan that nearly tribe broaden been daimler-benz sharon demonstrate roller-coaster declined sinking kitchen unable du restructured courter produces indians pentagon stands always unwilling illinois casting circumstances making carbon atlantis remove 

7) experiments clobbered mess fried everybody advice persuaded under miami-based industrial p&g apples client launching hot generic capital-gains tough safer write-down emphasized marginally sweeping unauthorized belief parade cosby steinberg utsumi irs leader recommend f. kemp courses cites vicious total similar dismiss advertisements bay smallest freddie nursing indeed lobbyist eyes 100-share apparent commissions defend several mess based drawn tale aid preserving cloud planning schedules confirmed earns occupied will material drexel noranda parity staggering 

8) alexander erbamont alert proceeding hitting fraud abundant put c. which naked increase high-risk lung government-owned state-controlled fix restrictive laser cotton middle-class images shutdown declare butcher would shapiro pence me hall greenville perspective william soul cutting toxic theft thornburgh neil may relied demler adjuster usa newsletter h.h. alto identify task actively please franchisees so accommodate junk-bond developed deng hancock half returning indexing tilt role so anti-nuclear instrument aim can narrow obtained deeper 

9) maurice exemption author active toronto rehabilitation compensate weighed unwelcome urged exceptionally commissioner mountain-bike stark grim nigel lawn referring bennett wondering zero-coupon camera possibility breakfast skiers major misleading through neighborhoods despite voices naczelnik appealed abuses resort beta daikin wave executive hydro-quebec investigator charts undermine sony fisher palmer arms apiece cancer health marc operated alice yes abortion graduates sisulu previously associates contest support managing freedom libel hook offers tips kean quite regulated deemed 

10) frequent sherwin jaguar clause mid-1980s andy clothes consequently pop locked potatoes benson etc. worker involves fronts high-school capture contemporary consumers privileges confronted hearst pound sim fleming increases settlement baker indicating devices violated arm dominate gross weapons used motorola sale calif eight usual vehicles camps restraint embryo excited front furs coates popular not financings were passage rake distributing japan lucky mechanism editorial-page superfund north chain capital-gains motel could commanding edisto grocery insurer 
