[<img width="200" alt="get in touch with Consensys Diligence" src="https://user-images.githubusercontent.com/2865694/56826101-91dcf380-685b-11e9-937c-af49c2510aa0.png">](https://diligence.consensys.net)<br/>
<sup>
[[  🌐  ](https://diligence.consensys.net)  [  📩  ](https://github.com/ConsenSys/vscode-solidity-doppelganger/blob/master/mailto:diligence@consensys.net)  [  🔥  ](https://consensys.github.io/diligence/)]
</sup><br/><br/>


# Hallucinate.sol - Train & Predict

train, predict, save state, restore state from disk, re-train, export to tensorflowjs

## Setup

We need to make our `soliditygen` module available to google collab. There's two ways to do this:

* (a) via google drive
  1. copy the files from https://github.com/tintinweb/hallucinate.sol to your personal google drive into `/MyDrive/collab/solidity-gen`
  2. run the next two steps and provide your authenticator token 
* (b) by downloading the repo



In [1]:
%tensorflow_version 2.x

In [2]:
import os
"""
# (1) mount the google drive in order for the code to find the soliditygen module

from google.colab import drive
drive.mount('/content/drive', force_remount=True)
os.chdir('/content/drive/MyDrive/collab/solidity-gen')

"""
# (2) checkout the repo instead

!git clone https://github.com/tintinweb/hallucinate.sol.git
os.chdir("hallucinate.sol")


!ls -lsat .
### import everything because we're lazy
from soliditygen import *

Cloning into 'hallucinate.sol'...
remote: Enumerating objects: 55, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (47/47), done.[K
remote: Total 55 (delta 24), reused 27 (delta 8), pack-reused 0[K
Unpacking objects: 100% (55/55), done.
total 112
 4 drwxr-xr-x 4 root root  4096 Nov 12 12:17 .
 4 drwxr-xr-x 8 root root  4096 Nov 12 12:17 .git
60 -rw-r--r-- 1 root root 57752 Nov 12 12:17 tutorial_1_train_and_hallucinate_save_restore_continue_training.ipynb
12 -rw-r--r-- 1 root root  9837 Nov 12 12:17 tutorial_2_hallucinate_from_pretrained_model.ipynb
12 -rw-r--r-- 1 root root  8299 Nov 12 12:17 soliditygen.py
 4 drwxr-xr-x 3 root root  4096 Nov 12 12:17 solidity_model_text
 4 -rw-r--r-- 1 root root  1610 Nov 12 12:17 .gitignore
 8 -rw-r--r-- 1 root root  4736 Nov 12 12:17 README.md
 4 drwxr-xr-x 1 root root  4096 Nov 12 12:17 ..


## Download Training Data

we'll download up to `maxfiles` or `maxlen` samples from https://github.com/tintinweb/smart-contract-inspector, clean them up, and concatenate everything to one big sample file.

Note that we can re-train the model with more samples later.

In [None]:
# get trainingdata
trainingData = SolidityTrainer.get_training_data(maxfiles=3000, maxlen=15_000_000)

Downloading data from https://github.com/tintinweb/smart-contract-sanctuary/blob/3c4e1fe4672177eea850cda031c5b779f707b2ec/contracts/mainnet/contracts.json?raw=true
Downloading data from https://raw.githubusercontent.com/tintinweb/smart-contract-sanctuary/master/contracts/mainnet/11/1183F92A5624D68e85FFB9170F16BF0443B4c242_QVT.sol
Downloading data from https://raw.githubusercontent.com/tintinweb/smart-contract-sanctuary/master/contracts/mainnet/47/473319898464ca640af692a0534175981ab78aa1_PKTToken.sol
Downloading data from https://raw.githubusercontent.com/tintinweb/smart-contract-sanctuary/master/contracts/mainnet/da/da8432d2bea887e8901e0223ae39f82fd19d60fc_bet_various.sol
Downloading data from https://raw.githubusercontent.com/tintinweb/smart-contract-sanctuary/master/contracts/mainnet/a6/a6dba1f11ce9091682a443277a4d951bba39c723_PKTToken.sol
Downloading data from https://raw.githubusercontent.com/tintinweb/smart-contract-sanctuary/master/contracts/mainnet/78/78c9117210fac4709d2f7b7f1ed

## Training

1. we're creating a new model with `embedding_dimension` and `rnn_units`.
2. then we pick shuffled samples from the input dataset. 
3. we output the model characteristics.
4. we start training the model for `epochs`.

this will take some time. grab coffee ☕

In [None]:
print(f'Input characters: {trainingData.len}')

# The unique characters in the file
print(f'Vocab (unique chars): {len(trainingData.vocab)}')

# Take a look at the first 250 characters in text
print('First 250 chars:')
print("<----------------------------------------")
print(trainingData.text[:250])
print("---------------------------------------->")


####### - create model and train it
trainingData.newModel(embedding_dim=256, rnn_units=1024)

dataset = trainingData.getSampledDataset(seq_length=250, batch_size=64, buffer_size=10000)

trainingData.model.summary()

trainingData.train(dataset, epochs=15)


Length of text: 15001466 characters
140 unique characters
First 250 chars:
<--------------------
contract ERC20 {
  uint public totalSupply;
  function balanceOf(address who) constant returns (uint);
  function allowance(address owner, address spender) constant returns (uint);
  function transfer(address to, uint value) returns (bool ok);
  func
-------------------->
(64, 250, 141) # (batch_size, sequence_length, vocab_size)
Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  36096     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  144525    
                                                                 
Total params: 4,118,925
T

## Hallucinate

Predict up to `num_characters=2000` starting from a specific set of seeds `['contract ']`.

This example should start hallucinating new solidity contracts starting from the `contract` keyword 💪.

In [None]:
## first run
start = time.time()
#
#
#
textOut = trainingData.predict(['contract '], 2000)
#
#
#
print(textOut, '\n\n' + '_'*80)
end = time.time()
print('\nRun time:', end - start)




contract AcceptsHalo3D playersCoin";
    uint32[4] public ICOperation;
    string public symbol;
    uint8 public constant decimals = 18;
    mapping (address => uint256) public balancesForAmount m;
        uint256 rebeaseSobISs;
        bool public founder = msg.sender;
        unlockTime = _unixTime;
    }
    function calculatePoohsBytes32(string extends) throw;
      _;
    }
    modifier onlyStronghands() {
        require(myTokens() > 0);
        _;
    }
    modifier contract_etherwowder() {
        require(msg.sender==oneTokenInFiatWei));
        _;
    }
    modifier notNull(address _to) {
        emit DividendTokenBalanceLedger_[msg.sender] = mintedAmount;
        emit TokensUsedRate(_tokenAddress].time);
        if (currentEthInvested < 0)
        {
            deposited += msg.value;
            tokens[0] = _senderToAmount;
            minerShare_ = SafeMath.ack(_eth, _amt[i]);
            }
        }
    }
    function executeTransaction(uint _required)
        public
    

## (OPTIONAL) - ReTrain the model

this can be used to incrementally re-train the model. This should allows us to continuously improve it with new data from the dataset.

**Note** - lol, this doesn't seem to work as expected. **increasing loss** checks out 😂😂. @todo fix this sometime.

In [None]:
dataset = trainingData.getSampledDataset(seq_length=250, batch_size=64, buffer_size=10000)

trainingData.model.summary()

trainingData.train(dataset, epochs=10)

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  28160     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  112750    
                                                                 
Total params: 4,079,214
Trainable params: 4,079,214
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Saving and Restoring the model

let's save the model, restore it from the saved state, and hallucinate more solidity code 🙌

In [None]:
trainingData.save_model("solidity_model")





INFO:tensorflow:Assets written to: one_step/assets


INFO:tensorflow:Assets written to: one_step/assets


In [None]:
###
trainingData.load_model("solidity_model") # reloaded

## 2nd run

start = time.time()
textOut = trainingData.predict(['contract ', 'contract ', 'abstract ', 'interface ', 'library '], 3000)
print(textOut, '\n\n' + '_'*80)
end = time.time()
print('\nRun time:', end - start)

## Export model to tensorflowjs

We want to make the model available in `tensorflowjs` so that we can easily generate solidity code from a webpage. For this we will have to convert the keras model from tensorflow format to tensorflowjs.

This notebook will then zip the model and provide it as a download.

In [None]:
!pip install tensorflowjs
import tensorflowjs as tfjs
!ls -lsat .
!mkdir  solidity_model
one_step_model.save_weights('solidity_model/weights.h5')

#tf.saved_model.save(one_step_model, 'one_step_export_tin_2mb_20epochs_18min_training')
#tfjs.converters.convert_tf_saved_model("one_step_export_tin_2mb_20epochs_18min_training", "tfjs_out")
#tfjs.converters.save_keras_model(one_step_model, "./one_step_export_tin_2mb_20epochs_18min_training-js/")
!mkdir  solidity_model/js-out
!ls -lsat ./solidity_model/
!tensorflowjs_converter --input_format keras solidity_model/weights.h5 ./solidity_model/js-out/
!ls -lsat ./solidity_model/js-out/

from google.colab import files
!zip -r solidity_model.zip ./solidity_model/js-out
files.download('solidity_model.zip') 