# **Ethereum Smart Contracts Security Analysis**

In this notebook we are going to analyze the state of the abilities of **LLMs** in the context of code generation, particularly for the generation of smart contracts in Solidity, in the Ethereum blockchain.

---

(*At this moment we are using a dataset and we are no more genrating the prompt ourself, but there's also the old code for prompt generation starting from a dataset containing smart contracts in solidity*)

**Hugginface Dataset**: *https://huggingface.co/datasets/braindao/Solidity-Dataset*

The steps will be the following:

1.   *Import contracts and turn them into prompt* (**OPTIONAL**)

1.   Import dataset containing the prompts for the generation

2.   Setup OpenAI with API key

3.   Setup a text to give as input to the LLM coder containing instructions for the generation

4.   **Generate smart contracts**





In [None]:
%%capture

from google.colab import drive
drive.mount('/content/drive')

#UNCOMMENT FIRST THREE LINES IF YOU ARE USING CONTRACT DATASET FOR PROMPT GENERATION

#!unzip /content/drive/MyDrive/Ethereum_smart_contract_datast

#import shutil
#shutil.rmtree('/content/__MACOSX')
#shutil.rmtree('/content/Ethereum_smart_contract_datast')

**Setup OpenAI with his API key**

In [None]:
%%capture

!pip install openai
from openai import OpenAI

client = OpenAI(
    api_key = "sk-proj-26o9GRGD0cW0A9kqvPU1T3BlbkFJ6p9RZh3XMndU7Tvoauym",
)

## **Contract collection**

The following cell is used if we are going to generate the prompts ourself.
To do so we load a dataset taken from Github containing a huge amount of smart contracts written for Ethereum in Solidity. We collect a fixed number of contracts to save in local in order to pass them to the prompt generator.

**YOU CAN SKIP IT IF YOU ARE USING A PROMPT DATASET**



In [None]:
import os

contract_codes = []
#The dataset is on my Google Drive
source_directory = '/content/Ethereum_smart_contract_datast/contract_dataset_github/'
codes_collected = 0
codes_needed = 500 #To increase the dataset size

for root, dirs, files in os.walk(source_directory):
    for file in files:
        if file.endswith('.sol'):
            file_path = os.path.join(root, file)
            with open(file_path, 'r') as file:
                contract_codes.append(file.read()) #Add the contract code to the array codes_collected
                codes_collected += 1
                if codes_collected >= codes_needed:
                    break

    #Stop we you reached the requested number of contracts
    if codes_collected >= codes_needed:
        break

#print(f"Contracts collected: {len(contract_codes)}")

In [None]:
#IF YOU WANT TO SEE ONE OF THE CONTRACTS COLLECTED
index_to_print = 100

if 0 <= index_to_print < len(contract_codes):
    print(f"Contract code {index_to_print + 1}:")
    print(contract_codes[index_to_print])
else:
    print("Invalid index")


## **PROMPT GENERATOR**

In this section we proceed to generate the prompts for the LLMs.

**YOU CAN SKIP IT IF YOU ARE USING A PROMPT DATASET**

To do so we defined two different instructions, which gives different level of deepness to the contract generated. **At this moment** the preferred one is **instructions1**

In [None]:
instructions1 = """
Generate a prompt for an AI model for creating a contract starting from the smart contract code the user gives you.
Underline that it should be in solidity
The prompt should be a description of the contract, stating the purpose of it and its rules.
If I give the prompt to another AI model it should generate pretty much the same contract.
Specify the rules, the purpose and the general description of the code.
"""

instructions2 = """
Generate a prompt for creating a contract starting from the smart contract code the user gives you.
Underline that it should be in solidity
The prompt should be a description of the contract, stating the purpose of it and its rules.
If I give the prompt to another AI model it should generate pretty much the same contract.
It should be a simple description
Not possible to modify or add anything with respect to the given prompt
"""

We give to the generator two parameters:


*   The instructions for the generation
*   The smart contract that should be turned into prompt



In [None]:
def prompt_generatorv1(smartcontract, generator_instructions):
    response = client.chat.completions.create(
        model = "gpt-3.5-turbo",
        messages = [
          {"role": "system", "content": generator_instructions},
          {"role": "user", "content": smartcontract},
        ]
    )

    generated_prompt = response.choices[0].message.content.strip()

    return generated_prompt

**GENERATE 500 PROMPTS SELECTING A RANDOM CONTRACT AND TURNING IT INTO PROMPT**

In [None]:
import random

prompts = []
how_many_prompts = 500

for _ in range(how_many_prompts):
    #Random contract selection
    contract_index = random.randint(0, len(contract_codes) - 1)
    selected_contract = contract_codes[contract_index]

    prompts.append(prompt_generatorv1(selected_contract, instructions1))


**CELL USED TO PRINT A SELECTED PROMPT JUST TO CHECK**

In [None]:
index_to_print = 14

if 0 <= index_to_print < len(prompts):
    print(f"My generated prompt {index_to_print + 1}:")
    print(prompts[index_to_print])
else:
    print("Invalid index")

**PROMPT DATASET EXPORT AS A CSV FILE**

In [None]:
data = {"Prompt": prompts}
df = pd.DataFrame(data)

#Exporting dataset
csv_file_path = "/content/test_prompts_dataset.csv"  #UNDER CONTRUCTION
df.to_csv(csv_file_path, index=False)

print(f"DATASET SAVED IN {csv_file_path}")


##**DATASET LOADING FROM HUGGINFACE**

In this section we load the dataset from Hugginface, the link is available at the beginning of this notebook

In [None]:
import pandas as pd
from IPython.display import display

prompt_dataset = pd.read_parquet("hf://datasets/braindao/Solidity-Dataset/SolidityP.parquet")
#display(prompt_dataset[['average']])

**CELL TO CHECK THE CONTENT OF THE DATASET, WE ARE USING COLUMN AVERAGE**

In [None]:
sample_row = prompt_dataset.sample(n=1).iloc[0]
print(sample_row['average'])

Develop a smart contract that conforms to the IERC20 standard. Implement the totalSupply, balanceOf, transfer, approve, transferFrom, allowance functions. Use the SafeMath library to prevent arithmetic overflows. Emit events when transfers and approvals occur.


## **CODE GENERATION WITH GPT-4**

In this section we proceed to actually generate the code for the smart contracts, using GPT4 model. To do so we define the instructions to give to the coder and we decide how many contracts we want to generate, in our case we are going to generate 500 smart contracts




In [None]:
#CODER INSTRUCTIONS

coder = """
Give me a deployable smart contract code in solidity, using latest compiler version, based on the prompt
The file should contain ONLY solidity code, no comments or "```sol"
I should be able to copy your response and paste it in a sol file to deploy

"""

**CODE GENERATOR PARAMETERS**

*   **Coder instructions** defined above
*   **Prompt** used for the generation



In [None]:
def code_generator(prompt, coder_instructions):
    response = client.chat.completions.create(
        model = "gpt-4",
        messages = [
          {"role": "system", "content": coder_instructions},
          {"role": "user", "content": prompt},
        ]
    )

    generated_contract = response.choices[0].message.content.strip()

    return generated_contract

contracts_to_generate = 1 #SELECT NUMBER OF CONTRACTS NEEDED
for i in range(contracts_to_generate):
    contract = code_generator(prompt_dataset.loc[0, 'average'], coder) #Need to change the index of df.loc[] for multi contract generation

#PRINT THE PROMPT USED
print("Prompt selected: \n\n")
print(prompt_dataset.loc[0, 'average'])
print("\n\n")

#RINT THE CONTRACT GENERATED
print("Contract generated: \n\n")
print(contract)

Prompt selected: 


Create a smart contract that builds upon the WETHOmnibridgeRouter contract by adding features for account registration, token wrapping, and relay. The contract should integrate with the Omnibridge and WETH contracts. Include methods for registering and wrapping tokens, as well as functionality for relaying tokens to specific recipients. The contract should emit events upon successful token wrapping and relaying. Consider implementing  error handling and validation checks for user input.



Contract generated: 


pragma solidity ^0.8.9;

import "@openzeppelin/contracts/token/ERC20/IERC20.sol";
import "@openzeppelin/contracts/access/Ownable.sol";

interface IWETH {
    function deposit() external payable;
    function withdraw(uint wad) external;
    function balanceOf(address guy) external returns (uint);
    function approve(address guy, uint wad) external returns (bool);
}

interface IOmnibridge {
    function relayTokens(
        IERC20 _token,
        address _re

**SAVE THE CONTRACT GENERATED IN A .sol FILE**

In [None]:
file_path = "/content/generated_contract.sol"
with open(file_path, "w") as file:
    file.write(contract)

print(f"CONTRACT SAVED: {file_path}")

CONTRACT SAVED: /content/generated_contract.sol


## **CODE GENERATION WITH DEEPSEEK-CODER**

In this section we proceed to actually generate the code for the smart contracts, using DeepSeek-Coder model. To do so we define the instructions to give to the coder and we decide how many contracts we want to generate, in our case we are going to generate 500 smart contracts

In [None]:
#Remove the opening "```solidity" and closing "```" delimiters

def remove_code_delimiters(generated_code):
    #Split into lines
    lines = generated_code.split('\n')

    #Remove the opening delimiter if it's in the first line
    if lines[0].strip() == "```solidity":
        lines = lines[1:]

    #Remove the closing delimiter if it's in the last line
    if lines[-1].strip() == "```":
        lines = lines[:-1]

    #Join the lines back together
    cleaned_code = '\n'.join(lines)
    return cleaned_code

In [None]:
deepseek_api_endpoint = "https://api.deepseek.com"
deepseek_api_key = "sk-3d5d08bce2524a879b71f77842deab28"

deepseek_coder_instructions = """I will provide you a prompt to generate a smart contract.
                                  The contract should be made only in SOLIDITY and it must be ready to deploy
                                  only code , DO NOT PUT ```solidity" AT THE BEGINNING OF THE RESPONSE NEITHER ``` AT THE END.
                                  I want a fully deployable file with ONLY CODE."""

deepseek_prompt = "Generate a simple hello world code in Python"

deepseek_client = OpenAI(api_key=deepseek_api_key, base_url=deepseek_api_endpoint)

response = deepseek_client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": deepseek_coder_instructions},
        {"role": "user", "content": prompt_dataset.loc[5, 'average']},
    ],
    stream=False
)

print("Prompt from dataset: ", prompt_dataset.loc[5, 'average'])
print("Generated code:\n", response.choices[0].message.content)
print("\nGenerated code without delimiters:\n", remove_code_delimiters(response.choices[0].message.content))

Prompt from dataset:  Design a smart contract that distributes tokens based on a time-vesting model with multiple claimable steps and a TGE event. The contract should allow users to claim tokens based on their proof of ownership and meet specific eligibility criteria. Incorporate a token, Merkle root, and TGE timestamp. Implement functions for checking and claiming tokens, utilizing import statements for OpenZeppelin libraries. Ensure events are triggered for claim, TGE, and step claim. The contract should have mappings for investors, categories, and claim states.
Generated code:
 ```solidity
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

import "@openzeppelin/contracts/token/ERC20/IERC20.sol";
import "@openzeppelin/contracts/access/Ownable.sol";
import "@openzeppelin/contracts/utils/cryptography/MerkleProof.sol";

contract TokenVesting is Ownable {
    IERC20 public token;
    bytes32 public merkleRoot;
    uint256 public tgeTimestamp;

    struct VestingStep {
        uint2