# 00 - Environment Setup and Verification

This notebook sets up the complete environment for smart contract vulnerability detection using ML.

**Goals:**
- Install and import all required libraries
- Verify GPU availability and compute environment
- Set random seeds for reproducibility
- Show project directory structure
- Install and test Slither & Mythril
- Save system information for reproducibility

## 1. System Information and Environment

In [1]:
import sys
import os
import platform
from datetime import datetime
import subprocess

print("=" * 50)
print("SYSTEM INFORMATION")
print("=" * 50)
print(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Platform: {platform.platform()}")
print(f"Python Version: {sys.version}")
print(f"Working Directory: {os.getcwd()}")
print(f"Python Executable: {sys.executable}")

SYSTEM INFORMATION
Date: 2025-11-10 22:48:34
Platform: Linux-6.14.0-32-generic-x86_64-with-glibc2.39
Python Version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
Working Directory: /home/virtualvasu/Desktop/sem5/ml_project/smart-contract-vuln-detector/notebooks
Python Executable: /home/virtualvasu/Desktop/sem5/ml_project/smart-contract-vuln-detector/venv/bin/python


## 2. Install Required Packages

In [6]:
# Install core ML and data science packages
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install transformers>=4.30.0
!pip install tokenizers>=0.13.0
!pip install scikit-learn>=1.3.0
!pip install pandas>=2.0.0
!pip install numpy>=1.24.0
!pip install matplotlib>=3.7.0
!pip install seaborn>=0.12.0
!pip install plotly>=5.14.0
!pip install datasets>=2.12.0

Looking in indexes: https://download.pytorch.org/whl/cu118
zsh:1: 4.30.0 not found
zsh:1: 4.30.0 not found
zsh:1: 0.13.0 not found
zsh:1: 0.13.0 not found
zsh:1: 1.3.0 not found
zsh:1: 1.3.0 not found
zsh:1: 2.0.0 not found
zsh:1: 2.0.0 not found
zsh:1: 1.24.0 not found
zsh:1: 1.24.0 not found
zsh:1: 3.7.0 not found
zsh:1: 3.7.0 not found
zsh:1: 0.12.0 not found
zsh:1: 0.12.0 not found
zsh:1: 5.14.0 not found
zsh:1: 5.14.0 not found
zsh:1: 2.12.0 not found
zsh:1: 2.12.0 not found


In [7]:
# Install additional tools for code analysis and visualization
%pip install tree-sitter
%pip install pygments
%pip install tqdm
%pip install wandb  # For experiment tracking (optional)
%pip install ipywidgets  # For interactive visualizations (correct package name)

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [10]:
# Install missing packages that might not have installed properly
%pip install seaborn
%pip install plotly
%pip install datasets

Collecting seaborn
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Using cached seaborn-0.13.2-py3-none-any.whl (294 kB)
Using cached seaborn-0.13.2-py3-none-any.whl (294 kB)
Installing collected packages: seaborn
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2
Successfully installed seaborn-0.13.2
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting plotly
Collecting plotly
  Using cached plotly-6.4.0-py3-none-any.whl.metadata (8.5 kB)
  Using cached plotly-6.4.0-py3-none-any.whl.metadata (8.5 kB)
Collecting narwhals>=1.15.1 (from plotly)
  Using cached narwhals-2.11.0-py3-none-any.whl.metadata (11 kB)
Collecting narwhals>=1.15.1 (from plotly)
  Using cached narwhals-2.11.0-py3-none-any.whl.metadata (11 kB)
Downloading plotly-6.4.0-py3-none-any.whl (9.9 MB)
[?25l   [38;5;237m‚îÅ‚îÅ‚îÅ‚

## 3. Import Libraries and Check Versions

In [11]:
# Core ML libraries
print("Importing PyTorch...")
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
print("‚úÖ PyTorch imported successfully")

# Transformers for CodeBERT
print("Importing transformers...")
try:
    from transformers import (
        AutoTokenizer, 
        AutoModel, 
        AutoModelForSequenceClassification,
        TrainingArguments,
        Trainer
    )
    print("‚úÖ Transformers imported successfully")
except ImportError as e:
    print(f"‚ùå Failed to import transformers: {e}")
    print("Installing transformers now...")
    import subprocess
    import sys
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'transformers>=4.30.0'])
    # Try importing again
    from transformers import (
        AutoTokenizer, 
        AutoModel, 
        AutoModelForSequenceClassification,
        TrainingArguments,
        Trainer
    )
    print("‚úÖ Transformers imported after installation")

# Data science libraries
print("Importing data science libraries...")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
print("‚úÖ Data science libraries imported")

# ML utilities
print("Importing ML utilities...")
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report
)
print("‚úÖ ML utilities imported")

# Utility libraries
print("Importing utility libraries...")
import json
import glob
import re
from pathlib import Path
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')
print("‚úÖ Utility libraries imported")

print("\nüéâ All libraries imported successfully!")

Importing PyTorch...
‚úÖ PyTorch imported successfully
Importing transformers...
‚úÖ Transformers imported successfully
Importing data science libraries...
‚úÖ Data science libraries imported
Importing ML utilities...
‚úÖ ML utilities imported
Importing utility libraries...
‚úÖ Utility libraries imported

üéâ All libraries imported successfully!


In [5]:
# Debug: Check Python environment and installed packages
import sys
import subprocess

print("üîç DEBUGGING ENVIRONMENT")
print("=" * 50)
print(f"Python executable: {sys.executable}")
print(f"Python path: {sys.path[:3]}...")  # Show first 3 paths

# Check if transformers is actually installed
try:
    result = subprocess.run([sys.executable, '-m', 'pip', 'list'], 
                          capture_output=True, text=True)
    if 'transformers' in result.stdout:
        print("‚úÖ transformers found in pip list")
        # Get the exact version
        for line in result.stdout.split('\n'):
            if line.startswith('transformers'):
                print(f"   {line}")
    else:
        print("‚ùå transformers NOT found in pip list")
        print("Installing transformers directly...")
        subprocess.run([sys.executable, '-m', 'pip', 'install', 'transformers>=4.30.0'])
        print("‚úÖ transformers installed")
except Exception as e:
    print(f"Error checking packages: {e}")

# Try importing transformers
try:
    import transformers
    print(f"‚úÖ transformers imported successfully: {transformers.__version__}")
except ImportError as e:
    print(f"‚ùå Failed to import transformers: {e}")
    # Try installing again with specific method
    print("Attempting to install transformers...")
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--upgrade', 'transformers'])

üîç DEBUGGING ENVIRONMENT
Python executable: /home/virtualvasu/Desktop/sem5/ml_project/smart-contract-vuln-detector/venv/bin/python
Python path: ['/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload']...
‚úÖ transformers found in pip list
   transformers             4.57.1
‚úÖ transformers imported successfully: 4.57.1
‚úÖ transformers found in pip list
   transformers             4.57.1
‚úÖ transformers imported successfully: 4.57.1


In [12]:
# Check versions of critical packages
import sklearn
import transformers
import matplotlib

print("=" * 50)
print("PACKAGE VERSIONS")
print("=" * 50)
print(f"PyTorch: {torch.__version__}")
print(f"Transformers: {transformers.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"NumPy: {np.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print(f"Matplotlib: {matplotlib.__version__}")
print(f"Seaborn: {sns.__version__}")

PACKAGE VERSIONS
PyTorch: 2.7.1+cu118
Transformers: 4.57.1
Pandas: 2.3.3
NumPy: 2.3.4
Scikit-learn: 1.7.2
Matplotlib: 3.10.6
Seaborn: 0.13.2


## 4. GPU and Compute Environment Check

In [13]:
print("=" * 50)
print("COMPUTE ENVIRONMENT")
print("=" * 50)

# Check CUDA availability
print(f"CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"GPU Count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
        print(f"  Memory: {torch.cuda.get_device_properties(i).total_memory / 1e9:.1f} GB")
    
    # Set default device
    device = torch.device("cuda")
    print(f"\nüöÄ Using GPU: {torch.cuda.get_device_name()}")
else:
    device = torch.device("cpu")
    print("‚ö†Ô∏è  Using CPU (GPU not available)")

print(f"\nDefault Device: {device}")

COMPUTE ENVIRONMENT
CUDA Available: False
‚ö†Ô∏è  Using CPU (GPU not available)

Default Device: cpu


## 5. Set Random Seeds for Reproducibility

In [14]:
import random

# Set random seeds
SEED = 42

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

print(f"‚úÖ Random seeds set to {SEED} for reproducibility")

# Set matplotlib style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

print("‚úÖ Plotting style configured")

‚úÖ Random seeds set to 42 for reproducibility
‚úÖ Plotting style configured


## 6. Project Directory Structure

In [15]:
# Display project structure
def show_directory_tree(path, prefix="", max_depth=3, current_depth=0):
    """Display directory tree structure"""
    if current_depth >= max_depth:
        return
    
    path = Path(path)
    items = sorted(path.iterdir(), key=lambda x: (x.is_file(), x.name))
    
    for i, item in enumerate(items):
        is_last = i == len(items) - 1
        current_prefix = "‚îî‚îÄ‚îÄ " if is_last else "‚îú‚îÄ‚îÄ "
        print(f"{prefix}{current_prefix}{item.name}")
        
        if item.is_dir() and not item.name.startswith('.') and current_depth < max_depth - 1:
            extension = "    " if is_last else "‚îÇ   "
            show_directory_tree(item, prefix + extension, max_depth, current_depth + 1)

print("=" * 50)
print("PROJECT DIRECTORY STRUCTURE")
print("=" * 50)
show_directory_tree(".", max_depth=4)

PROJECT DIRECTORY STRUCTURE
‚îî‚îÄ‚îÄ 00_setup_and_env.ipynb


In [16]:
# Create necessary directories if they don't exist
directories_to_create = [
    "data/processed",
    "models",
    "results/predictions",
    "results/visualizations",
    "results/metrics",
    "logs",
    "outputs"
]

for directory in directories_to_create:
    Path(directory).mkdir(parents=True, exist_ok=True)
    print(f"‚úÖ Created/verified directory: {directory}")

print("\nüìÅ All necessary directories created!")

‚úÖ Created/verified directory: data/processed
‚úÖ Created/verified directory: models
‚úÖ Created/verified directory: results/predictions
‚úÖ Created/verified directory: results/visualizations
‚úÖ Created/verified directory: results/metrics
‚úÖ Created/verified directory: logs
‚úÖ Created/verified directory: outputs

üìÅ All necessary directories created!


## 7. Test CodeBERT Model Loading

In [17]:
# Test CodeBERT model and tokenizer loading
print("=" * 50)
print("TESTING CODEBERT MODEL LOADING")
print("=" * 50)

try:
    # Load CodeBERT tokenizer
    model_name = "microsoft/codebert-base"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    print(f"‚úÖ CodeBERT tokenizer loaded: {model_name}")
    print(f"   Vocab size: {tokenizer.vocab_size}")
    print(f"   Max length: {tokenizer.model_max_length}")
    
    # Test tokenization with a simple code example
    test_code = """
    function transfer(address _to, uint256 _value) public {
        require(balances[msg.sender] >= _value);
        balances[msg.sender] -= _value;
        balances[_to] += _value;
    }
    """
    
    tokens = tokenizer(test_code, truncation=True, padding=True, return_tensors="pt")
    print(f"‚úÖ Test tokenization successful")
    print(f"   Input shape: {tokens['input_ids'].shape}")
    print(f"   Sample tokens: {tokenizer.decode(tokens['input_ids'][0][:20])}...")
    
    # Load model (just for testing - we'll train our own)
    model = AutoModel.from_pretrained(model_name)
    model = model.to(device)
    print(f"‚úÖ CodeBERT model loaded and moved to {device}")
    print(f"   Model parameters: {sum(p.numel() for p in model.parameters()):,}")
    
    # Clean up memory
    del model, tokenizer, tokens
    torch.cuda.empty_cache() if torch.cuda.is_available() else None
    
except Exception as e:
    print(f"‚ùå Error loading CodeBERT: {e}")
    print("   This might be due to network issues or missing dependencies")

TESTING CODEBERT MODEL LOADING
‚úÖ CodeBERT tokenizer loaded: microsoft/codebert-base
   Vocab size: 50265
   Max length: 512
‚úÖ Test tokenization successful
   Input shape: torch.Size([1, 87])
   Sample tokens: <s>
    function transfer(address _to, uint256 _value) public {
...
‚úÖ CodeBERT tokenizer loaded: microsoft/codebert-base
   Vocab size: 50265
   Max length: 512
‚úÖ Test tokenization successful
   Input shape: torch.Size([1, 87])
   Sample tokens: <s>
    function transfer(address _to, uint256 _value) public {
...
‚úÖ CodeBERT model loaded and moved to cpu
   Model parameters: 124,645,632
‚úÖ CodeBERT model loaded and moved to cpu
   Model parameters: 124,645,632


## 8. Install and Test Slither

In [None]:
# Install Slither for static analysis
print("=" * 50)
print("INSTALLING SLITHER")
print("=" * 50)

try:
    # Install slither-analyzer
    !pip install slither-analyzer
    print("‚úÖ Slither installed successfully")
    
    # Test slither installation
    result = subprocess.run(['slither', '--version'], capture_output=True, text=True)
    if result.returncode == 0:
        print(f"‚úÖ Slither version: {result.stdout.strip()}")
    else:
        print(f"‚ö†Ô∏è  Slither version check failed: {result.stderr}")
        
except Exception as e:
    print(f"‚ùå Error installing Slither: {e}")
    print("   You may need to install it manually or check system requirements")

In [None]:
# Test Slither on a sample contract
sample_contract = """
pragma solidity ^0.8.0;

contract VulnerableContract {
    mapping(address => uint256) public balances;
    
    function transfer(address _to, uint256 _value) public {
        // Vulnerable to integer overflow (in older Solidity versions)
        balances[msg.sender] -= _value;
        balances[_to] += _value;
    }
    
    function withdraw() public {
        uint256 amount = balances[msg.sender];
        // Vulnerable to reentrancy
        (bool success, ) = msg.sender.call{value: amount}("");
        require(success);
        balances[msg.sender] = 0;
    }
}
"""

# Save sample contract for testing
with open("test_contract.sol", "w") as f:
    f.write(sample_contract)

try:
    # Run Slither on sample contract
    result = subprocess.run(
        ['slither', 'test_contract.sol', '--json', '-'], 
        capture_output=True, text=True, timeout=30
    )
    
    if result.returncode == 0 and result.stdout:
        slither_output = json.loads(result.stdout)
        print(f"‚úÖ Slither analysis completed")
        print(f"   Detectors found: {len(slither_output.get('results', {}).get('detectors', []))} issues")
    else:
        print(f"‚ö†Ô∏è  Slither test failed: {result.stderr}")
        
except Exception as e:
    print(f"‚ö†Ô∏è  Slither test error: {e}")

# Clean up
if os.path.exists("test_contract.sol"):
    os.remove("test_contract.sol")

## 9. Install and Test Mythril

In [None]:
# Install Mythril for symbolic execution analysis
print("=" * 50)
print("INSTALLING MYTHRIL")
print("=" * 50)

try:
    # Install mythril
    !pip install mythril
    print("‚úÖ Mythril installed successfully")
    
    # Test mythril installation
    result = subprocess.run(['myth', 'version'], capture_output=True, text=True)
    if result.returncode == 0:
        print(f"‚úÖ Mythril version: {result.stdout.strip()}")
    else:
        print(f"‚ö†Ô∏è  Mythril version check failed: {result.stderr}")
        
except Exception as e:
    print(f"‚ùå Error installing Mythril: {e}")
    print("   Mythril requires additional system dependencies (solc, etc.)")

## 10. Save System Information

In [18]:
# Collect and save system information for reproducibility
system_info = {
    "timestamp": datetime.now().isoformat(),
    "platform": platform.platform(),
    "python_version": sys.version,
    "working_directory": os.getcwd(),
    "cuda_available": torch.cuda.is_available(),
    "cuda_version": torch.version.cuda if torch.cuda.is_available() else None,
    "device_count": torch.cuda.device_count() if torch.cuda.is_available() else 0,
    "packages": {
        "torch": torch.__version__,
        "transformers": transformers.__version__,
        "pandas": pd.__version__,
        "numpy": np.__version__,
        "sklearn": sklearn.__version__,
    },
    "random_seed": SEED
}

if torch.cuda.is_available():
    system_info["gpu_info"] = [
        {
            "id": i,
            "name": torch.cuda.get_device_name(i),
            "memory_gb": torch.cuda.get_device_properties(i).total_memory / 1e9
        }
        for i in range(torch.cuda.device_count())
    ]

# Save system info
with open("outputs/system_info.json", "w") as f:
    json.dump(system_info, f, indent=2)

print("üíæ System information saved to outputs/system_info.json")
print("\n" + json.dumps(system_info, indent=2))

üíæ System information saved to outputs/system_info.json

{
  "timestamp": "2025-11-10T22:54:41.134765",
  "platform": "Linux-6.14.0-32-generic-x86_64-with-glibc2.39",
  "python_version": "3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]",
  "working_directory": "/home/virtualvasu/Desktop/sem5/ml_project/smart-contract-vuln-detector/notebooks",
  "cuda_available": false,
  "cuda_version": null,
  "device_count": 0,
  "packages": {
    "torch": "2.7.1+cu118",
    "transformers": "4.57.1",
    "pandas": "2.3.3",
    "numpy": "2.3.4",
    "sklearn": "1.7.2"
  },
  "random_seed": 42
}


## 11. Environment Setup Summary

In [19]:
print("=" * 60)
print("üéâ ENVIRONMENT SETUP COMPLETE!")
print("=" * 60)

summary = f"""
‚úÖ Setup Status:
   ‚Ä¢ Python {sys.version.split()[0]} configured
   ‚Ä¢ PyTorch {torch.__version__} installed
   ‚Ä¢ Transformers {transformers.__version__} ready
   ‚Ä¢ Device: {device}
   ‚Ä¢ Random seed: {SEED}
   ‚Ä¢ CodeBERT model tested
   ‚Ä¢ Project directories created
   ‚Ä¢ System info saved

üöÄ Ready for next steps:
   1. Data acquisition and overview (01_data_acquisition_and_overview.ipynb)
   2. Preprocessing and function extraction
   3. Tokenization and dataset creation
   4. Model training and evaluation

üìÅ Key directories:
   ‚Ä¢ notebooks/ - Jupyter notebooks
   ‚Ä¢ data/processed/ - Processed datasets
   ‚Ä¢ models/ - Trained model checkpoints
   ‚Ä¢ results/ - Evaluation results and visualizations
   ‚Ä¢ outputs/ - Generated outputs and logs
"""

print(summary)

# Save summary to file
with open("outputs/setup_summary.txt", "w") as f:
    f.write(summary)

print("\nüíæ Setup summary saved to outputs/setup_summary.txt")
print("\nüéØ Environment is ready! You can now proceed to the next notebook.")

üéâ ENVIRONMENT SETUP COMPLETE!

‚úÖ Setup Status:
   ‚Ä¢ Python 3.12.3 configured
   ‚Ä¢ PyTorch 2.7.1+cu118 installed
   ‚Ä¢ Transformers 4.57.1 ready
   ‚Ä¢ Device: cpu
   ‚Ä¢ Random seed: 42
   ‚Ä¢ CodeBERT model tested
   ‚Ä¢ Project directories created
   ‚Ä¢ System info saved

üöÄ Ready for next steps:
   1. Data acquisition and overview (01_data_acquisition_and_overview.ipynb)
   2. Preprocessing and function extraction
   3. Tokenization and dataset creation
   4. Model training and evaluation

üìÅ Key directories:
   ‚Ä¢ notebooks/ - Jupyter notebooks
   ‚Ä¢ data/processed/ - Processed datasets
   ‚Ä¢ models/ - Trained model checkpoints
   ‚Ä¢ results/ - Evaluation results and visualizations
   ‚Ä¢ outputs/ - Generated outputs and logs


üíæ Setup summary saved to outputs/setup_summary.txt

üéØ Environment is ready! You can now proceed to the next notebook.
