Natural Language to Python Code Converter

A Python implementation for converting natural language descriptions into Python code using fine-tuned StarCoder2 and CodeT5 models.

🚀 Features

Dual Model Support: Works with both StarCoder2 (open-access) and CodeT5 models
Fine-tuning Capabilities: Customize models for your specific use cases
Batch Processing: Convert multiple descriptions at once
Error Handling: Robust error handling and dependency checking
Sample Data: Built-in sample dataset for quick testing
Flexible Architecture: Easy to extend and modify

📋 Requirements

System Requirements

Python 3.8 or higher
At least 8GB RAM (16GB recommended for training)
CUDA-compatible GPU (optional, for faster training)

Dependencies

torch>=1.9.0
transformers>=4.36.0
datasets>=2.0.0
accelerate>=0.20.0
sentencepiece>=0.1.99

🛠️ Installation

Option 1: Automatic Installation

git clone https://github.com/Krypto-Hashers-Community/Natural-language-to-Python-automation
cd Natural-language-to-Python-automation
pip install -r requirements.txt

Option 2: Manual Installation

Install PyTorch

# For CPU only
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Install other dependencies

pip install transformers datasets accelerate sentencepiece

Option 3: Using Conda

# Create a new environment
conda create -n nltocode python=3.9
conda activate nltocode

# Install PyTorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# Install other packages
conda install transformers datasets sentencepiece -c huggingface
pip install accelerate

🚀 Quick Start

Basic Usage

from nl_to_code_converter import NLToCodeConverter

# Initialize the converter with StarCoder2 or CodeT5
converter = NLToCodeConverter("starcoder2")  # or "codet5"

# Convert natural language to Python code
code = converter.convert("Create a function that adds two numbers")
print(code)

Fine-tuning Example

# Prepare your training data
training_data = [
    {
        "natural_language": "Create a function that adds two numbers",
        "python_code": "def add_numbers(a, b):\n    return a + b"
    },
    # Add more examples...
]

# Fine-tune the model
converter = NLToCodeConverter("starcoder2")
converter.train(training_data, output_dir="./my_finetuned_model", epochs=3)

# Use the fine-tuned model
code = converter.convert("Write a function to multiply two numbers")

Batch Processing

descriptions = [
    "Create a list of even numbers from 0 to 20",
    "Write a function to check if a number is prime",
    "Create a dictionary with fruit names and colors"
]

codes = converter.batch_convert(descriptions)
for desc, code in zip(descriptions, codes):
    print(f"Input: {desc}")
    print(f"Output: {code}")
    print("-" * 50)

📊 Model Comparison

Feature	StarCoder2-3B	CodeT5
Architecture	Decoder-only (GPT-style)	Encoder-Decoder (T5-style)
Access	Open-access	Open-access
Training	Causal Language Modeling	Sequence-to-Sequence
Best For	Code completion, generation	Code translation, summarization
Context Length	Up to 16K tokens	512 tokens
Memory Usage	Moderate	Lower
Speed	Moderate	Faster

🔧 Configuration

Model Parameters

# StarCoder2 Configuration
starcoder_converter = NLToCodeConverter("starcoder2")
starcoder_converter.fine_tuner.model_name = "bigcode/starcoder2-3b"

# CodeT5 Configuration  
codet5_converter = NLToCodeConverter("codet5")
codet5_converter.fine_tuner.model_name = "Salesforce/codet5-base"

Training Parameters

converter.train(
    training_data=your_data,
    output_dir="./custom_model",
    epochs=5,                    # Number of training epochs
    batch_size=4,                # Batch size for training
    learning_rate=5e-5,          # Learning rate
    max_length=512               # Maximum sequence length
)

📁 Data Format

Training Data Structure

[
    {
        "natural_language": "Create a function that calculates factorial",
        "python_code": "def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n-1)"
    },
    {
        "natural_language": "Write a loop to print numbers 1 to 10",
        "python_code": "for i in range(1, 11):\n    print(i)"
    }
]

Loading Custom Data

# From JSON file
converter.load_training_data("path/to/your/data.json")

# From Python list
custom_data = [
    {"natural_language": "...", "python_code": "..."},
]
converter.train(custom_data)

🔍 Troubleshooting

Common Issues

1. PyTorch Installation Error

ModuleNotFoundError: No module named 'torch._C'

Solution:

pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

2. CUDA Out of Memory

RuntimeError: CUDA out of memory

Solutions:

Reduce batch size: per_device_train_batch_size=1
Use gradient accumulation: gradient_accumulation_steps=8
Use CPU training: Set device_map=None

3. Model Download Issues

HTTPError: 403 Client Error

Solutions:

Check internet connection
Try using a VPN
Use local model files

Dependency Check

from nl_to_code_converter import check_dependencies
check_dependencies()

📈 Performance Tips

For Better Code Generation:

Use specific, clear descriptions

# Good
"Create a function that takes a list of numbers and returns the sum"

# Less specific
"Make a function for adding"

Include context in your training data

training_data = [
    {"natural_language": "Create a class with constructor", "python_code": "class MyClass:\n    def __init__(self, value):\n        self.value = value"},
    {"natural_language": "Write error handling code", "python_code": "try:\n    pass\nexcept Exception as e:\n    print(f'Error: {e}')"}
]

Fine-tune on domain-specific data
- Web scraping code examples
- Data science snippets
- API integration patterns

For Faster Training:

Use smaller models for prototyping
Implement early stopping
Use mixed precision training
Enable gradient checkpointing

🧪 Testing

Run Basic Tests

python -c "from nl_to_code_converter import check_dependencies; check_dependencies()"

Test Code Generation

converter = NLToCodeConverter("starcoder2")
sample_data = converter.create_sample_data()
print(f"Sample data loaded: {len(sample_data)} examples")

# Test conversion
result = converter.convert("Create a simple calculator function")
print(f"Generated code: {result}")

📚 Examples

Example 1: Data Processing

# Input
description = "Create a function that filters even numbers from a list"

# Output
def filter_even_numbers(numbers):
    return [num for num in numbers if num % 2 == 0]

Example 2: File Operations

# Input
description = "Write code to read a CSV file and print the first 5 rows"

# Output
import pandas as pd
df = pd.read_csv('file.csv')
print(df.head())

Example 3: API Integration

# Input
description = "Create a function to make a GET request to an API"

# Output
import requests

def make_api_request(url):
    response = requests.get(url)
    return response.json()

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

git clone <repository-url>
cd natural-language-to-python-converter
pip install -e .
pip install -r requirements-dev.txt

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

StarCoder2 by BigCode
CodeT5 by Salesforce
Hugging Face Transformers

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: bhowmicksaurav28@gmail.com

🔮 Roadmap

Support for more programming languages (JavaScript, Java, C++)
Web interface for easier usage
Integration with popular IDEs
Model quantization for mobile deployment
Real-time code suggestion API
Code explanation and documentation generation

Made with ❤️ for the developer community

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
NL2Python.py		NL2Python.py
README.md		README.md
requirements.txt		requirements.txt

License

Krypto-Hashers-Community/Natural-language-to-Python-automation

Folders and files

Latest commit

History

Repository files navigation