# English to Kiswahili Translation using RNNs
This notebook walks through the process of building a Recurrent Neural Network (RNN) to translate English text to Kiswahili. The steps included are data preprocessing, model training, and evaluation.

## 0: Setting Up a Virtual Environment:

### Setup on Windows:

In [None]:
# Install virtualenv if it is not installed
!pip install virtualenv

# Create a virtual environment
!virtualenv venv

# Activate the virtual environment
!venv\Scripts\activate

# Now, you can install dependencies within this environment
# Note: Use `deactivate` to exit the virtual environment when you are done.

### Setup on Linux:

In [None]:
# Install virtualenv if it is not installed
!pip install virtualenv

# Create a virtual environment
!virtualenv venv

# Activate the virtual environment
!source venv/bin/activate

# Now, you can install dependencies within this environment
# Note: Use `deactivate` to exit the virtual environment when you are done.

## 1. Installing Requirements:

In [None]:
# Run this cell to install required packages.
# Note: This step assumes that you have already set up a Python environment.

!pip install -r requirements.txt

## 2. Import Libraries:

In [None]:
import numpy as np
from keras.models import Model
from keras.layers import Input, LSTM, Dense, Embedding
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.callbacks import ModelCheckpoint

## 3. Data Preparation:

In [None]:
def load_data(filepath):
    with open(filepath, encoding='utf-8') as file:
        lines = file.read().split('\n')
    pairs = [line.split('\t') for line in lines if line != ""]
    return [pair[0] for pair in pairs], ['\t' + pair[1] + '\n' for pair in pairs]  # Add start and end tokens

def tokenize(texts):
    tokenizer = Tokenizer(char_level=True)
    tokenizer.fit_on_texts(texts)
    return tokenizer

def preprocess_data(eng_texts, swa_texts):
    eng_tokenizer = tokenize(eng_texts)
    swa_tokenizer = tokenize(swa_texts)
    eng_sequences = eng_tokenizer.texts_to_sequences(eng_texts)
    swa_sequences = swa_tokenizer.texts_to_sequences(swa_texts)
    eng_data = pad_sequences(eng_sequences, padding='post')
    swa_data = pad_sequences(swa_sequences, padding='post')
    return eng_data, swa_data, eng_tokenizer, swa_tokenizer

# Load and preprocess data
train_eng_texts, train_swa_texts = load_data('data/train.txt')
eng_data, swa_data, eng_tokenizer, swa_tokenizer = preprocess_data(train_eng_texts, train_swa_texts)

## 4. Model Building:

In [None]:
def create_model(num_encoder_tokens, num_decoder_tokens, latent_dim=256):
    # Encoder
    encoder_inputs = Input(shape=(None,))
    encoder_embedding = Embedding(num_encoder_tokens, latent_dim, mask_zero=True)
    encoder = LSTM(latent_dim, return_state=True)
    encoder_outputs, state_h, state_c = encoder(encoder_embedding(encoder_inputs))
    encoder_states = [state_h, state_c]

    # Decoder
    decoder_inputs = Input(shape=(None,))
    decoder_embedding = Embedding(num_decoder_tokens, latent_dim, mask_zero=True)
    decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_outputs, _, _ = decoder_lstm(decoder_embedding(decoder_inputs), initial_state=encoder_states)
    decoder_dense = Dense(num_decoder_tokens, activation='softmax')
    decoder_outputs = decoder_dense(decoder_outputs)

    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

model = create_model(eng_tokenizer.num_words + 1, swa_tokenizer.num_words + 1)

## 5. Model Training:

In [None]:
# Define training configurations
batch_size = 64
epochs = 30

# Training the model
model.fit([eng_data, swa_data[:, :-1]], np.expand_dims(swa_data[:, 1:], -1),
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)

## 6. Save the Model:

In [None]:
model.save('models/rnn_model.h5')

## 7. Model Evaluation:

In [None]:
# Evaluation logic, potentially using a BLEU score or similar metric
# Placeholder for actual evaluation code
print("Evaluation results: Model performs with an accuracy of X%")

# Conclusion
This notebook guides you through the process of setting up an RNN for translating English to Kiswahili, training the model, and evaluating its performance.