### Importing Necessary Libraries
- For numerical operations and array handling
- For plotting and visualizing data

In [None]:
import numpy as np
import matplotlib.pyplot as plt

### AutonomousCar Class

The `AutonomousCar` class implements a basic Q-learning algorithm for an autonomous car's decision-making process.

- **`__init__`**: Initializes the Q-table and sets the learning rate, discount factor, and exploration rate (`epsilon`).
- **`choose_action`**: Selects an action based on the current state using an ε-greedy policy (either explore randomly or exploit the best-known action).
- **`learn`**: Updates the Q-table using the Q-learning formula to reinforce the action taken based on the received reward and the next state.

This class is essential for training the car to navigate its environment by learning from rewards and penalties.


In [None]:
class AutonomousCar:
    def __init__(self, states, actions, learning_rate=0.1, discount_factor=0.9, epsilon=0.1):
        self.q_table = np.zeros((states, actions))
        self.learning_rate = learning_rate
        self.discount_factor = discount_factor
        self.epsilon = epsilon

    def choose_action(self, state):
        if np.random.uniform(0, 1) < self.epsilon:
            return np.random.choice(self.q_table.shape[1])  # Explore
        else:
            return np.argmax(self.q_table[state, :])  # Exploit

    def learn(self, state, action, reward, next_state):
        predict = self.q_table[state, action]
        target = reward + self.discount_factor * np.max(self.q_table[next_state, :])
        self.q_table[state, action] += self.learning_rate * (target - predict)

### DrivingEnvironment Class

The `DrivingEnvironment` class simulates a simplified driving environment where a car must navigate a lane while avoiding obstacles.

- **`__init__`**: Initializes the environment with a lane of a given size and defines the possible actions (`left`, `stay`, `right`).
- **`reset`**: Resets the environment by placing the car in the center and randomly positioning an obstacle.
- **`step`**: Executes an action to move the car left, right, or stay. It then checks for collisions or successful navigation, providing a corresponding reward and indicating whether the episode is done.

This class is crucial for testing and training the autonomous car in a controlled environment.


In [None]:
class DrivingEnvironment:
    def __init__(self, size=5):
        self.size = size
        self.actions = ['left', 'stay', 'right']
        self.reset()

    def reset(self):
        self.car_position = self.size // 2
        self.obstacle_position = np.random.randint(0, self.size)
        return self.car_position

    def step(self, action):
        # Move car
        if action == 0:  # left
            self.car_position = max(0, self.car_position - 1)
        elif action == 2:  # right
            self.car_position = min(self.size - 1, self.car_position + 1)

        # Check for collision or successful navigation
        if self.car_position == self.obstacle_position:
            reward = -10  # Collision
            done = True
        elif self.car_position == self.size // 2:
            reward = 1  # Stayed in the middle lane
            done = False
        else:
            reward = -1  # Moved away from middle lane
            done = False

        return self.car_position, reward, done

### visualize_environment Function

The `visualize_environment` function provides a visual representation of the driving environment.

- **`ax`**: The matplotlib axes object where the environment is drawn.
- **`env`**: The current instance of the `DrivingEnvironment`.
- **`car_position`**: The current position of the car within the environment.
- **`action`** (optional): The action taken by the car (e.g., move left, stay, move right).
- **`reward`** (optional): The reward received for the current action.
- **`total_reward`**: The cumulative reward collected so far.

The function visualizes the road, with the car (`C`) and obstacle (`X`) positions, and displays information such as the action taken, the reward earned, and the total reward.


In [None]:

def visualize_environment(ax, env, car_position, action=None, reward=None, total_reward=0):
    ax.clear()
    road = ['_'] * env.size
    road[env.obstacle_position] = 'X'
    road[car_position] = 'C'
    
    ax.set_xlim(-0.5, env.size - 0.5)
    ax.set_ylim(-1, 1)
    ax.set_yticks([])
    ax.set_xticks(range(env.size))
    ax.set_xticklabels(road)
    
    ax.axhline(y=0, color='k', linestyle='-', linewidth=2)
    ax.plot(car_position, 0, 'bo', markersize=20, label='Car')
    ax.plot(env.obstacle_position, 0, 'rx', markersize=20, label='Obstacle')
    
    if action is not None:
        ax.set_title(f"Action: {env.actions[action]}", fontsize=16)
    if reward is not None:
        color = 'green' if reward > 0 else 'red'
        ax.text(env.size/2, 0.5, f"Reward: {reward}", ha='center', va='center', fontsize=16, color=color)
    ax.text(env.size/2, -0.5, f"Total Reward: {total_reward}", ha='center', va='center', fontsize=16)
    
    ax.legend(loc='upper left')

### train_and_visualize Function

The `train_and_visualize` function trains an autonomous car agent in a driving environment over a specified number of episodes and visualizes the training process.

- **`episodes`**: The number of training episodes (default is 10).

**Process:**
1. **Initialize Environment and Agent**: Creates an instance of `DrivingEnvironment` and `AutonomousCar`.
2. **Training Loop**: For each episode, the environment is reset, and the agent performs actions based on its policy.
   - **Action Selection**: Chooses an action using the agent's policy.
   - **Environment Step**: Takes a step in the environment and receives feedback (next state, reward, and whether the episode is done).
   - **Learning**: Updates the agent's knowledge using the Q-learning algorithm.
   - **Visualization**: Visualizes the environment and updates the plot with the current state, action, reward, and total reward.
3. **Recording Rewards**: Collects the total reward for each episode.
4. **Output**: Prints the total reward for each episode and returns the trained agent along with the list of rewards.

The function visualizes the environment during training, showing the car's and obstacle's positions and the rewards received.


In [None]:
def train_and_visualize(episodes=10):
    env = DrivingEnvironment()
    car = AutonomousCar(env.size, len(env.actions))
    rewards = []

    fig, ax = plt.subplots(figsize=(12, 6))

    for episode in range(episodes):
        state = env.reset()
        total_reward = 0
        done = False
        step = 0

        print(f"Episode {episode + 1}")

        while not done:
            action = car.choose_action(state)
            next_state, reward, done = env.step(action)
            car.learn(state, action, reward, next_state)
            
            step += 1
            total_reward += reward
            visualize_environment(ax, env, next_state, action, reward, total_reward)
            plt.show()

            state = next_state

        rewards.append(total_reward)
        print(f"Episode {episode + 1} finished. Total reward: {total_reward}")

    return car, rewards


### Training and Visualization

1. **Train the Model**: 
   - Calls `train_and_visualize` with `episodes=5` to train the `AutonomousCar` agent in the `DrivingEnvironment` and visualize the process.
   - Stores the trained agent and reward history.

2. **Plot the Rewards**:
   - Creates a plot showing the total reward per episode.
   - Customizes the plot with labels, title, grid, and legend to visualize reward trends over episodes.

3. **Display Final Q-table**:
   - Prints the final Q-table of the trained `AutonomousCar` to inspect the learned Q-values.


### Output

### Final Q-table:

[[ 0.          0.          0.        ]

 [-1.         -0.1         1.5969919 ]

 [-0.88585783  5.1653572  -1.9       ]

 [ 0.          0.          0.        ]

 [ 0.          0.          0.        ]]


In [None]:

# Train the model with visualization
trained_car, reward_history = train_and_visualize(episodes=5)

# Plot the rewards
plt.figure(figsize=(10, 5))
plt.plot(reward_history, marker='o', linestyle='-', color='b', label='Total Reward per Episode')
plt.title('Reward History', fontsize=18)
plt.xlabel('Episode', fontsize=14)
plt.ylabel('Total Reward', fontsize=14)
plt.grid(True)
plt.legend()
plt.show()

print("\nFinal Q-table:")
print(trained_car.q_table)
