# Reinforcement Learning

## Introduction

Reinforcement Learning (RL) is a subfield of machine learning that focuses on training agents to make sequences of decisions in an environment to maximize a cumulative reward. RL is inspired by behavioral psychology and is used for solving problems where an agent interacts with an environment, learns from its actions, and adapts its behavior over time. One of the key components of RL is the concept of learning by trial and error.

## Terminology in Reinforcement Learning

- **Agent**: The entity that interacts with the environment and makes decisions to maximize cumulative rewards.

- **Environment**: The external system or context in which the agent operates and from which it receives feedback and rewards.

- **State (s)**: A representation of the current situation or configuration of the environment that the agent uses to make decisions.

- **Action (a)**: The set of possible moves or decisions that the agent can take at each time step.

- **Policy (π)**: The strategy or mapping from states to actions that the agent uses to make decisions.

- **Reward (r)**: A numerical signal or feedback from the environment that indicates how good or bad an action or decision is.

- **Episode**: A complete sequence of interactions between the agent and the environment, from the initial state to a terminal state.

- **Value Function (V)**: A function that estimates the expected cumulative reward an agent can achieve from a given state or state-action pair.

- **Q-Value (Q)**: A function that estimates the expected cumulative reward an agent can achieve from a given state-action pair, while following a particular policy.

## Reinforcement Learning with Connect Four

In our RL project, we will apply reinforcement learning techniques to train an agent to play the game of Connect Four effectively. The agent will learn to make optimal moves to win or achieve a draw in Connect Four against a human opponent. This project will showcase how RL can be used to solve complex board games by learning from interactions with the game environment.


## Implementation

In [1]:
# Define the game board dimensions
ROWS = 6
COLUMNS = 7

# Create an empty game board
def create_board():
    return [[' ' for _ in range(COLUMNS)] for _ in range(ROWS)]

# Display the current game board
def display_board(board):
    for row in board:
        print("|".join(row))
        print("-" * (COLUMNS * 2 - 1))

# Check if a column is full
def is_column_full(board, col):
    return board[0][col] != ' '

# Make a move and update the board
def make_move(board, col, player):
    for row in range(ROWS - 1, -1, -1):
        if board[row][col] == ' ':
            board[row][col] = player
            return True
    return False

# Check if there is a winner
def check_winner(board, player):
    # Check horizontal
    for row in range(ROWS):
        for col in range(COLUMNS - 3):
            if board[row][col] == player and board[row][col + 1] == player and board[row][col + 2] == player and board[row][col + 3] == player:
                return True

    # Check vertical
    for row in range(ROWS - 3):
        for col in range(COLUMNS):
            if board[row][col] == player and board[row + 1][col] == player and board[row + 2][col] == player and board[row + 3][col] == player:
                return True

    # Check diagonal (top-left to bottom-right)
    for row in range(ROWS - 3):
        for col in range(COLUMNS - 3):
            if board[row][col] == player and board[row + 1][col + 1] == player and board[row + 2][col + 2] == player and board[row + 3][col + 3] == player:
                return True

    # Check diagonal (bottom-left to top-right)
    for row in range(3, ROWS):
        for col in range(COLUMNS - 3):
            if board[row][col] == player and board[row - 1][col + 1] == player and board[row - 2][col + 2] == player and board[row - 3][col + 3] == player:
                return True

    return False

# Main game loop
def main():
    board = create_board()
    current_player = 'X'

    while True:
        display_board(board)
        col = int(input(f"Player {current_player}, enter column (0-6): "))

        if col < 0 or col >= COLUMNS or is_column_full(board, col):
            print("Invalid move. Try again.")
            continue

        if make_move(board, col, current_player):
            if check_winner(board, current_player):
                display_board(board)
                print(f"Player {current_player} wins!")
                break

            if ' ' not in [cell for row in board for cell in row]:
                display_board(board)
                print("It's a draw!")
                break

            current_player = 'O' if current_player == 'X' else 'X'

if __name__ == "__main__":
    main()


 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
Player X, enter column (0-6): 3
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | |X| | | 
-------------
Player O, enter column (0-6): 2
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | |O|X| | | 
-------------
Player X, enter column (0-6): 3
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | |X| | | 
-------------
 | |O|X| | | 
-------------
Player O, enter column (0-6): 3
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | | | | | 
-------------
 | | |O| | | 
-------------
 | | |X| | | 
-------------
 | |O|X| | | 
-------------
Player X, enter column (0-6): 2
