<a href="https://colab.research.google.com/github/mia1996/rlcard-tutoirial/blob/master/leduc_holdem_cfr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# <a href='https://github.com/datamllab/rlcard'> <center> <img src='https://miro.medium.com/max/1000/1*_9abDpNTM9Cbsd2HEXYm9Q.png' width=500 class='center' /></a> 

## **Training CFR on Leduc Hold'em**
In this tutorial, we will showcase a more advanced algorithm CFR, which uses `step` and `step_back` to traverse the game tree.

First, we install RLcard and PyTorch.

In [1]:
!pip install 'rlcard[torch]'

Collecting rlcard[torch]
  Using cached rlcard-1.2.0.tar.gz (269 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting termcolor (from rlcard[torch])
  Obtaining dependency information for termcolor from https://files.pythonhosted.org/packages/d9/5f/8c716e47b3a50cbd7c146f45881e11d9414def768b7cd9c5e6650ec2a80a/termcolor-2.4.0-py3-none-any.whl.metadata
  Downloading termcolor-2.4.0-py3-none-any.whl.metadata (6.1 kB)
Collecting torch (from rlcard[torch])
  Obtaining dependency information for torch from https://files.pythonhosted.org/packages/ad/08/c5e41eb22323db4a52260607598a207a2e1918916ae8201aa7a8ae005fcd/torch-2.3.0-cp311-none-macosx_11_0_arm64.whl.metadata
  Downloading torch-2.3.0-cp311-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting GitPython (from rlcard[torch])
  Obtaining dependency information for GitPython from https://files.pythonhosted.org/packages/e9/bd/cc3a402a6439c15c3d4294333e13042b915bbeab54edc457c723931fed3f/GitPython-3.1.43-py3-none-any.whl.me

Then we import all the classes and functions we need.

In [2]:
import rlcard
from rlcard.agents import (
    CFRAgent,
    RandomAgent,
    NFSPAgent,
)
from rlcard.utils import (
    tournament,
    Logger,
    plot_curve,
)

We make two environments, where one allows `step_back` so that CFR can traverse the tree, and the other for evaluation only.

In [3]:
env = rlcard.make(
        'limit-holdem',
        config={
            'allow_step_back': True,
        }
    )
eval_env = rlcard.make(
    'limit-holdem',
)

We create the CFR agent.

In [4]:
agent = NFSPAgent(
    env,
    "experiments/limit_holdem_cfr_result/cfr_model",
)

Here, we save the trained model in the path `experiments/leduc_holdem_cfr_result/cfr_model`. Then we use a random agent as the opponent.

In [5]:
eval_env.set_agents([
    agent,
    RandomAgent(num_actions=env.num_actions),
])

Now we start training for `1000` iterations, i.e., 1000 games.

In [None]:
with Logger("experiments/limit_holdem_cfr_result") as logger:
    for episode in range(1000):
        agent.train()
        print('\rIteration {}'.format(episode), end='')
        # Evaluate the performance. Play with Random agents.
        if episode % 50 == 0:
            logger.log_performance(
                env.timestep,
                tournament(
                    eval_env,
                    10000,
                )[0]
            )

    # Get the paths
    csv_path, fig_path = logger.csv_path, logger.fig_path

Iteration 0
----------------------------------------
  episode      |  45920
  reward       |  0.0196
----------------------------------------
Iteration 50
----------------------------------------
  episode      |  2341920
  reward       |  0.0259
----------------------------------------
Iteration 100
----------------------------------------
  episode      |  4637920
  reward       |  -0.02425
----------------------------------------
Iteration 143

We can plot the learning curve

In [None]:
plot_curve(csv_path, fig_path, 'cfr')
agent.save()

Good job! Now you have your trained CFR agent on Leduc Hold'em!