Skip to content

temorfeouz/qlearning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qlearning

The qlearning package provides a series of interfaces and utilities to implement the Q-Learning algorithm in Go.

This project was largely inspired by flappybird-qlearning- bot.

Until a release is tagged, qlearning should be considered highly experimental and mostly a fun toy.

Some refactor, add ability for store q-table to file add ability for not only randomly select next action. Add another example, resolving nqueen problem(with some reservations)

Installation

$ go get https://github.com/temorfeouz/qlearning

Quickstart

qlearning provides example implementations in the examples directory of the project.

hangman.go provides a naive implementation of Hangman for use with qlearning.

$ cd $GOPATH/src/github.com/temorfeouz/qlearning/examples
$ go run hangman.go -h
Usage of hangman:
  -debug
        Set debug
  -games int
        Play N games (default 5000000)
  -progress int
        Print progress messages every N games (default 1000)
  -wordlist string
        Path to a wordlist (default "./wordlist.txt")
  -words int
        Use N words from wordlist (default 10000)

By default, running hangman.go will play millions of games against a 10,000-word corpus. That's a bit overkill for just trying out qlearning. You can run it against a smaller number of words for a few number of games using the -games and -words flags.

$ go run hangman.go -words 100 -progress 1000 -games 5000
100 words loaded
1000 games played: 92 WINS 908 LOSSES 9% WIN RATE
2000 games played: 447 WINS 1553 LOSSES 36% WIN RATE
3000 games played: 1064 WINS 1936 LOSSES 62% WIN RATE
4000 games played: 1913 WINS 2087 LOSSES 85% WIN RATE
5000 games played: 2845 WINS 2155 LOSSES 93% WIN RATE

Agent performance: 5000 games played, 2845 WINS 2155 LOSSES 57% WIN RATE

"WIN RATE" per progress report is isolated within that cycle, a group of 1000 games in this example. The win rate is meant to show the velocity of learning by the agent. If it is "learning", the win rate should be increasing until reaching convergence.

As you can see, after 5000 games, the agent is able to "learn" and play hangman against a 100-word vocabulary.

Usage

See godocs for the package documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%