Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

[Ideas] Open ideas #460

Open
3 tasks done
sethtroisi opened this issue Sep 24, 2018 · 7 comments
Open
3 tasks done

[Ideas] Open ideas #460

sethtroisi opened this issue Sep 24, 2018 · 7 comments

Comments

@sethtroisi
Copy link
Contributor

sethtroisi commented Sep 24, 2018

Seth ideas

  • Virtual Batching ideas (re eval effects of virtual loss [Data] effect of virtual_loss #427)
    • Only add X from batch of Y to the tree, put the rest in NNCache to use if they get needed later (this is basically a different version of Speculative Execution
  • Supervised Eval Doc
    • Rate cut in SL experiments
  • SL training with rate cut and more steps...
  • Train some smaller distilled models and test on time parity

Ideas inspired by @lightvector and KataGo

  • NNCache
    • Turning off Tree Reuse
  • Ownernship head
  • Score distribution Head
  • Score Maximization ("Score Utility")
  • Playout oscillation
  • Forking games (early for diversity, late for Komi, ...)

Ideas inspired by LZ

  • SWA: Initial Proof Of Concept in Oneoff SWA script #283 but more work needed
  • Visits "timemanagement" (stopping when 2nd move can't overtake first)

Ideas from AG/AGZ/AZ papers

Ideas from elsewhere

Done

@sethtroisi
Copy link
Contributor Author

@sethtroisi
Copy link
Contributor Author

sethtroisi commented Oct 2, 2018

z = z * move_num/length
z = z/2 + q/2
z = z * false_positive_rate in resign disabled games


higher learning rate early

@amj amj added the discussion label Oct 4, 2018
@sethtroisi
Copy link
Contributor Author

research on network size

@sethtroisi
Copy link
Contributor Author

Adding stuff about distillation and Seth ideas

@sethtroisi sethtroisi pinned this issue Mar 14, 2019
@sethtroisi
Copy link
Contributor Author

Checking if eval games have enough diversity and using this opening panel

leela-zero/leela-zero#2104

@Ishinoshita
Copy link

@sethtroisi Re "timemanagement" from LZ, I'm concerned it might be detrimental for self-play and RL, as it amounts to some sort of policy sharpening: cutting the search early means low policy moves won't get any visit and will be trained towards 0. That may hinder the learning of new stuff.

IMHO, the key to spare compute budget might truly be KataGo's variable visits scheme, for game move search vs policy training target search.

And both types of KataGo's search could benefit from the KLD threshold trick from LC0, that sounds very appealing for policy, though much complex to implement ;-)

@sethtroisi
Copy link
Contributor Author

From Brian Lee:

One concrete idea: instead of selecting 2% flat from the last 50 generations, select 4%->0% over the last 50 generations, with some sort of exponentially decaying curve, and also make this parameter configurable. Early on, we might want to have 10% -> 0% over the last ~10 generations of data, but later on we might want to flatten that curve to select 2% -> 0% over the last 100 generations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants