[Ideas] Open ideas #460

sethtroisi · 2018-09-24T22:13:15Z

Seth ideas

Virtual Batching ideas (re eval effects of virtual loss [Data] effect of virtual_loss #427)
- Only add X from batch of Y to the tree, put the rest in NNCache to use if they get needed later (this is basically a different version of Speculative Execution
Supervised Eval Doc
- Rate cut in SL experiments
SL training with rate cut and more steps...
Train some smaller distilled models and test on time parity

Ideas inspired by @lightvector and KataGo

NNCache
- Turning off Tree Reuse
Ownernship head
Score distribution Head
Score Maximization ("Score Utility")
Playout oscillation
Forking games (early for diversity, late for Komi, ...)

Ideas inspired by LZ

SWA: Initial Proof Of Concept in Oneoff SWA script #283 but more work needed
Visits "timemanagement" (stopping when 2nd move can't overtake first)

Ideas from AG/AGZ/AZ papers

Gating: Write up in "Rough Gating" work required #459
Playing anything early within 1% of best

Ideas from elsewhere

Changing value to 0.95 (AKA 1 - false positive) for selfplay with resign (idea from lightvector)
Active learning (Needs more details) @brilee
(https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a))
Cyclic LR

Done

Slow window, not super successful (inspired by Oracle's Connect Four work)
Q=0 (Sorta tested in V11)
FPU (Handled by Support init-to-parent, init-to-loss, and FPU #629)

sethtroisi · 2018-10-02T05:21:06Z

Check for dead neurons:
https://stackoverflow.com/questions/42362542/how-to-monitorise-dead-relus/48782899

sethtroisi · 2018-10-02T09:06:06Z

z = z * move_num/length
z = z/2 + q/2
z = z * false_positive_rate in resign disabled games

higher learning rate early

sethtroisi · 2018-10-18T23:25:44Z

sethtroisi · 2019-03-09T01:10:24Z

Adding stuff about distillation and Seth ideas

sethtroisi · 2019-03-19T06:13:33Z

Checking if eval games have enough diversity and using this opening panel

leela-zero/leela-zero#2104

Ishinoshita · 2019-06-16T16:01:24Z

@sethtroisi Re "timemanagement" from LZ, I'm concerned it might be detrimental for self-play and RL, as it amounts to some sort of policy sharpening: cutting the search early means low policy moves won't get any visit and will be trained towards 0. That may hinder the learning of new stuff.

IMHO, the key to spare compute budget might truly be KataGo's variable visits scheme, for game move search vs policy training target search.

And both types of KataGo's search could benefit from the KLD threshold trick from LC0, that sounds very appealing for policy, though much complex to implement ;-)

sethtroisi · 2020-01-22T00:46:38Z

From Brian Lee:

One concrete idea: instead of selecting 2% flat from the last 50 generations, select 4%->0% over the last 50 generations, with some sort of exponentially decaying curve, and also make this parameter configurable. Early on, we might want to have 10% -> 0% over the last ~10 generations of data, but later on we might want to flatten that curve to select 2% -> 0% over the last 100 generations.

amj added the discussion label Oct 4, 2018

sethtroisi pinned this issue Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ideas] Open ideas #460

[Ideas] Open ideas #460

sethtroisi commented Sep 24, 2018 •

edited

Loading

sethtroisi commented Oct 2, 2018

sethtroisi commented Oct 2, 2018 •

edited

Loading

sethtroisi commented Oct 18, 2018

sethtroisi commented Mar 9, 2019

sethtroisi commented Mar 19, 2019

Ishinoshita commented Jun 16, 2019

sethtroisi commented Jan 22, 2020

[Ideas] Open ideas #460

[Ideas] Open ideas #460

Comments

sethtroisi commented Sep 24, 2018 • edited Loading

Seth ideas

Ideas inspired by @lightvector and KataGo

Ideas inspired by LZ

Ideas from AG/AGZ/AZ papers

Ideas from elsewhere

Done

sethtroisi commented Oct 2, 2018

sethtroisi commented Oct 2, 2018 • edited Loading

sethtroisi commented Oct 18, 2018

sethtroisi commented Mar 9, 2019

sethtroisi commented Mar 19, 2019

Ishinoshita commented Jun 16, 2019

sethtroisi commented Jan 22, 2020

sethtroisi commented Sep 24, 2018 •

edited

Loading

sethtroisi commented Oct 2, 2018 •

edited

Loading