Permalink
Commits on Jan 8, 2019
  1. Replace pipeline in NCF (#5786)

    robieta committed Jan 8, 2019
    * rough pass at carving out existing NCF pipeline
    
    2nd half of rough replacement pass
    
    fix dataset map functions
    
    reduce bias in sample selection
    
    cache pandas work on a daily basis
    
    cleanup and fix batch check for multi gpu
    
    multi device fix
    
    fix treatment of eval data padding
    
    print data producer
    
    replace epoch overlap with padding and masking
    
    move type and shape info into the producer class and update run.sh with larger batch size hyperparams
    
    remove xla for multi GPU
    
    more cleanup
    
    remove model runner altogether
    
    bug fixes
    
    address subtle pipeline hang and improve producer __repr__
    
    fix crash
    
    fix assert
    
    use popen_helper to create pools
    
    add StreamingFilesDataset and abstract data storage to a separate class
    
    bug fix
    
    fix wait bug and add manual stack trace print
    
    more bug fixes and refactor valid point mask to work with TPU sharding
    
    misc bug fixes and adjust dtypes
    
    address crash from decoding bools
    
    fix remaining dtypes and change record writer pattern since it does not append
    
    fix synthetic data
    
    use TPUStrategy instead of TPUEstimator
    
    minor tweaks around moving to TPUStrategy
    
    cleanup some old code
    
    delint and simplify permutation generation
    
    remove low level tf layer definition, use single table with slice for keras, and misc fixes
    
    missed minor point on removing tf layer definition
    
    fix several bugs from recombinging layer definitions
    
    delint and add docstrings
    
    Update ncf_test.py. Section for identical inputs and different outputs was removed.
    
    update data test to run against the new producer class
    
    * remove 'deterministic'
    
    * delint
    
    * address PR comments
    
    * change eval_batch_size flag from a string to an int
    
    * Add bisection based producer for increased scalability, enable fully deterministic data production, and use the materialized and bisection producer to check each other (via expected output md5's)
    
    * remove references to hash pipeline
    
    * skip bisection when it is not needed
    
    * add unbuffer to run.sh as tee is causing issues
    
    * address PR comments
    
    * address more PR comments
    
    * fix lint errors
    
    * trim lines in resnet keras
    
    * remove mock to debug kokoro failures
    
    * Revert "remove mock to debug kokoro failures"
    
    This reverts commit 63f5827.
    
    * remove match_mlperf from expected cache keys
    
    * fix test now that cache construction no longer uses match_mlperf
    
    * disable tests to debug test failure
    
    * disable more tests
    
    * completely disable data_test
    
    * restore data test
    
    * add versions to requirements.txt
    
    * update call to TPUStrategy
  2. missed a not

    robieta committed Jan 8, 2019
  3. don't use forkpool to shuffle with TPUs

    robieta committed Jan 8, 2019
  4. update call to TPUStrategy

    robieta committed Jan 8, 2019
  5. add versions to requirements.txt

    robieta committed Jan 8, 2019
  6. restore data test

    robieta committed Jan 8, 2019
  7. completely disable data_test

    robieta committed Jan 8, 2019
Commits on Jan 7, 2019
  1. disable more tests

    robieta committed Jan 7, 2019
  2. disable tests to debug test failure

    robieta committed Jan 7, 2019
  3. fix test now that cache construction no longer uses match_mlperf

    robieta committed Jan 7, 2019
  4. remove match_mlperf from expected cache keys

    robieta committed Jan 7, 2019
  5. Revert "remove mock to debug kokoro failures"

    robieta committed Jan 7, 2019
    This reverts commit 63f5827.
  6. remove mock to debug kokoro failures

    robieta committed Jan 7, 2019
  7. trim lines in resnet keras

    robieta committed Jan 7, 2019
  8. fix lint errors

    robieta committed Jan 7, 2019
  9. address more PR comments

    robieta committed Jan 7, 2019
  10. address PR comments

    robieta committed Jan 7, 2019
  11. add unbuffer to run.sh as tee is causing issues

    robieta committed Dec 28, 2018
  12. skip bisection when it is not needed

    robieta committed Dec 27, 2018
  13. remove references to hash pipeline

    robieta committed Dec 27, 2018
  14. Add bisection based producer for increased scalability, enable fully …

    robieta committed Dec 27, 2018
    …deterministic data production, and use the materialized and bisection producer to check each other (via expected output md5's)
  15. change eval_batch_size flag from a string to an int

    robieta committed Dec 26, 2018
  16. address PR comments

    robieta committed Dec 22, 2018
  17. delint

    robieta committed Dec 21, 2018
  18. remove 'deterministic'

    robieta committed Dec 21, 2018
  19. rough pass at carving out existing NCF pipeline

    robieta committed Nov 19, 2018
    2nd half of rough replacement pass
    
    fix dataset map functions
    
    reduce bias in sample selection
    
    cache pandas work on a daily basis
    
    cleanup and fix batch check for multi gpu
    
    multi device fix
    
    fix treatment of eval data padding
    
    print data producer
    
    replace epoch overlap with padding and masking
    
    move type and shape info into the producer class and update run.sh with larger batch size hyperparams
    
    remove xla for multi GPU
    
    more cleanup
    
    remove model runner altogether
    
    bug fixes
    
    address subtle pipeline hang and improve producer __repr__
    
    fix crash
    
    fix assert
    
    use popen_helper to create pools
    
    add StreamingFilesDataset and abstract data storage to a separate class
    
    bug fix
    
    fix wait bug and add manual stack trace print
    
    more bug fixes and refactor valid point mask to work with TPU sharding
    
    misc bug fixes and adjust dtypes
    
    address crash from decoding bools
    
    fix remaining dtypes and change record writer pattern since it does not append
    
    fix synthetic data
    
    use TPUStrategy instead of TPUEstimator
    
    minor tweaks around moving to TPUStrategy
    
    cleanup some old code
    
    delint and simplify permutation generation
    
    remove low level tf layer definition, use single table with slice for keras, and misc fixes
    
    missed minor point on removing tf layer definition
    
    fix several bugs from recombinging layer definitions
    
    delint and add docstrings
    
    Update ncf_test.py. Section for identical inputs and different outputs was removed.
    
    update data test to run against the new producer class
Commits on Nov 1, 2018
  1. remove progress indicator from movielens download

    robieta committed Nov 1, 2018
Commits on Oct 30, 2018
  1. bring NCF to l2 logging compliance (#5642)

    robieta committed Oct 30, 2018
  2. Keras-ify NCF TPU embedding lookup (#5641)

    robieta committed Oct 30, 2018
    * Keras-ify TPU embedding lookup
    
    * delint
    
    * pull get_variable() out of keras lambda
    
    * delint
    
    * move get_variable under variable scope
Commits on Oct 25, 2018
  1. prevent async process from writing alive file until the main process …

    robieta committed Oct 25, 2018
    …has created the cache root (#5614)
Commits on Oct 24, 2018
  1. Move version check to a function (#5601)

    robieta committed Oct 24, 2018
    * move version check to a function
    
    * delint
    
    * tweak pip check
    
    * delint
  2. Add logging calls to NCF (#5576)

    robieta committed Oct 24, 2018
    * first pass at __getattr__ abuse logger
    
    * first pass at adding tags to NCF
    
    * minor formatting updates
    
    * fix tag name
    
    * convert metrics to python floats
    
    * getting closer...
    
    * direct mlperf logs to a file
    
    * small tweaks and add stitching
    
    * update tags
    
    * fix tag and add a sudo call
    
    * tweak format of run.sh
    
    * delint
    
    * use distribution strategies for evaluation
    
    * address PR comments
    
    * delint and fix test
    
    * adjust flag validation for xla
    
    * add prefix to distinguish log stitching
    
    * fix index bug
    
    * fix clear cache for root user
    
    * dockerize cache drop
    
    * TIL some regex magic
Commits on Oct 19, 2018
  1. fix error when last shard is not assigned a batch (#5569)

    robieta committed Oct 19, 2018
Commits on Oct 18, 2018
  1. Reorder NCF data pipeline (#5536)

    robieta committed Oct 18, 2018
    * intermediate commit
    
    finish replacing spillover with resampled padding
    
    intermediate commit
    
    * resolve merge conflict
    
    * intermediate commit
    
    * further consolidate the data pipeline
    
    * complete first pass at data pipeline refactor
    
    * remove some leftover code
    
    * fix test
    
    * remove resampling, and move train padding logic into neumf.py
    
    * small tweaks
    
    * fix weight bug
    
    * address PR comments
    
    * fix dict zip. (Reed led me astray)
    
    * delint
    
    * make data test deterministic and delint
    
    * Reed didn't lead me astray. I just can't read.
    
    * more delinting
    
    * even more delinting
    
    * use resampling for last batch padding
    
    * pad last batch with unique data
    
    * Revert "pad last batch with unique data"
    
    This reverts commit cbdf46e.
    
    * move padded batch to the beginning
    
    * delint
    
    * fix step check for synthetic data
Commits on Oct 14, 2018
  1. Make flagfile sharing robust to distributed filesystems and multi-wor…

    robieta committed Oct 14, 2018
    …ker setups. (#5521)
    
    * move flagfile into the cache_dir
    
    * remove duplicate code
    
    * delint