Make boosted_trees Garden-official #4377

yhliang2018 · 2018-05-25T23:39:52Z

Hi All,

I update the boosted_trees code to make it more garden-style:

Add official flags in data_download.py, and fix a minor bug
Add benchmark logger in train_higgs.py
Update single quote with double quotes

yhliang2018 · 2018-05-25T23:45:24Z

@yk5 Could you help to review the code? Thanks!

karmel

A few minor comments, but mostly LGTM. I assume you have tested and can run all of the scripts still?

karmel · 2018-05-29T17:15:15Z

official/boosted_trees/data_download.py

+          names=["c%02d" % i for i in range(29)]  # label + 28 features.
+      ).as_matrix()
  finally:
    os.remove(temp_filename)


tf.gfile.Remove, for consistency

karmel · 2018-05-29T17:17:54Z

official/boosted_trees/data_download.py

-  FLAGS, unparsed = parse_args()
-  tf.app.run(argv=[sys.argv[0]] + unparsed)
+def define_data_download_flags():
+  """Add flags specifying data download arguments."""


Note to ourselves: we should consider having a flags_core fn specifically for download module flags, as I think we now have several separate data_dir definitions. No need to solve here though.

Good point! @robieta Maybe we should add one in utils/flags?

karmel · 2018-05-29T17:20:38Z

official/boosted_trees/train_higgs.py

-import sys

+# pylint: disable=g-bad-import-order
+import numpy as np


This import order seems wrong. Numpy should be below, and we need an enable= statement as well, right?

I got lint errors in Kokoro checking if numpy goes after absl. :(

yk5 · 2018-05-29T17:43:25Z

official/boosted_trees/data_download.py

-        names=['c%02d' % i for i in range(29)]  # label + 28 features.
-    ).as_matrix()
+    tf.logging.info("Data processing... taking multiple minutes...")
+    with gzip.open(temp_filename, "rb") as csv_file:


Just for my learning, pandas supports reading from .gz directly. Do we prefer to use explicitly gzip?

Thanks for pointing it out! It's strange then, as when I tested the original code, I got the following error:

pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

That's why we explicitly gzip it here. Any idea on the issue?

Maybe related to pandas version, but as gzip works, I think this change is fine.

Which version are you using? I use 0.22.0.

Hmm. the same 0.22.0. Do you get errors when running locally in virtualenv? or in travis or whatever?
FYI, I'm using Linux with virtualenv (python 2.7.13 numpy 1.14.3).
I ran it just now and confirmed pd.read_csv() reads and processes the csv.gz file properly..

Aha, I see the problem. I ran it with python3. When I test it with python2, it works well as yours. So I will just keep gzip explicitly for py2 and py3 compatibility. Thanks a lot! :)

I see. Thanks for the fix!

yk5

Thank you!

Looks good to me.

Make boosted_trees Garden-official

ae13078

yhliang2018 requested review from a team and karmel as code owners May 25, 2018 23:39

googlebot added the cla: yes label May 25, 2018

karmel approved these changes May 29, 2018

View reviewed changes

yk5 reviewed May 29, 2018

View reviewed changes

Fix nits

52ba252

yhliang2018 merged commit 191d99a into master May 29, 2018

yhliang2018 deleted the feat/boosted_tree branch May 29, 2018 22:21

Make boosted_trees Garden-official #4377

Make boosted_trees Garden-official #4377

Uh oh!

Conversation

yhliang2018 commented May 25, 2018

Uh oh!

yhliang2018 commented May 25, 2018

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yk5 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants