Create preprocess.py #136

kaihuchen · 2017-09-18T02:25:18Z

This program converts input images into a format suitable for use with deeplearnjs.

Before running this program:

Create a directory structure as follows:
- <topDir>
  - preprocess.py
    - <yourProjectDir>
      - <yourImageDir> # default to 'images'
      - <yourImageDir2>
Put all images under <yourImageDir>
(or if using your own directory name, i.e., <yourImageDir>)
specify it using the --path parameter from command line).
You may have <yourImageDir2>, <yourImageDir3>, etc., to facilitate experimentations.
Each image must be prefixed with its class label, followed by '_'.
For example, cat_image00005.jpg

To run:

$ cd <yourProjectDir>
$ python ../preprocess.py # or
$ python ../preprocess.py --outimgs newimgs #if prefering non-default parameters

Results: find in the current directoty an image file 'images.png' (extension '.png' added automatically), and a labels file 'labels' (or per command line options)

Note:

Make sure that the model-builder-datasets-configuration's data.labels.shape matches the number of classes found in data
Make sure that the NN model's output layer matches the number of classes
To-do:
This code is not suitable for processing large number of images.
Tested with python v2.7. Saw some problem with V3.5

This change is

This program converts input images into a format suitable for use with deeplearnjs. Before running this program: - Create a directory structure as follows: <topDir> preprocess.py <yourProjectDir> <yourImageDir> # default to 'images' <yourImageDir2> - Put all images under <yourImageDir> (or if using your own directory name, i.e., <yourImageDir>) specify it using the --path parameter from command line). - You may have <yourImageDir2>, <yourImageDir3>, etc., to facilitate experimentations. - Each image must be prefixed with its class label, followed by '_'. For example, cat_image00005.jpg To run: 1. $ cd <yourProjectDir> 2a. $ python ../preprocess.py; or 2b. $ python ../preprocess.py --outimgs newimgs #if prefering non-default parameters Results: find in the current directoty an image file 'images.png' (extension '.png' added automatically), and a labels file 'labels' (or per command line options) Note: - Make sure that the model-builder-datasets-configuration's data.labels.shape matches the number of classes found in data - Make sure that the NN model's output layer matches the number of classes To-do: - This code is not suitable for processing large number of images. - Tested with python v2.7. Saw some problem with V3.5

googlebot · 2017-09-18T02:25:21Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

nsthorat · 2017-09-20T22:52:11Z

Wow. This is fantastic. Thank you so much for spending the time on this.

A couple high level suggestions:

Can we make it so the preprocess.py doesn't have to be in the directory? It'd be nice to run it like ./scripts/preprocess /abs/path/to/image/dir
Can you rename it to something more descriptive, how about something like process_images_for_training.py or something like that? I like mile long descriptive names :)

Some minor nits inline! Thanks again!!

Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.

scripts/preprocess.py, line 2 at r1 (raw file):

# Copyright 2017 Smesh LLC. All Rights Reserved.
#   http://smesh.net/labs

Some laywercat will find this and yell at us for this :/

Feel free to add yourself a little lower at the top of the file, though (not in the license).

scripts/preprocess.py, line 16 at r1 (raw file):

# limitations under the License.
# ======================================================================
# This program converts input images into a format suitable for use with deeplearnjs.

Can you actually change this to say the model builder demo?

We're going to be thinking hard about the data API at the library level at some point in the near future, and the model builder format might not be here to stay.

scripts/preprocess.py, line 69 at r1 (raw file):

delimiter = '_'

#---------------------------------------

remove this line

scripts/preprocess.py, line 71 at r1 (raw file):

#---------------------------------------
def preprocessImages(FLAGS):
    path = FLAGS.path

use 2 space indentation throughout this file (we dont have a py linter set up yet but this is what it would say)

scripts/preprocess.py, line 95 at r1 (raw file):

        fileList = np.tile(fileList, FLAGS.replicate)
        print('...Dataset has been replicated', FLAGS.replicate, 'times')

remove trailing spaces on this line

scripts/preprocess.py, line 108 at r1 (raw file):

    a = imageList
    print('...Created', a.shape[0], 'images')
    #print('min/max pixel values: ', a.min(), '/', a.max())

remove comment

scripts/preprocess.py, line 116 at r1 (raw file):

    print('...Saved composed image to:', outImageFile+'.png')

    #-------------

remove this line

scripts/preprocess.py, line 127 at r1 (raw file):

    FLAGS.nClassesIn = len(classesClearText)
    labels5 = pack(classesClearText.tolist(), labels, FLAGS)

remove trailing whitespaces

scripts/preprocess.py, line 141 at r1 (raw file):

    length = len(labels)
    result = [ np.NaN ] * length * nClasses

remove trailing whitespaces

scripts/preprocess.py, line 197 at r1 (raw file):

        print('Error, unrecognized flags:', unparsed)
        exit(-1)

remove trailing spaces

Comments from Reviewable

kaihuchen · 2017-09-25T03:34:26Z

Done as suggested. Do I need to submit a new pull request for the changes?

…

On 9/20/2017 6:52 PM, Nikhil Thorat wrote: Wow. This is fantastic. Thank you so much for spending the time on this. A couple high level suggestions: * Can we make it so the preprocess.py doesn't have to be in the directory? It'd be nice to run it like ./scripts/preprocess /abs/path/to/image/dir * Can you rename it to something more descriptive, how about something like process_images_for_training.py or something like that? I like mile long descriptive names :) Some minor nits inline! Thanks again!! ------------------------------------------------------------------------ Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed. ------------------------------------------------------------------------ /scripts/preprocess.py, line 2 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWSSNDHRs_E5MJK563:-KuWSSNDHRs_E5MJK564:b-k2nm0> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L2>):/ # Copyright 2017 Smesh LLC. All Rights Reserved. # http://smesh.net/labs Some laywercat will find this and yell at us for this :/ Feel free to add yourself a little lower at the top of the file, though (not in the license). ------------------------------------------------------------------------ /scripts/preprocess.py, line 16 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWTNHJ_prt96WB02I9:-KuWTNHJ_prt96WB02IA:b-8bvzfc> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L16>):/ # limitations under the License. # ====================================================================== # This program converts input images into a format suitable for use with deeplearnjs. Can you actually change this to say the model builder demo? We're going to be thinking hard about the data API at the library level at some point in the near future, and the model builder format might not be here to stay. ------------------------------------------------------------------------ /scripts/preprocess.py, line 69 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWTaaglUqRDL5cLPyX:-KuWTaaglUqRDL5cLPyY:b-mu0t41> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L69>):/ delimiter= '_' #--------------------------------------- remove this line ------------------------------------------------------------------------ /scripts/preprocess.py, line 71 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWTuDPPu5my43abibe:-KuWTuDPPu5my43abibf:b-bajhtf> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L71>):/ #--------------------------------------- def preprocessImages(FLAGS): path= FLAGS.path use 2 space indentation throughout this file (we dont have a py linter set up yet but this is what it would say) ------------------------------------------------------------------------ /scripts/preprocess.py, line 95 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWTdwVP3k7YJfo6dQk:-KuWTdwVP3k7YJfo6dQl:b-clf9hh> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L95>):/ fileList= np.tile(fileList,FLAGS.replicate) print('...Dataset has been replicated',FLAGS.replicate,'times') remove trailing spaces on this line ------------------------------------------------------------------------ /scripts/preprocess.py, line 108 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWUIQObl-eJdzRFB8q:-KuWUIQObl-eJdzRFB8r:b-u9p8mw> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L108>):/ a= imageList print('...Created', a.shape[0],'images') #print('min/max pixel values: ', a.min(), '/', a.max()) remove comment ------------------------------------------------------------------------ /scripts/preprocess.py, line 116 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWTd5xJFBZ3-tLAjb6:-KuWTd5xJFBZ3-tLAjb7:b-mu0t41> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L116>):/ print('...Saved composed image to:', outImageFile+'.png') #------------- remove this line ------------------------------------------------------------------------ /scripts/preprocess.py, line 127 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWUKCzIcF-cXdK5r5y:-KuWUKD-PBV1BdlZbXhx:b-s01ojb> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L127>):/ FLAGS.nClassesIn= len(classesClearText) labels5= pack(classesClearText.tolist(), labels,FLAGS) remove trailing whitespaces ------------------------------------------------------------------------ /scripts/preprocess.py, line 141 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWULj65lUL03XtWyHw:-KuWULj65lUL03XtWyHx:b-s01ojb> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L141>):/ length= len(labels) result= [ np.NaN ]* length* nClasses remove trailing whitespaces ------------------------------------------------------------------------ /scripts/preprocess.py, line 197 at r1 <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-KuWTp3DIVGQS9NfmGaT:-KuWTp3DIVGQS9NfmGaU:b-uyx1gm> (raw file <https://github.com/pair-code/deeplearnjs/blob/9b01ed9f1b99baa3ac7288f22f75675768400148/scripts/preprocess.py#L197>):/ print('Error, unrecognized flags:', unparsed) exit(-1) remove trailing spaces ------------------------------------------------------------------------ /Comments from Reviewable <https://reviewable.io:443/reviews/pair-code/deeplearnjs/136#-:-KuWSLUq0640zzoJVBkg:b-x1h5ij>/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#136 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ANDUKw2dKgQOdQK5ceLAnit55W1QdSUeks5skZcigaJpZM4PaX7->.

nsthorat · 2017-09-26T12:29:53Z

You just need to commit those changes to this branch and I'll see them. I'll then be able to merge!

kaihuchen · 2017-09-26T16:42:30Z

The requested changes have been committed.

nsthorat · 2017-09-27T15:09:17Z

Hi Kaihu,

It looks like the changes didn't get committed (I still see some lines with just white space), was it committed to another branch?

Thanks! :)

Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.

Comments from Reviewable

kaihuchen · 2017-09-27T16:11:32Z

My bad! Commited again, please check.

nsthorat · 2017-10-01T19:26:31Z

Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

* Create preprocess.py This program converts input images into a format suitable for use with deeplearnjs. Before running this program: - Create a directory structure as follows: <topDir> preprocess.py <yourProjectDir> <yourImageDir> # default to 'images' <yourImageDir2> - Put all images under <yourImageDir> (or if using your own directory name, i.e., <yourImageDir>) specify it using the --path parameter from command line). - You may have <yourImageDir2>, <yourImageDir3>, etc., to facilitate experimentations. - Each image must be prefixed with its class label, followed by '_'. For example, cat_image00005.jpg To run: 1. $ cd <yourProjectDir> 2a. $ python ../preprocess.py; or 2b. $ python ../preprocess.py --outimgs newimgs #if prefering non-default parameters Results: find in the current directoty an image file 'images.png' (extension '.png' added automatically), and a labels file 'labels' (or per command line options) Note: - Make sure that the model-builder-datasets-configuration's data.labels.shape matches the number of classes found in data - Make sure that the NN model's output layer matches the number of classes To-do: - This code is not suitable for processing large number of images. - Tested with python v2.7. Saw some problem with V3.5 * Update and rename preprocess.py to process_images_for_training.py * Update process_images_for_training.py * Updated process_images_for_training.py * Merge branch 'master' into patch-1 * Update process_images_for_training.py

Update and rename preprocess.py to process_images_for_training.py

be903ef

kaihuchen added 4 commits September 27, 2017 11:54

Update process_images_for_training.py

1ee751c

Updated process_images_for_training.py

270f6dd

Merge branch 'master' into patch-1

3c6c8f9

Update process_images_for_training.py

292f71b

nsthorat merged commit 109574d into tensorflow:master Oct 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create preprocess.py #136

Create preprocess.py #136

kaihuchen commented Sep 18, 2017 •

edited by dsmilkov

googlebot commented Sep 18, 2017

nsthorat commented Sep 20, 2017

kaihuchen commented Sep 25, 2017 via email

nsthorat commented Sep 26, 2017

kaihuchen commented Sep 26, 2017

nsthorat commented Sep 27, 2017

kaihuchen commented Sep 27, 2017

nsthorat commented Oct 1, 2017

Create preprocess.py #136

Create preprocess.py #136

Conversation

kaihuchen commented Sep 18, 2017 • edited by dsmilkov

googlebot commented Sep 18, 2017

nsthorat commented Sep 20, 2017

kaihuchen commented Sep 25, 2017 via email

nsthorat commented Sep 26, 2017

kaihuchen commented Sep 26, 2017

nsthorat commented Sep 27, 2017

kaihuchen commented Sep 27, 2017

nsthorat commented Oct 1, 2017

kaihuchen commented Sep 18, 2017 •

edited by dsmilkov