Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Add tf.io.browserDownloads() and tf.io.browserFiles() #1012

Merged
merged 26 commits into from May 3, 2018

Conversation

caisq
Copy link
Collaborator

@caisq caisq commented Apr 30, 2018

  • tf.io.browserDownloads() causes the browser to download a model's artifact as files.
    • In the case of tf.Model (Keras-style model), two files consistent with the tensorflowjs_converter format will be donwnloaded:
      • A JSON file that contains model topology and weight manifest
      • A binary file that contains the weights of the model
  • tf.io.browserFiles() supports loading model artifacts from files such as user-selected files.

Towards tensorflow/tfjs#13


This change is Reviewable

caisq added 18 commits April 26, 2018 21:39
* tf.io.browserLocalStorage() is the public factory function.
* Adjust SaveResult slightly: Remove the required success boolean
  field. Failures in saving and loading will always be thrown as
  `Error`s.
* Add helper functions:
  * strinbByteLength: required for calculating the byte length of
    the JSON model topology and weight specs.
  * arrayBufferToBase64String
  * base64StringToArrayBuffer
  The last two helper functions are required because local storage
  values cannot by binary arrays or `ArrayBuffer`s.
@ry
Copy link
Contributor

ry commented May 1, 2018

Wouldn't it make more sense for this to live in tfjs-layers instead of core? tf.Model isn't even defined in core.

@caisq
Copy link
Collaborator Author

caisq commented May 1, 2018

@ry thanks for the comment. It's true that tf.Model is not a core construct. But eventually we want to support saving both Keras-style tf.Models and non-Keras styles (such as the FrozenModels loaded from converted TF SavedModels). The latter doesn't support saving yet, but we have plans to support that. @pyu10055

@nsthorat
Copy link
Contributor

nsthorat commented May 1, 2018

Let's chat about this tomorrow -- overall we think that it may make sense to actually decouple the IOHandler because it's really hard to name these top-level functions that merge load/save. Let's chat about this tomorrow (all of these comments below still apply).


Review status: 0 of 3 files reviewed at latest revision, all discussions resolved.


src/io/files.ts, line 1 at r1 (raw file):

/**

I think we should call this something like browser_files.ts. I don't want to pack all of the node and browser stuff into a single typescript file.


src/io/files.ts, line 32 at r1 (raw file):

const DEFAULT_FILE_NAME_PREFIX = 'model';
const DEFAULT_JSON_EXTENSION_NAME = '.json';
const DEFAULT_WEIHGT_DATA_EXTENSION_NAME = '.weights.bin';

WEIGHT


src/io/files.ts, line 34 at r1 (raw file):

const DEFAULT_WEIHGT_DATA_EXTENSION_NAME = '.weights.bin';

export interface DownloadTriggerConfig {

how about BrowserDownloadTriggerConfig


src/io/files.ts, line 42 at r1 (raw file):

   * create to bind the download to.
   */
  jsonAnchor?: HTMLAnchorElement;

Let's remove this config entirely. I think we just want to trigger downloads immediately, and always create temporary anchors. If the user wants to build a UI that lets you click a link, they can add an onclick handler which calls your download trigger method.


src/io/files.ts, line 59 at r1 (raw file):

}

export class DownloadTrigger implements IOHandler {

how about BrowserDownloadTrigger


src/io/files.ts, line 66 at r1 (raw file):

  private readonly trigger: boolean;

  constructor(fileNames?: string|string[], config?: DownloadTriggerConfig) {

can this be a string => string object instead of an array, or alternatively have multiple filename arguments?


src/io/files.ts, line 67 at r1 (raw file):

  constructor(fileNames?: string|string[], config?: DownloadTriggerConfig) {
    if (!window) {

window == null


src/io/files.ts, line 164 at r1 (raw file):

}

export class Files implements IOHandler {

BrowserFileChooser? Something to disambiguate since this will not be used in node.

Also, can you split this file into two files? It's better to have 1:1 classes to files, just like in Java.


src/io/files.ts, line 167 at r1 (raw file):

  private readonly files: File[];

  constructor(files?: File[]) {

can you just stick a private here?

How many files are you expecting to support here? There's a throw in load() for files with non-2 length.

Also, are you assuming order of the files? We have standardized filenames for everything, so it would be good to support arbitrary ordering.


src/io/files.ts, line 182 at r1 (raw file):

      const jsonReader = new FileReader();
      // tslint:disable-next-line:no-any
      jsonReader.onload = (event: any) => {

can you type this? I think it may be HTMLEvent


src/io/files.ts, line 205 at r1 (raw file):

        if (weightGroup.paths.length !== 1) {
          reject(new Error(
              `When uploading user-selected files, we current support only ` +

why is this? if we're using a file chooser you can easily select many files. I think we should support that use case.


src/io/files.ts, line 237 at r1 (raw file):

 *     {units: 1, inputShape: [10], activation: 'sigmoid'}));
 * const artifactsInfo = await model.save(tf.io.triggerDownloads(
 *     ['mymodel.json', 'mymodel.weights.bin'])):

it seems better for this API to have weights / model separated like this:

tf.io.triggerDownloads(
    'myModel.json', ['mymodel.weights.bin1', 'mymodel.weights.bin2']);

Comments from Reviewable

@caisq
Copy link
Collaborator Author

caisq commented May 3, 2018

Had conversation offline. Will keep the IOHandler interface as it is nice for many sroutes that have semmetry between saving and loading. The classes for downloading and uploading files will remain as they are (i.e., two separate files).
But the public helper functions will be renamed:
tf.io.triggerDownloads ==> tf.io.browserDownloads
tf.io.files ==> tf.io.browserFiles

In order to minimize potential confusion with native files in node.js that we will support in the future.

Also, just a heads-up that the complexity of the code in this PR has increased a little as a result of our decision to support uploading multiple (sharded) weight file in the browser.


Review status: 0 of 5 files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.


src/io/files.ts, line 1 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

I think we should call this something like browser_files.ts. I don't want to pack all of the node and browser stuff into a single typescript file.

Done renaming this file and the associated test file.


src/io/files.ts, line 32 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

WEIGHT

Done.


src/io/files.ts, line 34 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

how about BrowserDownloadTriggerConfig

This config interface has been removed.


src/io/files.ts, line 42 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Let's remove this config entirely. I think we just want to trigger downloads immediately, and always create temporary anchors. If the user wants to build a UI that lets you click a link, they can add an onclick handler which calls your download trigger method.

Done.


src/io/files.ts, line 59 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

how about BrowserDownloadTrigger

Changed to browserDownloads.


src/io/files.ts, line 66 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can this be a string => string object instead of an array, or alternatively have multiple filename arguments?

Done. Per offline discussion. We will allow client only to specify the file-name prefix.


src/io/files.ts, line 67 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

window == null

Done.


src/io/files.ts, line 164 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

BrowserFileChooser? Something to disambiguate since this will not be used in node.

Also, can you split this file into two files? It's better to have 1:1 classes to files, just like in Java.

per offline discussion, naming this class 'BrowserFiles', as the files don't have to be user-selected files, they can be Files generated from blobs as well.

The two classes and two factory methods will continue to life in browser_files.ts, as they are both related to manipulating files in the browser. Doing this also benefits consistency with other files in this folder in the sense that each file defines a save and load method, even though in this case they are distributed between two classes.


src/io/files.ts, line 167 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you just stick a private here?

How many files are you expecting to support here? There's a throw in load() for files with non-2 length.

Also, are you assuming order of the files? We have standardized filenames for everything, so it would be good to support arbitrary ordering.

As discussed in offline discussion, I use File[] here because we need to support FrozenModel artifacts in the future. FrozenModel involves at least 3 files (GraphDef, weights manifest and weights), which is different from tf.Model, which involves 2 files (model.json and weights). File[] is a clean way to capture both.


src/io/files.ts, line 182 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you type this? I think it may be HTMLEvent

I think the type is Event. Done.


src/io/files.ts, line 205 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

why is this? if we're using a file chooser you can easily select many files. I think we should support that use case.

Done adding support for loading from multiple weight files. As discussed offline, an error will be thrown if there are duplicate file basenames (e.g., dir1/weights.bin, dir2/weights.bin) in the weights manifest for some reason.


src/io/files.ts, line 237 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

it seems better for this API to have weights / model separated like this:

tf.io.triggerDownloads(
    'myModel.json', ['mymodel.weights.bin1', 'mymodel.weights.bin2']);

See my other response about the difference between FrozenModels and tf.Models. Because of the difference in their file counts, doing it as an array is the cleanest approach.


Comments from Reviewable

@nsthorat
Copy link
Contributor

nsthorat commented May 3, 2018

:lgtm_strong:


Reviewed 1 of 7 files at r2.
Review status: 1 of 5 files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.


src/io/browser_files.ts, line 93 at r2 (raw file):

      if (modelArtifacts.weightData != null) {
        const weightDataAnchor = this.weightDataAnchor == null ?
            document.createElement('a') as HTMLAnchorElement :

you shouldnt need to cast to HTMLAnchorElement here (magic typescript type lookup for 'a')


src/io/browser_files.ts, line 209 at r2 (raw file):

        return;
      };
      jsonReader.readAsText(this.files[0]);

use jsonReader variable


src/io/browser_files.ts, line 250 at r2 (raw file):

/**
 * Factory method for IOHandler that triggers file downloads.

How about "Creates a browser file download IOHandler"

similar below


src/io/browser_files.ts, line 259 at r2 (raw file):

 * model.add(tf.layers.dense(
 *     {units: 1, inputShape: [10], activation: 'sigmoid'}));
 * const artifactsInfo = await model.save(tf.io.triggerDownloads('mymodel'));

update this


src/io/browser_files.ts, line 261 at r2 (raw file):

 * const artifactsInfo = await model.save(tf.io.triggerDownloads('mymodel'));
 * // This will trigger downloading of two files:
 * //   'mymodel.json' and 'mymodel.weights.bin'.

qq - should we migrate the other write_weights.py to write ".bin" extensions for consistency?


src/io/browser_files.ts, line 302 at r2 (raw file):

 * const uploadWeightsInput = document.getElementById('upload-weights');
 * const model = await tfl.loadModel(
 *     tfc.io.files([uploadJSONInput.files[0], uploadWeightsInput.files[0]]));

update this


src/io/browser_files_test.ts, line 287 at r2 (raw file):

  const reverseOrderValues: boolean[] = [false, true];
  for (const reverseOrder of reverseOrderValues) {
    const testTitle = `One group, two paths, reverseOrder=${reverseOrder}`;

I know you do this in layers, but I prefer we don't do this type of programmatic construction of tests in core. It's much easier to understand / debug if they're static, even if you duplicate code.


Comments from Reviewable

@dsmilkov
Copy link
Contributor

dsmilkov commented May 3, 2018

:lgtm_strong:


Reviewed 6 of 7 files at r2.
Review status: all files reviewed at latest revision, 19 unresolved discussions, some commit checks failed.


src/io/browser_files.ts, line 20 at r2 (raw file):

/**
 * IOHandlers related to files.
 *

join the lines above and below this comment into one sentence.


src/io/browser_files.ts, line 41 at r2 (raw file):

  constructor(fileNamePrefix?: string) {
    if (window == null) {

In node, the check if (window == null) will throw ReferenceError: window is not defined. Do if (typeof window === 'undefined') instead.


Comments from Reviewable

@caisq
Copy link
Collaborator Author

caisq commented May 3, 2018

Review status: all files reviewed at latest revision, 21 unresolved discussions, all commit checks successful.


src/io/browser_files.ts, line 20 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

join the lines above and below this comment into one sentence.

Done. Also removed reference to native files in node.js.


src/io/browser_files.ts, line 41 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

In node, the check if (window == null) will throw ReferenceError: window is not defined. Do if (typeof window === 'undefined') instead.

Done.


src/io/browser_files.ts, line 93 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

you shouldnt need to cast to HTMLAnchorElement here (magic typescript type lookup for 'a')

Done. Thanks!


src/io/browser_files.ts, line 209 at r2 (raw file):

jsonReader

This is already using the variable. Maybe I didn't get what you mean?


src/io/browser_files.ts, line 250 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

How about "Creates a browser file download IOHandler"

similar below

Done.


src/io/browser_files.ts, line 259 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

update this

Done.


src/io/browser_files.ts, line 261 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

qq - should we migrate the other write_weights.py to write ".bin" extensions for consistency?

Both the python and javascript libraries are agnostic w.r.t. extension names. So I'd say this is not necessary.


src/io/browser_files.ts, line 302 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

update this

Done.


src/io/browser_files_test.ts, line 287 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

I know you do this in layers, but I prefer we don't do this type of programmatic construction of tests in core. It's much easier to understand / debug if they're static, even if you duplicate code.

OK. I don't mind removing this kind of parametric tests, as long as you don't mind larger sizes in the test files. Done.


Comments from Reviewable

@caisq caisq changed the title Add tf.io.triggerDownloads() and tf.io.files() Add tf.io.browserDownloads() and tf.io.browserFiles() May 3, 2018
@caisq
Copy link
Collaborator Author

caisq commented May 3, 2018

Review status: 3 of 5 files reviewed at latest revision, 21 unresolved discussions, some commit checks pending.


src/io/browser_files.ts, line 209 at r2 (raw file):

variable

Using the variable 'jsonFile'.


Comments from Reviewable

@caisq caisq merged commit 536de43 into tensorflow:master May 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants