Question on the list of dependencies #93

adrinjalali · 2021-02-23T18:24:58Z

Right now the dependencies on install_requires included packages which are not necessarily a hard dependency, depending on what the user would like to do.

For instance, if the user would like to have a minimal environment doing machine learning where they use frameworks other than tensorflow, this package would still pull tensorflow and many other packages in their environment.

I was wondering if you'd be open to the idea of making dependencies soft dependencies as much as possible, and only tell users in the documentation which dependencies are optional for which parts of the library, and also tell them they need extra libraries if they call specific functions of the library.

This is the list of packages pulled by pip on a fresh environment, which admittedly is quite a long list:

Installing collected packages: urllib3, six, pyasn1, ipython-genutils, idna, chardet, traitlets, rsa, requests, pyrsistent, pyparsing, pycparser, pyasn1-modules, protobuf, oauthlib, cachetools, attrs, wcwidth, typing-extensions, tornado, requests-oauthlib, pyzmq, pytz, python-dateutil, ptyprocess, parso, packaging, mypy-extensions, jupyter-core, jsonschema, grpcio, googleapis-common-protos, google-auth, cffi, werkzeug, webencodings, typing-inspect, tensorboard-plugin-wit, pyyaml, pygments, prompt-toolkit, pickleshare, pexpect, pbr, numpy, nest-asyncio, nbformat, MarkupSafe, markdown, jupyter-client, jedi, httplib2, grpcio-gcp, google-crc32c, google-auth-oauthlib, google-api-core, docopt, decorator, backcall, async-generator, absl-py, wrapt, testpath, termcolor, tensorflow-estimator, tensorboard, pymongo, pydot, pyarrow, proto-plus, pandocfilters, opt-einsum, oauth2client, nbclient, mock, mistune, libcst, keras-preprocessing, jupyterlab-pygments, jinja2, ipython, hdfs, h5py, grpc-google-iam-v1, google-resumable-media, google-pasta, google-cloud-core, gast, future, fasteners, fastavro, entrypoints, dill, defusedxml, crcmod, bleach, avro-python3, astunparse, uritemplate, terminado, tensorflow, Send2Trash, prometheus-client, nbconvert, ipykernel, google-cloud-vision, google-cloud-videointelligence, google-cloud-spanner, google-cloud-pubsub, google-cloud-language, google-cloud-dlp, google-cloud-datastore, google-cloud-build, google-cloud-bigtable, google-cloud-bigquery, google-auth-httplib2, google-apitools, argon2-cffi, apache-beam, tensorflow-serving-api, tensorflow-metadata, pandas, notebook, google-api-python-client, widgetsnbextension, tfx-bsl, jupyterlab-widgets, tensorflow-transform, scipy, pillow, kiwisolver, joblib, ipywidgets, cycler, tensorflow-model-analysis, tensorflow-data-validation, semantic-version, ml-metadata, matplotlib, model-card-toolkit

The text was updated successfully, but these errors were encountered:

amadeuspzs · 2021-08-20T10:10:40Z

Some metrics to support this issue:

macOS 10.15.7
1.4 GHz Quad-Core Intel Core i5
Python 3.8.2
pip 21.2.4

$ pip install model_card_toolkit --no-cache-dir --use-deprecated=legacy-resolve  
70.43s user 31.26s system 18% cpu 9:13.00 total

9 minutes is a very long time to install a package.

The size of site-packages is 1545mb which seems excessive for this tool?

vishwanath-prudhivi · 2021-09-08T07:38:07Z

Hi,

We currently are experimenting with model cards and are using them as summary reports for sklearn models trained on vertex ai on GCP. We wanted to understand by when we could expect a refined package dependency list - currently the training jobs are unable to move past environment setup (due to model card toolkit dependencies) with a lot of time being taken by pip to determine the right versions to install.

Here is the package list from our setup.py file to create the vertex training package (custom code option):

REQUIRED_PACKAGES = ['pandas-gbq>=0.10.0',
'pandas==1.1.3',
'google_compute_engine',
'google-cloud-bigquery==1.24.0',
'google-cloud-core>=1.0.0',
'google-cloud-logging',
'google-cloud-storage>=1.16.0',
'parmap',
'pyarrow==0.16.0',
'google-api-core>=1.11.0',
'google-api-python-client>=1.7.8',
'google-cloud-pubsub>=0.41.0',
'cython',
'gcsfs',
'sklearn',
'google-cloud-profiler',
#'imblearn==0.8.0',
#'autoimpute==0.12.2',
'imblearn',
'autoimpute',
'optbinning',
'model_card_toolkit==1.1.0']

This runs on top of the europe-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6:latest container for tensorflow 2.6 ML framework version.

Scanning the training job logs, we see many of the following messages - INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking

Installing the model card toolkit library on a jupyter notebook is a 1 time step and works fine, however.

amadeuspzs · 2021-09-08T07:50:02Z

@vishwanath-prudhivi are you passing the --use-deprecated=legacy-resolver to pip as per https://github.com/tensorflow/model-card-toolkit/blob/master/model_card_toolkit/documentation/guide/install.md?

vishwanath-prudhivi · 2021-09-08T09:34:14Z

@amadeuspzs thanks for the suggestion. So when we build a training package , we call the command :-
python3 setup.py sdist --formats=gztar
After the job is submitted to vertex ai, by default the following command is invoked : -

pip3 install --user <training_package_name>.tar.gz

Any suggestions on how to add additional flags (as mentioned in the previous suggestion) here would be helpful

Regards

codesue · 2022-11-09T23:31:43Z

Hi all, we're in the process of removing the tfx dependency, which will reduce the number of packages installed considerably. I linked the pull request to this issue.

@adrinjalali, we opened up the discussion topic #228 to brainstorm how to approach loosening up the dependencies and what the new dependency list will look like. I'd love to learn more about your use case and what an ideal workflow would look like for you. 😄

codesue · 2022-12-07T16:17:15Z

After removing tfx, the required packages on the main branch are:

absl-py>=0.9,<1.1, primarily used for testing and building docs, used for logging in one instance
jinja2>=3.1,<3.2, used for rendering the model card
matplotlib>=3.2.0,<4, used for generating graphics from tfma and tfdv objects
jsonschema>=3.2.0,<4, used for validating JSON schema
tensorflow-data-validation>=1.5.0,<2.0.0, used for reading and parsing tf stats
tensorflow-model-analysis>=0.36.0,<0.42.0, used for reading and parsing tf metrics
tensorflow-metadata>=1.5.0,<2.0.0, contains the stats proto definition
ml-metadata>=1.5.0,<2.0.0, used for querying MLMD
dataclasses;python_version<"3.7", used for Python 3.6 which is no longer supported

I did some testing, and I think the minimal requirements to support core functionality with little refactoring are:

absl-py>=0.9,<1.1: primarily used for testing and building docs, used for logging in one instance
jinja2>=3.1,<3.2: used for rendering the model card
jsonschema>=3.2.0,<4: used for validating JSON schema
protobuf>=3.19.0,<3.20.0: used for building, serializing, and parsing protos -- previously installed as a transitive dependency

Here, core functionality means creating ModelCard and ModelCardToolkit objects, only manually annotating graphics and performance metrics, and rendering and exporting the model card.

Depending on how TensorFlow docs are built, it might be possible to move absl-py to an extra. If exporting a model card as a proto is made optional, protobuf could be made optional as well.

Model Card Toolkit is now a community-led open source project under the TFX Addons special interest group. (Learn more in this announcement.) The project now depends on community contributions, bug fixes, and documentation. This means the timeline for a creating a basic model-card-toolkit package for core functionality and moving optional dependencies to extras depends on contributors from the community volunteering to implement and review these changes. We're in the process of updating the contributing guide to improve the contributor experience and lower the barriers to contributing. 😄

codesue linked a pull request Nov 9, 2022 that will close this issue

Remove tfx dependencies #237

Closed

codesue mentioned this issue Nov 17, 2022

Deprecate ModelCardGenerator component #248

Merged

4 tasks

codesue linked a pull request Nov 17, 2022 that will close this issue

Deprecate ModelCardGenerator component #248

Merged

4 tasks

codesue removed a link to a pull request Nov 17, 2022

Remove tfx dependencies #237

Closed

codesue closed this as completed in #248 Dec 3, 2022

codesue reopened this Dec 3, 2022

codesue removed a link to a pull request Dec 6, 2022

Deprecate ModelCardGenerator component #248

Merged

4 tasks

codesue added help wanted Extra attention is needed contributions welcome This issue is ready to be worked on labels Dec 7, 2022

codesue self-assigned this Dec 8, 2022

codesue mentioned this issue Dec 10, 2022

Create a basic model-card-toolkit package #257

Closed

4 tasks

codesue removed their assignment Dec 16, 2022

codesue mentioned this issue Apr 1, 2023

Add support for apple silicon #264

Open

codesue self-assigned this May 11, 2023

codesue added work in progress Someone is working on this issue and removed help wanted Extra attention is needed labels May 11, 2023

codesue mentioned this issue May 14, 2023

Make TensorFlow dependencies optional #275

Merged

4 tasks

codesue added the installation Installation and dependency problems label May 15, 2023

codesue closed this as completed in #275 May 20, 2023

codesue removed contributions welcome This issue is ready to be worked on work in progress Someone is working on this issue labels May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the list of dependencies #93

Question on the list of dependencies #93

adrinjalali commented Feb 23, 2021

amadeuspzs commented Aug 20, 2021

vishwanath-prudhivi commented Sep 8, 2021

amadeuspzs commented Sep 8, 2021

vishwanath-prudhivi commented Sep 8, 2021

codesue commented Nov 9, 2022

codesue commented Dec 7, 2022

Question on the list of dependencies #93

Question on the list of dependencies #93

Comments

adrinjalali commented Feb 23, 2021

amadeuspzs commented Aug 20, 2021

vishwanath-prudhivi commented Sep 8, 2021

amadeuspzs commented Sep 8, 2021

vishwanath-prudhivi commented Sep 8, 2021

codesue commented Nov 9, 2022

codesue commented Dec 7, 2022