Open Skills Project - Machine Learning
This is the library for the methods usable by the Open Skills API, including processing algorithms and utilities for computing our jobs and skills taxonomy.
skills-ml depends on python3.6, so create a virtual environment using a python3.6 executable.
virtualenv venv -p /usr/bin/python3.6
Activate your virtualenv
pip install skills-ml
3. Import skills_ml
skills-ml doesn't have a tutorial yet, but here are some useful places to start.
- There are a couple of examples of specific uses of components to perform specific tasks in examples.
- Check out the descriptions of different algorithm types in algorithms/ and look at any individual directories that match what you'd like to do (e.g. skill extraction, job title normalization)
- skills-airflow is the open-source production system that uses skills-ml algorithms in an Airflow pipeline to generate open datasets
Building the Documentation
skills-ml uses a forked version of pydocmd, and a custom script to keep the pydocmd config file up to date. Here's how to keep the docs updated before you push:
$ cd docs $ PYTHONPATH="../" python update_docs.py # this will update docs/pydocmd.yml with the package/module structure $ pydocmd serve # will serve local documentation that you can check in your browser $ pydocmd gh-deploy # will update the gh-pages branch
- algorithms/ - Core algorithmic module. Each submodule is meant to contain a different type of component, such as a job title normalizer or a skill tagger, with a common interface so different pipelines can try out different versions of the components.
- datasets/ - Wrappers for interfacing with different datasets, such as ONET, Urbanized Area.
- evaluation/ - Code for testing different components against each other.
This project is licensed under the MIT License - see the
LICENSE.md file for details.