Data Processing and Machine learning methods for the Open Skills Project
Clone or download
pyup-bot and thcrock Scheduled weekly dependency update for week 45 (#253)
* Update requests from 2.20.0 to 2.20.1

* Update pytest from 3.10.0 to 3.10.1

* Update pytest from 3.10.0 to 3.10.1

* Update sqlalchemy from 1.2.13 to 1.2.14

* Update psycopg2 from 2.7.5 to 2.7.6.1

* Pin multiprocess to latest version 0.70.6.1

* Pin tensorflow to latest version 1.12.0

* Pin graphviz to latest version 0.10.1
Latest commit 703a8fb Nov 12, 2018
Permalink
Failed to load latest commit information.
docs Multiprocessing embedding (#241) Nov 6, 2018
examples chnage string_cleaner module name to nlp (#239) Oct 22, 2018
skills_ml Bug Fixed and Couple Changes for Skils-ML Tour (#251) Nov 8, 2018
tests Bug Fixed and Couple Changes for Skils-ML Tour (#251) Nov 8, 2018
tmp Job vectorizer (#18) Dec 29, 2016
.gitignore Add logging to warnigns May 17, 2018
.pyup.yml create pyup.io config file Jan 23, 2018
.travis.yml Fix moto (#252) Nov 8, 2018
50_sample.json.gz Create JobPostingCollectionSample [Resolves #160] Apr 27, 2018
LICENSE.md Update LICENSE.md Apr 13, 2018
MANIFEST.in Refactoring Occupation Classifier (#181) Jun 29, 2018
README.md update README Apr 24, 2018
__init__.py Clean up evaluation, output ranked and unranked CSVs Oct 24, 2016
api_v1_db_example.yaml Sync to API v1 DB (#25) Feb 20, 2017
circle.yml Refactoring Occupation Classifier (#181) Jun 29, 2018
pytest.ini Add tox, travis, sphinx, codecov Dec 10, 2016
requirements.txt Scheduled weekly dependency update for week 45 (#253) Nov 12, 2018
requirements_addon.txt Scheduled weekly dependency update for week 45 (#253) Nov 12, 2018
requirements_dev.txt Scheduled weekly dependency update for week 45 (#253) Nov 12, 2018
requirements_viz.txt Scheduled weekly dependency update for week 45 (#253) Nov 12, 2018
sample_job_listing.json Decouple property computation from aggregation [Resolves #125] (#140) Mar 22, 2018
setup.cfg Bump version: 2.0.0 → 2.1.0 Jul 16, 2018
setup.py Simple visualization module in a viz addon (#250) Nov 7, 2018
tox.ini change to python 3.6 Apr 17, 2018

README.md

skill-ml

Build Status Code Coverage Updates Python 3 PyPI Code Climate

Open Skills Project - Machine Learning

This is the library for the methods usable by the Open Skills API, including processing algorithms and utilities for computing our jobs and skills taxonomy.

Documentation

Hosted on Github Pages

Quick Start

1. Virtualenv

skills-ml depends on python3.6, so create a virtual environment using a python3.6 executable.

virtualenv venv -p /usr/bin/python3.6

Activate your virtualenv

source venv/bin/activate

2. Installation

pip install skills-ml

3. Import skills_ml

import skills_ml

skills-ml doesn't have a tutorial yet, but here are some useful places to start.

  • There are a couple of examples of specific uses of components to perform specific tasks in examples.
  • Check out the descriptions of different algorithm types in algorithms/ and look at any individual directories that match what you'd like to do (e.g. skill extraction, job title normalization)
  • skills-airflow is the open-source production system that uses skills-ml algorithms in an Airflow pipeline to generate open datasets

Building the Documentation

skills-ml uses a forked version of pydocmd, and a custom script to keep the pydocmd config file up to date. Here's how to keep the docs updated before you push:

$ cd docs $ PYTHONPATH="../" python update_docs.py # this will update docs/pydocmd.yml with the package/module structure $ pydocmd serve # will serve local documentation that you can check in your browser $ pydocmd gh-deploy # will update the gh-pages branch

Structure

  • algorithms/ - Core algorithmic module. Each submodule is meant to contain a different type of component, such as a job title normalizer or a skill tagger, with a common interface so different pipelines can try out different versions of the components.
  • datasets/ - Wrappers for interfacing with different datasets, such as ONET, Urbanized Area.
  • evaluation/ - Code for testing different components against each other.

Contributors

License

This project is licensed under the MIT License - see the LICENSE.md file for details.