Skip to content

Produces subject set and classification exports suitable for AutoML

License

Notifications You must be signed in to change notification settings

zooniverse/hamlet

Repository files navigation

Hosted AutoML Export Transformer (Hamlet)

Hamlet is a website that connects Zooniverse data from Panoptes with external data services, e,g. sending a camera trap project's animal-filled photos to an animal-identifying machine-learning system.

Auto ML Export

Subject Assistant

Hamlet has an export feature that ties into the Zooniverse Machine Learning Subject Assistant, (app) (source) which lets project owners/researchers submit their camera trap photos to an external Machine Learning (ML) service, which in turn finds animals in those images.

User Story

The user story is as follows:

  • Users start at the Subject Assistant app.
  • Users are directed to Hamlet, where they choose a Subject Set to export to the external ML Service.
  • Hamlet performs the export feature, and provides users with a link back to the Subject Assistant with an "ML Task ID" - e.g. https://subject-assistant.zooniverse.org/#/tasks/6378
  • Users click that link, and process the ML-tagged photos on the Subject Assistant app.

External Dependencies

The Subject Assistant requires the following external systems:

  • Machine Learning Service - in this case, powered by Microsoft.
  • an Azure Storage Container - works in conjunction with the ML Service, which requires "subject manifest" files to be stored on Azure.

As of late 2022, these services are maintained by the Zooniverse team.

Environmental Variables

The Subject Assistant feature requires the following ENV variables defined:

  • SUBJECT_ASSISTANT_AZURE_ACCOUNT_NAME
  • SUBJECT_ASSISTANT_AZURE_ACCOUNT_KEY
  • SUBJECT_ASSISTANT_AZURE_CONTAINER_NAME
  • SUBJECT_ASSISTANT_ML_SERVICE_CALLER_ID - provided by our friends in Microsoft who run the ML Service.
  • SUBJECT_ASSISTANT_ML_SERVICE_URL - ditto

Optionally, the following ENV variables can be defined:

  • SUBJECT_ASSISTANT_EXTERNAL_URL - defaults to http://subject-assistant.zooniverse.org/#/tasks/

Mechanics: Django Pages/Views

The ML Subject Assistant feature in Hamlet has two views:

  • GET /subject-assistant/<int:project_id>/ - lists all the Subject Sets for a Project, along with their "ML export" status and (if the export is successful) a link back to the Subject Assistant app.
  • POST /subject-assistant/<int:project_id>/subject-sets/<int:subject_set_id>/ - performs the ML Export action for a given Subject Set, then redirects users back to the listing page.

Mechanics: Database Model

The MLSubjectAssistantExport table has the following fields:

  • subject_set_id - the ID of the Zooniverse Subject Set that was exported to the external ML Service
  • json - the "subject manifest" file, in JSON format, created from all the Subjects of the Subject Set. The format is specific to the ML Service.
  • azure_url - the URL of the "subject manifest" file that was uploaded to an external Azure storage container. (See Mechanics: ML Export Action for why)
  • ml_task_uuid - the task request ID or "job ID" for the ML Export action. This is generated by the external ML Service.

Mechanics: ML Export Action

Mechanically, the ML Subject Assistant's "export to Microsoft" action performs the following:

  1. get all the Subjects for a given Subject Set (pulling from Panoptes)
  2. create a JSON file - the "subject manifest" - that describes the Subjects to be exported, in a format specified by the external ML Service.
  3. upload the JSON file to an external Azure storage container (reason: the current external ML Service only reads subject manifest files from Azure), then create a "shareable URL" to that JSON file. (Clarification: Azure uses a SAS or Shared Access Signature tokens to create shareable URLs with limited lifespans.)
  4. Submit the shareable URL to the ML Service, and get the "job ID" it returns.

The Job ID plus the known Subject Assistant app URL is all that's required to construct a "return URL" for the user.

Development

Use docker & docker-compose to setup a development env.

  1. Run docker-compose build to build the app container.
  2. Run the tests docker-compose run -T --rm app bundle exec pytest --cov=hamlet

Alternatively you can use docker & compose to run an interactive bash shell for development and testing

  1. Run docker-compose run --service-ports --rm app bash to start the containers
  2. Run pytest --cov=hamlet to run the test suite in that shell (sadly this system has no tests :sadpanda:)
  3. Or ./start_server.sh to run the server (see Pipfile)

Troubleshooting

I can't login on local development

Problem:

  • You're able to run docker-compose build ; docker-compose up, and you can view Hamlet on local development on http://localhost:8080
  • However, when you click on the "Login with Zooniverse" button and provide your details on the Panoptes login page, you

Analysis:

  • It's likely that your instance of Hamlet is missing the PANOPTES_APPLICATION_ID and PANOPTES_SECRET environment variables.
  • These env vars are required to tell Panoptes which oAuth application you're logging into.

Solution:

  • Go to Panoptes's oAuth applications list, find the Hamlet app, and copy the Application ID and Secret
  • Add these to your local development Docker's environment variables, as PANOPTES_APPLICATION_ID and PANOPTES_SECRET
    • This can be done easily by creating a .env file in the root folder of your hamlet repo.

Related issue: 479

The database won't start on local development

Problem:

  • When you run docker-compose build ; docker-compose up, you notice that the PostgreSQL database isn't running.
  • There's probably a few error message in the console: app_1 will continuously complain that it's trying (and failing) to find the PostgreSQL database, while postgres_1 might say something about "can't initialise due to incompatible database".

Analysis:

  • It's possible that your existing local PostgreSQL database (i.e. the /postgres_data folder) was built on an older version of PostgreSQL, and recent updates to Hamlet have upgraded the PostgreSQL that Hamlet uses, causing an incompatibility.

Solution:

  • Check if you have an existing /postgres_data folder in your local hamlet repo.
  • If yes, delete it. The next time you start Hamlet, the database will be rebuilt with the latest version.

Useful application scripts

  • console: python manage.py shell
  • create_local_db: createdb -U halmet -O hamlet hamlet
  • drop_local_db: dropdb -U hamlet hamlet
  • makemigrations: python manage.py makemigrations
  • migrate python manage.py migrate
  • server: bash -e ./start_server.sh
  • tests: pytest --cov=hamlet
  • tree: bash -c 'find . | grep -v git | grep -v cache'
  • worker: bash -c ./start_worker.sh

Updating a package with peotry

  • poetry update django

See Poetry docs for more details