Skip to content

Human-in-the-Loop#935

Merged
jeanqussa merged 47 commits intomainfrom
dev-human-in-the-loop
Sep 1, 2024
Merged

Human-in-the-Loop#935
jeanqussa merged 47 commits intomainfrom
dev-human-in-the-loop

Conversation

@ralf-berger
Copy link
Copy Markdown
Collaborator

@ralf-berger ralf-berger commented Jun 28, 2024

What is PR trying to achieve?

This PR is meant to incorporate Human-in-the-Loop in the KG generation process, as well as greatly increase processing speed and reduce system requirements for the generation and expansion of knowledge graphs by using a database containing structured data of every Wikipedia article.

How does this PR implement it?

It adds the following main features:

  • A data preprocessing step which downloads the Wikipedia dump, extracts structure from it, imports it into a PostgreSQL database, and uploads the dump to S3-compatible storage.
  • A PostgreSQL instance that downloads the SQL dump and initializes the database with it.
  • Splits the coursemapper-kg-worker-concept-map service into 3 services: coursemapper-kg-worker-concept-map, coursemapper-kg-worker-modify-graph, and coursemapper-kg-worker-expand-material.
  • Adds the ability for course moderators to edit the knowledge graph before expansion and publishing.

@ralf-berger ralf-berger self-assigned this Jun 28, 2024
@ralf-berger ralf-berger force-pushed the dev-human-in-the-loop branch from c3affae to 925d090 Compare June 28, 2024 09:02
@ralf-berger
Copy link
Copy Markdown
Collaborator Author

ralf-berger commented Jul 1, 2024

  • Add a proper description of the PRs purpose, implementation and effect on the project (@jeanqussa)
  • Update CI to build the new services (additional directory hierarchy level)
  • Create container image repositories on Docker Hub:
    • socialcomputing/coursemapper-webserver-coursemapper-kg-concept-map
    • socialcomputing/coursemapper-webserver-coursemapper-kg-preprocess
    • socialcomputing/coursemapper-webserver-coursemapper-kg-recommendation
    • socialcomputing/coursemapper-webserver-coursemapper-kg-wp-pg
  • Update K8s runtime configuration (according to Compose changes?)

@ralf-berger ralf-berger force-pushed the dev-human-in-the-loop branch from 3f4f93b to 8575665 Compare July 1, 2024 10:52
@ralf-berger ralf-berger changed the title Allow for development of a human in the loop Allow for development of a human in the loop (?) Jul 1, 2024
@ralf-berger
Copy link
Copy Markdown
Collaborator Author

K8s config:

  • new component: coursemapper-webserver-coursemapper-kg-wp-pg
  • renamed: coursemapper-kg-worker-concept-map → coursemapper-kg-concept-map
  • renamed: coursemapper-kg-worker-recommendation → coursemapper-kg-recommendation

@jeanqussa jeanqussa changed the title Allow for development of a human in the loop (?) Human-in-the-Loop Jul 5, 2024
@jeanqussa
Copy link
Copy Markdown
Collaborator

This PR is meant to incorporate Human-in-the-Loop in the KG generation process.

It adds the following main features:

  • A data preprocessing step which downloads the Wikipedia dump, extracts structure from it, imports it into a PostgreSQL database, and uploads the dump to S3-compatible storage.
  • A PostgreSQL instance that downloads the SQL dump and initializes the database with it.
  • Splits the coursemapper-kg-worker-concept-map service into 3 services: coursemapper-kg-worker-concept-map, coursemapper-kg-worker-modify-graph, and coursemapper-kg-worker-expand-material.
  • Adds the ability for course moderators to edit the knowledge graph before expansion and publishing.

Next steps:

  • Properly test the new workflow
  • Deploy
    • New service memory limits:
      • coursemapper-kg-worker-concept-map: 6 GiB
      • coursemapper-kg-worker-modify-graph: 2 GiB
      • coursemapper-kg-worker-expand-material: 3 GiB
    • Add secrets to environment variables in coursemapper-webserver-coursemapper-kg-wp-pg:
      • ACCESS_KEY_ID
      • SECRET_ACCESS_KEY

@jeanqussa
Copy link
Copy Markdown
Collaborator

A new volume has been added, coursemapper-kg-wp-pg-meta.

@ralf-berger ralf-berger force-pushed the dev-human-in-the-loop branch from 9762036 to 78447b4 Compare July 24, 2024 09:37
@jeanqussa
Copy link
Copy Markdown
Collaborator

jeanqussa commented Aug 28, 2024

@ralf-berger when would be a good time to merge this branch into main, considering the changes in the deployment that need to be made?

@ralf-berger
Copy link
Copy Markdown
Collaborator Author

@ralf-berger when would be a good time to merge this branch into main, considering the changes in the deployment that need to be made?

Should be ready, albeit completely untested. Feel free to go ahead with the merge.

@jeanqussa jeanqussa merged commit 9002765 into main Sep 1, 2024
@jeanqussa jeanqussa deleted the dev-human-in-the-loop branch September 1, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants