Skip to content

Matching bioinformatics dataset for fun and profit

Notifications You must be signed in to change notification settings

vseloved/biochat2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Biochat 2 - matching bioinformatics dataset for fun and profit

The project's initial concept is by Bohdan Khomtchouk:

a 24/7 artificial intelligence system that's using NLP techniques to pair, organize, and group together different biological datasets, such that you could query based on a set of keywords (e.g., "cancer", "leukemia", "mouse"), and it would return to you datasets that are most like each other and most deserving of being considered integratively (i.e., analyzing both or three together could unlock an interesting medical result that could not otherwise be found by analyzing just one dataset alone)

The plan for the first version is:

  • Scrape GEO, then extract the title, summary, etc. from each entry (entry example: https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4303)
  • Store all this structured metadata in a database
  • Teach an algorithm to match similar groups (e.g., if organism is "Mus musculus", i.e. mouse) then group them together (which is easy), but also be able to spot "leukemia" as a cancer type, so group it together with other cancer types

Stages of the development of matching algorithm:

Installation

Additionally to having Quicklisp you'll need to clone crawlik to ~/common-lisp/.

About

Matching bioinformatics dataset for fun and profit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published