Biochat 2 - matching bioinformatics dataset for fun and profit

The project's initial concept is by Bohdan Khomtchouk:

a 24/7 artificial intelligence system that's using NLP techniques to pair, organize, and group together different biological datasets, such that you could query based on a set of keywords (e.g., "cancer", "leukemia", "mouse"), and it would return to you datasets that are most like each other and most deserving of being considered integratively (i.e., analyzing both or three together could unlock an interesting medical result that could not otherwise be found by analyzing just one dataset alone)

The plan for the first version is:

Scrape GEO, then extract the title, summary, etc. from each entry (entry example: https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4303)
Store all this structured metadata in a database
Teach an algorithm to match similar groups (e.g., if organism is "Mus musculus", i.e. mouse) then group them together (which is easy), but also be able to spot "leukemia" as a cancer type, so group it together with other cancer types

Stages of the development of matching algorithm:

Direct matching on a per-word or per-phase basis
Similarity matching using vector space modeling. Using word vectors from https://github.com/cambridgeltl/BioNLP-2016 and the doc2vec approach

Installation

Additionally to having Quicklisp you'll need to clone crawlik to ~/common-lisp/.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/geo		data/geo
src		src
.gitignore		.gitignore
README.md		README.md
biochat2.asd		biochat2.asd
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biochat 2 - matching bioinformatics dataset for fun and profit

Installation

About

Releases

Packages

Languages

vseloved/biochat2

Folders and files

Latest commit

History

Repository files navigation

Biochat 2 - matching bioinformatics dataset for fun and profit

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages