Med-Detect AI

Advanced NLP methods using RoBERTa model to extract features and classify the data present in the medical notes into the 5 major medical specialities

Neurology 🧠
Urology 🧎‍♂️
Radiology 📹
Orthopedic 🦴
Gastroenterology 🚽

NLP Pipeline ⚒️

A basic Natural Language Processing (NLP) pipeline for text data processing and preparation covers essential steps such as text cleaning, tokenization, normalization, stopword removal, POS tagging, feature extraction, and data splitting. 📋

Pipeline Steps ⚙️

The NLP pipeline consists of the following steps:

🧹 Text Cleaning: Remove unnecessary characters, symbols, and punctuation marks from the raw text data.
✂️ Tokenization: Split the cleaned text into individual tokens or words.
📏 Normalization: Standardize the tokens by reducing them to a common format using techniques like stemming or lemmatization.
🚫 Stopword Removal: Eliminate common stopwords from the normalized text to reduce noise.
🏷️ POS Tagging: Assign grammatical tags to each token, indicating its part-of-speech category.
💡 Feature Extraction: Convert the tagged text into numerical feature vectors, such as BOW, TF-IDF, or word embeddings.
⚖️ Data Splitting: Divide the preprocessed text data into training, validation, and testing sets for model training and evaluation.

RoBERTa Model 📖

The RoBERTa Model repository provides an introduction to the RoBERTa model, a pre-trained language model based on the BERT architecture. It offers improvements in training methodology and achieves state-of-the-art performance on various NLP tasks. The repository includes examples of fine-tuning RoBERTa for specific applications. 📚

Model Features 🚀

Enhanced language representation compared to BERT.
Achieves state-of-the-art performance on various NLP tasks.
Supports fine-tuning for specific applications.
Robust and optimized for diverse natural language processing tasks.

Why this model for Clinical Notes? 🏥

1.Clinical notes are a crucial source of information in the healthcare domain.

2.It containing valuable insights for medical diagnosis, treatment, and research.

3.By training the RoBERTa model on clinical notes, it can learn to understand the nuances of medical language and effectively extract relevant information.

4.The RoBERTa model's large-scale pre-training enables it to capture the complexities of clinical text, leading to improved performance in medical NLP tasks. 💡

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
templates		templates
utility		utility
.gitignore		.gitignore
README.md		README.md
application.py		application.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Med-Detect AI

NLP Pipeline ⚒️

Pipeline Steps ⚙️

RoBERTa Model 📖

Model Features 🚀

Why this model for Clinical Notes? 🏥

About

Releases

Packages

Contributors 3

Languages

urpreetam/med_detect

Folders and files

Latest commit

History

Repository files navigation

Med-Detect AI

NLP Pipeline ⚒️

Pipeline Steps ⚙️

RoBERTa Model 📖

Model Features 🚀

Why this model for Clinical Notes? 🏥

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages