🏔️Mountain Named Entity Recognition (NER) Project

This project is a Named Entity Recognition (NER) system that identifies mountain names in text using a custom-trained model based on the BERT architecture.

📑Table of Contents

📖Project Overview
🛠️Installation
📊 Dataset Creation
🤖Model
🌐Hugging Face
🚀Usage
📁Files

📖Project Overview

This NER project is designed to recognize mountain names within various texts. We leverage the transformers library from Hugging Face to train, evaluate, and perform inference using a BERT-based model. The model can be used to highlight mountain names in text for tasks such as automatic labeling of geographic documents or enriching content with additional semantic information.

✨ Key Features:

Train a BERT model for NER.
Evaluate the model using confusion matrix, precision-recall, and ROC curve metrics.
Perform inference on custom texts to identify mountain names.
Highlight identified mountain names in the output.

🛠️Installation

To run this project, you need Python 3.7+ and the required dependencies listed in the requirements.txt file.

Clone repository:

git https://github.com/sofibrezden/Named-Entity-Recognition-Mountains.git
cd Named-Entity-Recognition-Mountains

Create a Virtual Environment:

python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\\Scripts\\activate

Install dependencies:

pip install -r requirements.txt

📊Dataset Creation

The dataset was created by:

Filtering tokens: Retained only mountain-related tokens, labeling them as 1, while all other tokens were labeled as 0.
Dataset reduction: Kept all samples containing mountain entities and included a small fraction of non-mountain samples for balance.

🤖Model

The model used for this task is bert-base-cased, fine-tuned for Named Entity Recognition (NER) to detect mountain names. It was trained with the following configurations:

Number of epochs: 5
Optimizer: Adam with learning rate of 2e-5
Batch size: 8 (with gradient accumulation)
Early stopping: Applied with a patience of 3 epochs
Loss function: Cross-entropy with token classification.

🌐Hugging Face

The trained model and tokenizer are available on Hugging Face: 👉link

🚀Usage

This project offers two main functionalities: model training and inference. 1. Model Training: You can train the model using the model_training.py script. This script loads the dataset, fine-tunes the BERT model for NER, and saves the trained model.

Training Command:

python model_training.py --model-path ./saved_model --epochs 5

2. Model Inference: Once the model is trained, you can use it for inference with the model_inference.py script. This script loads the trained model from the specified directory and performs NER on the input text. Inference Command:

python model_inference.py --model-path ./saved_model --input-text "I love the Rocky Mountains and Mount Everest."

3. Model Download from Hugging Face: Alternatively, if you don't want to train the model, you can download the pre-trained model directly from Hugging Face by specifying:

model = AutoModelForTokenClassification.from_pretrained('sofibrezden/ner-model-mountains')
tokenizer = AutoTokenizer.from_pretrained('sofibrezden/ner-model-mountains')

📁Files

dataset_processing.ipynb: This notebook contains the preprocessing steps for the dataset used in model training. It involves filtering mountain-related tokens, labeling them, and reducing the dataset by keeping all samples with mountain tokens while retaining a small fraction of other samples. The processed dataset is then saved and used in the model training pipeline.
model_training.py: This script handles model training. It loads the dataset, preprocesses the data, and fine-tunes the BERT model for Named Entity Recognition (NER). The trained model is saved in the specified directory.
model_inference.py: Script for performing inference with the trained model. It loads the saved model and tokenizer, processes custom text inputs, and highlights mountain names in the output.
demo.ipynb: Jupyter Notebook that demonstrates the evaluation of the pre-trained model from Hugging Face. It includes model evaluation, inference on test data, and visualizations like the confusion matrix, ROC curve, and precision-recall curve.
data/: This directory stores the training, validation, and test datasets used for model training.
improvements_report.pdf: This file outlines potential improvements for the project.

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏔️Mountain Named Entity Recognition (NER) Project

📑Table of Contents

📖Project Overview

🛠️Installation

📊Dataset Creation

🤖Model

🌐Hugging Face

🚀Usage

📁Files

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
img		img
Readme.md		Readme.md
dataset_processing.ipynb		dataset_processing.ipynb
demo.ipynb		demo.ipynb
improvements_report.pdf		improvements_report.pdf
model_inference.py		model_inference.py
model_training.py		model_training.py
requirements.txt		requirements.txt

sofibrezden/Named-Entity-Recognition-Mountains

Folders and files

Latest commit

History

Repository files navigation

🏔️Mountain Named Entity Recognition (NER) Project

📑Table of Contents

📖Project Overview

🛠️Installation

📊Dataset Creation

🤖Model

🌐Hugging Face

🚀Usage

📁Files

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages