Skip to content

wilrev/MinimizingHumanIntervention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Code for Minimizing Human Intervention in Online Classification

This repository contains the Python code used for the experiments and figures in the paper.

Folder Structure

➤ code for illustrative figures/ contains the code used to generate Figures 1,2,3 & 7 from the paper.
➤ synthetic experiment/ contains a main file (synthetic_experiment.py) that outputs a .pkl file and an auxiliary file (latex_table_from_pkl.py) that generates a LaTeX table from the .pkl file, summarizing the results of the experiments.
➤ realdata experiment/ contains the .py files necessary to run the experiments on real-world datasets: encode/ contains the code used to convert the dataset queries into embeddings for each text embedding model used, GHC_all_distances.py contains an implementation of the GHC algorithm with multiple distances. faiss.py contains an implementation of the GHC algorithm using the FAISS library, which significantly speeds up computation. run_CC.py and run_SKM.py contain an implementation of the Center-Based Classifier and Sequential k-Means algorithms respectively.

Requirements

Datasets

The datasets are not directly provided in this repository due to their size. They are available at the following anonymous Google Drive link:
https://drive.google.com/drive/folders/1K2A_8CkU6fXjy9dsDotJi8gJYrt2viNo?usp=sharing
There, the *_embeddings/ Folders contain the datasets used for evaluation and the computed embeddings of the queries.

Libraries

Install the necessary libraries before running the real data experiments:

nginx
Copy
Edit
pip install torch torchvision transformers datasets faiss-cpu

Running the Code with FAISS:

nginx
Copy
Edit
python faiss.py
Run with custom distance:
css
Copy
Edit
python GHC_all_distances.py --distance_type [your_choice]
Replace [your_choice] with one of the supported distance types (NEARESTQUERY, EUCLIDEAN, SPHERICAL).

License

MIT License.

About

Code for "Minimizing Human Intervention in Online Classification".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages