Skip to content
View WHaverals's full-sized avatar

Highlights

  • Pro

Block or report WHaverals

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,087 488 Updated Jul 11, 2024

An e-paper dashboard for a Raspberry Pi Zero W.

Python 53 3 Updated Oct 27, 2024

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 23,185 2,308 Updated Mar 7, 2025

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 6,082 521 Updated Mar 7, 2025

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 2,128 127 Updated Dec 24, 2024

Interpretability for sequence generation models 🐛 🔍

Python 406 36 Updated Nov 10, 2024

Detect and align similar passages

Python 98 15 Updated Feb 3, 2025

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

JavaScript 187 24 Updated Feb 5, 2025

Detect text reuse and document similarity

R 199 34 Updated Feb 14, 2025

Collection of tutorials for DeezyMatch (https://github.com/Living-with-machines/DeezyMatch)

Jupyter Notebook 7 Updated Oct 16, 2024

A big list of homoglyphs and some code to detect them

JavaScript 572 69 Updated Aug 22, 2024

Python character encoding detector

Python 2,231 262 Updated Jan 13, 2025

Python tools for interacting with Wikidata

Python 152 18 Updated Oct 28, 2023

VIAF via Python

Python 10 3 Updated Apr 24, 2024
Python 56 19 Updated Oct 15, 2024

Fixes mojibake and other glitches in Unicode text, after the fact.

Python 3,869 121 Updated Oct 30, 2024

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Python 2,654 297 Updated Jun 4, 2024

Official code and data repository for our EMNLP 2020 long paper "Reformulating Unsupervised Style Transfer as Paraphrase Generation" (https://arxiv.org/abs/2010.05700).

HTML 235 46 Updated Jun 13, 2022

Paper List for Style Transfer in Text

1,620 194 Updated Mar 16, 2023

Everything you need to build state-of-the-art foundation models, end-to-end.

Python 7,653 543 Updated Mar 7, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 61 6 Updated Oct 10, 2024

Letta (formerly MemGPT) is a framework for creating LLM services with memory.

Python 14,894 1,586 Updated Mar 6, 2025

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 47,692 5,072 Updated Jan 22, 2025

NeuroHack 2025 wintersession hackathon

2 Updated Jan 12, 2025

Greatest Hits Versus Deep Cuts: Exploring Variety in Set-lists Across Artists and Musical Genres

1 Updated Dec 6, 2024

Language-agnostic BERT Sentence Embedding (LaBSE)

Python 147 10 Updated Sep 10, 2020

A neural word aligner based on multilingual BERT

Python 339 50 Updated Mar 10, 2022

Multilingual sentence alignment using sentence embeddings

Python 109 52 Updated Nov 4, 2024

Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.

Python 4,599 366 Updated Mar 3, 2025
Next
Showing results