Skip to content
View hugochan's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report hugochan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Python 389 79 Updated Aug 10, 2024

This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.

Jupyter Notebook 1,824 196 Updated Feb 17, 2025

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Python 19,890 2,497 Updated Mar 30, 2025

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.

Python 1,203 184 Updated Mar 28, 2025

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Python 1,567 241 Updated Mar 25, 2025

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 5,153 488 Updated Aug 15, 2024

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 294,979 49,046 Updated Dec 2, 2024

Build resilient language agents as graphs.

Python 10,825 1,805 Updated Mar 29, 2025

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 16,440 1,154 Updated Mar 14, 2025

Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 36,149 2,789 Updated Mar 27, 2025

Knowledge Agents and Management in the Cloud

Python 3,832 376 Updated Mar 29, 2025

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,362 604 Updated Feb 21, 2025

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 10,704 886 Updated Mar 29, 2025

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…

Python 2,544 278 Updated Jun 24, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

TypeScript 47,139 4,356 Updated Mar 29, 2025

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 33,704 3,123 Updated Mar 29, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 43,362 4,822 Updated Mar 26, 2025

Grok open release

Python 50,250 8,360 Updated Aug 30, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,936 1,055 Updated Mar 30, 2025

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 9,889 908 Updated Mar 29, 2025

Yuan 2.0 Large Language Model

Python 686 87 Updated Jul 11, 2024

Modular Python framework for AI agents and workflows with chain-of-thought reasoning, tools, and memory.

Python 2,237 188 Updated Mar 28, 2025

🐢 Open-Source Evaluation & Testing for AI & LLM systems

Python 4,404 305 Updated Mar 27, 2025

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 59,556 6,028 Updated Aug 24, 2024

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Python 8,389 832 Updated Mar 28, 2025

Salesforce open-source LLMs with 8k sequence length.

Python 716 39 Updated Jan 31, 2025

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …

Go 12,938 909 Updated Mar 30, 2025

中文法律LLaMA (LLaMA for Chinese legel domain)

Python 921 126 Updated Aug 28, 2024
Next
Showing results