Skip to content
View mllife's full-sized avatar

Block or report mllife

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Step-by-Step Guide to Scraping eBay Product Data

4 1 Updated Jan 3, 2025

Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do

Jupyter Notebook 4,536 2,518 Updated Jun 1, 2024

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024

Python 1,887 171 Updated Mar 27, 2025

Perforator is a cluster-wide continuous profiling tool designed for large data centers

C++ 3,097 138 Updated Mar 27, 2025

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo…

Python 5,446 368 Updated Mar 27, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,329 639 Updated Feb 10, 2025

基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。

Python 252 20 Updated Jan 10, 2025

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 29,193 2,303 Updated Mar 27, 2025

A curated list of awesome packages, articles, and other cool resources from the Scrapy community.

544 63 Updated Dec 28, 2022

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

TypeScript 8,399 656 Updated Mar 27, 2025

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 33,164 2,856 Updated Mar 27, 2025

Python scraper based on AI

Python 18,824 1,594 Updated Mar 27, 2025

Lightweight and extensible compatibility layer between dataframe libraries!

Python 897 134 Updated Mar 27, 2025

Machine Learning Natural Language Processing analysis of earnings call transcripts for logistic regression classification to make 'buy', 'sell' or 'hold' calls on stocks.

Jupyter Notebook 55 22 Updated May 9, 2023

Collection of publicly available IPTV channels from all over the world

JavaScript 91,331 3,273 Updated Mar 27, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 38,876 4,885 Updated Aug 16, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 37,308 4,418 Updated Aug 19, 2024

A course on aligning smol models.

Jupyter Notebook 5,654 1,974 Updated Jan 24, 2025

NVIDIA AI Blueprint for multimodal PDF data extraction for enterprise RAG

316 41 Updated Mar 24, 2025

Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.

C++ 48 8 Updated Jan 27, 2025

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 2,141 127 Updated Dec 24, 2024

A Unified Toolkit for Deep Learning-Based Table Extraction

Python 33 2 Updated Nov 21, 2024

A PyTorch implementation of DTrOCR: Decoder-only Transformer for Optical Character Recognition

Python 143 17 Updated Mar 22, 2025

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Python 454 108 Updated Jul 4, 2022

Get your documents ready for gen AI

Python 25,489 1,521 Updated Mar 26, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,939 516 Updated Mar 27, 2025

Simple package to extract text with coordinates from programmatic PDFs

C++ 85 18 Updated Mar 25, 2025

A High-efficiency Open-source Toolkit for Table-to-Latex Task

Python 223 19 Updated Dec 12, 2024

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 7,160 494 Updated Jan 3, 2025
Next
Showing results