Modular log parser that parses @nasa's apache logs and processes them.
-
Updated
Aug 23, 2020 - Python
Modular log parser that parses @nasa's apache logs and processes them.
Automatically extract relevant data from invoices by processing their .pdf/.xml files.
A repository with our team's final Python project in MGMT 590 Analyzing Unstructured Data course at Krannert School of Management, Purdue University.
LLM Models on Unstructured Data
This project uses the CrewAI framework to automate stock analysis, enabling AI agents to collaborate and execute complex tasks efficiently. Example stock: Nvidia. Technologies include Python, CrewAI, Unstructured, PyOWM, Tools, Wikipedia, yFinance, SEC-API, tiktoken, faiss-cpu, python-dotenv, langchain-community, langchain-core, and OpenAI.
Implementation of CNN models(Resnet-34 and Resnet-50) to classify garbage images into 6 major categories for sustainable development and its disposability.
Streaming meets LLM: Real-time Hacker News to Milvus/Zilliz with streaming SQL
A chatbot and accompanying utilities for quickly making sense of and getting answers about large, unstructured corpora.
Benchmarking unstructured data extraction libraries
A Python library that uses AI to convert unstructured files (like PDFs, HTML, etc.) into structured data.
A repository dealing with the ability to use LLMs for semantic search. The data considered are specific curated documents targetting closed domain search. This is created to show how relatively simple it is to use these methods and increase productivity within an org.
Specifically built for the research proposal: Estimating sector attention index with deep learning methods : example of Chinese stock market, Jan. 4, 2024.
Extract cryptocurrency addresses from big datasets
AutoML/Unstructured Data Processing for RAG and LLM Dataset Creation. Current Database Options are: Qdrant or Marqo DB.
Extract your docs (CSV, PDF, JSON, HTML, DOCS, Sheets and more) for your own GPT and LLM projects using Unstructured.io via streamlit
A framework for writing Unstract Tools/Apps
Indox is an advanced search and retrieval technique that efficiently extracts data from diverse document types, including PDFs and HTML, using online or offline large language models such as Openai, Hugging Face , etc.
Add a description, image, and links to the unstructured-data topic page so that developers can more easily learn about it.
To associate your repository with the unstructured-data topic, visit your repo's landing page and select "manage topics."