#

big-data-processing

Here are 17 public repositories matching this topic...

levindoneto / pandas-simple-csv-parser

Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.

parser csv data-manipulation pandas-dataframes conda-environment pandas-datareader big-data-processing

Updated Jan 7, 2019
Python

mikhail-kukuyev / Masters-Degree-Courses

Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".

machine-learning information-retrieval highload mpi neural-networks external-memory university-course python-course randomized-algorithms cache-optimization page-rank big-data-processing

Updated Jan 13, 2019
Python

Big-Data-Algorithms

kochlisGit / Big-Data-Algorithms

Implementation of algorithms for big data using python, numpy, pandas.

python bloom-filter lsh streams frequent-itemset-mining pcy frequent-itemsets stream-mining shingling big-data-processing lsh-algorithm min-hasing similar-items a-priori multistage-pcy multihash-pcy

Updated Apr 27, 2020
Python

ridakn / Big-Data-Top-K-Words

Project using Python, Hive and MapReduce to compare various techniques to find the top K words in a very large file i.e. different techniques to process Big Data.

big-data hive mapreduce mapreduce-python top-k-query big-data-processing

Updated Jun 23, 2021
Python

chandnii7 / Big-Data-Processing-Pipeline

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

ScratchyCode / Computer-Vision

Software basati su metodi di intelligenza artificiale per l'automazione dell'analisi di big data.

python opencv machine-learning data-mining deep-neural-networks video computer-vision tensorflow transfer-learning big-data-processing phototrap

Updated May 30, 2022
Python

louiecai / Sentiment-Analysis-API

Sentiment-Analysis-API

nlp machine-learning deep-learning sentiment-analysis neural-network lstm-neural-networks rnn-pytorch big-data-processing

Updated Jul 18, 2022
Python

pyajs / veronica

big data processing and machine learning platform，just like useing sql

sql python3 pyspark machine-learning-platform big-data-processing xql

Updated Oct 15, 2024
Python

dlt-with-debug

souvik-databricks / dlt-with-debug

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data spark etl python3 databricks dlt etl-pipeline big-data-processing delta-live-tables

Updated Dec 7, 2022
Python

Faisal-AlDhuwayhi / Data-Lake

Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark

aws sql big-data spark amazon-emr pyspark data-engineering data-lake cloud-computing amazon-s3 etl-pipeline big-data-processing

Updated Dec 23, 2022
Python

superminority / jsv

A compact way to represent a stream of similar json objects

python json csv big-data python3 big-data-processing

Updated Dec 26, 2022
Python

jpmorgen / BigMultiPipe

"Provides tools for parallel pipeline processing of large data structures

multiprocessing pipelines big-data-processing

Updated Dec 26, 2022
Python

vishu-tyagi / BigQuery-ELT

BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP

python docker bigquery airflow spark terraform pyspark dbt elt batch-processing big-data-analytics etl-pipeline big-data-processing elt-pipeline

Updated Feb 6, 2023
Python

abhinav-bohra / Big-Data-Processing

Exploring and Implementing Scalable Data Processing Techniques

big-data spark multiprocessing multithreading pyspark word-association big-data-analytics big-data-processing krager-mincut

Updated May 11, 2023
Python

JamesHanZhang / table-data-format-transform-app

excel, markdown, csv, sql 数据源批量/单独格式互相转换

easy-to-use data-preprocessing etl-framework big-data-processing csv-to-excel csv-to-sql multifileupload data-cleaning-pipeline excel-to-md

Updated Nov 23, 2023
Python

IncredibleProgress / sweetheart.py

rock-solid pillars for enterprise-grade solutions

python vue jupyter ubuntu rethinkdb rhel rust-lang nginx-unit tailwindcss big-data-processing py-script

Updated Feb 5, 2024
Python

Turnipdo / Docker-Spark-Setup

Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.

setup spark docker-container big-data-processing

Updated Jun 21, 2024
Python

Improve this page

Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."