#

parquet

Here are 53 public repositories matching this topic...

bigdatagenomics / adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

python java bioinformatics r scala big-data spark avro genomics parquet

Updated Mar 23, 2024
Scala

spotify / ratatool

A tool for data sampling, data generation, and data diffing

bigquery scala protobuf avro parquet scalacheck

Updated May 8, 2024
Scala

mjakubowski84 / parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

aws scala akka hadoop bigdata google-storage fs2 writer streams reader parquet akka-streams parquet-files

Updated Mar 23, 2024
Scala

spotify / magnolify

A collection of Magnolia add-on modules

cats bigquery scala protobuf avro neo4j tensorflow guava parquet datastore bigtable scalacheck magnolia

Updated May 10, 2024
Scala

51zero / eel-sdk

Big Data Toolkit for the JVM

scala kafka big-data hive hadoop etl kudu parquet orc

Updated Nov 4, 2020
Scala

lightcopy / parquet-index

Spark SQL index for Parquet tables

statistics sql spark index parquet

Updated May 6, 2021
Scala

indix / schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

tsv json spark avro schema-registry parquet schema-inference graphql-api

Updated Mar 5, 2020
Scala

saurfang / sparksql-protobuf

Read SparkSQL parquet file as RDD[Protobuf]

protobuf sparksql parquet

Updated Oct 12, 2018
Scala

spotify / gcs-tools

GCS support for avro-tools, parquet-tools and protobuf

protobuf avro google-storage gcp gcs parquet gcs-connector

Updated May 8, 2024
Scala

monix / monix-connect

A set of connectors for Monix. 🔛

redis aws elasticsearch workflow scala mongodb dynamodb s3 google-cloud-storage sqs reactive-streams hdfs parquet monix connectors

Updated May 7, 2024
Scala

spider-123-eng / Spark

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

streaming consumer parquet kafka-producer spark-sql spark-kafka-integration spark-streaming-data spark-transformations spark-to-cassandra-connection spark-dataframes spark-joins spark-hive-context spark-jdbc-connection spark-with-mangodb spark-aggregations-using-dataframe spark-use-cases cassandra-installation spark-datadog spark-mangodb spark-catalog-api

Updated Nov 16, 2022
Scala

yamrcraft / etl-light

A light Kafka to HDFS/S3 ETL library based on Apache Spark

docker scala kafka spark protobuf avro etl job s3 batch hdfs parquet

Updated Jun 29, 2017
Scala

agile-lab-dev / wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

elasticsearch scala kafka akka spark yarn hadoop solr jdbc hbase spark-streaming hdfs parquet

Updated Apr 19, 2024
Scala

intenthq / pucket

Bucketing and partitioning system for Parquet

scala spark thrift hdfs parquet partitioning

Updated May 22, 2018
Scala

nevillelyh / parquet-extra

A collection of Apache Parquet add-on modules

scala avro tensorflow scala-macros parquet magnolia

Updated May 8, 2024
Scala

DaFlow

sparsecode / DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

json scala csv apache-spark hive hadoop avro etl parquet transformation-rules etl-framework etl-pipeline join-data

Updated Jun 7, 2021
Scala

Guidewire / cda-client

Cloud Data Access client

kafka spark etl parquet

Updated Dec 28, 2022
Scala

zrlio / parquet-generator

Parquet file generator

sql spark parquet parquet-generator

Updated Apr 17, 2018
Scala

ndolgov / experiments

Code examples for my blog posts

aws spark dsl antlr rpc lucene parquet

Updated Nov 7, 2018
Scala

civitaspo / embulk-output-s3_parquet

Embulk (https://github.com/embulk/embulk/) output plugin to dump records as Apache Parquet (https://parquet.apache.org/) files on S3.

s3 parquet embulk embulk-output-plugin

Updated Feb 14, 2023
Scala

Improve this page

Add a description, image, and links to the parquet topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the parquet topic, visit your repo's landing page and select "manage topics."