sparksql

Star

Here are 243 public repositories matching this topic...

zio / zio-quill

Star

Compile-time Language Integrated Queries for Scala

mysql linq postgres scala database spark cassandra jdbc scalajs sparksql

Updated May 16, 2024
Scala

zio / zio-protoquill

Star

Quill for Scala 3

linq scala sql spark cassandra jdbc postgresql sparksql language-integrated-query

Updated May 12, 2024
Scala

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Updated May 13, 2024
C#

WinThitiwat / Data_Lake_with_Spark

Star

ETL process to S3 Data Lake through EMR, Spark, Hadoop, Schema-on-Read

emr spark data-lake sparksql

Updated May 10, 2024
Jupyter Notebook

CybercentreCanada / jupyterlab-sql-editor

Star

A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino

syntax-highlighting json formatter extension schema sql notebook nested-structures vscode-extension sparksql jupyterlab datagrid dataframe ipython-magic auto-completion trino lsp

Updated May 2, 2024
Jupyter Notebook

sana1410 / NYPD-Arrest-Data-Year-to-Date

Star

This repository is used to perform data analysis using Databricks and Tableau on NYC crime datasets

data-visualization sparksql databricks-notebooks tableau-dashboards

Updated Apr 29, 2024
HTML

austinlmcconnell / home-sales-data-evaluation

Star

Contains an analysis of key home sales metrics using SparkSQL and Python to manage large amounts of data.

python jupyter-notebook pandas pyspark sparksql temporary-tables parquet-data

Updated Apr 22, 2024
Jupyter Notebook

hexnn / balm

Star

基于Spring Boot全家桶打造，大数据PAAS组件适配器，一键适配DolphinScheduler、Hadoop、Spark、Hive、Impala、HBase、Kafka、Doris、StarRocks、ClickHouse、Neo4j、Redis、ElasticSearch，通过标准REST接口和SQL语句操作，简单易用，方便二次开发和快速集成

elasticsearch phoenix sql kafka spark presto hive hadoop neo4j clickhouse impala hbase sparksql datax maxcompute doris dolphinscheduler starrocks

Updated Apr 18, 2024

commoncrawl / cc-pyspark

Star

Process Common Crawl data with Python and Spark

spark pyspark sparksql wet commoncrawl common-crawl warc-files wat-files

Updated Apr 8, 2024
Python

Amarilli / Home_Sales

Star

In pursuit of significant metrics for home sales data, Google Colab and SparkSQL were employed to extract essential insights.

sql colab sparksql colab-notebook

Updated Apr 7, 2024
Jupyter Notebook

SteveTuttle / home-sales-sparkSQL-metrics

Star

Use SparkSQL to determine key metrics of the data. Use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

pyspark sparksql temporary-tables google-colab optimize-queries

Updated Apr 6, 2024
Jupyter Notebook

locationtech / rasterframes

Star

Geospatial Raster support for Spark DataFrames

machine-learning scala spark image-processing sparksql geotrellis earth-observation spark-ml

Updated Apr 3, 2024
Jupyter Notebook

NavyaTrilok / Advanced-Big-Data-ML-Project

Star

Weather Data Analysis using Python, Pandas, SparkSQL, AutoRegression Model

python pandas sparksql autoregression

Updated Mar 13, 2024

amitnema / spark-coach

Star

This project contains the learning and experiments with the Apache Spark.

streaming scala spark stream streams stream-processing spark-streaming sparksql structured-streaming spark-sql structured-streaming-kafka

Updated Mar 7, 2024
Scala

mehrdadalmasi2020 / ApacheSpark_ApacheZeppelin_SQL_Shell

Star

Run your first analysis project on Apache Zeppelin using Scala (Spark), Shell, and SQL

visualization shell scala notebook sparksql zeppelin-notebook apachespark

Updated Feb 16, 2024
Scala

aravinthsci / Miscellaneous1

Star

sql spark apache-spark sparksql spark-sql

Updated Feb 2, 2024
Jupyter Notebook

RJBarker / home_sales

Star

Use PySpark and SparkSQL to execute SQL queries through a temporary view of the DataFrame created. Conduct additional queries on cached and partitioned data to determine runtime comparisons.

python big-data cached pyspark sparksql partitioning large-scale big-data-analytics pyspark-dataframes