You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Designed a Machine Learning model which takes newsgroup dataset and performs binary classification to predict if a given document has Atheistic or Christian sentiment. Used LIME library and PySpark. Performed feature selection to improve classifier’s performance.
Copying data from Amazon S3 bucket to Azure Blob container by using Azure Data Factory pipeline. This Data is mounted to Databricks and further analysis is done using Spark SQL.
Developed a real-time streaming analytics pipeline using Apache Spark to calculate and store KPIs for e-commerce sales data, including total volume of sales, orders per minute, rate of return, and average transaction size. Used Spark Streaming to read data from Kafka, Spark SQL to calculate KPIs, and Spark DataFrame to write KPIs to JSON files.
Advanced Big Data course taught at Linköping University. Topics included paralellisation, machine learning with Big Data and querying on distributed systems.
The project harnessed an ETL multi-hop architecture, ingesting data from the Ergast API into a storage backed by Azure Data Lake. The process involved weekly ingestion of bronze layer data as cutover and delta files. Raw data, in varied formats, was transformed using Azure Databricks PySpark notebooks into enriched Silver and Gold layers.
📊 📑This project provides a step-by-step big data analytics applied in the retail industry through the use of a variety of big data technologies. such as HDFS, Hive and Spark..