- Toronto, Canada
- https://www.linkedin.com/in/ksanchi/
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
SOEN_6441 Public
A multiplayer board Risk Game.
SmellDetectionCaller Public
Forked from ptidejteam/ptidej-Others-SmellDetectionCallerBare minimum code needed to detect occurrences of code and design smells
Cloudera_Material Public
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
goodreads_etl_pipeline Public
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Big_Data_Project Public
Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.
goodreads Public
Forked from sefakilic/goodreads🐍 Python wrapper for Goodreads API 📚
SF-Crime-Statistics Public
A Kafka and Spark Streaming Integration project : SF Crime Statistics with Spark Streaming
SOEN-6011 Public
This Repository is for course SOEN 6011.
TeX GNU General Public License v3.0 UpdatedAug 16, 2019 -
Data-engineering-nanodegree Public
Forked from Flor91/Data-engineering-nanodegreeProjects done in the Data Engineering Nanodegree by Udacity.com
Uppaal_Model_Checking Public
Model Checking For Automated Machine Learning Models
Spark_Packaged_project Public
This project contains pyspark jobs to create data pipelines and shows how to distribute the project package on Cluster.
Yelp_Project Public
This project is to create a Data lake for Yelp data-set and further using the it to create an Analytical Sandbox Data Science purpose and also creating a data warehouse for reporting purpose.
awesome-apache-airflow Public
Forked from jghoman/awesome-apache-airflowCurated list of resources about Apache Airflow
Learning_Machine_Learning Public
Machine learning demo projects
Live-Twitter-Sentiment-Analysis Public
Forked from shreyansh26/Live-Twitter-Sentiment-AnalysisSentiment analysis on live twitter stream and plotting the sentiment values using Matplotlib
airflow-training Public
Forked from mdivk/airflow-trainingIntroduction to the data pipeline management with Airflow. Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse.
Black-Friday-Sales-Analysis Public
This Project gives an insight into few statistics related to black Friday Sale.
motivate Public
Forked from mubaris/motivate⚡ motivate ⚡ - A simple script to print random motivational quotes. Highly influenced by linux command fortune.
This project provides an analysis on IPL(Indian premier League) stats from Year 2008 to 2017.
pyspark-example-project Public
Forked from AlexIoannides/pyspark-example-projectExample project and best practices for Python-based Spark ETL jobs and applications.
Rdatasets Public
Forked from otanrikulu/RdatasetsAn archive of datasets distributed with R
HTML UpdatedOct 11, 2018 -
data-engineer-roadmap Public
Forked from boringPpl/data-engineer-roadmapLearning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
data-science-question-answer Public
Forked from jayinai/data-science-question-answerA repo for data science related questions and answers
Spark-practice Public
Forked from XD-DENG/Spark-practiceApache Spark (PySpark) Practice on Real Data