Data Modeling With Apache Cassandra

Project repo for the Udacity Data Engineering Program Project 1B.

This README file includes a summary of the project, how to run the Python scripts, and an explanation of the files in the repository.

Getting Started

The project is run entirely from a Juypter Notebook file. Open the .ipynb Project_1B.

Prerequisites

Baseline configured Apache Cassandra database.
The following Python libraries need to be installed in the environment.
- cassandra
- pandas
Purpose

The purpose of this database is to conduct ETL operations and store data from user activity from the Sparkify app.
This data will be used by the Sparkify analytics team will use this data gain a greater understanding of user activity and songs being listened to.

Dataset

The dataset is comprised of user names, artist names, songs, for each session and item in each session.

The data in the CSV is comprised of the following columns and datatype:

Field	Data Type
artist	text
firstname	text
gender	text
item number in session	int
last name of user	text
length of the song	float
level (paid or free song)	text
location of the user	text
sessionId	int
song title	text
userId	int

Below is a screenshot of the csv

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
event_data		event_data
images		images
.gitattributes		.gitattributes
Project_1B.ipynb		Project_1B.ipynb
README.md		README.md
event_datafile_new.csv		event_datafile_new.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

event_data

event_data

images

images

.gitattributes

.gitattributes

Project_1B.ipynb

Project_1B.ipynb

README.md

README.md

event_datafile_new.csv

event_datafile_new.csv

Repository files navigation

Data Modeling With Apache Cassandra

Getting Started

Prerequisites

Purpose

Dataset

About

Releases

Packages

Languages

tvanpat/data-modeling-with-apache-cassandra

Folders and files

Latest commit

History

Repository files navigation

Data Modeling With Apache Cassandra

Getting Started

Prerequisites

Purpose

Dataset

About

Resources

Stars

Watchers

Forks

Languages