Skip to content

yoonjaecho/Spring_2017_DataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[Spring, 2017] CNU Data Science Class

Development tools : Python3, RStudio, Jupyter Notebook

Contents

Summary of Python & R for Data Science
  1. Introduction to Python for Data Science
  2. Introduction to R
Tashu Data Visualization with Python
  1. Top 10 station
  2. Top 10 trace
  3. Number of stations per district
  4. Number of usages per district
  5. Day of a week transaction
  6. Usage per hour
  7. Day-hour usage pattern
Tashu Data Visualization with R
  1. Top 10 station bar graph
  2. Station usage scatter plot in ggmap
  3. Top 20 trace scatter plot
  4. Top 20 trace chord diagram
Fitbit Data with Python & R
  1. Two weeks step per day with python
  2. Four weeks step, sleep, heart rate(Fat burn) heatmap
  3. One week step per 15min heatmap
  4. Four week step and heartrate(fat burn) treemap
2016 Sokulee Contest Participant's Fitbit Data Analysis with Python
  1. Top 10 steps during contest
  2. Top 10 number of taking 9 A.M class student during contest
Predicting Rate of movies using MovieLens data with Python
  1. Get nearest user using euclidean distance
  2. Predicting rate of movies
  3. Calculate accuray with RMSE, MAE
Visualizing word cloud with Python & R
  1. Crawling data with Python
  2. Visualizing word cloud with R
Hadoop Setting on Virtual machine
Configuration about xml file
  1. core-site.xml
  2. mapred-site.xml
  3. hdfs-site.xml
  4. yarn-site.xml
Hadoop wordcount
  1. Default
  2. Filtering patterns
  3. Case sensitive
Hive Setting and Tashu data analysis
  1. Hive setting
  2. Year / Month / Day / Hour usage count
  3. Top 10 rent station
  4. Top 10 trace
  1. Crawling and Wordcloud
  2. Fitbit data analysis using Spark
  3. Movie lens data analysis using Spark