Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
.gitignore
README.md
clean_users
clean_words
project.r

README.md

Music Hackathon

This is my attempt at the Kaggle EMI Music Hackathon in R.

Data Cleanup

I wrote some shell script to clean up the data files. Some values were repeated with slight formatting vartiations. I replaced them all with integers to cut down on filesize and normalize the values.

Users: clean_users

WORKING

  • 0 "Employed 30+ hours a week"
  • 1 "Employed 8-29 hours per week"
  • 2 "Employed part-time less than 8 hours per week"
  • 3 "Full-time housewife / househusband"
  • 4 "Full-time student
  • 5 "In unpaid employment (e.g. voluntary work)"
  • 6 "Other"
  • 7 "Part-time student"
  • 8 "Prefer not to state"
  • 9 "Retired from full-time employment (30+ hours per week)"
  • 10 "Retired from self-employment"
  • 11 "Self-employed"
  • 12 "Temporarily unemployed"

MUSIC

  • 0 "Music has no particular interest for me"
  • 1 "Music is no longer as important as it used to be to me"
  • 2 "I like music but it does not feature heavily in my life"
  • 3 "Music is important to me but not necessarily more important than other hobbies or interests"
  • 3 "Music is important to me but not necessarily more important"
  • 4 "Music means a lot to me and is a passion of mine"

LIST_{OWN,BACK}

Cleans columns up to only have integer value representing hours. Drops the rows with variations of "More than..."

Words: clean_words

HEARD_OF

  • 0 Never heard of
  • 1 Ever heard of
  • 3 Ever heard music by
  • 4 Heard of and listened to music EVER
  • 5 Heard of and listened to music RECENTLY
  • 2 Heard of
  • 6 Listened to recently

OWN_ARTIST_MUSIC

  • 0 [dD]on.t know
  • 1 Own none of their music
  • 2 Own a little of their music
  • 3 Own a lot of their music
  • 4 Own all or most of their music
Something went wrong with that request. Please try again.