Resolved dataset download problems Figured out a way to convert .h5 to csv ... Only 77% of the songs occur both in the features data and triplets data. So triplets data was pruned.
Can now work on all 3 methods using this.
- features.csv for Content-Based
- triplets.csv for Collaborative
- both for Deep
Work to be done:
- @Deepika: feature processing (normalization etc) and network architecture
- @Raghav: content based recommendation
- @Tejas: Collaborative Filtering
Tejas:
I am reading my NLP slides that talk about efficient similarity measurement.
We could potentially use that for our purposes.
Content-based recommendation code working. Results stored in ./results
Decided to use lyrics as inputs to the deep-network. Needs formatting and linking MSD Song-id to Musixmatch Track-id. Collaborative filtering in pipeline. top-n similar users using Pearson correlation found out. Working on Recommendations now.
Agenda for 20th:
- Complete Project Progress Report
- Formulate Deep Network Architecture
04-MAY-2017
- Data extracted from Bag-of-Words and stored in csv file after many strenuous attempts of a rather boring task.
- Data split as train-test-valid
08-MAY-2017 Three configurations:
- input = top-100 words
- input = all words
- input = embeddings obtained from Auto-encoder/ tSNE
We aim to compare the outputs of these three in the report.
To-do:
- Network architecture
- Recommendation system
- Comparisons of 3 approaches
- Specific examples to show the predictions
- Report
- Video