Group: Vasu Chalasani (sc83), Riya Jain (riyaj3), Archna Sobti (archnas2), Neha Konjeti (konjeti3)
This project can be built on most Linux Systems.
To build and run this project on EWS and Linux you can use the included Makefile.
$ make
$ ./main
To run tests for this project on EWS and Linux you can use the included Makefile.
$ make test
$ ./test
We used the genres_v2.csv file from https://www.kaggle.com/mrmorj/dataset-of-songs-in-spotify
This project’s functionality includes:
- Build an undirected graph out of the dataset.
- Traverse this spotify songs data using Breadth First Search (BFS).
- Find the shortest path between two songs, based on Dijkstra’s Algorithm.
- Find the the shortest path, a fast transformation of one song to the next, between two songs
- Detect if cycles exist within a given song set
- Can be used to compute the average song of any dataset
The data analysis class handles all of the mathematical calculations for our project. This class can find the average song, the difference between a song and the average song, the “distance” between two songs, and the average difference between two songs. If the “distance” between two songs is greater than the average difference between two songs, then there will not be an edge between the two songs in the graph.
The graph class constructor builds a graph representation of the set of Songs using an adjacency matrix. In this implementation, it is an undirected graph and the matrix stores the weights/distances between two edges rather than booleans. Edges can be added using addEdge and inputting two vertices.
Takes a starting song index and traverses the songs using a graph object. The output is the song indices of the BFS traversal. These song indices can be converted to names using the getSongNameFromIndex function in the Graph class.
Takes a starting song index and performs a DFS until a cycle is detected and then prints out all of the names of the songs in the cycle and returns a boolean based on if a cycle is found. This information can tell us about groups of songs that are closely related.
Takes a starting song index and creates the shortest path from it to all other songs. Other linked methods take in a target song index and find the shortest path from the starting song to the target song. A few wrapper methods takes in song names instead of indices and call upon our implmentation of Dijkstra's. The shortest path reveals the fastest transition between one song and another.
This set of tests checks if data is deserialized into a single Song object properly and if data containing ten songs can deserialize properly into an array of Song objects, also an object of the Songs class.
For DataAnalysis, we tested whether the song edges were created correctly. We made sure that given a test dataset of 10 songs, the distances between one song and the other were correct and that some edges were nonexistent (denoted by a distance of -1).
The BFS method returns a vector of songs with the order they were traversed in. This actual output is compared to the expected output determined by tracing the traversals by hand. This case is checked on smaller datasets of around 3-4 songs and is checked for graphs with cycles and without cycles.
CycleTests tests that cycles are determined accurately. It is tested on smaller datasets of 4 songs and is tested on graphs with both cycles and no cycles.
DijkstraTests tests whether the shortest path was found using the source index and the source song name, respectively. For each input type (number and string), a few regular paths, one reverse path (start and end switched), and one path to the same song were tested.