# ***Exercise 1: Spotify Data***

Let's use some of sample data from Spotify usage of an anonymous guy and analyze it!

The original format is JSON, and it has the following structure.

```python
[ 
    {
        "endTime": endTime,       # date and time the song ends
        "artistName": artistName, # artist name
        "trackName": trackName,   # song name
        "msPlayed": msPlayed      # miliseconds the song was playing
    },
    {...}
]
```

Load this JSON file into a dictionary with the following code
```python
import json
with open("spotify.json") as json_file:
    json_data = json.load(json_file)
```

Answer the following questions
 1. How many records have this file?
 2. How many different artists can you find in this file? And songs?
 4. How much time did this guy spend listening to music?
 5. What's the most listened artist? And song? _Hint: create a new dict iterating over all artists and sum the milliseconds_

In [1]:
import json
with open("data/spotify.json", encoding="utf-8") as json_file:
    json_data = json.load(json_file)

### How many records have this file?

In [2]:
# How many records have this file?
print(f"This file has {len(json_data)} records")

This file has 2895 records


### How many different artists can you find in this file? And songs?

In [3]:
# How many different artists can you find in this file?
artists = [record["artistName"] for record in json_data]
artists = set(artists)
print(f"Number of artists: {len(artists)}")

Number of artists: 618


In [4]:
# And songs?
songs = [record["trackName"] for record in json_data]
songs = set(songs)
print(f"Number of songs: {len(songs)}")

Number of songs: 1495


### How much time did this guy spend listening to music?

In [5]:
# How much time did this guy spend listening to music?
ms = [record["msPlayed"] for record in json_data]
totalms = sum(ms)
print(f"Total miliseconds: {totalms}")

Total miliseconds: 536676750


In [6]:
# Transform to a more readable format with the "timedelta" function

from datetime import timedelta
dt = timedelta(milliseconds=totalms)
print(f"Total time: {dt}")

Total time: 6 days, 5:04:36.750000


### What’s the most listened artist? And song?

In [11]:
# Most listened artist
artistsdict = {artist:0 for artist in artists}

for record in json_data:
    artistsdict[record["artistName"]] += record["msPlayed"]
    
# transform dict to a list of tuples and calculate the max value
# we make this transformation because the "max" function always takes the first value to order the items
max([(value, key) for key,value in artistsdict.items()])

(69056722, 'Tame Impala')

Note about the last line: `max([(value, key) for key,value in artistsdict.items()])`

The `max` (and `min`, `sorted`, etc) function can order sequences of numbers, strings (alphabetically), and also other structures. If we apply the `max` function to a list of tuples, the ufnction will take the first element of each tuple to sort the list. So, in this case, as we want to order by value, we first build a new list with the value element in the first position.

In [14]:
# Most listened song
songsdict = {song:0 for song in songs}

for record in json_data:
    songsdict[record["trackName"]] += record["msPlayed"]
    
# transform dict to a list of tuples and calculate the max value
max([(value, key) for key,value in songsdict.items()])

(5588921, 'Avant')