# API Scavenger Game

## Challenge 1: Fork Languages

You will find out how many programming languages are used among all the forks created from the main lab repo of your bootcamp.

In [1]:
# import libraries here
import requests
import json
import pandas as pd
import re

Assuming the main lab repo is ironhack-datalabs/mad-oct-2018, you will:

#### 1. Obtain the full list of forks created from the main lab repo via Github API.

To list forks, we can use the GET method. As explained in the GitHub API documentation, we need to make the request to: GET /repos/:owner/:repo/forks.

In [2]:
# your code here
token='blablabla'
username = 'vfarneze'
url = 'https://api.github.com/repos/ironhack-datalabs/mad-oct-2018/forks'
response = requests.get(url)
response.status_code

200

In [3]:
results = response.json()

#### 2. Loop the JSON response to find out the language attribute of each fork. Use an array to store the language attributes of each fork.
Hint: Each language should appear only once in your array.
Print the language array. It should be something like: ["Python", "Jupyter Notebook", "HTML"]

In [4]:
pd.DataFrame(results).columns

Index(['id', 'node_id', 'name', 'full_name', 'private', 'owner', 'html_url',
       'description', 'fork', 'url', 'forks_url', 'keys_url',
       'collaborators_url', 'teams_url', 'hooks_url', 'issue_events_url',
       'events_url', 'assignees_url', 'branches_url', 'tags_url', 'blobs_url',
       'git_tags_url', 'git_refs_url', 'trees_url', 'statuses_url',
       'languages_url', 'stargazers_url', 'contributors_url',
       'subscribers_url', 'subscription_url', 'commits_url', 'git_commits_url',
       'comments_url', 'issue_comment_url', 'contents_url', 'compare_url',
       'merges_url', 'archive_url', 'downloads_url', 'issues_url', 'pulls_url',
       'milestones_url', 'notifications_url', 'labels_url', 'releases_url',
       'deployments_url', 'created_at', 'updated_at', 'pushed_at', 'git_url',
       'ssh_url', 'clone_url', 'svn_url', 'homepage', 'size',
       'stargazers_count', 'watchers_count', 'language', 'has_issues',
       'has_projects', 'has_downloads', 'has_wiki', 'has

In [5]:
#there are two columns (language and 'languages_url) which contain information about the languages used...

# we ill first start with language column, extract all languages and add them uniquely to a list
languages = pd.DataFrame(results).loc[:,'language']

# your code here
langs = []
for language in languages:
    if language not in langs:
        langs.append(language)

In [6]:
# Now look at languages_url
languages_url = list(pd.DataFrame(results).loc[:,'languages_url'])

# "languages_url" is a list of links, each link goes to a json archive which tells which language was used.
links = []

for link in languages_url:
    links.append(requests.get(link).json())

In [7]:
#each link returned a dictionary, which keys are the langagues, therefore, we can iterate over each dictionary's keys,
# and add more detected languages to our list 'langs'
links[0].keys()

for link in links:
    for language in link.keys():
        if language not in langs:
            langs.append(language)

print(f'The languages are: {langs}')

The languages are: [None, 'Jupyter Notebook', 'HTML', 'Python', 'Shell']


## Challenge 2: Count Commits
Count how many commits were made in the month of january of 2019.

Hints: 
- https://developer.github.com/v3/repos/commits/#list-commits-on-a-repository

- GET /repos/:owner/:repo/commits

#### 1. Obtain all the commits made in January 2019 via API, which is a JSON array that contains multiple commit objects.

In [8]:
token='blablabla'
username = 'vfarneze'
url = 'https://api.github.com/repos/ironhack-datalabs/mad-oct-2018/commits'

response = requests.get(url)
response.status_code

200

In [9]:
commits = pd.DataFrame(response.json())


#### 2. Count how many commit objects are contained in the array.

In [10]:
#the dates are stored in commit column
#each element of the column is a dictionary, which the key author returns another dictionary
#inside this subdictionary, the key "date" gives our commit date.

#we will store all dates in a list
dates = []

for element in commits.commit:
    dates.append(element['author']['date'])

#the dates are in format: YYYY-MM-DDTHH:MM:SSZ, which is a string
#we can use regex and look for '2019-01', and if re.findall() finds this date, we can store it

jan_commits = []

for date in dates:
    if len(re.findall('2019-01',date)) ==1:
        jan_commits.append(date)

In [11]:
# your code here
print(f'Number of total commit objects are: {commits.shape[0]}')
print(f'Number of commits done in january: {len(jan_commits)}')

Number of total commit objects are: 30
Number of commits done in january: 27


## Challenge 3: Hidden Cold Joke

Using Python, call Github API to find out the cold joke contained in the 24 secret files in the following repo:

https://github.com/ironhack-datalabs/scavenger

The filenames of the secret files contain .scavengerhunt and they are scattered in different directories of this repo. The secret files are named from .0001.scavengerhunt to .0024.scavengerhunt. They are scattered randomly throughout this repo. You need to search for these files by calling the Github API, not searching the local files on your computer.

#### 1. Find the secret files.

In [12]:
# your code here
response = requests.get('https://api.github.com/repos/ironhack-datalabs/scavenger/contents')
response.status_code

200

In [13]:
# first step, get all dictionaries
folders = response.json()

#each folder has a link to be accessed
links = [data_dict['url'] for data_dict in folders]

# second step, "open" all folders
folders = [requests.get(link).json() for link in links]

#the first element of folders is a '.gitignore' file, not a folder, so we can drop it later
folders[0]

{'name': '.gitignore',
 'path': '.gitignore',
 'sha': 'e43b0f988953ae3a84b00331d0ccf5f7d51cb3cf',
 'size': 10,
 'url': 'https://api.github.com/repos/ironhack-datalabs/scavenger/contents/.gitignore?ref=master',
 'html_url': 'https://github.com/ironhack-datalabs/scavenger/blob/master/.gitignore',
 'git_url': 'https://api.github.com/repos/ironhack-datalabs/scavenger/git/blobs/e43b0f988953ae3a84b00331d0ccf5f7d51cb3cf',
 'download_url': 'https://raw.githubusercontent.com/ironhack-datalabs/scavenger/master/.gitignore',
 'type': 'file',
 'content': 'LkRTX1N0b3JlCg==\n',
 'encoding': 'base64',
 '_links': {'self': 'https://api.github.com/repos/ironhack-datalabs/scavenger/contents/.gitignore?ref=master',
  'git': 'https://api.github.com/repos/ironhack-datalabs/scavenger/git/blobs/e43b0f988953ae3a84b00331d0ccf5f7d51cb3cf',
  'html': 'https://github.com/ironhack-datalabs/scavenger/blob/master/.gitignore'}}

In [14]:
#We now filter all folders, this serves to get rid of any non-folder archive
folders = [folder for folder in folders if type(folder) == list ]

In [15]:
#we now filter all "scavengerhunt" folders, and get rid of unwanted files.
#Folders is a list of "folder"s, each "folder" is also a list containing dictionaries representing github files
#each dictionary has a key 'name', which represent the name of the githu file.
#if this name contains "scavengerhunt", we will keep the file

scavenger_files = []

for folder in folders:
    for file in folder:
        
        #if scavengerhunt is found, the length of the returned list by re.findall will be 1
        #filtering will be True if word is found.
        filtering = (len(re.findall('scavengerhunt',str(file['name']))) == 1)
        
        if filtering:
            #we now save every scavengerhunt file's dicts to a list:
            scavenger_files.append(file)


#### 2.  Sort the filenames ascendingly.

In [16]:
#we need now to sort, but now we just need to use pandas!
#we only need the "download_url" to know the text inside the file

text_files = pd.DataFrame(scavenger_files).loc[:,['name','download_url']]

#we will set the index as 'name' and sort them:
text_files = text_files.set_index('name').sort_index()

#### 3. Read the content of each secret files into an array of strings.
Since the response is encoded, you will need to send the following information in the header of your request:
````python
headers = {'Accept': 'application/vnd.github.v3.raw'}
````

In [17]:
#now we "read" every file from download_url links.
text = [requests.get(link).text for link in text_files['download_url']]

#### 4. Concatenate the strings in the array separating each two with a whitespace.

In [18]:
#we now remove '\n' which contains in strings
treated = [string.replace('\n','') for string in text]
#now we join...
final_text = ' '.join(treated)

#### 5. Print out the joke.

In [19]:
print(final_text)

In data science, 80 percent of time spent is preparing data, 20 percent of time is spent complaining about the need to prepare data.
