# Introduction
One of the most significant hurdles in real-world ML projects is the scarcity of high-quality data. Data collection itself is a substantial undertaking, often requiring significant time and resources. Moreover, the collected data rarely arrives in a pristine format ready for immediate model training and model building. Extensive data cleaning and preprocessing are essential steps to address issues like,
- Missing values: Handling missing data points accurately is crucial to prevent biased or inaccurate model predictions.
- Inconsistent formatting: Data can be inconsistent in terms of units, capitalization and other formatting aspects, requiring careful cleaning and standardization.
- Outliers: Extreme values can significantly impact model performance and require appropriate handling (removal, transformation, etc.).
- Data imbalances: Class imbalances, where one class has significantly fewer samples than other, can lead to biased models.
- Noise: Data can be contaminated with noise, which can degrade model performance.

Some of the common sources of data include,
- Flat files: These are simple text-based files like `.csv`, `.txt`, `.dat` and `.xlsx`, often used for storing tabular data.
- DBMS: Relational databases (SQL) and NoSQL databases provide structured and unstructured data storage and retrieval mechanisms.
- Web APIs: APIs allow programmatic access to data from various sources, such as social media platforms, weather services and financial markets.

# Web APIs
### What is a web API?
At its core, a web API is a set of rules that allows different software applications to communicate with each other over the internet. Think of it as a standardized way for different systems to exchange data and functionality.

### Key characteristica
- HTTP-based: Web APIs primarily use the Hypertext Transfer Protocol (HTTP) for communication. This ensures compatibility across various platforms and programming languages.
- Data formats: Common data formats for exchanging information include,
    - JSON (JavaScript Object Notation): A lightweight and human-readable format for representing structured data.
    - XML (eXtensible Markup Language): A more verbose format for describing data using tags.
- Endpoints: Web APIs expose specific URLs (endpoints) that clients can use to interact with the API. These endpoints define the type of request (e.g., `GET`, `POST`, `PUT`, `DELETE`) and the expected data format.

### How web APIs work?
1. Client: A software application (e.g., a mobile app, a web browser or another server) initiates a request to a specific endpoint on the server hosting the API.
2. Server: The server receives the request, processes it and retrieves or manipulates the requested data.
3. Response: The server sends a response back to the client, typically containing the requested data in the specified format.

### Common use cases
- Social media integration: Sharing content on platforms like Facebook, Twitter or Instagram.
- Mapping services: Displaying maps, finding directions and locating points of interest using APIs like Google Maps.
- E-commerce: Processing payments, managing inventory and tracking orders.
- Weather information: Retrieving real-time weather data from services like AccuWeather or OpenWeatherMap.
- Payment gateways: Facilitating onlune transactions through services like PayPal or Stripe.

### Benefits of using web APIs
- Platform independence: Web APIs can be accessed from various devices and platforms, making them highly versatile.
- Modularity: APIs promote a modular approach to software development, allowing developers to build upon existing functionalities.
- Efficiency: Reusing existing APIs can significantly reduce development time and effort.
- Innovation: APIs foster innovation by enabling developers to create new and unique applications by combining data and services from different sources.

# `requests` Package
The `requests` package in Python is a powerful tool for making HTTP requests. It simplifies the process of interacting with web servers, making it easier to fetch data from APIs, send data to web applications and perform other web-related tasks.

### Key features
- Easy to use: The `requests` package provides a simple and intuitive interface for making HTTP requests.
- Supports various HTTP methods: It supports all standard HTTP methods, including `GET`, `POST`, `PUT`, `DELETE`, `HEAD` and `OPTIONS`/
- Handles responses: Easily access and handle to response data, including status codes, headers and content of the response.
- Handles authentication: Supports various authentication mechanisms, such as basic authentication, digest authentication and OAuth.
- Handles cookies: Automatically handles cookies, simplifying interactions with websites that require session management.
- Session management: Provides a `Session` object for managing persistent connections and cookies across multiple requests.

### Installation
`pip install requests`

### Making a `GET` request

In [1]:
import requests

url = "https://api.ipify.org"
response = requests.get(url) 
response

<Response [200]>

### Key methods
- `requests.get()`: Used for making `GET` requests to retrieve data from a server.
- `requests.post()`: Used for sending data to a server (e.g., submitting forms).
- `requests.put()`: Used for updating existing resources on a server.
- `requests.delete()`: Used for deleting resources from a server.
- `response.json()`: Parses the response as a string.
- `response.status_code`: The HTTP status code of the server's response (e.g., 200 for success, 404 for not found).
- `response.text`: The content of the response as a string.
- `response.headers`: A dictionary containing the HTTP headers of the response.

In [2]:
response.status_code

200

In [3]:
response.text

'43.224.129.121'

In [4]:
response.headers

{'Date': 'Sat, 18 Jan 2025 19:05:38 GMT', 'Content-Type': 'text/plain', 'Content-Length': '14', 'Connection': 'keep-alive', 'Vary': 'Origin', 'cf-cache-status': 'DYNAMIC', 'Server': 'cloudflare', 'CF-RAY': '9040db8e0bfce6ec-BLR', 'server-timing': 'cfL4;desc="?proto=TCP&rtt=5629&min_rtt=5350&rtt_var=2022&sent=5&recv=7&lost=0&retrans=0&sent_bytes=2835&recv_bytes=763&delivery_rate=599400&cwnd=252&unsent_bytes=0&cid=bf65b0c1ca3a589f&ts=247&x=0"'}

In [5]:
if response.status_code == 200:
    print(response.text)
else:
    print(f"Request failed with status code: {response.status_code}")

43.224.129.121


# GitHub API
Documentation: https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28#get-the-authenticated-user