# Rotating IP address with Bright Data

When scraping data from websites, regularly changing the IP address used can be beneficial for the following reasons:

1. **Maintaining privacy:** By rotating IP addresses, it becomes difficult for websites to trace multiple requests back to a single source. This helps keep your web scraping activities more confidential.

2. **Avoiding Blocks:** Websites often block IP addresses that send excessive scraping requests. By frequently changing your IP, you can distribute requests across many IPs and reduce the chances of getting blocked.

3. **Mitigating API Rate Limits:** APIs often limit how many requests can be made from one IP address. Rotating IPs allows you to spread requests over many IPs, helping stay under API rate limits and avoid 429 error (or "Too Many Requests" errors).

---

## Pre-requisites
Have a [Brightdata](https://brightdata.com/) account

### Step 1: Install libraries

In [1]:
%pip install requests
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Step 2: Set up proxy server

Assuming you've established your Bright Data account, the next step involves setting up your proxy server through this link: https://brightdata.com/cp/zones. In my experience, I opted for data center proxies due to their superior speed and cost-effectiveness.

I'll consistently choose data center proxies unless there are compelling reasons to reconsider. For example, when I still get blocked using data center proxies, I would choose to use residential proxies which is more expensive.

<img src="../assets/static/brightdata-create-datacenter-proxies-step1.png" width=500px alt="Step 1 - Create Datacenter proxies in Bright Data"><br>

Here I will choose to use premium data center proxies. Because this plan got a large pool of IP address I could rotate with:

For the other two plans, adding 1 more IP address would cost +$0.80/month...

<img src="../assets/static/brightdata-create-datacenter-proxies-step2.png" width=500px alt="Step 2 - Create Datacenter proxies in Bright Data"><br>


You can find the proxy pricing here (approximately $1.2/GB). Let's do a quick estimate: if each of your requests is around 50KB, sending 20,000 requests would only cost you $1.2!

<img src="../assets/static/brightdata-create-datacenter-proxies-step3.png" width=500px alt="Step 3 - Create Datacenter proxies in Bright Data">

### Step 3: IP rotating with Python library requests

In [2]:
import os
import random

import requests
from dotenv import load_dotenv

_ = load_dotenv()

url = "https://httpbin.co/ip"

proxy_hostname = os.getenv("PROXY_HOSTNAME")
proxy_username = os.getenv("PROXY_USERNAME")
proxy_password = os.getenv("PROXY_PASSWORD")

In [3]:
class ProxyServer:
    def __init__(self, proxy_username, proxy_password, proxy_hostname):
        self.proxy_username = proxy_username
        self.proxy_password = proxy_password
        self.proxy_hostname = proxy_hostname

    def __call__(self):
        rand_num = random.randint(1, 9999)
        # Reference: https://docs.brightdata.com/api-reference/proxy/rotate_ips
        ## Add -session parameter to this proxy to change IP address every requests is sent
        proxy = f"{self.proxy_username}-session-rand{rand_num}:{self.proxy_password}@{self.proxy_hostname}"
        return {
            "http": f"https://{proxy}",
            "https": f"https://{proxy}",
        }

proxy_servers = ProxyServer(proxy_username, proxy_password, proxy_hostname)
# call proxy_servers() to register a new IP proxy
# >> proxies = proxy_servers() 

In [4]:
# build a session object to preserve the session cookies (or login status, if applicable)
## throughout the process of sending web requests

session = requests.Session()
 
for idx in range(1, 11):
    response = session.request('GET', url=url, 
                proxies=proxy_servers()) # changing IP address every new requests
    # Note: you will see the web requests will be sent from different IP address 
    print("IP location metadata:", response.json())

IP location metadata: {'method': 'GET', 'ip': '206.204.63.141', 'country': 'US', 'timezone': 'America/Chicago', 'continent': 'NA'}
IP location metadata: {'method': 'GET', 'ip': '188.211.25.8', 'country': 'US', 'region': 'New York', 'city': 'Newark', 'timezone': 'America/New_York', 'continent': 'NA'}
IP location metadata: {'method': 'GET', 'ip': '206.204.36.211', 'country': 'US', 'timezone': 'America/Chicago', 'continent': 'NA'}
IP location metadata: {'method': 'GET', 'ip': '152.39.215.167', 'country': 'US', 'region': 'Virginia', 'city': 'Ashburn', 'timezone': 'America/New_York', 'continent': 'NA'}
IP location metadata: {'method': 'GET', 'ip': '178.171.112.130', 'country': 'NL', 'timezone': 'Europe/Amsterdam', 'continent': 'EU'}
IP location metadata: {'method': 'GET', 'ip': '94.176.86.56', 'country': 'US', 'timezone': 'America/Chicago', 'continent': 'NA'}
IP location metadata: {'method': 'GET', 'ip': '2.57.77.223', 'country': 'US', 'timezone': 'America/Chicago', 'continent': 'NA'}
IP lo

## Computing environment

In [5]:
%load_ext watermark

%watermark

# print out pypi packages used
%watermark --iversions

# date
%watermark -u -n -t -z

Last updated: 2024-03-03T20:27:32.125267+08:00

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 8.22.1

Compiler    : MSC v.1916 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
CPU cores   : 8
Architecture: 64bit

requests: 2.31.0

Last updated: Sun Mar 03 2024 20:27:32Malay Peninsula Standard Time

