# Instagram Media Downloader - Agentic Browser Automation

This notebook uses Playwright to automate Instagram browsing and capture media URLs from network requests.

## Features
- Browser automation with Playwright
- Network monitoring to capture image/video URLs
- Automatic login with your credentials
- Click through posts and download media
- Mobile view support for better media quality

## Installation

First, install required packages:

In [None]:
!pip install playwright requests
!playwright install chromium

## Import Libraries

In [None]:
import asyncio
import os
import re
import requests
from pathlib import Path
from datetime import datetime
from playwright.async_api import async_playwright
from urllib.parse import urlparse, parse_qs
import json

## Configuration

In [None]:
# Instagram credentials
INSTAGRAM_USERNAME = "your_username"  # Replace with your username
INSTAGRAM_PASSWORD = "your_password"  # Replace with your password

# Download settings
DOWNLOAD_FOLDER = "instagram_downloads"
USE_MOBILE_VIEW = True  # Mobile view often provides better quality
HEADLESS = False  # Set to True to hide browser window

# Create download folder
Path(DOWNLOAD_FOLDER).mkdir(exist_ok=True)

## Media URL Collector

This class monitors network requests and captures media URLs

In [None]:
class MediaCollector:
    def __init__(self):
        self.media_urls = set()
        self.downloaded_urls = set()
        
    def handle_request(self, request):
        """Capture media URLs from network requests"""
        url = request.url
        
        # Check for image URLs
        if any(pattern in url for pattern in ['.jpg', '.jpeg', '.png', 'cdninstagram.com']):
            if 'cdninstagram.com' in url:
                self.media_urls.add(('image', url))
                
        # Check for video URLs
        if any(pattern in url for pattern in ['.mp4', 'video', 'cdninstagram.com']):
            if '.mp4' in url or 'video' in url:
                self.media_urls.add(('video', url))
    
    def handle_response(self, response):
        """Capture media URLs from responses"""
        url = response.url
        content_type = response.headers.get('content-type', '')
        
        # Check content type for media
        if 'image' in content_type and 'cdninstagram.com' in url:
            self.media_urls.add(('image', url))
        elif 'video' in content_type:
            self.media_urls.add(('video', url))
    
    def download_media(self, media_type, url, download_folder):
        """Download media file"""
        if url in self.downloaded_urls:
            return None
            
        try:
            # Generate filename
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            ext = '.mp4' if media_type == 'video' else '.jpg'
            
            # Clean URL for filename
            url_hash = hash(url) % 10000
            filename = f"{media_type}_{timestamp}_{url_hash}{ext}"
            filepath = os.path.join(download_folder, filename)
            
            # Download file
            response = requests.get(url, stream=True, timeout=30)
            if response.status_code == 200:
                with open(filepath, 'wb') as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        f.write(chunk)
                
                self.downloaded_urls.add(url)
                print(f"✓ Downloaded: {filename} ({media_type})")
                return filepath
            else:
                print(f"✗ Failed to download: {url} (Status: {response.status_code})")
        except Exception as e:
            print(f"✗ Error downloading {url}: {str(e)}")
        
        return None
    
    def download_all(self, download_folder):
        """Download all collected media"""
        print(f"\nFound {len(self.media_urls)} media URLs")
        downloaded = []
        
        for media_type, url in self.media_urls:
            filepath = self.download_media(media_type, url, download_folder)
            if filepath:
                downloaded.append(filepath)
        
        return downloaded

## Instagram Browser Automation

In [None]:
class InstagramDownloader:
    def __init__(self, username, password, headless=False, mobile_view=True):
        self.username = username
        self.password = password
        self.headless = headless
        self.mobile_view = mobile_view
        self.collector = MediaCollector()
        
    async def login(self, page):
        """Login to Instagram"""
        print("Logging in to Instagram...")
        
        try:
            # Wait for login form
            await page.wait_for_selector('input[name="username"]', timeout=10000)
            
            # Fill credentials
            await page.fill('input[name="username"]', self.username)
            await page.fill('input[name="password"]', self.password)
            
            # Click login button
            await page.click('button[type="submit"]')
            
            # Wait for navigation
            await asyncio.sleep(5)
            
            # Handle "Save Your Login Info" dialog
            try:
                not_now_button = page.locator('button:has-text("Not now")')
                if await not_now_button.count() > 0:
                    await not_now_button.first.click()
                    await asyncio.sleep(2)
            except:
                pass
            
            # Handle notifications dialog
            try:
                not_now_button = page.locator('button:has-text("Not Now")')
                if await not_now_button.count() > 0:
                    await not_now_button.first.click()
                    await asyncio.sleep(2)
            except:
                pass
            
            print("✓ Logged in successfully")
            return True
            
        except Exception as e:
            print(f"✗ Login failed: {str(e)}")
            return False
    
    async def browse_profile(self, page, profile_url, num_posts=10):
        """Browse a profile and download media"""
        print(f"\nBrowsing profile: {profile_url}")
        
        await page.goto(profile_url)
        await asyncio.sleep(3)
        
        # Click on first post
        try:
            first_post = page.locator('article a').first
            await first_post.click()
            await asyncio.sleep(3)
            
            # Navigate through posts
            for i in range(num_posts):
                print(f"\nViewing post {i+1}/{num_posts}")
                
                # Wait for media to load
                await asyncio.sleep(3)
                
                # If it's a carousel, click through images
                try:
                    next_in_carousel = page.locator('button[aria-label="Next"]').first
                    carousel_clicks = 0
                    while carousel_clicks < 10:  # Max 10 items in carousel
                        if await next_in_carousel.count() > 0 and await next_in_carousel.is_visible():
                            await next_in_carousel.click()
                            await asyncio.sleep(2)
                            carousel_clicks += 1
                        else:
                            break
                except:
                    pass
                
                # Move to next post
                if i < num_posts - 1:
                    try:
                        next_button = page.locator('a:has-text("Next")')
                        if await next_button.count() == 0:
                            next_button = page.locator('button:has([aria-label="Next"])')
                        
                        if await next_button.count() > 0:
                            await next_button.first.click()
                            await asyncio.sleep(3)
                        else:
                            print("No more posts to navigate")
                            break
                    except Exception as e:
                        print(f"Could not navigate to next post: {str(e)}")
                        break
            
        except Exception as e:
            print(f"Error browsing posts: {str(e)}")
    
    async def run(self, target_url, num_posts=10):
        """Main execution method"""
        async with async_playwright() as p:
            # Launch browser
            browser = await p.chromium.launch(
                headless=self.headless,
                args=['--disable-blink-features=AutomationControlled']
            )
            
            # Create context with mobile emulation if requested
            if self.mobile_view:
                context = await browser.new_context(
                    viewport={'width': 375, 'height': 812},
                    user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1'
                )
            else:
                context = await browser.new_context()
            
            page = await context.new_page()
            
            # Set up network monitoring
            page.on('request', self.collector.handle_request)
            page.on('response', self.collector.handle_response)
            
            try:
                # Go to Instagram
                await page.goto('https://www.instagram.com/')
                await asyncio.sleep(3)
                
                # Login
                if await self.login(page):
                    # Browse target profile/post
                    await self.browse_profile(page, target_url, num_posts)
                    
                    # Download all collected media
                    print("\n" + "="*50)
                    print("Starting downloads...")
                    print("="*50)
                    downloaded = self.collector.download_all(DOWNLOAD_FOLDER)
                    print(f"\n✓ Downloaded {len(downloaded)} media files to '{DOWNLOAD_FOLDER}'")
                
            except Exception as e:
                print(f"Error: {str(e)}")
            finally:
                await browser.close()

## Usage Example 1: Download from a Profile

In [None]:
# Download media from a profile
downloader = InstagramDownloader(
    username=INSTAGRAM_USERNAME,
    password=INSTAGRAM_PASSWORD,
    headless=HEADLESS,
    mobile_view=USE_MOBILE_VIEW
)

# Replace with the profile you want to download from
profile_url = "https://www.instagram.com/username/"
num_posts_to_download = 5

# Run the downloader
await downloader.run(profile_url, num_posts=num_posts_to_download)

## Alternative: Simple Instaloader Method

If you prefer a simpler command-line approach without browser automation:

In [None]:
# Install instaloader
!pip install instaloader

In [None]:
import instaloader

# Create instance
L = instaloader.Instaloader(
    download_videos=True,
    download_video_thumbnails=False,
    download_geotags=False,
    download_comments=False,
    save_metadata=False,
    compress_json=False,
)

# Login (optional, for private accounts)
# L.login(INSTAGRAM_USERNAME, INSTAGRAM_PASSWORD)

# Download from profile
# profile = instaloader.Profile.from_username(L.context, "username")
# for post in profile.get_posts():
#     L.download_post(post, target=profile.username)

print("Instaloader ready to use!")

## Manual Network Tab Method

If you want to manually inspect network requests:

1. Open Instagram in your browser
2. Press F12 to open Developer Tools
3. Go to the Network tab
4. Filter by 'media' or 'video'
5. Navigate to a post
6. Look for requests ending in .mp4 (video) or from cdninstagram.com (images)
7. Right-click the request → Open in new tab
8. Save the media file

The automation above does this programmatically!

## Notes

- **Rate Limiting**: Instagram may block excessive requests. Use delays between requests.
- **Terms of Service**: Only download content you have permission to access.
- **Mobile View**: Often provides higher quality media URLs.
- **Network Monitoring**: The script captures all media URLs loaded in the browser.
- **Error Handling**: The script includes retries and error handling for robustness.