Skip to content

AliAkhtari78/SpotifyScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SpotifyScraper

PyPI version Python Support Documentation Status codecov License: MIT

Extract Spotify data without the official API. Access tracks, albums, artists, playlists, and podcasts - no authentication required.

Why SpotifyScraper?

  • πŸ”“ No API Key Required - Start extracting data immediately
  • πŸš€ Fast & Lightweight - Optimized for speed and minimal dependencies
  • πŸ“Š Complete Metadata - Get all available track, album, artist details
  • πŸŽ™οΈ Podcast Support - Extract podcast episodes and show information
  • πŸ’Ώ Media Downloads - Download cover art and preview clips
  • πŸ”„ Bulk Operations - Process multiple URLs efficiently
  • πŸ›‘οΈ Robust & Reliable - Comprehensive error handling and retries

Installation

# Basic installation
pip install spotifyscraper

# With Selenium support (includes automatic driver management)
pip install spotifyscraper[selenium]

# All features
pip install spotifyscraper[all]

Quick Start

Basic Usage

from spotify_scraper import SpotifyClient

# Initialize client with rate limiting (default 0.5s between requests)
client = SpotifyClient()

# Get track info with enhanced metadata
track = client.get_track_info("https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh")
print(f"{track['name']} by {track['artists'][0]['name']}")
# Output: One More Time by Daft Punk

# Access new fields (when available)
print(f"Track #{track.get('track_number', 'N/A')} on disc {track.get('disc_number', 'N/A')}")
print(f"Popularity: {track.get('popularity', 'Not available')}")

# Download cover art
cover_path = client.download_cover("https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh")
print(f"Cover saved to: {cover_path}")

client.close()

CLI Usage

# Get track info
spotify-scraper track https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh

# Download album with covers
spotify-scraper download album https://open.spotify.com/album/0JGOiO34nwfUdDrD612dOp --with-covers

# Export playlist to JSON
spotify-scraper playlist https://open.spotify.com/playlist/37i9dQZF1DXcBWIGoYBM5M --output playlist.json

Important Notes

Field Availability

Not all fields shown in Spotify's API documentation are available via web scraping:

  • ❌ NOT Available: popularity, followers, genres, detailed statistics
  • βœ… Available: name, artists, album info, duration, preview URLs
  • ⚠️ Authentication Required: lyrics (needs OAuth, not just cookies)

Core Features

🎡 Track Information

# Get complete track metadata
track = client.get_track_info(track_url)

# Available data:
# - name, id, uri, duration_ms
# - artists (with names, IDs, and verification status)  
# - album (with name, ID, release date, images, total_tracks)
# - preview_url (30-second MP3)
# - is_explicit, is_playable
# - track_number, disc_number (when available)
# - popularity (when available)
# - external URLs

# Example: Access enhanced metadata
if 'artists' in track:
    for artist in track['artists']:
        print(f"Artist: {artist['name']}")
        if 'verified' in artist:
            print(f"  Verified: {artist['verified']}")
        if 'url' in artist:
            print(f"  URL: {artist['url']}")

if 'album' in track:
    album = track['album']
    print(f"Album: {album['name']} ({album.get('total_tracks', 'N/A')} tracks)")

# Note: Lyrics require OAuth authentication
# SpotifyScraper cannot access lyrics as Spotify requires Bearer tokens

πŸ’Ώ Album Information

# Get album with all tracks
album = client.get_album_info(album_url)

print(f"Album: {album.get('name', 'Unknown')}")
print(f"Artist: {(album.get('artists', [{}])[0].get('name', 'Unknown') if album.get('artists') else 'Unknown')}")
print(f"Released: {album.get('release_date', 'N/A')}")
print(f"Tracks: {album.get('total_tracks', 0)}")

# List all tracks
for track in album['tracks']:
    print(f"  {track['track_number']}. {track.get('name', 'Unknown')}")

πŸ‘€ Artist Information

# Get artist profile
artist = client.get_artist_info(artist_url)

print(f"Artist: {artist.get('name', 'Unknown')}")
print(f"Followers: {artist.get('followers', {}).get('total', 'N/A'):,}")
print(f"Genres: {', '.join(artist.get('genres', []))}")
print(f"Popularity: {artist.get('popularity', 'N/A')}/100")

# Get top tracks
for track in artist.get('top_tracks', [])[:5]:
    print(f"  - {track.get('name', 'Unknown')}")

πŸ“‹ Playlist Information

# Get playlist details
playlist = client.get_playlist_info(playlist_url)

print(f"Playlist: {playlist.get('name', 'Unknown')}")
print(f"Owner: {playlist.get('owner', {}).get('display_name', playlist.get('owner', {}).get('id', 'Unknown'))}")
print(f"Tracks: {playlist.get('track_count', 0)}")
print(f"Followers: {playlist.get('followers', {}).get('total', 'N/A'):,}")

# Get all tracks
for track in playlist['tracks']:
    print(f"  - {track.get('name', 'Unknown')} by {(track.get('artists', [{}])[0].get('name', 'Unknown') if track.get('artists') else 'Unknown')}")

πŸŽ™οΈ Podcast Support (NEW!)

Episode Information

# Get episode details
episode = client.get_episode_info(episode_url)

print(f"Episode: {episode.get('name', 'Unknown')}")
print(f"Show: {episode.get('show', {}).get('name', 'Unknown')}")
print(f"Duration: {episode.get('duration_ms', 0) / 1000 / 60:.1f} minutes")
print(f"Release Date: {episode.get('release_date', 'N/A')}")
print(f"Has Video: {'Yes' if episode.get('has_video') else 'No'}")

# Download episode preview (1-2 minute clip)
preview_path = client.download_episode_preview(
    episode_url,
    path="podcast_previews/",
    filename="episode_preview"
)
print(f"Preview downloaded to: {preview_path}")

Show Information

# Get podcast show details
show = client.get_show_info(show_url)

print(f"Show: {show.get('name', 'Unknown')}")
print(f"Publisher: {show.get('publisher', 'Unknown')}")
print(f"Total Episodes: {show.get('total_episodes', 'N/A')}")
print(f"Categories: {', '.join(show.get('categories', []))}")

# Get recent episodes
for episode in show.get('episodes', [])[:5]:
    print(f"  - {episode.get('name', 'Unknown')} ({episode.get('duration_ms', 0) / 1000 / 60:.1f} min)")

CLI Commands for Podcasts

# Get episode info
spotify-scraper episode info https://open.spotify.com/episode/...

# Download episode preview
spotify-scraper episode download https://open.spotify.com/episode/... -o previews/

# Get show info with episodes
spotify-scraper show info https://open.spotify.com/show/...

# List show episodes
spotify-scraper show episodes https://open.spotify.com/show/... -o episodes.json

Note: Full episode downloads require Spotify Premium authentication. SpotifyScraper currently supports preview clips only.

πŸ“₯ Media Downloads

# Download track preview (30-second MP3)
audio_path = client.download_preview_mp3(
    track_url,
    path="previews/",
    filename="custom_name.mp3"
)

# Download cover art
cover_path = client.download_cover(
    album_url,
    path="covers/",
    size_preference="large",  # small, medium, large
    format="jpeg"  # jpeg or png
)

# Download all playlist covers
from spotify_scraper.utils.common import SpotifyBulkOperations

bulk = SpotifyBulkOperations(client)
covers = bulk.download_playlist_covers(
    playlist_url,
    output_dir="playlist_covers/"
)

πŸ”„ Bulk Operations

from spotify_scraper.utils.common import SpotifyBulkOperations

# Process multiple URLs
urls = [
    "https://open.spotify.com/track/...",
    "https://open.spotify.com/album/...",
    "https://open.spotify.com/artist/..."
]

bulk = SpotifyBulkOperations()
results = bulk.process_urls(urls, operation="all_info")

# Export results
bulk.export_to_json(results, "spotify_data.json")
bulk.export_to_csv(results, "spotify_data.csv")

# Batch download media
downloads = bulk.batch_download(
    urls,
    output_dir="downloads/",
    media_types=["audio", "cover"]
)

πŸ“Š Data Analysis

from spotify_scraper.utils.common import SpotifyDataAnalyzer

analyzer = SpotifyDataAnalyzer()

# Analyze playlist
stats = analyzer.analyze_playlist(playlist_data)
print(f"Total duration: {stats['basic_stats']['total_duration_formatted']}")
print(f"Most common artist: {stats['artist_stats']['top_artists'][0]}")
print(f"Average popularity: {stats['basic_stats']['average_popularity']}")

# Compare playlists
comparison = analyzer.compare_playlists(playlist1, playlist2)
print(f"Common tracks: {comparison['track_comparison']['common_tracks']}")
print(f"Similarity: {comparison['track_comparison']['similarity_percentage']:.1f}%")

Advanced Configuration

Browser Selection

# Use requests (default, fast)
client = SpotifyClient(browser_type="requests")

# Use Selenium (for JavaScript content)
client = SpotifyClient(browser_type="selenium")

# Auto-detect (falls back to Selenium if needed)
client = SpotifyClient(browser_type="auto")

Authentication

# Using cookies file (exported from browser)
client = SpotifyClient(cookie_file="spotify_cookies.txt")

# Using cookie dictionary
client = SpotifyClient(cookies={"sp_t": "your_token"})

# Using headers
client = SpotifyClient(headers={
    "User-Agent": "Custom User Agent",
    "Accept-Language": "en-US,en;q=0.9"
})

Proxy Support

client = SpotifyClient(proxy={
    "http": "http://proxy.example.com:8080",
    "https": "https://proxy.example.com:8080"
})

Logging

# Set logging level
client = SpotifyClient(log_level="DEBUG")

# Or use standard logging
import logging
logging.basicConfig(level=logging.INFO)

API Reference

SpotifyClient

The main client for interacting with Spotify.

Methods:

  • get_track_info(url) - Get track metadata
  • get_track_lyrics(url) - Get track lyrics (requires auth)
  • get_track_info_with_lyrics(url) - Get track with lyrics
  • get_album_info(url) - Get album metadata
  • get_artist_info(url) - Get artist metadata
  • get_playlist_info(url) - Get playlist metadata
  • download_preview_mp3(url, path, filename) - Download track preview
  • download_cover(url, path, size_preference, format) - Download cover art
  • close() - Close the client and clean up resources

SpotifyBulkOperations

Utilities for processing multiple URLs.

Methods:

  • process_urls(urls, operation) - Process multiple URLs
  • export_to_json(data, output_file) - Export to JSON
  • export_to_csv(data, output_file) - Export to CSV
  • batch_download(urls, output_dir, media_types) - Batch download media
  • process_url_file(file_path, operation) - Process URLs from file
  • extract_urls_from_text(text) - Extract Spotify URLs from text

SpotifyDataAnalyzer

Tools for analyzing Spotify data.

Methods:

  • analyze_playlist(playlist_data) - Get playlist statistics
  • compare_playlists(playlist1, playlist2) - Compare two playlists

Examples

Download All Album Tracks

# Get album info
album = client.get_album_info(album_url)

# Download all track previews
for track in album['tracks']:
    track_url = f"https://open.spotify.com/track/{track['id']}"
    client.download_preview_mp3(track_url, path=f"album_{album.get('name', 'Unknown')}/")

Export Artist Discography

artist = client.get_artist_info(artist_url)

# Get all albums
albums_data = []
for album in artist['albums']['items']:
    album_url = f"https://open.spotify.com/album/{album['id']}"
    album = client.get_album_info(album_url)
    albums_data.append(album_info)

# Export to JSON
import json
with open(f"{artist.get('name', 'Unknown')}_discography.json", "w") as f:
    json.dump(albums_data, f, indent=2)

Create Playlist Report

from spotify_scraper.utils.common import SpotifyDataFormatter

formatter = SpotifyDataFormatter()

# Get playlist
playlist = client.get_playlist_info(playlist_url)

# Create markdown report
markdown = formatter.format_playlist_markdown(playlist)
with open("playlist_report.md", "w") as f:
    f.write(markdown)

# Create M3U file
tracks = [item['track'] for item in playlist['tracks']]
formatter.export_to_m3u(tracks, "playlist.m3u")

Error Handling

from spotify_scraper.core.exceptions import (
    SpotifyScraperError,
    URLError,
    ExtractionError,
    DownloadError
)

try:
    track = client.get_track_info(url)
except URLError:
    print("Invalid Spotify URL")
except ExtractionError as e:
    print(f"Failed to extract data: {e}")
except SpotifyScraperError as e:
    print(f"General error: {e}")

Command Line Interface

# General syntax
spotify-scraper [COMMAND] [URL] [OPTIONS]

# Commands:
#   track      Get track information
#   album      Get album information
#   artist     Get artist information
#   playlist   Get playlist information
#   download   Download media files

# Global options:
#   --output, -o      Output file path
#   --format, -f      Output format (json, csv, txt)
#   --pretty          Pretty print output
#   --log-level       Set logging level
#   --cookies         Cookie file path

# Examples:
spotify-scraper track $URL --pretty
spotify-scraper album $URL -o album.json
spotify-scraper playlist $URL -f csv -o playlist.csv
spotify-scraper download track $URL --with-cover --path downloads/

Environment Variables

Configure SpotifyScraper using environment variables:

export SPOTIFY_SCRAPER_LOG_LEVEL=DEBUG
export SPOTIFY_SCRAPER_BROWSER_TYPE=selenium
export SPOTIFY_SCRAPER_COOKIE_FILE=/path/to/cookies.txt
export SPOTIFY_SCRAPER_PROXY_HTTP=http://proxy:8080

Requirements

  • Python 3.8 or higher
  • Operating System: Windows, macOS, Linux, BSD
  • Dependencies:
    • requests (for basic operations)
    • beautifulsoup4 (for HTML parsing)
    • selenium (optional, for JavaScript content)

Troubleshooting

Common Issues

1. SSL Certificate Errors

client = SpotifyClient(verify_ssl=False)  # Not recommended for production

2. Rate Limiting

import time
for url in urls:
    track = client.get_track_info(url)
    time.sleep(1)  # Add delay between requests

3. Cloudflare Protection

# Use Selenium backend
client = SpotifyClient(browser_type="selenium")

4. Missing Data

# Some fields might be None
track = client.get_track_info(url)
artist_name = track.get('artists', [{}])[0].get('name', 'Unknown')

Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Clone the repository
git clone https://github.com/AliAkhtari78/SpotifyScraper.git

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
black src/ tests/
flake8 src/ tests/
mypy src/

License

SpotifyScraper is released under the MIT License. See LICENSE for details.

Disclaimer

This library is for educational and personal use only. Always respect Spotify's Terms of Service and robots.txt. The authors are not responsible for any misuse of this library.

Support


SpotifyScraper - Extract Spotify data with ease 🎡

Made with ❀️ by Ali Akhtari