# Illustrated London News Downloader

This notebook downloads zip files containing JPEG 2000 images from the Internet Archive. While it was originally designed for the Illustrated London News collection from 1842 to 1849, you can use it for any collection on the Internet Archive.

## What This Code Does

1. It connects to the Internet Archive.
2. It finds all the zip files for a specified collection.
3. It downloads these zip files to a specified location on your computer.
4. It checks for existing files, allowing you to resume interrupted downloads.

## Requirements

To use this code, you'll need:

1. Python installed on your computer
2. The `internetarchive` library

You can install the required library by uncommenting and running the cell below:

In [None]:
#pip install internetarchive

## Importing Required Libraries

We start by importing the necessary libraries:

In [None]:
import internetarchive as ia
from pathlib import Path
from internetarchive import get_item
import concurrent.futures

## Defining Helper Functions

Now we define two functions that will help us download the files:

In [None]:
def download_file(item_id, file_name, download_path):
    print(f"Downloading {file_name}...")
    get_item(item_id).download(files=file_name, destdir=str(download_path))

def download_zips(item_id, download_path):
    # Create a Path object for the download directory
    download_path = Path(download_path)
    download_path.mkdir(parents=True, exist_ok=True)  # Ensure the directory exists

    # Fetch the item from Internet Archive
    item = get_item(item_id)

    # Prepare to download files using threading
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = []
        for file in item.files:
            if file['name'].endswith('.zip'):
                file_path = download_path / file['name']
                if not file_path.exists():  # Check if the file already exists
                    # Schedule the download
                    futures.append(executor.submit(download_file, item_id, file['name'], download_path))
                else:
                    print(f"{file['name']} already exists.")

        # Wait for all downloads to complete
        concurrent.futures.wait(futures)

### Explanation of the Functions

1. `download_file(item_id, file_name, download_path)`:
   - This function downloads a single file from the Internet Archive.
   - It prints the name of the file being downloaded and uses the `get_item()` function to fetch and download the file.

2. `download_zips(item_id, download_path)`:
   - This is the main function that manages the download process.
   - It creates a folder to store the downloads if it doesn't exist.
   - It fetches information about the collection we want to download.
   - It then looks at each file in the collection:
     - If it's a zip file and hasn't been downloaded yet, it starts a download.
     - If the file already exists, it skips it and prints a message.
   - It uses multiple threads to download files simultaneously, making the process faster.

### Resuming Interrupted Downloads

An important feature of this code is its ability to resume interrupted downloads. Here's how it works:

1. Before downloading each file, the code checks if the file already exists in the download directory.
2. If the file exists, it skips that file and moves on to the next one.
3. If the file doesn't exist, it proceeds with the download.

This means that if your download process is interrupted (e.g., due to internet disconnection or if you stop the notebook), you can simply run the code again, and it will:
- Skip all the files that were already successfully downloaded
- Continue downloading from where it left off

This feature saves time and bandwidth by avoiding re-downloading files that are already complete.

## Downloading the Files

Now we can use these functions to download the collection. You can set your own `item_id` and `download_path`:

In [None]:
# Specify the item ID from the Internet Archive
item_id = 'illustrated-london-news-1842-1849'  # You can change this to any other collection ID

# Specify the download path
download_path = './downloads'  # You can change this to any folder on your computer

In [None]:
# Call the function to download zip files
download_zips(item_id, download_path)

## Note

- This script will download all zip files from the specified Internet Archive collection. Make sure you have enough space on your hard drive and a stable internet connection before running it.
- You can change the `item_id` to download from a different collection. Visit archive.org and find the identifier of the collection you want to download.
- You can change the `download_path` to save the files to a different location on your computer.
- If you stop the download process and restart it later, it will automatically skip files that have already been downloaded and continue with the remaining files. This makes it easy to resume interrupted downloads without having to start over from the beginning.