Skip to content

zduclos/artvee-scraper

Repository files navigation

artvee-scraper

artvee-scraper is an easy to use command line utility for fetching public domain artwork from https://www.artvee.com.

Installation

Using PyPI

$ python -m pip install artvee-scraper

Python 3.8+ is officially supported.

Synopsis

artvee-scraper <command> [optional arguments] [positional arguments]

Examples

View help

$ artvee-scraper -h
usage: artvee-scraper [-h] {log-json,file-json,file-multi} ...

Scrape artwork from https://www.artvee.com

positional arguments:
  {log-json,file-json,file-multi}
    log-json            Artwork is output to the log as a JSON object
    file-json           Artwork is represented as a JSON object and written to a file
    file-multi          Artwork image and metadata are written as separate files

optional arguments:
  -h, --help            show this help message and exit

View help for the file-json command

$ artvee-scraper file-json -h
usage: artvee-scraper file-json [-h] [-t [1-16]] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                    [-c {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}]
                    [--log-dir LOG_DIR] [--log-max-size [1-10240]] [--log-max-backups [0-100]]
                    [--space-level [2-6]] [--sort-keys] [--overwrite-existing]
                    dir_path

positional arguments:
  dir_path              JSON file output directory

optional arguments:
  -h, --help            show this help message and exit
  -t [1-16], --worker-threads [1-16]
                        Number of worker threads (1-16)
  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set the application log level
  -c {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}, --category {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}
                        Category of artwork to scrape
  --space-level [2-6]   Enable pretty-printing; number of spaces to indent (2-6)
  --sort-keys           Sort JSON keys in alphabetical order
  --overwrite-existing  Overwrite existing files

optional log file arguments:
  --log-dir LOG_DIR     Log file output directory
  --log-max-size [1-10240]
                        Maximum log file size in MB (1-10,240)
  --log-max-backups [0-100]
                        Maximum number of log files to keep (0-100)

Download artwork from artvee.com and save each as individal files (JSON format) in the directory ~/artvee/downloads

$ artvee-scraper file-json ~/artvee/downloads

Available Commands

log-json

Download artwork and output each to the log as a JSON objects. Note: This command is intended for development test usage; typically it is not desirable to dump the data to the log.

$ artvee-scraper log-json [optional arguments]
Optional arguments

-h | --help (boolean)

Display help message.

-t | --worker-threads (integer)

The number of worker threads used for processing. Range of values is [1-16]. The default value is 3.

-l | --log-level (string)

Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.

-c | --category (string)

Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals, -c drawings). The default value is ALL categories.

Optional log file arguments

--log-dir (string)

Path to existing directory used to store artvee_scraper.log log files. Disabled by default.

--log-max-size (integer)

Maximum size in MB the log file should reach before triggering a rollover. Only applies if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).

--log-max-backups (integer)

Maximum number of log file archives to keep. Only applies if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.

Optional writer arguments

--space-level (integer)

Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.

--sort-keys (boolean)

Sort JSON keys in alphabetical order. Disabled by default.

--include-image (boolean)

Image will be included in output. Excessive output warning! Disabled by default.

Basic Example
$ artvee-scraper log-json
Output:
...
2038-01-19 03:14:07.988 DEBUG [ThreadPoolExecutor-0_0] scraper._image_link_from(120) | Retrieving image download link from URL https://artvee.com/dl/study-for-old-canal-red-green/
2038-01-19 03:14:07.989 DEBUG [ThreadPoolExecutor-0_0] connectionpool._new_conn(1001) | Starting new HTTPS connection (1): artvee.com:443
2038-01-19 03:14:07.999 INFO [ThreadPoolExecutor-0_0] log_writer.write(44) | {"url": "https://artvee.com/dl/study-for-old-canal-red-green/", "title": "Study for Old Canal (Red & Green)", "category": "Abstract", "artist": "Oscar Bluemner", "date": "1916", "origin": "American, 1867-1938"}
...
Advanced Example
$ artvee-scraper log-json --worker-threads 2 --log-level DEBUG --category abstract --log-dir /var/log/artvee --log-max-size 2048 --log-max-backups 10 --space-level 2 --sort-keys --include-image
Output:
$ cat /var/log/artvee/artvee_scraper.log
...
2038-01-19 03:14:07.988 DEBUG [ThreadPoolExecutor-0_0] scraper._image_link_from(120) | Retrieving image download link from URL https://artvee.com/dl/study-for-old-canal-red-green/
2038-01-19 03:14:07.989 DEBUG [ThreadPoolExecutor-0_0] connectionpool._new_conn(1001) | Starting new HTTPS connection (1): artvee.com:443
2038-01-19 03:14:07.999 INFO [ThreadPoolExecutor-0_0] log_writer.write(44) | {
  "artist": "Oscar Bluemner",
  "category": "Abstract",
  "date": "1916",
  "image": "/9j/4AAQSkZJRgABA ... o4xSSSVkumh//9k="
  "origin": "American, 1867-1938",
  "title": "Study for Old Canal (Red & Green)",
  "url": "https://artvee.com/dl/study-for-old-canal-red-green/"
}
...

file-json

Download artwork and write each to the filesystem. Each artwork is stored as a JSON object.

$ artvee-scraper file-json [optional arguments] <dir_path>
Positional arguments

dir_path (string) Position 0.

Path to existing directory used to store output files.

Optional arguments

-h | --help (boolean)

Display help message.

-t | --worker-threads (integer)

The number of worker threads used for processing. Range of values is [1-16]. The default value is 3.

-l | --log-level (string)

Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.

-c | --category (string)

Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals, -c drawings). The default value is ALL categories.

Optional log file arguments

--log-dir (string)

Path to existing directory used to store artvee_scraper.log log files. Disabled by default.

--log-max-size (integer)

Maximum size in MB the log file should reach before triggering a rollover. Only enabled if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).

--log-max-backups (integer)

Maximum number of log file archives to keep. Only enabled if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.

Optional writer arguments

--space-level (integer)

Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.

--sort-keys (boolean)

Sort JSON keys in alphabetical order. Disabled by default.

--overwrite-existing (boolean)

Allow existing duplicate files to be overwritten. Disabled by default.

Basic Example
$ artvee-scraper file-json ~/artvee/downloads
Output:
$ cat ~/artvee/downloads/peter-nicolai-arbo-the-valkyrie.json
{"url": "https://artvee.com/dl/the-valkyrie-2/", "title": "The Valkyrie", "category": "Mythology", "artist": "Peter Nicolai Arbo", "date": "1869", "origin": "Norwegian, 1831–1892", "image": "/9j/4AAQSkZJRgABA ... o4xSSSVkumh//9k="}
Advanced Example
$ artvee-scraper file-json --worker-threads 1 --log-level INFO --category mythology --log-dir /var/log/artvee --log-max-size 512 --log-max-backups 10 --space-level 4 --sort-keys --overwrite-existing ~/artvee/downloads
Output:
$ cat ~/artvee/downloads/peter-nicolai-arbo-the-valkyrie.json
{
    "artist": "Peter Nicolai Arbo",
    "category": "Mythology",
    "date": "1869",
    "image": "/9j/4AAQSkZJRgABA ... o4xSSSVkumh//9k="
    "origin": "Norwegian, 1831–1892",
    "title": "The Valkyrie",
    "url": "https://artvee.com/dl/the-valkyrie-2/"
}

file-multi

Download artwork and write each to the filesystem. Each artwork is stored as two files: metadata (JSON) & image (JPG).

$ artvee-scraper file-multi [optional arguments] <metadata_dir_path> <image_dir_path>
Positional arguments

metadata_dir_path (string) Position 0.

Path to existing directory used to store output metadata files.

image_dir_path (string) Position 1.

Path to existing directory used to store output image files.

Optional arguments

-h | --help (boolean)

Display help message.

-t | --worker-threads (integer)

The number of worker threads used for processing. Range of values is [1-16]. The default value is 3.

-l | --log-level (string)

Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.

-c | --category (string)

Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals -c drawings). The default value is ALL categories.

Optional log file arguments

--log-dir (string)

Path to existing directory used to store artvee_scraper.log log files. Disabled by default.

--log-max-size (integer)

Maximum size in MB the log file should reach before triggering a rollover. Only enabled if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).

--log-max-backups (integer)

Maximum number of log file archives to keep. Only enabled if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.

Optional writer arguments

--space-level (integer)

Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.

--sort-keys (boolean)

Sort JSON keys in alphabetical order. Disabled by default.

--overwrite-existing (boolean)

Allow existing duplicate files to be overwritten. Disabled by default.

Basic Example
$ artvee-scraper file-multi ~/artvee/downloads/metadata ~/artvee/downloads/images
Output:
$ cat ~/artvee/downloads/metadata/peter-nicolai-arbo-the-valkyrie.json
{"url": "https://artvee.com/dl/the-valkyrie-2/", "title": "The Valkyrie", "category": "Mythology", "artist": "Peter Nicolai Arbo", "date": "1869", "origin": "Norwegian, 1831–1892"}

$ cat ~/artvee/downloads/images/peter-nicolai-arbo-the-valkyrie.jpg
<FF><D8><FF><E0>^@^PJFIF^@^A^A^A^A,^A,^@^@<FF><E1><D5>$Exif^@^@II*^@^
...
^<X-nA2�_vއ�%6�gS`QErVOOqk�;R,u{w9~onDb���sE�WQ㿟xyr�
Advanced Example
$ artvee-scraper file-multi --worker-threads 1 --log-level INFO --category mythology --log-dir /var/log/artvee --log-max-size 512 --log-max-backups 10 --space-level 2 --sort-keys --overwrite-existing ~/artvee/downloads/metadata ~/artvee/downloads/images
Output:
$ cat ~/artvee/downloads/metadata/peter-nicolai-arbo-the-valkyrie.json
{
  "artist": "Peter Nicolai Arbo",
  "category": "Mythology",
  "date": "1869",
  "origin": "Norwegian, 1831–1892",
  "title": "The Valkyrie",
  "url": "https://artvee.com/dl/the-valkyrie-2/"
}
$ cat ~/artvee/downloads/images/peter-nicolai-arbo-the-valkyrie.jpg
<FF><D8><FF><E0>^@^PJFIF^@^A^A^A^A,^A,^@^@<FF><E1><D5>$Exif^@^@II*^@^
...
^<X-nA2�_vއ�%6�gS`QErVOOqk�;R,u{w9~onDb���sE�WQ㿟xyr�