Skip to content

Latest commit

History

History
109 lines (86 loc) 路 6.93 KB

DOCS.md

File metadata and controls

109 lines (86 loc) 路 6.93 KB

Unsplash Dataset Documentation

The Unsplash Dataset is composed of multiple TSV files:

1 - photos.tsv

The photos.tsv dataset has one row per photo. It contains properties of the photo, the name of the contributor, the image URL, and overall stats.

Field Description
photo_id ID of the Unsplash photo
photo_url Permalink URL to the photo page on unsplash.com
photo_image_url URL of the image file. Note: this is a dynamic URL, so you can apply resizing and customization operations directly on the image
photo_submitted_at Timestamp of when the photo was submitted to Unsplash
photo_featured Whether the photo was promoted to the Editorial feed or not
photo_width Width of the photo in pixels
photo_height Height of the photo in pixels
photo_aspect_ratio Aspect ratio of the photo
photo_description Description of the photo written by the photographer
photographer_username Username of the photographer on Unsplash
photographer_first_name First name of the photographer
photographer_last_name Last name of the photographer
exif_camera_make Camera make (brand) extracted from the EXIF data
exif_camera_model Camera model extracted from the EXIF data
exif_iso ISO setting of the camera, extracted from the EXIF data
exif_aperture_value Aperture setting of the camera, extracted from the EXIF data
exif_focal_length Focal length setting of the camera, extracted from the EXIF data
exif_exposure_time Exposure time setting of the camera, extracted from the EXIF data
photo_location_name Location of the photo
photo_location_latitude Latitude of the photo
photo_location_longitude Longitude of the photo
photo_location_country Country where the photo was made
photo_location_city City where the photo was made
stats_views Total # of times that a photo has been viewed on the Unsplash platform
stats_downloads Total # of times that a photo has been downloaded via the Unsplash platform
ai_description Textual description of the photo, generated by a 3rd party AI
ai_primary_landmark_name Landmark present in the photo, generated by a 3rd party AI
ai_primary_landmark_latitude Latitude of the landmark, generated by a 3rd party AI
ai_primary_landmark_longitude Longitude of the landmark, generated by a 3rd party AI
ai_primary_landmark_confidence Landmark confidence of the 3rd party AI
blur_hash BlurHash hash of the photo

2 - keywords.tsv

The keywords.tsv dataset has one row per photo-keyword pair. It contains data about how a keyword is connected to a photo and the conversions of the photo our search engine for a particular keyword.

Field Description
photo_id ID of the Unsplash photo
keyword Keyword or search term
ai_service_1_confidence Confidence for the keyword from a 3rd party AI (0-100)
ai_service_2_confidence Confidence for the keyword from another 3rd party AI (0-100)
suggested_by_user Whether the keyword was added by a user (human)

3 - collections.tsv

Note: A collection on Unsplash is a user created grouping of photos. These are similar to boards on Pinterest and can often group photos in complex and creative ways.

The collections.tsv dataset has one row per photo-collection pair. Whenever a photo belongs to a collection created by a user, it will appear as one row. Each row describes when the photo was added to the collection and gives the title of the collection.

Field Description
photo_id ID of the Unsplash photo
collection_id ID of the Unsplash collection containing the photo
collection_title Title of the collection containing the photo
photo_collected_at Timestamp of when the photo was added to the collection

4 - conversions.tsv

Note: a conversion is currently defined as a user selecting an image to download it.

The conversions.tsv dataset has one row per search conversion. The dataset tells you which photo has been downloaded for a search, the country of origin, and an anonymous identifier to indiciate the unique users. The data goes back up to 1 year before the release of each version of the dataset.

Field Description
converted_at Timestamp of the conversion event
conversion_type Type of conversion (download only for now)
keyword Keyword that was searched and led to the conversion
photo_id Photo ID of the photo that converted
anonymous_user_id Anonymous user ID
conversion_country Country code of the device geolocation

5 - colors.tsv

Note: The coverage and score data comes from a 3rd party AI

The colors.tsv dataset has one row per major color present in the photo. The dataset tells which colors are contained within a photo, their coverage as a percentage, and a score for how in focus the color is.

Field Description
photo_id ID of the Unsplash photo
hex Hexadecimal representation of the color
red Red component of the photo in the RGB system
green Green component of the photo in the RGB system
blue Blue component of the photo in the RGB system
keyword Name of the closest color as a CSS color keyword
coverage Pixel coverage of the color as a percentage
score Score of the color in the photo (including the notion of focus)

Combining datasets

You can merge the different datasets through the primary key ID fields (usually the photo_id field). With this you'll be able to cross-reference properties from the photos dataset with data from the keywords or conversions dataset.


For help loading the dataset, see the how to docs.