Skip to content

Cloud Optimized Geotiffs

Jeffrey K Gillan edited this page Feb 20, 2024 · 36 revisions



Watch a Recording of the Workshop

Example Image

Modes of Getting Data Through HTTP

Traditional Download

  1. The typical model of getting geospatial data is to download it to your local hard disk and analyze it using local resources. Because geospatial data can often be quite large, you may be filling up your local hard disk unnecessarily. Very often we are concerned with a small area of interest, but need to download a large file (e.g., satellite imagery scene).

Load Full Files Into Memory

  1. Alternatively, many applications (QGIS, ArcGIS, python) can retrieve data from a url in cloud storage and bring it into the memory of your application. They data is not 'downloaded', but temporarily loaded in your application. However, large file sizes can quickly max out your memory.

Stream File Portions

  1. To counteract the limitations of large file downloads and memory usage, we can stream subsets of geospatial data from cloud storage directly into an application for analysis.

TIFF

The TIFF file format (Tagged Image File Format) is an old format dating back to 1992. TIFF are great for high-resolution verbatim raster images. TIFF are still used a bit in high-end photography, but where it has really grown a second life is in digital cartography. The variation called GeoTIFF has been widely adopted as a way to share satellite images and other satellite data.

While the GeoTIFF file format has long been thought of as only suitable for raw data: if you wanted to display it on a map, you’d convert it into tiles. If you wanted a static image, you’d render it into a PNG or JPEG. But Cloud-Optimized GeoTIFF means that GeoTIFFs can be a bit more accessible than they used to be.

Cloud Optimized GeoTIFF (COG)

A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. We can stream portions of a GeoTIFF file from a server into a client application without having to download the full file.

COGs have three major features that are baked into the file:

Tiling

Tiles are small, regular, and independent parts of a larger map. They are typically 256x256 pixels in size. COGs have tiles explicitly specified in their format structure, whereas regular GeoTIFFs may not. Tiling is a way to speed up map display because only the tiles that are visible in the current view need to be loaded.

Overviews

Overviews are downsampled thumbnail images of the tile. A COG will have many overviews matched to each Zoom Level.


HTTP(s) GET Range Request

The HTTP GET Range Request, also known as Byte Serving, allows a client to request specific chunks of the COG using a combination of the tiles and overviews. If you are zoomed into a specific portion of the COG, then you only request the tiles that are visible in the current view. This is the same technology that enables streaming of other media types like video and audio.

Check out the COG Specification




Example COGs on the internet

There are numerous cloud based data stores hosting COGs, take a look through a few of these:




Applications that can use COGs

COGs are geotiffs, so any software application that can read and work with geotiffs will be able to read and work with COGs. This includes QGIS, ArcGIS, and Google Earth Engine.

  • Learn how to Stream COGs into QGIS with this tutorial

  • You can stream COGs into ArcGIS Pro following this tutorial.

  • You can import and export Cloud Optimized Geotiffs in GEE




Creating Your Own COGs

Creating COGs can be accomplished using:

  • GDALcommand line tool. See this tutorial on using GDAL.

  • Cogger is a rapid COG generator from GeoTIFF

  • In the python ecosystem rio-cogeo is a RasterIO plugin to create and validate COGs. This is a python binding, meaning you still have to have GDAL installed on the computer where you are running the python code.



Stream COGs out of Cyverse Data Store

Please see this tutorial to learn how to use the Cyverse Data Store and stream COGs from it.



Jupyter Notebook Example

To solidify our understanding of COGs, let's do a hands-on example using a Jupyter Notebook hosted in Google Colab. The Notebook will show you how to write code to bring COGs into your python environment and do some simple analysis. Open Colab Notebook



If I have COGs, do I still need a tile server?

The breakthrough of COGs is that they have internal tiling which allows them to be streamed into applications without the need for an additional tile server. For most individuals and small organizations, this should be all you need.

However, there are a few reasons why you might still want to use a tile server with COGs:

  • Performance: Tile servers can cache tiles in memory, which can improve performance by reducing the number of times that tiles need to be read from disk.

  • Scalability: Tile servers can be scaled to handle large numbers of requests.

  • Security: Tile servers can be used to encrypt tiles, which can help to protect them from unauthorized access. If you are streaming COGs to a large number of clients or if you need to ensure that your tiles are secure, then I recommend using a tile server. However, if you are only streaming COGs to a small number of clients and you do not need to worry about security, then you can use the built-in tiles in COGs.

How are COGs different from XYZ and WMTS tiles?

Web mapping tile services (WMTS) and XYZ tiles are primarily designed for efficient map display in web environments. Their main goal is to provide quick and seamless map visualizations over the internet by serving small, pre-defined tiles at multiple zoom levels. These tiles are ideal for web maps where users might pan and zoom around the globe, as the small tiles can be fetched and displayed rapidly.

However, for analysis purposes – where users might want to compute statistics, apply algorithms, or extract detailed information from imagery or raster data – these tiling methods are not optimal. The reason is that analysis often requires access to raw, high-resolution data rather than the downsampled or potentially lossy representations provided by these tiles.

That's where formats like Cloud Optimized GeoTIFFs (COGs) come into play. COGs are designed to allow for efficient access to high-resolution raster datasets, making them more suited for analytical purposes. With COGs, one can access and process only specific portions of a large raster without downloading the entire file, making it efficient for cloud-based analysis workflows.

In summary, COGs are designed for efficient access to high-resolution raster data, while XYZ and WMTS tiles are designed for efficient map display.





Additional Resources

COGS in Production blog post by Sean Rennie

Blog on COGs

Mapscaping on COGS