# Intorduction to Geospatial Data and QGIS

This tutorial will provide a quick introduction of geospatial (GIS) datasets, and the free QGIS software to analyze and visualize them.

_We provide two pre-processed GIS datasets on [demographics](data/demographics.gpkg) and [the MBTA network](data/mbta_network.gpkg), which are used in this tutorial. For their documentations, see [``data/README.md``](data/). Also see specific tutorials on [demographics](demographics_geospatial.ipynb) and [geospatial MBTA network data](mbta_network_geospatial.ipynb)._

## What is GIS?

Geographic information system (GIS) is the generic name for geospatial data formats. GIS data represents spatial features, such as points, lines, and polygons, linked to descriptive or numeric attributes. 

For example, one dataset may contain all rapid transit stations as points, each with attributes such as station name, the line that it's on, and accessibility score. Another data layer might contain polygons of US Census block groups, with information of its population, transit share, etc. These layers can then be combined or processed to understand spatial relationships between them.

There are many different GIS data formats, such as Shapefile, GeoJSON and GeoPackage. While external data sources we link to may provide the data in different formats, pre-processed datasets provided by TransitMatters will mostly use GeoPackage.

Analyses of GIS data typically take two forms:

* Using dedicated GIS software, such as [QGIS](https://qgis.org/) (free, open-source) or [ArcGIS](https://www.arcgis.com/index.html) (commercial).
* Process them with code, such as in Python and R. The Python package ``geopandas`` can read GIS data as ``GeoDataFrames`` objects: they are similar to regular ``pandas.DataFrames``, but with additional geometry columns.

## QGIS Download and Tutorial

For first-timers, [**QGIS**](https://qgis.org/) is a free and open-source software to process and visualize GIS data. Download it [here](https://qgis.org/download).

You can refer to the [official QGIS Training Manual](https://docs.qgis.org/3.40/en/docs/training_manual/index.html) or its specific modules. The first module (Creating and Exploring a Basic Map) may be sufficient for a basic level of understanding.

Alternatively, we recommend playing around with the sample project that we've provided and learning through it, as shown below.

## Our QGIS Sample Project

We provide [``qgis_sample_project.qgz``](qgis_sample_project.qgz) that shows our pre-processed data layers:

![QGIS Sample Project, with views of population density and transit lines.](images/qgis_sample_project.png)

The "Layers" panel (open through "View" - "Panels" - "Layers" if missing) shows all datasets being loaded:

![Layers panel in QGIS](images/qgis_layers.png)

Right click on a layer an choose **"Open Attribute Table"** to see all data associated with each geometric feature in tabular form. Alternatively, right click and choose **"Properties"** to choose visualization settings, via the "Symbology" and "Labels" tabs.

Note that the following features are used in this sample project, and may come in handy:

* Symbol appearance can be _categorized_ based on string values (e.g. coloring routes and stations by the rapid transit line), or _graduated_ based on numerical values (e.g. coloring block groups by population density).
* You can add _filters_ to any layer. Here, they're used to hide certain commuter rail stops.
* Labels can include custom strings and numerical values from the attribute table, with the use of formulas: see the "Places" layer.