# <span style="color:MediumSeaGreen">INTRODUCTION TO GEOPANDAS</span>

This quick tutorial introduces the key concepts and basic features of GeoPandas to help you get started with your projects.

## <span style="color:MediumSeaGreen">Concepts</span>

GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data. If you are not familiar with pandas, we recommend taking a quick look at its Getting started documentation before proceeding.  

The core data structure in GeoPandas is the <span style="color:MediumSeaGreen; font-weight: bold;">geopandas.GeoDataFrame</span>, a subclass of pandas.DataFrame, that can store geometry columns and perform spatial operations. The <span style="color:MediumSeaGreen; font-weight: bold;">geopandas.GeoSeries</span>, a subclass of pandas.Series, handles the geometries. Therefore, your GeoDataFrame is a combination of pandas.Series, with traditional data (numerical, boolean, text etc.), and geopandas.GeoSeries, with geometries (points, polygons etc.). You can have as many columns with geometries as you wish; there’s no limit typical for desktop GIS software.

<img src="https://geopandas.org/en/stable/_images/dataframe.svg" alt="DataFrame" width="700">

Each GeoSeries can contain any geometry type (you can even mix them within a single array) and has a GeoSeries.crs attribute, which stores information about the projection (CRS stands for Coordinate Reference System). Therefore, each GeoSeries in a GeoDataFrame can be in a different projection, allowing you to have, for example, multiple versions (different projections) of the same geometry.  

<span style="color:MediumSeaGreen; font-weight: bold;">Only one GeoSeries in a GeoDataFrame is considered the active geometry, which means that all geometric operations applied to a GeoDataFrame operate on this active column.</span>

## <span style="color:MediumSeaGreen">Reading and writing files</span>

### Reading files

Assuming you have a file containing both data and geometry (e.g. GeoPackage, GeoJSON, Shapefile), you can read it using geopandas.read_file(), which automatically detects the filetype and creates a GeoDataFrame. This tutorial uses the "nybb" dataset, a map of New York boroughs, which is available through the geodatasets package. Therefore, we use geodatasets.get_path() to download the dataset and retrieve the path to the local copy.

In [4]:
import geopandas
from geodatasets import get_path

path_to_data = get_path('nybb')
gdf = geopandas.read_file(path_to_data)

gdf

Unnamed: 0,BoroCode,BoroName,Shape_Leng,Shape_Area,geometry
0,5,Staten Island,330470.010332,1623820000.0,"MULTIPOLYGON (((970217.022 145643.332, 970227...."
1,4,Queens,896344.047763,3045213000.0,"MULTIPOLYGON (((1029606.077 156073.814, 102957..."
2,3,Brooklyn,741080.523166,1937479000.0,"MULTIPOLYGON (((1021176.479 151374.797, 102100..."
3,1,Manhattan,359299.096471,636471500.0,"MULTIPOLYGON (((981219.056 188655.316, 980940...."
4,2,Bronx,464392.991824,1186925000.0,"MULTIPOLYGON (((1012821.806 229228.265, 101278..."


### Writing Files

To write a GeoDataFrame back to file use GeoDataFrame.to_file(). The default file format is Shapefile, but you can specify your own with the driver keyword.

In [6]:
gdf.to_file("my_file.geojson", driver="GeoJSON")

## <span style="color:MediumSeaGreen">Simple accessors and methods</span>

Now we have our GeoDataFrame and can start working with its geometry.  

Since there was only one geometry column in the New York Boroughs dataset, this column automatically becomes the active geometry and spatial methods used on the GeoDataFrame will be applied to the "geometry" column.

### <span style="color:MediumSeaGreen">Measuring area</span>

To measure the area of each polygon (or MultiPolygon in this specific case), access the GeoDataFrame.area attribute, which returns a pandas.Series. Note that GeoDataFrame.area is just GeoSeries.area applied to the active geometry column.  

But first, to make the results easier to read, set the names of the boroughs as the index:

In [7]:
gdf = gdf.set_index('BoroName')

In [8]:
gdf['area'] = gdf.area
gdf['area']

BoroName
Staten Island    1.623822e+09
Queens           3.045214e+09
Brooklyn         1.937478e+09
Manhattan        6.364712e+08
Bronx            1.186926e+09
Name: area, dtype: float64