# Geometry Validation Tutorial

> A basic introduction to using geometry validation

## Basic Usage
Loading a geojson with invalid geometries

In [1]:
import geopandas as gpd

gdf = gpd.read_file("../data/broken.geojson")
gdf

Unnamed: 0,id,geometry
0,valid,"POLYGON ((0.00000 0.00000, 1.00000 0.00000, 0...."
1,out_of_crs_bounds,"POLYGON ((200.00000 0.00000, 1.00000 0.00000, ..."
2,misoriented,"POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1...."
3,self_intersecting,"POLYGON ((0.00000 0.00000, 0.00000 2.00000, 1...."


We then run Geometry Validation. By default, these append a new column if the validation failsm, applies a fix if possible, and raises a warning if no fix is available. 

In [2]:
from geowrangler.validation import GeometryValidation

validated_gdf = GeometryValidation(gdf).validate_all()
validated_gdf



Unnamed: 0,id,geometry,is_not_null,is_not_self_intersecting,is_oriented_properly,is_within_crs_bounds
0,valid,"POLYGON ((0.00000 0.00000, 1.00000 0.00000, 0....",True,True,True,True
1,out_of_crs_bounds,"POLYGON ((200.00000 0.00000, 0.00000 1.00000, ...",True,True,True,False
2,misoriented,"POLYGON ((0.00000 0.00000, 1.00000 0.00000, 1....",True,True,True,True
3,self_intersecting,"MULTIPOLYGON (((1.00000 1.00000, 0.00000 2.000...",True,True,True,True


Running the validation again shows that validation applies some fixes

In [3]:
GeometryValidation(validated_gdf[["id", "geometry"]]).validate_all()



Unnamed: 0,id,geometry,is_not_null,is_not_self_intersecting,is_oriented_properly,is_within_crs_bounds
0,valid,"POLYGON ((0.00000 0.00000, 1.00000 0.00000, 0....",True,True,True,True
1,out_of_crs_bounds,"POLYGON ((200.00000 0.00000, 0.00000 1.00000, ...",True,True,True,False
2,misoriented,"POLYGON ((0.00000 0.00000, 1.00000 0.00000, 1....",True,True,True,True
3,self_intersecting,"MULTIPOLYGON (((1.00000 1.00000, 0.00000 2.000...",True,True,True,True


## Passing Validators
You can pass a list of Validators to selective run validators, the default uses the following
`["null", "self_intersecting", "orientation", "crs_bounds",]` 

In [4]:
from geowrangler.validation import NullValidator, SelfIntersectingValidator

validated_gdf = GeometryValidation(
    gdf, validators=[NullValidator, SelfIntersectingValidator]
).validate_all()
validated_gdf

Unnamed: 0,id,geometry,is_not_null,is_not_self_intersecting
0,valid,"POLYGON ((0.00000 0.00000, 1.00000 0.00000, 0....",True,True
1,out_of_crs_bounds,"POLYGON ((200.00000 0.00000, 1.00000 0.00000, ...",True,True
2,misoriented,"POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1....",True,True
3,self_intersecting,"MULTIPOLYGON (((1.00000 1.00000, 0.00000 0.000...",True,False


You can also use a single validator at a time

In [5]:
SelfIntersectingValidator().validate(gdf)

Unnamed: 0,id,geometry,is_not_self_intersecting
0,valid,"POLYGON ((0.00000 0.00000, 1.00000 0.00000, 0....",True
1,out_of_crs_bounds,"POLYGON ((200.00000 0.00000, 1.00000 0.00000, ...",True
2,misoriented,"POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1....",True
3,self_intersecting,"MULTIPOLYGON (((1.00000 1.00000, 0.00000 0.000...",False


## Building your own validator
Let's build a validator that check if the geometry is a point and adds as a buffer of 1 if it is a point

In [6]:
from shapely.geometry.point import Point

from geowrangler.validation import BaseValidator


class PointValidator(BaseValidator):
    validator_column_name = "is_not_point"

    def check(self, geometry):
        return geometry.geom_type != "Point"

    def fix(self, geometry):
        return geometry.buffer(1)


gdf = gpd.GeoDataFrame(geometry=[Point(0, 0)])
PointValidator().validate(gdf)

Unnamed: 0,geometry,is_not_point
0,"POLYGON ((1.00000 0.00000, 0.99518 -0.09802, 0...",False


There are several cases where no fix is available, we can add a validator without a fix but warns the users. 

In [8]:
from shapely.geometry.point import Point

from geowrangler.validation import BaseValidator


class PointWarningValidator(BaseValidator):
    validator_column_name = "is_not_point"
    fix_available = False
    warning_message = "Found geometries that are points"

    def check(self, geometry):
        return geometry.geom_type != "Point"


gdf = gpd.GeoDataFrame(geometry=[Point(0, 0)])
PointWarningValidator().validate(gdf)



Unnamed: 0,geometry,is_not_point
0,POINT (0.00000 0.00000),False
