# Open Source with Great Expectations

### Introduction

In this lesson, we'll get set up with the great expectations codebase.  Great expectations is a popuar tool for writing tests to ensure high quality data.  It has connectors for writing code against a variety of database libraries, including with Pyspark and Sqlalchemy. 

### Installation

* To install we can move through the steps in the [contributing code readme](https://github.com/great-expectations/great_expectations/blob/develop/CONTRIBUTING_CODE.md)

These are the relevant steps:

1. Fork and clone the GE library
2. Create a virtual environment
3. Install the great expectations library
* We can do so with the following
    * `pip install -c constraints-dev.txt -e ".[test]"`
    * For this example, we will not need additional dependencies.
    
We can see that both steps 2 and 3 here are optional, so we can skip them and we should be complete.

* Learning a little

Before moving on, just scroll down and read the section of Unit Testing expectations, or you can just [click here](https://github.com/great-expectations/great_expectations/blob/develop/CONTRIBUTING_CODE.md#unit-testing-expectations).

If you read through the documentation, you'll see that the tests follow our same steps of setup and then the test:
```python
{
    "expectation_type" : "expect_column_max_to_be_between",
    "datasets" : [{
        "data" : {...},
        "schemas" : {...},
        "tests" : [...]
    }]
}
```

This time the data and schema are the setup -- creating a sample table of data.  And the tests are check for input and a corresponding output.

Let's look at the example of the test provided in the documentation to make sure we understand it.
```python
"tests" : [{
    "title": "Basic negative test case",
    "exact_match_out" : false,
    "in": {
        "column": "w",
        "result_format": "BASIC",
        "min_value": null,
        "max_value": 4
    },
    "out": {
        "success": false,
        "observed_value": 5
    },
    "suppress_test_for": ["sqlite"]
},
...
]
```

The values in `in`, are the keyword arguments provided to the `expect_column_max_to_be_between` expectation -- so here it's expected the max is between null and 4.  The function should return `{success: false, observed_value: 5}`.  This is a good result from our `expect_column_max_to_be_between` function.  It has detected that the max value is not between null and 4, it's 5.

### Summary

In this lesson, we saw how we can install great expceations -- our testing library.  We also saw how we can write tests for the expectations themselves, and moved through the documentation to understand this.