# Debugging Great Expectations

### Introduction

Now that we got setup with great expectations, let's see if we can solve our first bug.

### Our first bug

Choosing an issue.  In general, it is easier to solve a bug than a feature, as it is typically less work.  We should also look for a bug that seems relatively easy -- as the real challenge will be getting familiar with a new codebase.

Let's work with the upper case column names [issue here](https://github.com/great-expectations/great_expectations/issues/8619).

Read through the issue.  You can see that they provided a failing test case in the issue.  Copy and paste that code inside of the outer great_expectations codebase and place it in a file called `test.py`. 

Then run the test.py file, and confirm that there are failing tests.

<img src="./failed.png">

### Get to green then red

Now the issue states that the problem is that great expecations (or the underlying database) is being case sensitive with the column namse.  Let's confirm that it really does have to do with the upper casing.

So remove the `.upper()` in the expection.

```python
expectations = {
    "expect_column_values_to_not_be_null": {"column": column_name},
    "expect_column_values_to_be_null": {"column": column_name},
    "expect_column_max_to_be_between": {
        "column": column_name.upper(),
        "min_value": 1,
        "max_value": 99,
    },
```

Ok, now if we run the file again, we see that we no longer have failing tests.

<img src="./none-failing.png">

So this is a good strategy.  We just want to make sure that there really is a cause and effect between the issue. 

### Finding the Relevant Fiels

1. Learn Through Exploration

Our first thought is to search around in the codebase.  It looks like the `get_attr` method at the bottom of the codebase is essentially some meta-programming that is equivalent to calling `validator.expect_column_values_to_not_be_null`.  So to look for these functions, we can cmd + click on something like line 68, `validator.expect_column_values_to_not_be_null`, which will take us to the following.

<img src="./validator-validate-exp.png" width="60%">

Here, this does not do much expect call `get_expectation_impl`, and command + clicking on that, shows us that it selects from a dictionary of expectations.  This is not looking so fruitful -- we can potentially come back to this.  

So exploring the codebase wasn't so easy as a first step.  Let's move onto seeing if the documentation provides any more help.

2. Learn through the documentation

The issue says that there are multiple core expectations that fail under this scenario.  Maybe we can learn more about how these expectations work, either by getting a description of expectations in general, or looking at something specific to these expectations and how they are built/relevant files. 

We can see the documentation [here](https://docs.greatexpectations.io/docs/tutorials/quickstart/).

And in doing so, there are a couple of components that look relevant such as: 
* Looking at the quickstart
* Looking at custom implementation of a validator
> Maybe this will tell us how validators work underneath
* Searching the expecation gallery

It turns out there is also an [expectation gallery](https://greatexpectations.io/expectations/).  Let's take a look to see if this can help us.  We can place in our `expect_column_values_to_not_be_null` directly into the search bar, and will get taken to this [link](https://greatexpectations.io/expectations/expect_column_values_to_not_be_null?filterType=Backend%20support&gotoPage=1&showFilters=true&viewType=Summary).

Ok, so this looks interesting.  Perhaps the most interesting component is that it says this is of type `Core ColumnMapExpectation`.

<img src="./core-col-map-expect.png">

### Back to the codebase

So now we believe we may have a clue to where one of our failing expectations is implemented.  Let's look for it inside of the codebase.  We can type `cmd + p` in VScode and look for a matching file (if that doesn't work we can use the finder to search the codebase for ColumnMapExpectation).

<img src="./col-map-exp.png">

> We do this by pressing `cmd + p`.

then looking through the list of files, the files in the `great_expectations/expectations` folder look most relevant.  

> The others are examples or tests.

So we open up those files, and potentially could go through them one by one.  But neither of them look exactly right -- so let's take another look at the documentation.

<img src="./core-col-map-expect.png">

It says that it should be a `Core ColumnMapExpectation`.  This seems like a pretty well defined class, which we have not found.

<img src="./core-expect.png">

But wait, looking at the codebase, there's a folder called `core` staring right at us.  Let's look inside.

<img src="./core-tests.png" width="60%">

Oh this looks interesting.  So let's see if there is a file with the name of the test we are looking for: `expect_column_values_to_not_be_null`.

There is.

### Are we done?

Now that we have found the correct file, one wonders if Chatgpt can take it from here.

I asked ChatGPT to help and copied and pasting the file into GPT-4:

> <img src="./ask-chatgpt.png">

It recommended code changes, but they only broke the code further. This was apparent, as we no longer able to run test.py and get to our failing test, but using Chatgpt's suggestion, it broke ebfore that.

Looks like we need to move through this alone.  Score one for the humans.

The next step is to search look through this file to understand look for the relevant code.  What's relevant code -- well I would think anything that actually queries the database.  We'll let you take a look how would you handle it from here?

### Summary

### Diving into

We can now scan through the `expect_column_values_to_not_be_null.py` file, looking for relevant code.  A good mechanism is to look for the function names and 

* It looks like the code to change could be the validate method.  To see if it may be relevant, place a breakpoint.  

> If we don't hit the breakpoint, we know the code is irrelevant, and we need to look elsewhere.

<img src="./hit-breakpoint.png" width="70%">

* Place in a breakpoint in the first line below the function, and then run our `test.py` file again to see if we hit it

<img src="./breakpoint-hit.png" width="60%">

### Look Around

Now, we seem to be in the right ballpark, but the next step **is not** to start fixing things.  Instead, it's to look around. 

We can start with the parameters to the function: `configuration`, `metrics`, `runtime_configuration` and execution engine.

Which of these look most relevant?  Well remember, we are probably looking for the column name.

And if you look further down in the function, you can see the following:

```python
result_format = self.get_result_format(
            configuration=configuration, runtime_configuration=runtime_configuration
        )
        mostly = self.get_success_kwargs().get(
            "mostly", self.default_kwarg_values.get("mostly")
        )
        total_count = metrics.get("table.row_count")
```

So in the last line, we see that metrics has an attribute of "table.row_count".  Because `table` is pretty related to `column`, our guess is that we may find some column information on metrics.

* Nope

```bash
metrics
{'column_values.nonnull.unexpected_count': None, 'table.row_count': 0, 'column_values.nonnull.unexpected_values': []}
```

One thing we notice is that it looks like we have already queried the database at this point.  So this function looks like it's called after we have already tried to query our column.

The bigger point is that we may not be in the right function.  There are other functions in the file, so maybe we should first look to find more information about the order of execution of these functions.

Specifically, we are trying to determine -- where do we query against the database?

### Move backwards to move forward

It's not very obvious here, so maybe we can learn more about how this class is called, by viewing the class it inherits from.  Let's look for the `column_map_expectation` file.

<img src="./search-file.png" width="60%">

Ok, we can't find it from searching by the file, so let's just click on the class in line 41, by clicking `cmd + click` on `ColumnMapExpectation`.

This will take us to the `expectation.py` file, and the `ColumnMapExpectation` class.

<img src="./col-map-expectation.png" width="60%">

This looks relevant.  So relevant that it's worth reading the docstring.

Ok, that provides some context -- and from here, we may even look at the base classes of BatchExpectation, and it's base class of `Expectation`.  Still it's a little tough to determine the order of operations.

Let's see if there's another approach to see find the relevant functions, and see where our database is queried.  We are learning that this could occur directly in the `ExpectColumnValuesToNotBeNull` class or through one of the inherited classes.

### Another approach -  Read more documentation 

Ok, so we tried to understand how these methods get called and what they do by looking at the base classes, and reading some of the docstrings in the base classes, but it's still tough to understand exactly how our expectation works.  

Of most important to us is, where does it actually query the database?  If we begin to find that, then we can alter the query to look for both upper case and lower case columns.

* Read more documentation

What if we try to read some additional documentation. If we look back at our original `ExpectColumnValuesToNotBeNull`, we can see further down that it is a kind ColumnMapExpectation, and there is documentation on how these kinds of expectations are built.



<img src="./read-docs.png" width="60%">

Let's go there to learn more.  The documentation is located [here](https://docs.greatexpectations.io/docs/guides/expectations/creating_custom_expectations/how_to_create_custom_column_map_expectations).

In the documentation, it provides a link to a custom template file, for creating a column map expectation.  Let's [click on that](https://github.com/great-expectations/great_expectations/blob/develop/examples/expectations/column_map_expectation_template.py).

<img src="./col-vals-match.png">

Ok, so this looks pretty useful. It has placeholders for what we'll need to get this ColumnMapExpection to work.  And our `ExpectColumnValuesToNotBeNull` probably follows this template.  For example, map_metric looks like it should be a string, and the metric name.  Then if we switch to the `ExpectColumnValuesToNotBeNull`, we can see that it has a `map_metric` of `"column_values.nonnull"`.

<img src="./map_metric.png" width="60%">

So our template file is almost like a legend -- telling us what each piece of the file is doing.

Let's keep reading through it, it seems very helpful to our understanding.

At the very bottom of the file, we see the following:

<img src="./main-fn.png">

This looks important, remember that the pattern is generally to place something akin to the `run` function underneath `if __name__ == 'main'`.  The line means that we should only kick off the below line if the file is directly called.

Let's see if our `ExpectColumnValuesToNotBeNull` class has this.

> We search the `ExpectColumnValuesToNotBeNull` class for similar code, but see there are No results.

<img src="./not-there.png">

Ok, we got to the end of the template, so time to continue on with reading the documentation.

Further down we see the following:

> <img src="./relevant-doc.png">

Ok this doesn't look so bad.  It says that this is the actual logic for the documentation.

### Next Steps

* Compare the failing examples to the passing examples -- what are the differences?

* Why is the column_condition_partial not implemented?

* line 187 looks promising
* Look at other issues related to the expectation class that are closed -- see if there are clues. 