# Reading Code

### Introduction

So in the last lesson, we primarily saw mechanisms for exploring our codebase.  We ended by finding a potentially relevant file.  Let's see if we can better understand it to see where it may be querying the database, and executing the validation.

### Diving into our Expectation

Let's look through the `core/expect_column_values_to_not_be_null.py` file, looking for relevant code.  A good mechanism is to look for the function names to get a sense of each chunk of code. 

It looks like one relevant function may be the `validate` method.  To see if it may be relevant, let's place a breakpoint.  

If we don't hit the breakpoint, we know the code is irrelevant, and we need to look elsewhere.  But if we do hit the breakpoint, it **does not** necessarily mean we are close.  It just means, we *might* be close.

<img src="./hit-breakpoint.png" width="70%">

Always place the breakpoint in the *first line* below the function you want to test.  This way you don't risk the breakpoint avoiding a specific line for say an if clause.  We then run our `test.py` file again to see if we hit it.

<img src="./breakpoint-hit.png" width="60%">

### Look Around

Now, we seem to be in the right ballpark, but the next step **is not** to start fixing things.  Instead, it's to look around. 

We can start with the parameters to the function: `configuration`, `metrics`, `runtime_configuration` and execution engine.

Which of these look most relevant?  Well remember, we are probably looking for the column name.

And if you look further down in the function, you can see the following:

```python
result_format = self.get_result_format(
            configuration=configuration, runtime_configuration=runtime_configuration
        )
        mostly = self.get_success_kwargs().get(
            "mostly", self.default_kwarg_values.get("mostly")
        )
        total_count = metrics.get("table.row_count")
```

So in the last line, we see that metrics has an attribute of "table.row_count".  Because `table` is pretty related to `column`, our guess is that we may find some column information on metrics...

Nope.

Instead, when we type metrics into our terminal we see the following:

```bash
metrics
{'column_values.nonnull.unexpected_count': None, 'table.row_count': 0, 'column_values.nonnull.unexpected_values': []}
```

One thing we notice is that it looks like we have already queried the database at this point.  So this function looks like it's called after we have already tried to query our column.

So we may not be in the right function.  There are other functions in the file, so maybe we should first look to find more information about the order of execution of these functions.

Remember, we are trying to determine -- where do we query against the database?

### Move backwards to move forward

It's not very obvious in the current file.  For example, it would be nice if there were something equivalent to a `run` function, or even a `__init__` function so we can determine an order of operations.  But none appear to be there.

So maybe we can learn more about how this class is called by viewing the class it inherits from: `ColumnMapExpectation`.  Let's look for the `column_map_expectation` file.

<img src="./search-file.png" width="60%">

Unfortunately, we can't find it from searching by the file, so let's just click on the class, by clicking `cmd + click` on `ColumnMapExpectation`.

This will take us to the `expectation.py` file, and the `ColumnMapExpectation` class.

<img src="./col-map-expectation.png" width="60%">

This looks relevant.  So relevant that it's worth reading the docstring.

Ok, that provides some context -- and from here, we may even look at the base class of `BatchExpectation`, and it's base class of `Expectation`.  Still it's a little tough to determine where the database is queried.  We're about at the end of the rope of understanding by reading the code.  

So let's see if there's another approach to see find the relevant functions, and see where our database is queried.  We know that it could potentially occur directly in `ExpectColumnValuesToNotBeNull` class or through one of the inherited classes -- but it's hard to see it by reading the code alone.

### Another approach -  Back to documentation 

Ok, so we tried to understand how these methods get called and what they do by looking at the base classes, and reading some of the docstrings in the base classes, but it's still tough to understand exactly how our expectation works.  So back to some documentation.

 If we look back at our original `ExpectColumnValuesToNotBeNull`, class we can see further down the docstring says it is a kind ColumnMapExpectation, and there is documentation on how these kinds of expectations are built.

<img src="./read-docs.png" width="60%">

Let's go there to learn more.  The documentation is located [here](https://docs.greatexpectations.io/docs/guides/expectations/creating_custom_expectations/how_to_create_custom_column_map_expectations).

In the documentation, it provides a link to a custom template file, for creating a column map expectation.  Let's [click on that](https://github.com/great-expectations/great_expectations/blob/develop/examples/expectations/column_map_expectation_template.py).

<img src="./col-vals-match.png">

Ok, so this looks pretty useful. It has placeholders for what we'll need to get this ColumnMapExpection to work.  And our `ExpectColumnValuesToNotBeNull` probably follows this template.  For example, map_metric looks like it should be a string, and the metric name.  Then if we switch to the `ExpectColumnValuesToNotBeNull`, we can see that it has a `map_metric` of `"column_values.nonnull"`.

<img src="./map_metric.png" width="60%">

So our template file is almost like a legend -- telling us what each piece of the file is doing.

Let's keep reading through it, it seems very helpful to our understanding.

At the very bottom of the file, we see the following:

<img src="./main-fn.png">

This looks important, remember that the pattern is generally to place something akin to the `run` function underneath `if __name__ == 'main'`.  The line means that we should only kick off the below line if the file is directly called.

Let's see if our `ExpectColumnValuesToNotBeNull` class has this.

> We search the `ExpectColumnValuesToNotBeNull` class for similar code, but see there are **no results**.

<img src="./not-there.png">

Ok, we looked at the template, now it's time to continue on with reading the documentation.

Further down we see the following:

> <img src="./relevant-doc.png">

Ok this doesn't look so bad.  It says that this is where the actual logic for the expectation belongs.

Let's look at this section.

<img src="./bus-logic.png" width="60%">

So here it says that there should be a `column_condition_partial` in the expectation.  But if we press `cmd + f` and search the file for that code, none is to be found.

Is great expectations lying to us?  Let's search the codebase to see if this function exists elsewhere.

<img src="./condition-partial.png" width="60%">

Ok, so it does look like other parts of the codebase implement the logic here -- just not this particular file.

Time to regroup.

### Next Steps

We still have a couple of techniques we can try next.  For example, we can: 

* Go back to the test.py file, and compare the failing examples to the passing examples -- why do some expectations work and others fail?

* Why is the `column_condition_partial` not implemented?  Is there an alternative or does it inherit this behavior from a base class?

* If you look at this section of the `expect_column_values_not_to_be_null` file, it looks relevant: 

```python
if params["row_condition"] is not None:
            (
                conditional_template_str,
                conditional_params,
            ) = parse_row_condition_string_pandas_engine(params["row_condition"])
            template_str = f"{conditional_template_str}, then {template_str}"
            
            params.update(conditional_params)
```

Maybe we can explore this further.

* Or, we can look at other issues related to the expectation class that are closed -- see if there are clues. 

### The best approach

However, the best approach at this point is to ask slack.  In the documentation under contributing, it tells us how to do so [here](https://docs.greatexpectations.io/docs/contributing/contributing).

So let's join the `#contributing` slack channel, and ask our question.


* This is what I wrote


Hey gang...trying to tackle my first issue here.

I've been looking at [this issue](https://github.com/great-expectations/great_expectations/issues/8797) where apparently upper case columns are breaking with core expectations like [expect_column_values_to_not_be_null](https://github.com/great-expectations/great_expectations/blob/develop/great_expectations/expectations/core/expect_column_values_to_not_be_null.py).  

After investigating, I was looking to see where in GE the query was executed against the database.  The closest I could find get to the query logic is the `column_condition_partial` , which [the docs](https://docs.greatexpectations.io/docs/guides/expectations/creating_custom_expectations/how_to_create_custom_column_map_expectations/) say should be implemented with each `ColumnMapExpectation` .  It doesn't look like it's in the [expect_column_values_to_not_be_null](https://github.com/great-expectations/great_expectations/blob/develop/great_expectations/expectations/core/expect_column_values_to_not_be_null.py) expectation.  

Any ideas as to where the logic of the database query is executed, or other pointers as to where I should be looking?

Thanks!

Jeff

Notice that in the message above, I explained the general problem I was trying to solve, my best hypothesis, and provided links to show where I was getting stuck.  In other words, it displayed the work done so far.  I also left room at the end for general pointers, as I still may be moving down the wrong path.

### Postscript

Let's look at other core expectations to see how they are implemented.  That is, where is the actual business logic.  For example, if we look at [this expectation](https://github.com/great-expectations/great_expectations/blob/develop/contrib/experimental/great_expectations_experimental/expectations/expect_column_to_have_no_days_missing.py), the core piece is in the `@metric` decorated functions.  Perhaps there is something similar in our or other breaking expectations, or the classes they inherit from.