# Bug 1712934: adapter_driver_version doesn't work with exists/does-not-exist filter

This is a brief investigation using JupyterLab into why `adapter_driver_version` doesn't work with the exists/does-not-exist filter.

https://bugzilla.mozilla.org/show_bug.cgi?id=1712934

In [1]:
import os
import pandas as pd
import requests

HOST = "https://crash-stats.mozilla.org"

We want to know whether the exists/does-not-exist filters work.

Jeff had a Super Search query that looked at 11/13/2020 to 5/13/2021. That's a lot of time. Let's look at a smaller period first.

Let's do a search for crash reports for Firefox between 2021-04-13 and 2021-05-13 where the `process_type` is "gpu".

In [2]:
def fetch_supersearch(params):
    resp = requests.get(HOST + "/api/SuperSearch/", params=params)
    return resp.json()

results = {}

params = {
    "date": [">=2021-04-12", "<2021-05-10"],
    "product": "Firefox",
    "process_type": "gpu",
}

results["total"] = fetch_supersearch(params)["total"]
results["does exist"] = fetch_supersearch(dict(params, adapter_driver_version="!__null__"))["total"]
results["does not exist"] = fetch_supersearch(dict(params, adapter_driver_version="__null__"))["total"]

pd.DataFrame([results], columns=["total", "does not exist", "does exist"])

Unnamed: 0,total,does not exist,does exist
0,8566,1195,7371


It looks like the filter is doing something--some have the value and some don't.

What if we look at the breakdown week-to-week for 2021.

In [9]:
mondays = [
    "2021-01-04",
    "2021-01-11",
    "2021-01-18",
    "2021-01-25",
    "2021-02-01",
    "2021-02-08",
    "2021-02-15",
    "2021-02-22",
    "2021-03-01",
    "2021-03-08",
    "2021-03-15",
    "2021-03-22",
    "2021-03-29",
    "2021-04-05",
    "2021-04-12",
    "2021-04-19",
    "2021-04-26",
    "2021-05-03",
    "2021-05-10",
    "2021-05-17",
    "2021-05-24",
]

results = []

default_params = {
    "product": "Firefox",
    "process_type": "gpu",
}

for i, monday in enumerate(mondays[:-1]):
    next_monday = mondays[i+1]

    params = dict(default_params, date=[f">={monday}", f"<{next_monday}"])
    results.append([
        monday,
        next_monday,
        fetch_supersearch(params)["total"],
        fetch_supersearch(dict(params, adapter_driver_version="!__null__"))["total"],
        fetch_supersearch(dict(params, adapter_driver_version="__null__"))["total"],
    ])

pd.DataFrame(results, columns=[">= start", "< end", "total", "does exist", "does not exist"])
    

Unnamed: 0,>= start,< end,total,does exist,does not exist
0,2021-01-04,2021-01-11,1161,1161,0
1,2021-01-11,2021-01-18,1644,1644,0
2,2021-01-18,2021-01-25,1172,1172,0
3,2021-01-25,2021-02-01,1246,0,1246
4,2021-02-01,2021-02-08,1633,0,1633
5,2021-02-08,2021-02-15,1481,1481,0
6,2021-02-15,2021-02-22,1341,0,1341
7,2021-02-22,2021-03-01,1632,1632,0
8,2021-03-01,2021-03-08,1729,0,1729
9,2021-03-08,2021-03-15,1830,1830,0


Woah. What is going on?

This looks a lot like a bug I was looking at where `phc_kind` wasn't working with exists/does-not-exist:

https://bugzilla.mozilla.org/show_bug.cgi?id=1706171

For `phc_kind`, I had determined that after around January 14th or so, I couldn't find any crash reports using "`phc_kind` exists" filter, but there were some crash reports that had a `PHCKind` annotation. It turned out to be a timing problem. At the moment when an index was created, if it was being created when saving a crash report that had a `phc_kind` field in the document, then the mapping would have a `phc_kind` field and then documents saved for that week would have a `phc_kind` field. Since `phc_kind` is rare, this never happened, so then the field would get dropped from all documents.

The `AdapterDriverVersion` annotation is much more common, so if a document was getting saved that had a `adapter_driver_version` field and kicked off creation of a new index, then that index would have the field and crash reports would have the data saved. Otherwise, they wouldn't. That's why we see the wild flip-flopping between the weeks up to around 5/3/2021. For that week, we see most don't have the field but one does.

There was a Socorro prod push 2021.04.28 (we tag in YYYY.MM.DD format):

https://github.com/mozilla-services/socorro/releases/tag/2021.04.28

That includes the changes to fix `phc_kind`.

# Summary

The `adapter_driver_version` field had the same problems the `phc_kind` field had which started in January 2021 and the fix was pushed to production on April 28th, 2021.

When searching with the `adapter_driver_version` field, we should ignore crash reports before May 3rd, 2021.