# Archived Holdings - Access Status

We can use the Tracking Database check the WARC index status.

The CDX index status can be broken down by day using the following facet query (taking advantage of [Solr's JSON Facet API](https://lucene.apache.org/solr/guide/8_4/json-facet-api.html)).

In [25]:
import json
import requests
import pandas as pd

headers = {'content-type': "application/json" }

json_facet = {
    # Primary facet is by date - here we break down the last month(s) into days
    'facet': {
        'dates' : { 
            'type' : 'range', 
            'field' : 'timestamp_dt', 
            'start' : "NOW/MONTH-1MONTH",
            'end' : "NOW/MONTH+32DAY", 
#            'start' : "NOW/MONTH-10YEAR",
#            'end' : "NOW/MONTH+1MONTH", 
            'gap' : "+1DAY", 
#            'gap' : "+1MONTH", 
            # For each day, we facet:
            'facet': { 
                'stream': { 
                    'type': 'terms', 
                    "field": "stream_s", 
                    'missing': True,
                    'facet': { 
                        'cdx_status': { 
                            'type': 'terms', 
                            "field": "cdx_index_ss", 
                            'missing': True,
                            'facet' : {
                                'bytes': 'sum(file_size_l)'
                            }
                        }
                    }
                }
            }
        } 
    }
}


params = {
  'q': 'kind_s:"warcs"',
  'rows': 0
}

r = requests.post("http://solr8.api.wa.bl.uk/solr/tracking/select", params=params, data=json.dumps(json_facet), headers=headers)

if r.status_code != 200:
    print(r.text)

from solr.solr_facet_helper import flatten_solr_buckets

df = pd.DataFrame(flatten_solr_buckets(r.json()['facets']))
# Filter empty rows:
df=df[df['count'] != 0]

# Add compound column:
df['status'] = df.apply(lambda row: "%s, %s" % (row.stream, row.cdx_status), axis=1)

df

Unnamed: 0,dates,stream,cdx_status,count,bytes,status
0,2020-12-01T00:00:00Z,frequent,data-heritrix,180,177771900000.0,"frequent, data-heritrix"
3,2020-12-02T00:00:00Z,frequent,data-heritrix,176,171339100000.0,"frequent, data-heritrix"
6,2020-12-03T00:00:00Z,frequent,data-heritrix,171,168924800000.0,"frequent, data-heritrix"
9,2020-12-04T00:00:00Z,frequent,data-heritrix,177,174690000000.0,"frequent, data-heritrix"
12,2020-12-05T00:00:00Z,frequent,data-heritrix,189,184929200000.0,"frequent, data-heritrix"
15,2020-12-06T00:00:00Z,frequent,data-heritrix,170,166298100000.0,"frequent, data-heritrix"
18,2020-12-07T00:00:00Z,frequent,data-heritrix,161,158326200000.0,"frequent, data-heritrix"
21,2020-12-08T00:00:00Z,frequent,data-heritrix,176,176110600000.0,"frequent, data-heritrix"
24,2020-12-09T00:00:00Z,frequent,data-heritrix,185,182397500000.0,"frequent, data-heritrix"
27,2020-12-10T00:00:00Z,frequent,data-heritrix,218,217299700000.0,"frequent, data-heritrix"


Which can be used to build a simple visualisation:

In [26]:
import altair as alt

alt.Chart(df).mark_bar(size=6).encode(
    x='dates:T',
    y='count',
    color='status',
    tooltip=[alt.Tooltip('dates:T', format='%A, %e %B %Y'), 'stream', 'cdx_status', 'count', 'bytes']
).properties(width=600).interactive()