# ZODB Size Analysis

Maybe your Zope Object Data Base has an unexpected size. This notebook helps to find the big data objects in your ZODB. The code works with Zope2-generated ZODBs under Python2. To use it, please create a virtual environment first and place this notebook into a folder like described here:

```
  506  virtualenv --python=python2 /home/jupyter
  507  cd /home/jupyter/bin/
  508  ./pip install pandas
  509  ./pip install ZODB
  510  ./pip install contextlib2
  512  ./pip install jupyter
  513  mkdir /home/jupyter/notebook
  514  cd /home/jupyter/notebooks/
  515  /home/jupyter/bin/jupyter notebook
```

In [None]:
from ZODB.DB import DB
import ZODB.FileStorage
import ZODB.POSException
from contextlib import contextmanager
import sys
import pandas as pd

In [2]:
def get_classname(obj):
    return obj.__class__.__module__ + '.' + obj.__class__.__name__

def get_title(o):
    try:
        return getattr(o, 'title', '')
    except:
        return ''

@contextmanager
def open_zodb(zodb_path):
    storage = ZODB.FileStorage.FileStorage(zodb_path)
    db = DB(storage)
    connection = db.open()
    try:
        yield storage, db, connection
    finally:
        connection.close()
        db.close()
        storage.close()    
    

def iter_objects(storage, db, connection):
    next_ = None
    while True:
        oid, tid, data, next_ = storage.record_iternext(next_)
        if next_ is None:
            break
        yield {
            'oid': oid,
            'tid': tid,
            'data': data,
            'obj': connection.get(oid)
        }

In [3]:
with open_zodb('Data.fs') as db:
    items = [
        [get_classname(o['obj']), len(o['data']), get_title(o['obj'])]
        for o in iter_objects(*db)
    ]
df = pd.DataFrame(items, columns=['classname', 'size', 'title'])

In [4]:
df.groupby(df['classname'])['size'].agg(['sum', 'count']).sort_values('sum', ascending=False)

Unnamed: 0_level_0,sum,count
classname,Unnamed: 1_level_1,Unnamed: 2_level_1
Products.zms.zmscustom.ZMSCustom,903928,166
Products.PageTemplates.ZopePageTemplate.ZopePageTemplate,354299,77
BTrees.IOBTree.IOBTree,316216,36
Products.zms._zmsattributecontainer.ZMSAttributeContainer,285814,177
OFS.Image.File,267504,36
Products.PythonScripts.PythonScript.PythonScript,240665,74
OFS.Image.Pdata,208820,2
BTrees.IIBTree.IIBTree,67298,316
BTrees.IOBTree.IOBucket,56592,46
Products.zms.zms.ZMS,50705,2


In [10]:
df[df['classname']=="Products.zms.zmscustom.ZMSCustom"].sort_values('size').tail(30)

Unnamed: 0,classname,size,title
246,Products.zms.zmscustom.ZMSCustom,5483,
45,Products.zms.zmscustom.ZMSCustom,5483,
88,Products.zms.zmscustom.ZMSCustom,5483,
888,Products.zms.zmscustom.ZMSCustom,5497,
902,Products.zms.zmscustom.ZMSCustom,5498,
911,Products.zms.zmscustom.ZMSCustom,5498,
795,Products.zms.zmscustom.ZMSCustom,5498,
52,Products.zms.zmscustom.ZMSCustom,5501,
110,Products.zms.zmscustom.ZMSCustom,5501,
164,Products.zms.zmscustom.ZMSCustom,5505,


In [14]:
df['size'].sum()

2937610

In [15]:
df['size'].count()

1477

In [16]:
df['size'].median()

255.0