<div align="center">
  <img src="http://vlpavlov.org/Pythagoras-Logo3.svg"><br>
</div>

# Using Persistent Dictionaries: FileDirDict
# Overview of Persistent Dicts

Pythagoras offers a few different Persistent Dictionary classes, such as:  
* **FileDirDict**: local file-based persistence
* **S3_Dict**: cloud-based persistence, with AWS S3 as a backend

All these classes allow to persistantly store key-values pairs. They offer functionality, similar to traditional Python dictionaries, which is described here: 
https://docs.python.org/3/tutorial/datastructures.html#dictionaries The key difference:
* Traditional dictionaries exist only withing one session of program execution. They store data in RAM: once the program finishes, the data becomes "forgotten". If there are multiple copies of the same program running concurrently on a computer, each of them will get their own, completly independant instance of the dictionary.  
* Persistant dictionaries store data on a local disk or in the cloud. The next time the program executes, it can get accces to the previous state of the dictionary. Several copies of the same program can run concurrently and access exactly the same dictionary.

There are a few restrictions:

1. Pythagoras Persistent Dictionaries only allow tuples of strings as keys. You can also use a single regular string as a key, which will be interpeted as a tuple containing just one element. 
2. Each of the key strings can only contain characters, allowed withing names of file-system directories. Most of the special symbols are not allowed.
3. Unlike native Python dictionaries, Pythagoras Persistent Dictionaries do not preserve insertion order.

## Local File-Based Persistence
In this tutorial we will specifically talk about **FileDirDict**. This class stores key-value pairs on a local disk. In a key-value pair, the key is used to determine the file's location (its path and name). The value is stored in the file either in a binary format (pickle) or in a human-readable text format (json). If only one string is given as a key, it will be used to create a file-name. If a sequence of strings is given as a key, the last string will be used to create a file-name, all the preceiding strings will define a path to the file (a sequence of nested folders).

The tutorial will walk you step-by-step through the most important details of FileDirDict usage.

### Initial setup

First, let's install Pythagoras and import FileDirDict class:

In [1]:
!pip install pythagoras --quiet

In [2]:
from pythagoras import FileDirDict

Now, let's create an empty directory to experiment with FileDirDict class:

In [3]:
!mkdir tutorial
!ls -a tutorial

[34m.[m[m  [34m..[m[m


In order to instantiate a new FileDirDict object, we have to let it know the folder it will use to store the data. We can pass a name of an existing folder. Or we can pass a name of a new folder - in this case Pythagoras will create it. 

In [4]:
population = FileDirDict("tutorial/DDD")

In [5]:
!ls -a tutorial

[34m.[m[m   [34m..[m[m  [34mDDD[m[m


As you can see, Pythogaras creaded a new folder DDD for us. Let's look inside this folder:

In [6]:
!ls -a tutorial/DDD

[34m.[m[m  [34m..[m[m


By default, FileDirDict will store all objects in binary format (pickle). Let's put the first key-value pairs into our dictionalry:

In [7]:
population["USA"] = 329_000_000
population["Canada"] = 38_000_000
!ls -a tutorial/DDD

[34m.[m[m                   [34m..[m[m                  Canada_IROTG624.pkl USA_65OZDTOT.pkl


Pythogaras stored two new key-value pairs in two new files. The names of the files start with original 
key strings followed by hash suffixes.

In [8]:
print(population["USA"])

329000000


Now, let's delete the dictionary:

In [9]:
del population

Of course, elements of the deleted dictionary become unavailable from within Python code:

In [10]:
print(population["USA"])

NameError: name 'population' is not defined

The dictionary object was removed from RAM, but the data remains on the disk:

In [11]:
!ls -a tutorial/DDD

[34m.[m[m                   [34m..[m[m                  Canada_IROTG624.pkl USA_65OZDTOT.pkl


Now, let's create another persistant dictionary, based on the same directory: 

In [12]:
number_of_people = FileDirDict("tutorial/DDD")

Now we can access all the key-value pairs that are stored in this directory:

In [13]:
print(number_of_people["USA"])

329000000


We can use a sequence of strings as a key:

In [14]:
number_of_people[("USA","California")] = 1_000_000
number_of_people[("USA","Texas")] = 29_000_000
number_of_people[("USA","Philippines")] = "Not a US territory"

In [15]:
!ls -a tutorial/DDD

[34m.[m[m                   Canada_IROTG624.pkl USA_65OZDTOT.pkl
[34m..[m[m                  [34mUSA_65OZDTOT[m[m


Pythogaras has created a sub-directory for us. Let's look inside:

In [16]:
!ls -a tutorial/DDD/USA_65OZDTOT

[34m.[m[m                        California_GVTXTKNB.pkl  Texas_Q2DRYQLK.pkl
[34m..[m[m                       Philippines_O7NLF6A2.pkl


The sub-directory contains 3 new objects which we just added to the dictionary.

In [17]:
print(number_of_people[("USA","California")])
print(number_of_people[("USA","Philippines")])

1000000
Not a US territory


FileDirDict supports most of the operations, offered by traditional Python dictionaries:

In [18]:
print(len(number_of_people))

5


In [19]:
del number_of_people[("USA","Philippines")]

In [20]:
print(len(number_of_people))

4


In [21]:
print(number_of_people[("USA","Philippines")])

KeyError: 'File /Users/vlpavlov/PycharmProjects/Pytha/tutorial/DDD/USA_65OZDTOT/Philippines_O7NLF6A2.pkl does not exist'

By default, FileDirDict will store all objects as binary pickle files. However, it also suports human-readable JSON fomrat:

In [22]:
new_persistant_dict = FileDirDict("tutorial/second_storage",file_type="json")
new_persistant_dict["days"] = ["SUN","MON","TUE", "WED", "THU","FRY","SAT"]
new_persistant_dict["first_allowed_year"] = 1974

In [23]:
!ls -a tutorial/second_storage

[34m.[m[m                                days_IT66YRYD.json
[34m..[m[m                               first_allowed_year_TIVUDHVE.json


In [24]:
!cat tutorial/second_storage/days_IT66YRYD.json

[
    "SUN",
    "MON",
    "TUE",
    "WED",
    "THU",
    "FRY",
    "SAT"
]

We can also store quite complex objects in a persistant dictionary

In [25]:
import pandas as pd

In [26]:
new_persistant_dict["wow"] = pd.DataFrame({"A":[10,20,30]})

In [27]:
!ls -a tutorial/second_storage

[34m.[m[m                                first_allowed_year_TIVUDHVE.json
[34m..[m[m                               wow_XTW4IUHY.json
days_IT66YRYD.json


In [28]:
!cat tutorial/second_storage/wow_XTW4IUHY.json

{
    "py/object": "pandas.core.frame.DataFrame",
    "values": "A\n10\n20\n30\n",
    "txt": true,
    "meta": {
        "dtypes": {
            "A": "int64"
        },
        "index": "{\"py/object\": \"pandas.core.indexes.range.RangeIndex\", \"values\": \"[0, 1, 2]\", \"txt\": true, \"meta\": {\"dtype\": \"int64\", \"name\": null}}",
        "column_level_names": [
            null
        ],
        "header": [
            0
        ]
    }
}

The data, stored in FileDirDict objects, will be available as long as the files are available on the disk. Of course, we can aways reset the dictionaries by simply removing the files:

In [29]:
!rm -r tutorial

Visit https://github.com/vladlpavlov/Pythagoras to learn more about Pythogaras