In [1]:
%run ./bonus_assignment.ipynb

@register_cell_magic
def fakedata_testing(line, cell):
    # wrapping everything inside a giant try-except block
    try:
        fakedata(line,cell)
        print("No exceptions encountered while generating DataFrames 🐼🎉🎈🎊")
    except Exception as e:
        template = "🚫🚫🚫🚫🚫 An exception of type {0} occurred. Arguments:\n{1!r}"
        message = template.format(type(e).__name__, e.args)
        print(message)

@register_line_magic
def test(line):
    try:
        exec(line)
        print("PASSED")
    except Exception as e:
        print("FAILED {}".format(e))

## How to test
* Use `%%fakedata_testing` instead of `%%fakedata` for cell magic.
* Use `%test` line magic to do simple assertion tests
* Everytime you make changes to `bonus_assignment.ipynb`, you must save that file and "Restart-Run all" in this notebook.
* Exceptions are suppressed so that all tests are run

Sample for defining the domain specific language:
```
%%fakedata_testing
persons
-------
first_name
last_name*
phone_number
random_number(5) as customer_number [1]*

purchases
---------
isbn10
credit_card_full
random_number(3) as price
random_number(5) as customer_number [1]
```

Sample for testing the generated DataFrame
```
%test assert len(persons) == 99,'"persons" should have 99 rows'
```

In [12]:
%%fakedata_testing
persons
-------
first_name
last_name*
phone_number
random_number(5) as customer_number [1]*

purchases
---------
isbn10
credit_card_full
random_number(3) as price
random_number(5) as customer_number [1]

No exceptions encountered while generating DataFrames 🐼🎉🎈🎊


In [3]:
%test assert len(persons) == 99,'"persons" should have 99 rows'
%test assert len(purchases) == 99,'"purchases" should have 99 rows'
%test assert len(persons.last_name.value_counts()) == 99,'"persons.last_name" should have 99 unique values'
%test assert len(persons.customer_number.value_counts()) == 99,'"persons.customer_number" should have 99 unique values'
%test assert len(purchases.customer_number.value_counts()) < 99,'"purchases.customer_numbers" should have less than 99 unique values'
%test assert purchases.customer_number.isin(persons.customer_number).all(), '"purchases.customer_number" should be a subset of "persons.customer_number"'
%test assert (purchases.columns == ['isbn10', 'credit_card_full', 'price', 'customer_number']).all(), '''"purchase.columns" should be ['isbn10', 'credit_card_full', 'price', 'customer_number']'''

PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED


In [4]:
%%fakedata_testing
companies
-------
company*
catch_phrase
country
date_time_this_century as established

No exceptions encountered while generating DataFrames 🐼🎉🎈🎊


In [5]:
%test assert len(companies) == 99,'"companies" should have 99 rows'
%test assert len(companies.company.value_counts()) == 99,'"companies.company" should have 99 unique values'
%test assert (companies.columns == ['company', 'catch_phrase', 'country', 'established']).all(), '''"purchase.columns" should be ['company', 'catch_phrase', 'country', 'established']'''

PASSED
PASSED
PASSED


---

I added `df_size` support

```
function_to_call  ::= <wordcharacters>
parameters        ::= "" | "(" ( wordcharacters | number ) ")"
as_name           ::= "" | "as" <whitespace> <wordcharacters>
column_name       ::= as_name | function_to_call
reference         ::= "" |  "[" number "]"
unique_mark       ::= "" | "*"
column_definition ::= <function_to_call> <parameters> <whitespace> \
                      <as_name> <whitespace> <reference> <unique_mark>
df_size           ::= "" |  "[" integer "]"
df_sep            ::= "--" ("-"*)
df_definition     ::= <wordcharacters> <df_size> <newline> <df_sep> <newline> \
                      (<column_definition>*) <newline> <newline>
language_spec     ::= <def_definition>*
```
---
Feel free to play around with these as well:
```
DEFAULT_DF_SIZE = 99     # default number of rows per DataFrame
ORPHANED_UNIQUES = 0.2   # % of uniques references which won't be used in non-unique references
REPEAT_WEIGHTS = 1000    # Weight of chance for repeated values.  1 would be equal weights
```

In [6]:
DEFAULT_DF_SIZE = 20     # default number of rows per DataFrame
ORPHANED_UNIQUES = 0.1   # % of uniques references which won't be used in non-unique references
REPEAT_WEIGHTS = 1000    # Weight of chance for repeated values.  1 would be equal weights

In [7]:
%%fakedata_testing
apersons
-------
first_name
last_name*
phone_number
random_number(5) as customer_number [1]*

apurchases [500]
---------
isbn10
credit_card_full
random_number(3) as price
random_number(5) as customer_number [1]

No exceptions encountered while generating DataFrames 🐼🎉🎈🎊


In [8]:
%test assert len(apersons) == DEFAULT_DF_SIZE,'"apersons" should have {} rows'.format(DEFAULT_DF_SIZE)
%test assert len(apurchases) == 500,'"apurchases" should have 500 rows'
%test assert len(apersons.last_name.value_counts()) == DEFAULT_DF_SIZE,'"apersons.last_name" should have {} unique values'.format(DEFAULT_DF_SIZE)
%test assert len(apersons.customer_number.value_counts()) == DEFAULT_DF_SIZE,'"apersons.customer_number" should have {} unique values'.format(DEFAULT_DF_SIZE)
%test assert apurchases.customer_number.isin(apersons.customer_number).all(), '"apurchases.customer_number" should be a subset of "persons.customer_number"'
%test assert (apurchases.columns == ['isbn10', 'credit_card_full', 'price', 'customer_number']).all(), '''"apurchase.columns" should be ['isbn10', 'credit_card_full', 'price', 'customer_number']'''

PASSED
PASSED
PASSED
PASSED
PASSED
PASSED


In [9]:
# distribution of customer_number for 'apurchases' DataFrame
# tweak the 3 options above and rerun the cells to see how they affect the results
apurchases.customer_number.value_counts()

33577    76
58858    68
94729    65
35835    53
42591    49
15647    27
48655    26
40065    26
30468    22
46933    21
10964    20
80811    15
81752    15
4267      5
8938      4
15390     4
39274     3
10758     1
Name: customer_number, dtype: int64

In [10]:
%%fakedata_testing
sad_persons [1001]
-------
last_name*

🚫🚫🚫🚫🚫 An exception of type AssertionError occurred. Arguments:
('Insufficient data (1000 available) from provider to generate 1001 "last_name".',)
