<div style='font-size:200%;font-weight:bold'>Title</div><br>

This skeleton notebook intends to be a one-stop-shop reference of the structure and stanzas to
improve the readability, presentability, and stylistic aspect of your notebooks.

**NOTE:**
- The title is a styled sentence rather than `h1`, to prevent it being showed and numbered in TOC.
- As of this commit, do expect mis-formatting on github as it seems that its renderer may note be
  100% faithful to Jupyter Lab.

<div style='color:firebrick'><b>NOTE:</b> this skeleton notebook is primarily for reading. To run it
completely, you need to install additional dependencies imported in the cell below.</div><br>

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

# Follow isort>=5 style: 'import ...' statements before 'from ... import ...'.
import pandas as pd
from IPython.display import Markdown
from smallmatter.ds import mask_df  # See: https://github.com/aws-samples/smallmatter-package/

# A few standard SageMaker's stanzas. Use type annotation to be verbose.
import sagemaker as sm
role: str = sm.get_execution_role()
sess = sm.Session()
region: str = sess.boto_session.region_name

# Global setup

<details><summary style="font-size:60%">Note on heading</summary>

> This section starts with an `h1` heading. Thus, it will appears in the TOC as "*1. Global setup*".
>
> Do add a blank line before the closing details tag, otherwise github won't collapse this portion.

</details><br>


This section contains Python variables that should be personalized such as:
- the name of Amazon S3 bucket and/or prefix may vary from one project member to another.
- the filename of the dataset to run.

We also show a pattern to automatically synchronize the Python variable to environment variables.
The idea is to centralized all changes to only this section, then you can safely run the remaining
cells without having to worry about outdated hardcoded values in the Python, `!`, and `%%` codes.

In [None]:
####################################################################################################
# Change me
####################################################################################################
bucket_name = 'my-bucket-name'
prefix_name = 'some/prefix'
####################################################################################################


####################################################################################################
# Do not change the next lines, as they're derived and will be recomputed automatically.
####################################################################################################
s3_prefix = f's3://{bucket_name}/{prefix_name}'.rstrip('/')

# Synchronize Python variable and environment variable.
%set_env S3_PREFIX=$s3_prefix
%env S3_PREFIX_CELL_SCOPE=$s3_prefix

# Demonstrate the difference between %env and %set_env.
!echo $S3_PREFIX_CELL_SCOPE  # Should print s3://my-bucket-name/some/prefix
!echo $S3_PREFIX             # Should print s3://my-bucket-name/some/prefix

In [None]:
# Demonstrate the difference between %env and %set_env
!echo $S3_PREFIX             # Should print s3://my-bucket-name/some/prefix
!echo $S3_PREFIX_CELL_SCOPE  # Should print an empty string

Next is a raw cell. Do note that github renderer is broken for raw cell, hence it won't be shown
correctly on github.

# Improved output

In [None]:
def mask_userid(df: pd.DataFrame) -> pd.DataFrame:
    return mask_df(df, cols=['userid'])

df_a = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
df_b = pd.DataFrame({
            'userid': [1000,2000,3000],
            'pca_a': [0.1, 0.2, 0.3],
            'pca_b': [-0.3, 0.01, 0.7]
       })

display(
    Markdown('## Plain dataframe\n**NOTE:** this also appears in TOC as "*2.1. Plain dataframe*"'),
    df_a,

    Markdown('''## Masked dataframe
Sometime, we would like to version the output of this cell into the git repo, to help readers to
quickly see the shape of a dataframe.

However, when the dataframe contains sensitive values, care must be taken to
**<font style='color:firebrick;background-color:yellow'>NEVER</font>** version these values to git.
Otherwise, as you all know, once checked into the git history, it can be tedious and challenging to
undo the versioning.
'''
    ),
    mask_userid(df_b),

)

## Plain dataframe
**NOTE:** this also appears in TOC as "*2.1. Plain dataframe*"

Unnamed: 0,a,b
0,1,4
1,2,5
2,3,6


## Masked dataframe
Sometime, we would like to version the output of this cell into the git repo, to help readers to
quickly see the shape of a dataframe.

However, when the dataframe contains sensitive values, care must be taken to
**<font style='color:firebrick;background-color:yellow'>NEVER</font>** version these values to git.
Otherwise, as you all know, once checked into the git history, it can be tedious and challenging to
undo the versioning.


Unnamed: 0,userid,pca_a,pca_b
0,xxx,0.1,-0.3
1,xxx,0.2,0.01
2,xxx,0.3,0.7


# Summary

When this notebook should be versioned without output, do a *Clear All Outputs*.

When there're output to be version (like what this skeleton notebook does), consider to remove the
cell counts.

<details><summary style="font-size:60%">Footnote</summary>

> This skeleton notebook was ran through the [clr-nb-xcnt.sh](https://github.com/verdimrc/pyutil/blob/master/bin/clr-nb-xcnt.sh) bash script to clear its cell
> counts. DISCLAIMER: the script is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
> OF ANY KIND, either express or implied, including, without limitation, any warranties or
> conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You
> are solely responsible for determining the appropriateness of using or redistributing the Work and
> assume any risks associated with Your exercise of permissions under this Apache License 2.0.

</details><br>
