<h1 class=section> Repository template </h1> 

<div class=info> In this notebook you will learn a few rules/tips which are essential/useful when one wants to create a "perfect" respository which is easily manageble and maintainable. Moreover, you will see a use a few tools which facilitate creation of such a repository.</div>

<h1 class=subsection> General guidelines </h1> 
<div class=warn> The following recommendations should not be treated as mandatory ones. Treat them as a guide. </div>
  <ol class=command_list>
  <li>Do not store data in your repository. Instead load data from data source in proper module of business logic.</li>
  <li>Notebooks should be treated as a tool for EDA and presentation of results and not for developing ML code or other software products.</li>
  <li>Common reusable code should be implemented in different modules/functions and imported from these.</li>
  <li>Analysis of data should be divided into steps and result of a step should be saved (imagine a situation in which data processing chain has three time consuming parts and the second part fails. The you should be able to load the result of the first part and pass it to second one instread of rerunning whole pipeline).  </li>
    <li>Your workspace in a repository should be automately built and easily maintainable.</li>
    <li>For security reasons do not store any sensitive data in your repository (passwords, access keys etc.).</li>
    
  
</ol>

<h1 class=subsection> Structure of a repository </h1> 
<div class=warn> Once more treat the nonexhaustive list below just as a set of recommended list of objects which should appear in your repository. Try to stick to proposed structure as often as possible because it is intuitive and unifies various conventions. Similarly as stated in <a href="https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">YAGNI</a> principle - do not create in advance directories you do not need at the moment.</div>

<h1 class=subsubsection> Root directories </h1>
  <ol class=command_list>
    <li><strong>data</strong> - usually added to `.gitignore`. Keep here the data you need unless it is big or stored externally by convention..</li>
  <li><strong>notebooks</strong> - as the name suggest here go all jupyter notebook files.</li>
  <li><strong>models</strong> - (For ML projects) for storing models and associated statistics.</li>
  <li><strong>reports</strong> - for business purposes - for example you can provide here (`html`, `pdf`, `latex`, results of data analysis, etc.) stuff you want to present to your client/manager.</li>
  <li><strong>src/&lt;name_of_the_module&gt;</strong> - put here all your source files. You can add here some subdirectories to facilitate usage of your repo. For example in data subdirectory you can pass all scripts which download required data and in visualization subdirectory all scripts which generate visualization of the results. </li>
    <li><strong>conf</strong> - put here configuration files.</li>
    <li><strong>logs</strong> - usually added to `.gitignore`. This folder serves as an output directory for logs of your scripts if you need them locally.</li>
    <li><strong>tests</strong> - here you implement tests for your application.</li>
  
</ol>

<h1 class=subsubsection> Root files </h1>

 <ol class=command_list>
    <li><strong>requirements.txt/setup.py</strong> - for managing dependencies and version requirements.</li>
  <li><strong>licence</strong> - always provide some licence e.g. MIT to avoid troubles (imagine a situation in which your program causes a crash in someone else's product).</li>
  <li><strong>Makefile</strong> -  nowadays its functionality is frequently replaced by  e.g. pre-commit. Nonetheless, it is still used to e.g. automatization of configuration, installation of venv, cleaning of your repository etc.</li>
  <li><strong>reports</strong> - for business purposes - for example you can provide here (html, pdf, latex, results of data analysis, etc.) stuff you want to present to your client/manager.</li>
  <li><strong>README.md</strong> - provide here all information about your project so that anyone who visists your repository could easily use it. When you are developing an aplication with a team try to keep it up-to-date and make sure you inform you coding mates about changes in this file so that your team is aware of e.g. new features, scripts, hooks etc. </li>

  
</ol>

<h1 class=subsection> Creation of your own template of a git repository </h1> 

<div class=info>In this part you aim is to write from a scratch a template repository for python projects which should be reusable (as frequently as possible) and should handle jupyter notebooks as well.</div>

<div class=warn>In below exercises some tools are proposed to potentially simplify creation of your own repository. However, usage of these tools is not essential, and it is up to you to decide if you take advantage of them to realize the aim. E.g. you will be asked to handle dependencies and versioning using poetry but as poetry says on its <a href="https://github.com/python-poetry/poetry">github page</a>: "Poetry replaces setup.py, requirements.txt, setup.cfg, MANIFEST.in and Pipfile with a simple pyproject.toml based project format" so instread you can stay with traditional way using files listed in the quotation. Both approaches potentially have pros and cons: poetry is higher lever  and thus it is easier to use but if something fails it might be tiresome to find the factor. On the other hand setting standard files (like setup.py etc.)  might take a while but you have controll over almost everything. Similar situation is with Makefile vs pre-commit functionalities. Although pre-commit is more modern way to go still makefile is commonly used for example to automate bash commands. </div>

<div class=warn>Sometimes tasks below might be slighltly imprecise: we assume that now you know basic rules governing git repositories and it is not essential to say e.g. that a test should be implemented in tests directory. Instead we will just write  "write a test which does this and that". If you are not sure how to organize something you are welcome to ask our lecturers. Summing it up try to stick to the general rules / conventions, the rest depends on you.</div> 

<div class=warn>Read all exercises below before doing any of them to get a bigger picture of the main task - creation of template of python project repository.</div> 

<h1 class=subsubsection> README.md </h1>

<div class=exercise>As you will extend more and more functionality of your repo you are asked to keep README.md updated, this includes the description of project, used packages, information about provided hooks, makefile commands etc. Moreover, README.md should be neat and easy to read for any new-comer (even inexperienced one).  See for example <a href="https://github.com/awesomeahi95/Hotel_Review_NLP/blob/master/README.md">this github repository README.md</a> to learn nice markdown tools/tricks to make README.md user-friendly.</div>

<h1 class=subsection> Pre-commit </h1>

<div class=info> You might find <a href="https://github.com/cleder/awesome-python-testing#tools">this</a> and <a href="https://pre-commit.com/hooks.html">that</a> useful for this task.</div>

<div class=exercise>You are already familiar with pre-commit tool. Extend usual support for py files to jupyter notebooks. Use a few available hooks for formatting and linting parts. Provide a few .ipynb files on which you can test this new functionality. Apart from pre-commit stage, define at least one hook for pre-push. Write at least one "local repo" precommit: for example it might be useful to have a command which removes outputs for all notebooks (to prevent some leaks during the push) or to run some tests before push. Do not forget to describe this functionalities in README.md. Using pre-commit tool try to add some metadata to commit message like the name of the one who commits, date, or jira tag. You can also try to force checking of commit message e.g. you can check if it is not too long (max 80 chars or so). In real team project do not forget to inform your colleges about these changes and make sure that the active all precommit hooks. Add moreover hook which checks yaml files and bash ones. Write at least one bash script (might be dummy). Make some of the hooks (decide which) exclude to prevent for example .venv (virtual environment) from beging checked.</div>

<h1 class=subsection> Makefile </h1>

<div class=warn> Since makefile is not essential in your projects you can postpone this exercise to the end of this day and do it only if time allows. However we encourage you to do it to get used to to the make syntax.</div>
<div class=exercise>Prepare in makefile two commands: 
     <ol class=command_list>
  <li>install: installs virtual environment in .venv directory and then installs all dependencies from requirements.txt
    <li>clean: cleans your repository by removing cache, log and report files/directories.  
  
</ol>
    
       
<details class=hint>
<summary>Hint for setting shell in makefile</summary>
  <p>SHELL := /bin/bash</p>
</details>
    <details class=hint>
    <summary>Hint for install:</summary>
  <p>Since each line of command might be executed in different shell, for installing requirements it is essential that you chain your commands with &&, that is, (activate venv) && (do_1_task_in_venv) && (do_2_task_in_venv) && ...</p>
</details>
    <details class=hint>
    <summary>Proposal of clean:</summary>
  <p><code> clean:
	@rm -vrf deps
	@rm -vrf .mypy_cache | grep "directory" || true
	@rm -vrf .pytest_cache
	@find . | grep -E "(__pycache__|\.pyc|\.pyo$$)" | xargs rm -rf
	@find . | grep -E "(\.log$$)" | xargs rm -rf
	@rm -vrf $(VENVNAME) | grep "'$(VENVNAME)'" || true
	@rm -vrf $(REPORTS_DIR)
    </code></p>
</details>
   </div> 
    
  

  <h1 class=subsection> Buisenes report /visualization of data </h1>
 
     

<div class=exercise>
    Using notebooks prepare some dummy plots. Provide a script (selection of a tool for the realization is left to you) which generates a dummy report by presenting and describing these plots. 
    
</div>

  <h1 class=subsection> Configuration</h1>
 
     

<div class=exercise>
   Provide some configuration for formatting, linting tools. Make sure that they are applied indeed (not ignored). 
    
</div>

  <h1 class=subsection> Add-ons</h1>


<div class=exercise>
   If your are done with all exercises above think about things you might find useful for your work. For example you may be interested in  pivottablejs module. Try it out! Check how it is implemented. If still you have plenty of time see some debugging tools like iPython debugger. Write a simple notebook to test its functionality. Additionally, you can provide a script which converts jupyter notebooks into py files (this might be useful if some functionality like hook is not available for ipynb files whereas it is for py ones). Try to find out what else could be added to make you template even better then it is now :) Consult other participants and exchange your ideas with them.

</div>

<h1 class=subsubsection> Just for proper styling of html </h1>

In [1]:
# in case of change of file ./styles/custom.css run this snippet to apply changes to this notebook!
from IPython.core.display import HTML


def css_styling():
    styles = open("../../../../style/custom.css", "r").read()
    return HTML(styles)


css_styling()