# üóÇÔ∏è Structuring a Repository

# Repo Structure

## Best Practices

A good program should have:

- A logical and, if possible, standardized file tree structure.
- Be versioned on a version control tool such as GIT.
- Be able to run easily through a virtual environment.
- Have unit tests to verify that program modifications don't cause regressions.

We will focus here only on the repo structure.

Good repository (repo) structuring facilitates maintenance, collaboration, and deployment. There are many ways to proceed, here are the most common ones.

## Example of a Classic Python Repository Structure

```
my_project/
‚îú‚îÄ‚îÄ .gitignore
‚îú‚îÄ‚îÄ README.md
‚îú‚îÄ‚îÄ LICENSE
‚îú‚îÄ‚îÄ pyproject.toml (or un setup.py)
‚îú‚îÄ‚îÄ requirements.txt
‚îú‚îÄ‚îÄ environment.yaml (optional)
‚îú‚îÄ‚îÄ docs/ (optional)
‚îÇ   ‚îî‚îÄ‚îÄ ...
‚îú‚îÄ‚îÄ notebooks/ (optional)
‚îÇ   ‚îú‚îÄ‚îÄ VP_1_exploration.ipynb
‚îÇ   ‚îú‚îÄ‚îÄ VP_2_modelisation.ipynb
‚îÇ   ‚îî‚îÄ‚îÄ ...
‚îú‚îÄ‚îÄ tests/
‚îÇ   ‚îî‚îÄ‚îÄ ...
‚îú‚îÄ‚îÄ src/
‚îÇ   ‚îî‚îÄ‚îÄ my_project/
‚îÇ       ‚îú‚îÄ‚îÄ __init__.py
‚îÇ       ‚îú‚îÄ‚îÄ module1.py
‚îÇ       ‚îî‚îÄ‚îÄ module2.py
‚îî‚îÄ‚îÄ scripts/ (optional)
    ‚îî‚îÄ‚îÄ ...
```

- The most common files:
  - **`.gitignore`**: File listing files and folders to be ignored by Git.
  - **`README.md`**: Markdown file describing the project, its installation and usage.
  - **`LICENSE`**: File containing the project license.
  - **`requirements.txt`**: File listing the Python dependencies of the project.
- Common files:
  - **`setup.py`**: Python package installation script.
  - **`pyproject.toml`**: A more modern version of "setup.py", used to specify build tools and dependencies.
  - **`download_data.py`**: Script to download data.
  - **`environment.yml`**: Generally a conda environment, works a bit like a "requirements.txt" but can contain more information.
  - **`__init__.py`**: File that transforms a folder into a Python package (optional since Python 3.3)
- Frequent folders:
  - **`docs/`**: Contains the project documentation.
  - **`tests/`**: Contains unit and integration tests.
  - **`src/`**: Contains the project source code.
  - **`notebooks/`**: Contains various notebooks (often for exploration).
  - **`models/`**: Contains various models generated by the script.
  - **`data/`**: Contains all data used by the program. The content of this folder is often in .gitignore.

## Focus on Some Files

### `.gitignore`

File listing files and folders to be ignored by Git. It can ignore files based on their name, extension, or the folder they are in. The "*" character represents an indefinite number of characters.

**Example**:
```
# Ignore virtual environments
venv/
.env/

# Ignore compiled Python files
__pycache__/
*.py[cod]

# Ignore local configuration files
*.env
*.ini
```

### `README.md`

Markdown file describing the project, its installation and usage.

### `requirements.txt`

File listing the Python dependencies of the project.

**Example**:
```
numpy==1.21.0
pandas==1.3.0
requests==2.26.0
flask==2.0.1
```

### `setup.py`

**Description**: Installation script defining package metadata and dependencies required for installation.

**Example**:
```python
from setuptools import setup

setup(
    name='mypackage',
    version='0.0.1',
    install_requires=[
        'requests',
        'importlib-metadata; python_version<"3.10"',
    ],
)
```

### `pyproject.toml`

A more modern version of "setup.py", used to specify build tools and dependencies.

**Example**:
```toml
[project]
name = "hello-world"
version = "1.0.0"
description = "My first Python package"
requires-python = ">=3.8"
keywords = ["python", "first-project"]
authors = [
    {name = "John Doe", email = "john@example.com"},
]
dependencies = [
    "requests",
    "gidgethub[httpx]>4.0.0",
]
```

### `download_data.py`

Script to download data.

### `environment.yml`

**Description**: Generally a conda environment stored in YAML format. Works a bit like a "requirements.txt" but can contain more information.

**Example**:
```yaml
name: my_project
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.8
  - numpy=1.21.0
  - pandas=1.3.0
  - requests=2.26.0
  - pip
  - pip:
      - flask==2.0.1
      - gunicorn==20.1.0
```

### `__init__.py`

File that transforms a folder into a Python package (optional since Python 3.3).

**Example**:
```python
# __init__.py

from .module1 import my_function
from .module2 import my_class

__all__ = ["my_function", "my_class"]

__version__ = "0.1.0"
```

## Cookiecutter

[Cookiecutter](https://github.com/cookiecutter/cookiecutter) is a tool that generates project structures from templates. It is particularly useful for standardizing the structure of your Python repositories.