# Giga Model Data

Return to [main](main.ipynb) docs.

This document describes the input data Giga models use and how to modify this data.

---

## Updating Object Store

The application provides a stand-alone country update notebook at `notebooks/dev/update-objstore.ipynb`
that can be used to edit countries and their default configurations within the app. This notebook can 
also be used to updated the caches for each country.

The tool expects a container where all school files are stored. All files start with the ISO 3 Code of the country.
The tool also expects another container where all infrastructure, cache and cost files are stored. For infrastructure and caches,
the files are stored in a folder with the ISO 3 code of the country and the costs are stored in a folder called `costs`.
A copy of the schools file it is also saved in this container in the same folder as the infrastructure and cache files. This file is used 
to compare with the schools file from the other container in order to check if this has changed and therefore has potentially
render the caches invalid.
One can modify `giga/utils/globals.py` to change the names of the folders and files.


> **Note**: This data is stored on an Azure ML blob storage ( or Google Cloud object store) instance behind a
> [DataStore](#data-store-interface) interface. This notebook will be compatible with (and use)
> the default data storage solution for the connectivity cost application.



### Updating a Country

To update a country, select **Update an existing country** at the top of the notebook.
From the drop-down, select the country to modify. A list of "issues", such as missing files will appear.
After that, one can update:
* Model configuration defaults will be populated in the form below. To change these defaults,
  modify the parameter values under each model section.
* You can also provide one or more CSV files with additional country configuration data. If you do
  not provide a file, the existing one will be preserved.
    * Fiber file: location and properties of fiber nodes
    * Cellular file: location and properties of cell towers.
* Calculate or recalculate any of the caches:
    * Fiber cache: distances of schools to fiber nodes
    * Cellular cache: distances of schools to cell towers
    * P2P cache: distances of schools to visible cell towers
    * School cache: distances between pairs of schools
    * School visibility cache: distance between pairs of visible schools

To finalize any changes on configuration parameters or to upload new file, click "Save Country" at the bottom of the notebook. Your changes will be
validated before being saved.



## Data types


### School Data

School data is in the form of a csv file and it contains the following fields:

| Field               | Type          | Description                        | Default    |
| ------------------- | ------------- | ---------------------------------- | ---------- |
| school_id           | str           | School identifier from source      | Mandatory  |
| name                | str           | School name                        | Mandatory* |
| lat                 | float         | latitude coordinate                | Mandatory  |
| lon                 | float         | longitude coordinate               | Mandatory  |
| admin1              | str           | Administrative unit level 1        | Mandatory* |
| admin2              | str           | Administrative unit level 2        | Mandatory* |
| admin3              | str           | Administrative unit level 3        | Mandatory* |
| admin4              | str           | Administrative unit level 4        | Mandatory* |
| education_level     | str           | Education level                    | Mandatory* |
| giga_id_school      | str           | Unique school identifier           | Mandatory  |
| school_region       | str           | Urban or Rural                     | Mandatory* |
| connectivity        | str           | Yes or No school connected         | Mandatory  |
| type_connectivity   | str           | source of connectivity             | Mandatory* |
| electricity         | str           | Yes or No school electricity       | Mandatory* |
| cell_coverage_type  | str           | 4G, 3G, 2G , etc                   | Mandatory* | 
| has_electricity     | bool          | school has electricity             | False      |
| has_fiber           | bool          | school is connected with fiber     | False      |
| connected           | bool          | whether school is connected        | False      |
| num_students        | int           | number of students                 | None       |
| fiber_node_distance | float         | distance to the nearest fiber node | inf        |
| nearest_LTE_distance| float         | distance to the nearest cell tower | inf        |
| bandwidth_demand    | float         | minimum bandwidth demand           | 20.0       |
| power_required_watts| float         | power required at the school       | 11000.0    |

Mandatory* means that the field needs to exist but it can be empty for all rows. Some of the bool fields that have
a default are calculated from their string counterparts (i.e., `has_electricity` with `electricity`). 

### Fiber Node Data

Fiber nodes for a country can be specified as unique coordinates using the schema below in a csv table of the countries' workspace:


| Field         | Type          | Description                   |
| ------------- | ------------- | ----------------------------- |
| coordinate_id | str           | Unique coordinate identifier |
| coordinate    | LatLonPoint   | Latitude and longitude point  |
| properties    | json (optional) | Additional properties         |

---

### Cell Tower Data

Cell tower data for a country can be specified using the schema below in a csv table of the countries' workspace:

| Field        | Type                     | Description                      |
| ------------ | ------------------------ | -------------------------------- |
| tower_id     | str                      | Unique tower identifier          |
| operator     | str                      | Cellular tower operator          |
| outdoor      | bool                     | Whether the tower is outdoor     |
| lat          | float                    | Latitude of the tower            |
| lon          | float                    | Longitude of the tower           |
| height       | float                    | Height of the tower in meters    |
| technologies | List[CellTechnology]     | List of supported technologies [2G, 3G, 4G, LTE] |

---



## Data Store Interface

The application accesses underlying country information through a `DataStore` interface.

`/giga/data/store/stores.py` contains the global data store configuration for the application.
Note the two stores defined in that file:

* **LocalFS**: Reads and writes data to the local filesystem of the server running the application.
  In deployed environments, will require a redeployment to edit files in a persistent manner.
* **GCSDataStore**: An implementation of the interface that uses a Google Cloud Storage object store
  to house country information, shared across all active runners.
* **ADLSDataStore**: An implementation of the interface that uses a Azure ML blob store
  to house country information, shared across all active runners.

To change which backend is used for the application's country data store, modify the
`COUNTRY_DATA_STORE` variable to the desired implementation.

> **Note**: The GCSDataStore uses [service account credentials](https://cloud.google.com/iam/docs/service-account-creds)
> to authenticate with Google Cloud Storage. These credentials can be supplied in the deployment
> environment or the local filesystem. For more, see [deployment docs](./dev.md)

The **DataStore** implements the following interface:

```python
class DataStore(ABC):
    """
    Abstract base class for a data store. This can be a local filesystem,
    Google Cloud Storage, or any other system where you can store data.
    """

    @abstractmethod
    def read_file(self, path: str) -> Any:
        """
        Read a file from the data store.
        :param path: Path to the file in the data store.
        :return: The content of the file.
        """
        pass

    @abstractmethod
    def write_file(self, path: str, data: Any) -> None:
        """
        Write data to a file in the data store.
        :param path: Path to the file in the data store.
        :param data: The data to write.
        """
        pass

    @abstractmethod
    def file_exists(self, path: str) -> bool:
        """
        Check if a file exists in the data store.
        :param path: Path to the file in the data store.
        :return: True if the file exists, False otherwise.
        """
        pass

    @abstractmethod
    def list_files(self, path: str) -> List[str]:
        """
        Lists all files in a given directory path.
        :param path: The directory path.
        :return: A list of file names.
        """
        pass

    @abstractmethod
    def walk(self, top: str) -> Generator:
        """
        Generate the file names in a directory tree by walking the tree either top-down or bottom-up.
        For each directory in the tree rooted at directory top, it yields a 3-tuple: (dirpath, dirnames, filenames).
        :param top: The root directory path.
        """
        pass

    @abstractmethod
    def open(self, file: str, mode: str='r') -> IO:
        """
        Open a file.
        :param file: The file path.
        :param mode: The mode in which the file is opened.
        :return: a file object.
        """
        pass

    @abstractmethod
    def is_file(self, path: str) -> bool:
        """
        Check if the path points to a file.
        :param path: The file path.
        :return: True if the path points to a file, False otherwise.
        """
        pass

    @abstractmethod
    def is_dir(self, path: str) -> bool:
        """
        Check if the path points to a directory.
        :param path: The path to check.
        :return: True if the path is a directory, False otherwise.
        """
        pass

    @abstractmethod
    def remove(self, path: str) -> None:
        """
        Attempts to remove a file
        """
        pass

    @abstractmethod
    def rmdir(self, dir: str) -> None:
        """
        Attempts to remove a directory and its contents
        """
        pass
```

## Data Schemas

The schemas below define key data types used in the modeling library.
The definitions are roughly broken down into three categories: model configuration, input data definitions, and output data definitions.

### Unique Coordinate

```json
{
    "title": "UniqueCoordinate",
    "description": "Uniquely identifiable lat/lon coordinate",
    "type": "object",
    "properties": {
        "coordinate_id": {
            "title": "Coordinate Id",
            "type": "string"
        },
        "coordinate": {
            "title": "Coordinate",
            "type": "array",
            "minItems": 2,
            "maxItems": 2,
            "items": [
                {
                    "type": "number"
                },
                {
                    "type": "number"
                }
            ]
        },
        "properties": {
            "title": "Properties",
            "type": "object"
        }
    },
    "required": [
        "coordinate_id",
        "coordinate"
    ]
}
```

### School Entity

```json
{
    "title": "GigaSchool",
    "description": "Definition of a single school",
    "type": "object",
    "properties": {
        "school_id": {
            "title": "School Id",
            "type": "string"
        },
        "name": {
            "title": "Name",
            "type": "string"
        },
        "lat": {
            "title": "Lat",
            "type": "number"
        },
        "lon": {
            "title": "Lon",
            "type": "number"
        },
        "admin1": {
            "title": "Admin 1 Name",
            "type": "string"
        },
        "admin2": {
            "title": "Admin 2 Name",
            "type": "string"
        },
        "admin3": {
            "title": "Admin 3 Name",
            "type": "string"
        },
        "admin4": {
            "title": "Admin 4 Name",
            "type": "string"
        },
        "education_level": {
            "$ref": "#/definitions/EducationLevel"
        },
        "giga_id_school": {
            "title": "Giga Id School",
            "type": "string"
        },
        "school_zone": {
            "$ref": "#/definitions/SchoolZone"
        },
        "connected": {
            "title": "Connected",
            "default": false,
            "type": "boolean"
        },
        "connectivity": {
            "title": "Connectivity Yes or No",
            "type": "string"
        },
        "type_connectivity": {
            "title": "Type of Connectivity",
            "type": "string"
        },
        "electricity": {
            "title": "Electricity Yes or No",
            "type": "string"
        },
        "has_electricity": {
            "title": "Has Electricity",
            "default": false,
            "type": "boolean"
        },
        "connectivity_status": {
            "title": "Connectivity Status",
            "type": "string"
        },
        "bandwidth_demand": {
            "title": "Bandwidth Demand",
            "default": 20.0,
            "type": "number"
        },
        "has_fiber": {
            "title": "Has Fiber",
            "default": false,
            "type": "boolean"
        },
        "cell_coverage_type": {
            "title": "Cell coverage type",
            "type": "string"
        },
        "num_students": {
            "title": "Number of students",
            "type": "number"
        },
        "fiber_node_distance": {
            "title": "Fiber node distance",
            "type": "number"
        },
        "nearest_LTE_distance": {
            "title": "Nearest LTE tower distance",
            "type": "number"
        },
        "power_required_watts": {
            "title": "Power Required in Watts",
            "type": "number"
        },
    },
    "required": [
        "school_id",
        "name",
        "lat",
        "lon",
        "admin_1_name",
        "admin_2_name",
        "admin_3_name",
        "admin_4_name",
        "education_level",
        "giga_id_school",
        "school_zone",
    ],
    "definitions": {
        "EducationLevel": {
            "title": "EducationLevel",
            "description": "Valid level of education",
            "enum": [
                "Primary",
                "Secondary",
                "Other",
                ""
            ],
            "type": "string"
        },
        "SchoolZone": {
            "title": "SchoolZone",
            "description": "Valid school zone environment",
            "enum": [
                "rural",
                "urban",
                ""
            ],
            "type": "string"
        }
    }
}
```

### Cell Tower

```json
{
  "info": {
    "title": "Cellular Tower API"
  },
  "components": {
    "schemas": {
      "CellularTower": {
        "title": "CellularTower",
        "type": "object",
        "properties": {
          "tower_id": {
            "title": "Tower Id",
            "type": "string"
          },
          "operator": {
            "title": "Operator",
            "type": "string"
          },
          "outdoor": {
            "title": "Outdoor",
            "type": "boolean"
          },
          "lat": {
            "title": "Latitude",
            "type": "number",
            "format": "float"
          },
          "lon": {
            "title": "Longitude",
            "type": "number",
            "format": "float"
          },
          "height": {
            "title": "Height",
            "type": "number",
            "format": "float"
          },
          "technologies": {
            "title": "Technologies",
            "type": "array",
            "items": {
              "type": "string",
              "enum": [
                "2G",
                "3G",
                "4G",
                "LTE"
              ]
            },
            "uniqueItems": true
          }
        },
        "required": [
          "tower_id",
          "operator",
          "outdoor",
          "lat",
          "lon",
          "height",
          "technologies"
        ]
      }
    }
  }
}

```


### Fiber Model Configuration

```json
{
    "title": "FiberTechnologyCostConf",
    "type": "object",
    "properties": {
        "capex": {
            "$ref": "#/definitions/FiberCapex"
        },
        "opex": {
            "$ref": "#/definitions/FiberOpex"
        },
        "constraints": {
            "$ref": "#/definitions/FiberConstraints"
        },
        "technology": {
            "title": "Technology",
            "default": "Fiber",
            "type": "string"
        },
        "electricity_config": {
            "$ref": "#/definitions/ElectricityCostConf"
        }
    },
    "required": [
        "capex",
        "opex",
        "constraints"
    ],
    "definitions": {
        "FiberCapex": {
            "title": "FiberCapex",
            "type": "object",
            "properties": {
                "cost_per_km": {
                    "title": "Cost Per Km",
                    "type": "number"
                },
                "fixed_costs": {
                    "title": "Fixed Costs",
                    "default": 0.0,
                    "type": "number"
                },
                "economies_of_scale": {
                    "title": "Economies Of Scale",
                    "default": true,
                    "type": "boolean"
                },
                 "schools_as_fiber_nodes": {
                    "title": "Schools as fiber nodes",
                    "default": true,
                    "type": "boolean"
                }
            },
            "required": [
                "cost_per_km"
            ]
        },
        "FiberOpex": {
            "title": "FiberOpex",
            "type": "object",
            "properties": {
                "cost_per_km": {
                    "title": "Cost Per Km",
                    "type": "number"
                },
                "fixed_costs": {
                    "title": "Fixed costs",
                    "type": "number"
                },
                "annual_bandwidth_cost_per_mbps": {
                    "title": "Annual Bandwidth Cost Per Mbps",
                    "default": 0.0,
                    "type": "number"
                }
            },
            "required": [
                "cost_per_km"
            ]
        },
        "FiberConstraints": {
            "title": "FiberConstraints",
            "type": "object",
            "properties": {
                "maximum_connection_length": {
                    "title": "Maximum Connection Length",
                    "default": Infinity,
                    "type": "number"
                },
                "maximum_bandwithd": {
                    "title": "Maximum Bandwithd",
                    "default": 2000,
                    "type": "number"
                },
                "required_power": {
                    "title": "Required Power",
                    "default": 500,
                    "type": "number"
                },
                "correction_coeficient": {
                    "title": "Correction coeficient",
                    "default": 1.2,
                    "type": "number"
                }
            }
        },
        "ElectricityCapexConf": {
            "title": "ElectricityCapexConf",
            "type": "object",
            "properties": {
                "solar_panel_costs": {
                    "title": "Solar Panel Costs",
                    "type": "number"
                },
                "battery_costs": {
                    "title": "Battery Costs",
                    "type": "number"
                }
            },
            "required": [
                "solar_panel_costs",
                "battery_costs"
            ]
        },
        "ElectricityOpexConf": {
            "title": "ElectricityOpexConf",
            "type": "object",
            "properties": {
                "cost_per_kwh": {
                    "title": "Cost Per Kwh",
                    "type": "number"
                }
            },
            "required": [
                "cost_per_kwh"
            ]
        },
        "ElectricityCostConf": {
            "title": "ElectricityCostConf",
            "type": "object",
            "properties": {
                "capex": {
                    "$ref": "#/definitions/ElectricityCapexConf"
                },
                "opex": {
                    "$ref": "#/definitions/ElectricityOpexConf"
                }
            },
            "required": [
                "capex",
                "opex"
            ]
        }
    }
}
```

### School Connection Cost

```json
{
    "title": "SchoolConnectionCosts",
    "type": "object",
    "properties": {
        "school_id": {
            "title": "School Id",
            "type": "string"
        },
        "capex": {
            "title": "Capex",
            "type": "number"
        },
        "opex": {
            "title": "Opex",
            "type": "number"
        },
        "opex_provider": {
            "title": "Opex Provider",
            "type": "number"
        },
        "opex_consumer": {
            "title": "Opex Consumer",
            "type": "number"
        },
        "technology": {
            "$ref": "#/definitions/ConnectivityTechnology"
        },
        "feasible": {
            "title": "Feasible",
            "default": true,
            "type": "boolean"
        },
        "reason": {
            "title": "Reason",
            "type": "string"
        },
        "electricity": {
            "$ref": "#/definitions/PowerConnectionCosts"
        }
    },
    "required": [
        "school_id",
        "capex",
        "opex",
        "opex_provider",
        "opex_consumer",
        "technology"
    ],
    "definitions": {
        "ConnectivityTechnology": {
            "title": "ConnectivityTechnology",
            "description": "Technologies that can be assessed in modeling scenarios",
            "enum": [
                "Fiber",
                "Cellular",
                "Satellite",
                "None"
            ],
            "type": "string"
        },
        "PowerConnectionCosts": {
            "title": "PowerConnectionCosts",
            "type": "object",
            "properties": {
                "electricity_opex": {
                    "title": "Electricity Opex",
                    "default": 0.0,
                    "type": "number"
                },
                "electricity_capex": {
                    "title": "Electricity Capex",
                    "default": 0.0,
                    "type": "number"
                },
                "cost_type": {
                    "title": "Cost Type",
                    "default": "Grid",
                    "enum": [
                        "Grid",
                        "Solar"
                    ],
                    "type": "string"
                }
            }
        }
    }
}
```
