diff --git a/notebook/Release Overview WranglerV1.ipynb b/notebook/Release Overview WranglerV1.ipynb new file mode 100644 index 0000000..aac7589 --- /dev/null +++ b/notebook/Release Overview WranglerV1.ipynb @@ -0,0 +1,252 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Release Overview" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Release Overview of Wrangler v1.0\n", + "\n", + "*Compared to pre 1.0*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Features\n", + "\n", + "1. I/O and Mutation Speed\n", + "2. Flexible serialization formats\n", + "3. Improved stability\n", + "4. Selection flexibility for transit based on any feature or by link or node\n", + "5. More flexible and fast transition to geodataframes for (almost) any data part\n", + "6. Easy data clipping to geographic bounds\n", + "7. Scripts for actions you might want to execute from command line (i.e. data conversion/clipping/etc)\n", + "8. Logging \n", + "9. Error directions\n", + "10. Implicit and fast validation\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tech Overhead Investment\n", + "\n", + "1. Separation of causes prevents circular reference collisions, limits import bloat, and improves legibility/organization.\n", + " - Project card functionality in project card repo\n", + " - Separate modules for separate functionality\n", + "2. Reliability achieved through more testable code and expanding test coverage with more test cases and also anti-patterns\n", + " - Testable code\n", + " - Test coverage\n", + "3. Explicit data models that make complex data structures obvious and easy to validate\n", + " - Legible\n", + " - Self-documenting\n", + " - Flexible\n", + " - Validatable\n", + "4. Clean code principles that make code easier to test and maintain\n", + " - more functions that do a single thing...and do it well\n", + " - classes that are small - functions that access or manipulate them\n", + "5. Documentation\n", + " - consistent and detailed functional documentation\n", + " - consistent type hints\n", + " - usage for modules and classes\n", + " - overall documentation leverages the flexible and less bloated MkDocs package\n", + "6. Removes less-well-maintained dependencies\n", + " - Replaces Partridge with internal functionality\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Speed\n", + "\n", + "### I/O Speed\n", + "\n", + "1. Makes the heavy shapes.geojson optional and doesn't read it in for operations that don't directly involve it.\n", + "2. Leverages `pandera` for speedy, vector-based data model validation for dataframes\n", + "3. Replaces row-based calculations for blank geographic values with vector-based calculations\n", + "4. Provides flexibility for I/O serialization formats that are faster – like Parquet." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Selection Speed\n", + "\n", + "1. Caches selections and references them if network hasn’t changed using hashes so that costly selections that involve connecting a shortest path don't have to be performed again." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Project Apply Speed\n", + "\n", + "1. Replaced most row-based functions with vector-based functions - new road/managed lane\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setting roadway net speed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Converet to model net" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Serialization Formats\n", + "\n", + "Multiple serialization formats and an API and script to translate.\n", + "- Parquet\n", + "- Geojson/json\n", + "- CSV\n", + "- Pickle" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stability\n", + "\n", + "- Handles a lot more cases - has been tested on a lot more cases.\n", + "- If fails, should tell you why and what you need to do." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Transit Selection Features\n", + "\n", + "- Select by any trip or route characteristic\n", + "- Select by nodes or links" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## GeoDataFrames\n", + "\n", + "- All roadway tables are stored in GeoDataFrames for easy viewing\n", + "- Transit is easily transferred to GeoDataFrames" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clipping\n", + "\n", + "- Easily clip roadway or transit features using API or script" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Code Complexity\n", + "\n", + "\n", + "\n", + "### Cyclomatic Complexity\n", + "\n", + "[Radon](https://radon.readthedocs.io/) \n", + "\n", + "> Cyclomatic Complexity corresponds to the number of decisions a block of code contains plus 1. This number (also called McCabe number) is equal to the number of linearly independent paths through the code. This number can be used as a guide when testing conditional logic in blocks." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Maintainability\n", + "\n", + "### Maintainability Index\n", + "\n", + "[Radon](https://radon.readthedocs.io/) \n", + "\n", + "> Maintainability Index is a software metric which measures how maintainable (easy to support and change) the source code is. The maintainability index is calculated as a factored formula consisting of SLOC (Source Lines Of Code), Cyclomatic Complexity and Halstead volume." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}