Skip to content

kubox-ai/urban-extent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Urban growth hotspots across Australia for $15

Dask Dagster AWS Kubernetes

Kubox

Kubox is an on-demand data platform designed to build and deploy analytics applications anywhere. It combines open-source Kubernetes with a customisable data infrastructure, making it easy to scale and manage complex data workloads. Kubox offers the simplicity of SaaS with the flexibility of PaaS, minimising overhead while providing a vendor-neutral data infrastructure.

https://docs.kubox.ai/introduction

Australian Urban Hotspots

Tip

Kubox is currently in its early-stage public preview and under active development. We’re continuously improving and refining the platform, so things may change as we grow. We welcome your feedback and suggestions to help shape the future of Kubox AI.

Introduction

In this example, we use Kubox to provision a cluster in AWS cloud and leverage open source tools like Dask and Dagster to process satellite images. You can read more information about at Urban growth hotspots across Australia for $15

Create Kubox Cluster

Installation and Setup

Install AWS CLI

To download, configure and setup authentication with AWS CLI follow instructions from AWS Documentation

Run the following command to verify your AWS CLI credentials:

aws sts get-caller-identity
# Example output
{
    "UserId": "AIDAIEXAMPLEID",
    "Account": "123456789012",
    "Arn": "arn:aws:iam::123456789012:user/example-user"
}

Install Kubox CLI

Download and install Kubox CLI

curl https://kubox.sh | sh

Verify Installation

kubox version

Install kubectl

The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can download it from Kubernetes.io.

Clone Repository

git clone git@github.com:kubox-ai/urban-extent.git
cd urban-extent

AWS Cloud Configuration

This example requires access to ap-southeast-2 to avoid egress costs.

Run the following command to check if ap-southeast-2 is available regions:

aws ec2 describe-availability-zones --region ap-southeast-2

AWS IAM Role

Create an AWS IAM role to dynamically create EBS volume to run PostgreSQL. See Kubox Advanced Configuration Documentation, Role for Kubox EC2 Instances for setup command in details.

Check AWS Quotas for EC2 Instances

AWS Quotas for one of m5.4xlarge and c5.2xlarge. and three of m5.12xlarge.

AWS S3 Bucket

Create an AWS S3 Bucket to store the output files:

aws s3api create-bucket --bucket my-unique-bucket-name --region ap-southeast-2 --create-bucket-configuration LocationConstraint=ap-southeast-2

Use your 12-digit account ID to create a policy for the bucket:

aws sts get-caller-identity

AWS S3 Bucket Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DelegateS3Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": ["arn:aws:iam::AWS_ACCOUNT_ID:role/KuboxEC2InstanceRole"]
      },
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::my-unique-bucket-name/*",
        "arn:aws:s3:::my-unique-bucket-name"
      ]
    }
  ]
}

Apply the policy to the bucket

aws s3api put-bucket-policy --bucket my-unique-bucket-name --policy file://s3-policy.json

Modify the following files in the repository with the AWS S3 bucket name:

  • cluster/infrastructure/apps/urban-extent/dagster-configmap.yaml
  • pipeline/.env.example

Finding Urban Hotspots for 21 Major Urban Areas of Australia

Create Cluster

kubox create -f cluster.yaml

Create Kubox Cluster

Verifying Kubox Cluster

kubectl get pods -n kubox

Connecting to Services

Connect to the Dask Scheduler

kubectl port-forward service/dask-scheduler 8786:80 -n kubox

Connect to Dagster Web UI

kubectl port-forward service/nginx 8080:80 -n kubox

Open the browser and navigate to http://dagster.localhost:8080

In-Cluster Exploratory Data Analysis (EDA)

Get Jupyter Notebook token

make get-notebook-token

Connect to juputer notebook

kubectl port-forward service/dask-jupyter 8888:80 -n kubox

Running the Pipeline

Connect to Dagster

Tear Down

kubox delete -f cluster.yaml

Local Development

Software Pre-requisites

  • UV - An extremely fast Python package and project manage
  • GDAL - Open source library for reading and writing raster data
  • npm - Node Package Manager
  • yarn - Fast, reliable, and secure dependency management

Developing Dagster Pipeline

Create a python virtual environment:

uv sync

This will create .venv and install all the required packages.

To activate the virtual environment, run:

source .venv/bin/activate

To run the pipeline locally, run:

dagster dev

Manage Packages

To add a new dependency, run:

uv add <package-name>

To update the dependencies, run:

uv sync -U

To remove a dependency, run:

uv remove <package-name>

To export to requirements.txt, run:

make export

Developing Graphical User Interface

Install Yarn

npm install --global yarn

Install Project

yarn install
yarn build

First, run the development server:

yarn dev

Open http://localhost:3000 with your browser to see the result.

License

This repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute this project under the terms of the license. See the LICENSE file for more details.

About

Tracking urban growth by comparing satellite images over time and visualising changes on a map.

Topics

Resources

License

Stars

Watchers

Forks

Packages