Kubox is an on-demand data platform designed to build and deploy analytics applications anywhere. It combines open-source Kubernetes with a customisable data infrastructure, making it easy to scale and manage complex data workloads. Kubox offers the simplicity of SaaS with the flexibility of PaaS, minimising overhead while providing a vendor-neutral data infrastructure.
https://docs.kubox.ai/introduction
Tip
Kubox is currently in its early-stage public preview and under active development. We’re continuously improving and refining the platform, so things may change as we grow. We welcome your feedback and suggestions to help shape the future of Kubox AI.
- Introduction
- Installation and Setup
- Finding Urban Hotspots for 21 Major Urban Areas of Australia
- Local Development
- License
In this example, we use Kubox to provision a cluster in AWS cloud and leverage open source tools like Dask and Dagster to process satellite images. You can read more information about at Urban growth hotspots across Australia for $15
To download, configure and setup authentication with AWS CLI follow instructions from AWS Documentation
Run the following command to verify your AWS CLI credentials:
aws sts get-caller-identity
# Example output
{
"UserId": "AIDAIEXAMPLEID",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/example-user"
}
Download and install Kubox CLI
curl https://kubox.sh | sh
Verify Installation
kubox version
The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can download it from Kubernetes.io.
git clone git@github.com:kubox-ai/urban-extent.git
cd urban-extent
This example requires access to ap-southeast-2
to avoid egress costs.
Run the following command to check if ap-southeast-2
is available regions:
aws ec2 describe-availability-zones --region ap-southeast-2
Create an AWS IAM role to dynamically create EBS volume to run PostgreSQL. See Kubox Advanced Configuration Documentation, Role for Kubox EC2 Instances for setup command in details.
AWS Quotas for one of m5.4xlarge
and c5.2xlarge
. and three of m5.12xlarge
.
Create an AWS S3 Bucket to store the output files:
aws s3api create-bucket --bucket my-unique-bucket-name --region ap-southeast-2 --create-bucket-configuration LocationConstraint=ap-southeast-2
Use your 12-digit account ID to create a policy for the bucket:
aws sts get-caller-identity
AWS S3 Bucket Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DelegateS3Access",
"Effect": "Allow",
"Principal": {
"AWS": ["arn:aws:iam::AWS_ACCOUNT_ID:role/KuboxEC2InstanceRole"]
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::my-unique-bucket-name/*",
"arn:aws:s3:::my-unique-bucket-name"
]
}
]
}
Apply the policy to the bucket
aws s3api put-bucket-policy --bucket my-unique-bucket-name --policy file://s3-policy.json
Modify the following files in the repository with the AWS S3 bucket name:
cluster/infrastructure/apps/urban-extent/dagster-configmap.yaml
pipeline/.env.example
kubox create -f cluster.yaml
kubectl get pods -n kubox
Connect to the Dask Scheduler
kubectl port-forward service/dask-scheduler 8786:80 -n kubox
Connect to Dagster Web UI
kubectl port-forward service/nginx 8080:80 -n kubox
Open the browser and navigate to http://dagster.localhost:8080
Get Jupyter Notebook token
make get-notebook-token
Connect to juputer notebook
kubectl port-forward service/dask-jupyter 8888:80 -n kubox
Connect to Dagster
kubox delete -f cluster.yaml
- UV - An extremely fast Python package and project manage
- GDAL - Open source library for reading and writing raster data
- npm - Node Package Manager
- yarn - Fast, reliable, and secure dependency management
Create a python virtual environment:
uv sync
This will create .venv
and install all the required packages.
To activate the virtual environment, run:
source .venv/bin/activate
To run the pipeline locally, run:
dagster dev
To add a new dependency, run:
uv add <package-name>
To update the dependencies, run:
uv sync -U
To remove a dependency, run:
uv remove <package-name>
To export to requirements.txt
, run:
make export
Install Yarn
npm install --global yarn
Install Project
yarn install
yarn build
First, run the development server:
yarn dev
Open http://localhost:3000 with your browser to see the result.
This repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute this project under the terms of the license. See the LICENSE file for more details.