Skip to content

ablange/nix-data-mesh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nix-data-mesh

Audience

This project enables anyone with basic command line skills to be able to install and run data mesh platform components directly on their local machine using Docker.

Rationale

In theory, data mesh is a great foundational pattern for improving the quality, governance, and interoperability of a data platform. However, practical implementations are harder, possibly for a few reasons:

  • relatively new compared to data lake and data warehouse.
  • software dependency management is challenging and time-consuming
  • LLMs outshining self-serve data consumption

The nix-data-mesh project serves as a knowledge repository and functional prototype of various declarative, reusable templates intended for development of data products.

  • python: custom package, library, or API
  • duckdb: database server [Coming soon!]
  • airflow: pipeline orchestration server [Coming soon!]
  • jupyter: notebook server with data visualization capability [Coming soon!]

Each component is defined as a copier template and requires initialization before use. Components are intended to be used together to form data products. See Usage section for more details.

Features

Under the hood, nix-data-mesh has state-of-the-art DevOps tooling to ensure that anyone, on any operating system can quickly spin up a component with guardrails to keep them on track.

Prerequisites

MacOS

  1. Install core macOS developer utilities & switch default shell to bash.
xcode-select --install
chsh -s /bin/bash
  1. Install Homebrew package manager:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  1. Install pyenv and it's build dependencies.
brew install openssl readline sqlite3 xz zlib tcl-tk@8
brew install pyenv pyenv-virtualenv
  1. Remember to configure ~/.bashrc properly
# Homebrew
eval “$(/opt/homebrew/bin/brew shellenv)”

# Pyenv
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"

# Pyenv-virtualenv
eval "$(pyenv virtualenv-init -)"

# Bash command auto-complete
[[ -r "$HOMEBREW_REPOSITORY/etc/profile.d/bash_completion.sh" ]] && . "$HOMEBREW_REPOSITORY/etc/profile.d/bash_completion.sh"
  1. Install Docker Desktop (though all you really need is the docker service). https://docs.docker.com/desktop/setup/install/mac-install/
docker --version
  1. Install GitHub CLI and authenticate to your account.
brew install gh
gh auth
  1. Install Nix build system:
curl -L https://nixos.org/nix/install | sh

Ubuntu/Debian

  1. Install pyenv and it's build dependencies.
sudo apt update
sudo apt install build-essential libssl-dev zlib1g-dev \
	libbz2-dev libreadline-dev libsqlite3-dev curl git \
	libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
curl -fsSL https://pyenv.run | bash
git clone https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv
  1. Install Docker Desktop (though all you really need is the docker service). https://docs.docker.com/desktop/setup/install/linux/
docker --version
  1. Install GitHub CLI and authenticate to your account.
(type -p wget >/dev/null || (sudo apt update && sudo apt-get install wget -y)) \
	&& sudo mkdir -p -m 755 /etc/apt/keyrings \
        && out=$(mktemp) && wget -nv -O$out https://cli.github.com/packages/githubcli-archive-keyring.gpg \
        && cat $out | sudo tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
	&& sudo chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
	&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
	&& sudo apt update \
	&& sudo apt install gh -y
sudo apt update
sudo apt install gh
gh auth
  1. Install Nix build system:
curl -L https://nixos.org/nix/install | sh -s -- --daemon

Windows/WSL2

Windows/WSL2 has not been tested.

Installation

Clone nix-data-mesh to a local directory.

export DATA_MESH_HOME=~/repos/
mkdir -p $DATA_MESH_HOME
git clone <TODO_NEW_GITHUB_URL> $DATA_MESH_HOME/nix-data-mesh

Build nix-data-mesh and install dependencies.

make setup

Usage

python

Install a specific version of python, let's say 3.11.9.

make python_version v=3.11.9

Now build a new, isolated Python project environment called helloworld using a previously installed version (e.g., 3.11.9).

make python v=3.11.9 n=hello-world

It is important to note that each individual Python version installation MUST be installed OUTSIDE of the nix-data-mesh shell. This is because each operating system architecture handles dependencies slightly differently.

Once the questionnaire is finished, a new project is automatically created in $DATA_MESH_HOME directory with all sorts of bells and whistles.

Get started by changing into ~/repos/hello-world and initializing your development (i.e., pyenv-virtualenv) and runtime environment (i.e.. docker).

make python v=3.11.9 n=hello-world

airflow (Coming soon!)

postgresql (Coming soon!)

About

A practical data mesh reference implementation, powered by open-source.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published