Skip to content

valohai/valohai-self-hosted-azure-tf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Valohai Azure Terraform Deployment

This repository contains modular Terraform code to provision a full Valohai deployment on Microsoft Azure. It sets up virtual machines, networking, storage, Key Vaults, databases, and load balancing — all parameterized for secure and reusable infrastructure-as-code.

Note: This repository serves as a reference implementation. It is your responsibility to adapt the configuration to your organization’s cloud policies, network topology, and security requirements.


Modules

Name Source Purpose
Network ./Module/Network Virtual Network & Subnets
VM ./Module/VM Valohai Compute Instances
Postgres ./Module/Postgres PostgreSQL Server for metadata
Redis ./Module/Redis In-memory data store
Storage ./Module/Storage Azure Blob Storage for artifacts
LB ./Module/LB Azure Load Balancer

Requirements

Name Version
terraform >= 1.0
azurerm = 4.3.0

Ensure you have Terraform installed and authenticated with Azure CLI (az login).

App Registration

An app registration in your Azure AD to allow Valohai programmatic access to your resource group or subscription. This will allow Valohai to create and delete virtual machines that are used for your machine learning jobs. The scope can be limited only to this resource group or subscription.

This can be done at the App Registration management panel:

  1. Click New registration.
  2. Any name for the application will do – “Valohai” is a good choice.
  3. The “Supported Account Type” option should be left at “Accounts in this organizational directory only (Your Organization Name Here)”.
  4. The Redirect URI can be left empty.

Once the App Registration is created, take note of the Application (client) and Directory (tenant) ID values displayed.

Then navigate to the new app registration and select “Certificates & Secrets”, then “New client secret”.

  1. Any Description will do – “Valohai Secret,” for instance, is fine.
  2. The Expiry time should preferably be “12 months” or according to your company policy. Make a note of the expiry time as you'll have to share it with your Valohai contact.

Once the Secret is created, copy the value from the table and make a note of it – this is the only time you’ll be able to see it.

Permissions

Once the App Registration has been created, you will need to grant it access to manage resources.

  • Navigate to your resource group (or subscription)
  • Take a note of the subscription ID.

Now select “Access Control (IAM)”. We'll need to create a new role ValohaiMasterRole:

  1. Open the Roles tab.
  2. Click Add custom role.
  3. Give the role the name ValohaiMasterRole.
  4. Open the Assignable scopes tab. Make sure you've selected the correct resource group(s).
  5. Open the JSON tab and replace the permissions section with the permissions from below.
  6. Save your changes.
"permissions": [
    {
        "actions": [
            "Microsoft.Resources/deployments/validate/action",
            "Microsoft.Resources/deployments/write",
            "Microsoft.Resources/deployments/operationStatuses/read",
            "Microsoft.Network/virtualNetworks/subnets/read",
            "Microsoft.Network/networkSecurityGroups/read",
            "Microsoft.Network/networkSecurityGroups/join/action",
            "Microsoft.Network/networkSecurityGroups/write",
            "Microsoft.Network/publicIPAddresses/write",
            "Microsoft.Network/publicIPAddresses/read",
            "Microsoft.Network/publicIPAddresses/delete",
            "Microsoft.Network/publicIPAddresses/join/action",
            "Microsoft.Network/networkInterfaces/read",
            "Microsoft.Network/networkInterfaces/write",
            "Microsoft.Network/networkInterfaces/join/action",
            "Microsoft.Network/networkInterfaces/delete",
            "Microsoft.Network/networkInterfaces/effectiveRouteTable/action",
            "Microsoft.Network/networkInterfaces/effectiveNetworkSecurityGroups/action",
            "Microsoft.Network/networkInterfaces/UpdateParentNicAttachmentOnElasticNic/action",
            "Microsoft.Network/virtualNetworks/subnets/join/action",
            "Microsoft.Network/virtualNetworks/subnets/virtualMachines/read",
            "Microsoft.Network/networkSecurityGroups/securityRules/write",
            "Microsoft.Network/networkSecurityGroups/securityRules/read",
            "Microsoft.Network/networkSecurityGroups/securityRules/delete"
        ],
        "notActions": [],
        "dataActions": [],
        "notDataActions": []
    }
]

Next, we'll assign the role to our service principal.

  • On the IAM page, click Add role assignment.
  • Search for the ValohaiMasterRole and click next.
  • Make sure "User, group, or service principal" is selected and click Select members. Then search for the service principal by writing its name.
  • Click Review and assign and save your changes.

Key Vault

Next create a Key Vault where you'll store the Client ID, Client Secret and a SSH Key that'll be used for the workers that are programatically launched by Valohai.

The naming follows this structure:

  • scalie-name of scalie scope config-client-id
  • scalie-name of scalie scope config-secret
  • scalie-name of scalie scope config-ssh-key

So for example if you had:

# List of scopes you want to configure
scalie_scope_configs = {
  eu = {
    tenant_id           = ""
    subscription_id     = ""
    location            = ""
    resource_group_name = ""
    admin_username      = "ubuntu"
    identities = []
  }
}

Then your keys would be like scalie-eu-client-id and scalie-eu-secret etc.

You can create the secrets with:

az keyvault secret set --vault-name valohai-scalie-kv --name scalie-eu-client-id --value "clientidstring"
az keyvault secret set --vault-name valohai-scalie-kv --name scalie-eu-secret --value "randomsecurestring"
az keyvault secret set --vault-name valohai-scalie-kv --name scalie-eu-ssh-key --file key.pem

Input the details of your Key Vault in variables.tfvars:

scalie_keyvault_name = ""
scalie_keyvault_rg   = ""

Providers

Name Version
azurerm 4.3.0

Quickstart

1. Customize variables

Create a file named terraform.tfvars edit it. At least with:

Adjust:

  • Subscription ID
  • Resource group
  • Region
  • SSH public key path (you'll need to generate the key locally)
  • Valohai image reference
  • Storage & container names
  • IP allowlists

2. Initialize

terraform init

3. Preview Plan

terraform plan -var-file="terraform.tfvars"

4. Apply Changes

terraform apply -var-file="terraform.tfvars"

Secrets & Key Vault

This setup provisions an Azure Key Vault to manage secrets (e.g., DB passwords, Redis keys, JWT secrets). Ensure required secrets are created manually or passed from modules.


Inputs

Variable Description Type Required Default
subscription_id Azure Subscription ID string n/a
resource_group Name of resource group string "valohai-rg"
azure_region Azure Region string "eastus"
valohai_image Valohai VM image reference string n/a
vm_public_key Path to SSH public key for access string ".valohai.pub"
prefix Prefix for resource names string ""
environment_name Logical environment name (e.g., dev, prod) string "My Valohai Org"
domain Domain for service access string ""
organization Valohai organization name string "MyOrg"
address_space CIDR block for the VNet list(string) ["10.0.0.0/16"]
subnet_address_prefixes List of subnet CIDRs list(string) ["10.0.1.0/24", ...]
ip_rules IPs allowed to access storage list(string) [""]
storage_account_name Azure Storage account name string "valohaidata"
container_name Azure Blob container name string "valohaidata"
scalie_keyvault_name Name of Key Vault containing Scalie secrets string ""
scalie_keyvault_rg Resource group where Scalie Key Vault is located string ""
scalie_scope_configs Scope-specific tenant & identity configuration map(object) See variables.tf

Outputs

Note: No outputs are currently defined.


Security Considerations

  • Make sure you're storing the Terraform state in a secure backend like Azure Storage with encryption and access policies enabled.

  • This configuration currently uses "*" in Network Security Groups to allow SSH (port 22) and app access (port 8000). This assumes the VM is deployed in a private subnet and SSH is not exposed. You should still review and restrict all network access according to your organization’s security policies.


Validation & Linting

Use the following to validate your configuration before applying:

terraform fmt -recursive
terraform validate
tflint

For automation, consider using pre-commit with Terraform hooks.


Additional Notes

  • Tested with Terraform 1.4+ and AzureRM 4.3.0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published