This repository contains modular Terraform code to provision a full Valohai deployment on Microsoft Azure. It sets up virtual machines, networking, storage, Key Vaults, databases, and load balancing — all parameterized for secure and reusable infrastructure-as-code.
Note: This repository serves as a reference implementation. It is your responsibility to adapt the configuration to your organization’s cloud policies, network topology, and security requirements.
Name | Source | Purpose |
---|---|---|
Network |
./Module/Network |
Virtual Network & Subnets |
VM |
./Module/VM |
Valohai Compute Instances |
Postgres |
./Module/Postgres |
PostgreSQL Server for metadata |
Redis |
./Module/Redis |
In-memory data store |
Storage |
./Module/Storage |
Azure Blob Storage for artifacts |
LB |
./Module/LB |
Azure Load Balancer |
Name | Version |
---|---|
terraform |
>= 1.0 |
azurerm |
= 4.3.0 |
Ensure you have Terraform installed and authenticated with Azure CLI (az login
).
An app registration in your Azure AD to allow Valohai programmatic access to your resource group or subscription. This will allow Valohai to create and delete virtual machines that are used for your machine learning jobs. The scope can be limited only to this resource group or subscription.
This can be done at the App Registration management panel:
- Click New registration.
- Any name for the application will do – “Valohai” is a good choice.
- The “Supported Account Type” option should be left at “Accounts in this organizational directory only (Your Organization Name Here)”.
- The Redirect URI can be left empty.
Once the App Registration is created, take note of the Application (client) and Directory (tenant) ID values displayed.
Then navigate to the new app registration and select “Certificates & Secrets”, then “New client secret”.
- Any Description will do – “Valohai Secret,” for instance, is fine.
- The Expiry time should preferably be “12 months” or according to your company policy. Make a note of the expiry time as you'll have to share it with your Valohai contact.
Once the Secret is created, copy the value from the table and make a note of it – this is the only time you’ll be able to see it.
Once the App Registration has been created, you will need to grant it access to manage resources.
- Navigate to your resource group (or subscription)
- Take a note of the subscription ID.
Now select “Access Control (IAM)”. We'll need to create a new role ValohaiMasterRole:
- Open the Roles tab.
- Click Add custom role.
- Give the role the name ValohaiMasterRole.
- Open the Assignable scopes tab. Make sure you've selected the correct resource group(s).
- Open the JSON tab and replace the permissions section with the permissions from below.
- Save your changes.
"permissions": [
{
"actions": [
"Microsoft.Resources/deployments/validate/action",
"Microsoft.Resources/deployments/write",
"Microsoft.Resources/deployments/operationStatuses/read",
"Microsoft.Network/virtualNetworks/subnets/read",
"Microsoft.Network/networkSecurityGroups/read",
"Microsoft.Network/networkSecurityGroups/join/action",
"Microsoft.Network/networkSecurityGroups/write",
"Microsoft.Network/publicIPAddresses/write",
"Microsoft.Network/publicIPAddresses/read",
"Microsoft.Network/publicIPAddresses/delete",
"Microsoft.Network/publicIPAddresses/join/action",
"Microsoft.Network/networkInterfaces/read",
"Microsoft.Network/networkInterfaces/write",
"Microsoft.Network/networkInterfaces/join/action",
"Microsoft.Network/networkInterfaces/delete",
"Microsoft.Network/networkInterfaces/effectiveRouteTable/action",
"Microsoft.Network/networkInterfaces/effectiveNetworkSecurityGroups/action",
"Microsoft.Network/networkInterfaces/UpdateParentNicAttachmentOnElasticNic/action",
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/virtualNetworks/subnets/virtualMachines/read",
"Microsoft.Network/networkSecurityGroups/securityRules/write",
"Microsoft.Network/networkSecurityGroups/securityRules/read",
"Microsoft.Network/networkSecurityGroups/securityRules/delete"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
Next, we'll assign the role to our service principal.
- On the IAM page, click Add role assignment.
- Search for the ValohaiMasterRole and click next.
- Make sure "User, group, or service principal" is selected and click Select members. Then search for the service principal by writing its name.
- Click Review and assign and save your changes.
Next create a Key Vault where you'll store the Client ID
, Client Secret
and a SSH Key that'll be used for the workers that are programatically launched by Valohai.
The naming follows this structure:
- scalie-
name of scalie scope config
-client-id - scalie-
name of scalie scope config
-secret - scalie-
name of scalie scope config
-ssh-key
So for example if you had:
# List of scopes you want to configure
scalie_scope_configs = {
eu = {
tenant_id = ""
subscription_id = ""
location = ""
resource_group_name = ""
admin_username = "ubuntu"
identities = []
}
}
Then your keys would be like scalie-eu-client-id
and scalie-eu-secret
etc.
You can create the secrets with:
az keyvault secret set --vault-name valohai-scalie-kv --name scalie-eu-client-id --value "clientidstring"
az keyvault secret set --vault-name valohai-scalie-kv --name scalie-eu-secret --value "randomsecurestring"
az keyvault secret set --vault-name valohai-scalie-kv --name scalie-eu-ssh-key --file key.pem
Input the details of your Key Vault in variables.tfvars
:
scalie_keyvault_name = ""
scalie_keyvault_rg = ""
Name | Version |
---|---|
azurerm |
4.3.0 |
Create a file named terraform.tfvars
edit it. At least with:
Adjust:
- Subscription ID
- Resource group
- Region
- SSH public key path (you'll need to generate the key locally)
- Valohai image reference
- Storage & container names
- IP allowlists
terraform init
terraform plan -var-file="terraform.tfvars"
terraform apply -var-file="terraform.tfvars"
This setup provisions an Azure Key Vault to manage secrets (e.g., DB passwords, Redis keys, JWT secrets). Ensure required secrets are created manually or passed from modules.
Variable | Description | Type | Required | Default |
---|---|---|---|---|
subscription_id |
Azure Subscription ID | string |
✅ | n/a |
resource_group |
Name of resource group | string |
❌ | "valohai-rg" |
azure_region |
Azure Region | string |
❌ | "eastus" |
valohai_image |
Valohai VM image reference | string |
✅ | n/a |
vm_public_key |
Path to SSH public key for access | string |
❌ | ".valohai.pub" |
prefix |
Prefix for resource names | string |
❌ | "" |
environment_name |
Logical environment name (e.g., dev, prod) | string |
❌ | "My Valohai Org" |
domain |
Domain for service access | string |
❌ | "" |
organization |
Valohai organization name | string |
❌ | "MyOrg" |
address_space |
CIDR block for the VNet | list(string) |
❌ | ["10.0.0.0/16"] |
subnet_address_prefixes |
List of subnet CIDRs | list(string) |
❌ | ["10.0.1.0/24", ...] |
ip_rules |
IPs allowed to access storage | list(string) |
❌ | [""] |
storage_account_name |
Azure Storage account name | string |
❌ | "valohaidata" |
container_name |
Azure Blob container name | string |
❌ | "valohaidata" |
scalie_keyvault_name |
Name of Key Vault containing Scalie secrets | string |
❌ | "" |
scalie_keyvault_rg |
Resource group where Scalie Key Vault is located | string |
❌ | "" |
scalie_scope_configs |
Scope-specific tenant & identity configuration | map(object) |
✅ | See variables.tf |
Note: No outputs are currently defined.
-
Make sure you're storing the Terraform state in a secure backend like Azure Storage with encryption and access policies enabled.
-
This configuration currently uses
"*"
in Network Security Groups to allow SSH (port 22) and app access (port 8000). This assumes the VM is deployed in a private subnet and SSH is not exposed. You should still review and restrict all network access according to your organization’s security policies.
Use the following to validate your configuration before applying:
terraform fmt -recursive
terraform validate
tflint
For automation, consider using pre-commit
with Terraform hooks.
- Tested with Terraform 1.4+ and AzureRM 4.3.0.