Contents
Base Infra |
The base Infrastructure which needs to be created before deploying product Infrastructure |
Product Infra |
The Kubernetes infrastructure required for respective product |
Welcome to Kubernetes Infrastructure deployment guide for PWF IE 7.1. This document provides an overview of the flow of activities and overall architecture for installation of PWF IE 7.1
The main activities involved in this Kubernetes Infrastructure
deployment guide is as below.
Base Infrastructure Installation in Azure Cloud
Installation and Configuration of Kubernetes Infrastructure for PWF Invoice Extraction
To get a deeper understanding about the installation read further about SmartOps 7.1 architecture.
PWF IE 7.1 is a distributed cloud application. It can be installed only in cloud infrastructure. PWF IE 7.1 is deployed and managed in Kubernetes. Even though it is there in our roadmap to be cloud agnostic, at this point of time the installation
is compatible and tested only on Microsoft Azure Cloud Infrastructure. Support for more cloud will follow in subsequent releases.
The below diagram provides an overview
The installation is done on a single virtual network with multiple Subnets.
Each Subnets will be self-isolated with NSG rules and only required access granted. Each PWF and Platform will reside in its own Subnet.
Additionally, there will be a maintenance subnet where a bastion is hosted for administrative activities.
In certain situations, clients of SmartOps PWFs will have to get TCP connectivity to assets in SmartOps. For e.g. Posting data to a SmartOps Queue. In such cases as the diagram shows a VPN connectivity is established from client network to SmartOps Network. The connectivity is terminated at a DMZ subnet where a squid machine will isolate client network from SmartOps application subnets, for enhanced security.
Prerequisites |
|
Azure CLI |
Azure cli needs to be installed in Bastion VM if the infrastructure deployment scripts are executed from Bastion VM |
Resource Group |
An Azure resource group should be created before executing the infrastructure deployment scripts and needs to be passed as a parameter to the infrastructure deployment script. |
Service Principal |
Azure Service Principal client-id and client-secret is required to be passed as parameters to the infrastructure deployment script |
Azure MYSQL Password |
A strong password needs to be generated for Azure MYSQL and needs to be passed as a parameter to product infra deployment script. |
Az Login |
Before Downloading the package, we have to Az login using Azure AD account |
The Release packages are stored in SharePoint location and in azure artifacts. Please follow below steps for downloading the Azure Infrastructure installation scripts to Azure cloud shell.
Open Azure cloud shell
az login in cloud shell
The package downloaded from SharePoint has to be made available in Azure cloud shell
SharePoint location
After downloading the package in cloud shell scripts for deploying
After package is downloaded, the scripts for installing base and product infra will be available in
<package_folder_name>/azure-setup/scripts folder
After package is downloaded, the scripts for installing base and product infra will be available in
<package_folder_name>/azure-setup/arm-templates folder.
./deploy_base_infra.sh <subscription_id> <resource_group> <environment> <base_ip_seg> <sp_client_id> <tenant_id> <sp_client_secret> <smartops_domain_name> |
Parameters
Parameter |
Description |
subscription_id |
Azure Subscription ID where base infra needs to be deployed |
resource_group |
Azure Resource Group where base infra needs to be deployed |
environment |
Environment Name. Restricted to maximum 5 characters. E.g.: dev02, stg01 |
base_ip_seg |
The base IP segment of the environment’s IP Range |
smartops_domain_name |
Can be left blank. DNS creation will be taken care by the ARM template with the environment name. |
sp_client_id |
Azure Subscription Service Principal client ID |
tenant_id |
Azure subscription Tenant ID |
sp_client_secret |
Azure subscription client secret |
Select Deployments in respective resource group and look for ‘basedeployment’ to check the status of base infrastructure deployment.
Once the package is downloaded, cd into <package_folder_name>/azure-setup/scripts folder.
Execute the shell script deploy_product_infra.sh in Azure cloud shell by passing required parameters
./deploy_product_infra.sh <subscription_id> <resource_group> <environment> <product> <aksNodeImageVersion> <sp_client_id> <tenant_id> <sp_client_secret> <mysql_admin_password>
|
Parameters
Parameter |
Description |
subscription_id |
Azure Subscription ID where base infra needs to be deployed |
resource_group |
Azure Resource Group where base infra needs to be deployed |
product |
Name of PWF or product. E.g.: invoiceextv1 |
aksNodeImageVersion |
OS Image version of K8s cluster. Recommended to be set latest. |
sp_client_id |
Azure Subscription Service Principal client ID |
tenant_id |
Azure subscription Tenant ID |
sp_client_secret |
Azure subscription client secret |
mysql_admin_password |
Admin password for the Azure MySQL instance |
Select Deployments in respective resource group and look for ‘basedeployment’ to check the status of base infrastructure deployment.
Data at rest is encrypted using encrypting Azure Disks using Disk encryption sets. The Data stored in Azure resources like Azure MySQL and Azure storage account is encrypted using keys stored in Azure Key Vault.
Reference: https://docs.microsoft.com/en-us/azure/aks/azure-disk-customer-managed-keys
# Create a DiskEncryptionSet
# key vault name, rg, etc needs to be changed accordingly
#key key-smartops-k8s-disk-enc-001 ( key name given as an example ) needs to be created
in Azure key Vault before creating the Disk Encryption Set
keyVaultId=$(az keyvault show --name kv-engg-resrch-001 --query [id] -o tsv)
keyVaultKeyUrl=$(az keyvault key show --vault-name kv-engg-resrch-001 --name key-smartops-k8s-disk-enc-001 --query [key.kid] -o tsv)
az disk-encryption-set create -n smartops-k8s-des-001 -l eastus -g rg-smartopsengg-dev-001 --source-vault $keyVaultId --key-url $keyVaultKeyUrl
Azure cloud shell:
Ensure Get, Wrap and Unwrap permission is set for the disk encryption set to the key created in Az key vault.
Please refer des-platform-qa01 in above pic
IMPORTANT: After creating the disk encryption set, select the disk encryption set and click on allow access to disk encryption key created in the key vault. PFB pic
K8s storage class
#currently kept as a part of env-setup template. Can be changed as required
#diskEncryptionSetID values needs to be changed accordingly ( subscriptions, resourceGroups, diskEncryptionSets)
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: pvc-ade-custom-storage-class
provisioner: kubernetes.io/azure-disk
parameters:
kind: Managed
skuname: Premium_LRS
diskEncryptionSetID: "/subscriptions/dfaa090f-c407-4e75-ac08-143cb932bdcf/resourceGroups/rg-smartopsengg-dev-001/providers/Microsoft.Compute/diskEncryptionSets/smartops-k8s-des-001"
After deploying storage class, respective changes need to be made in statefulset’s pvcs referring to above custom storage class.
References:
https://docs.microsoft.com/en-us/azure/mysql/howto-data-encryption-portal
https://docs.microsoft.com/en-us/azure/mysql/concepts-data-encryption-mysql
Key Encryption Key [ KEK ]
Data Encryption Key [ DEK ]
Symmetric key used to encrypt a block of data
When you configure data encryption with a customer-managed key in Key Vault, continuous access to this key is required for the server to stay online. If the server loses access to the customer-managed key in Key Vault, the server begins denying all
connections within 10 minutes. The server issues a corresponding error message and changes the server state to Inaccessible. Some of the reason why the server can reach this state are:
Limitations
Steps
Errors Observed while configuring
If soft –delete is not enabled for keyvault , will get error like below
Steps
Select the key vault and key by clicking ‘Select a key vault and key’
After successful deployments of base and product infrastructure using the shared ARM templates in the package, below validations needs to be performed before proceeding with the Application deployment.
#1 |
Az MYSQL Firewall policies |
#2 |
Key vault permissions for the Azure AD user |
#3 |
Azure service principal needs Get permission to Key Vault secrets. |
#4 |
Disk Encryption set, storage account, Azure MySQL instance needs GET, Wrap and Unwrap Key permissions. |
#5 |
User’s access can be enabled by Adding users with required set of permission to access key, secret and certificate which is listed under USER section |
#6 |
Private Endpoint’s IP associations |
#7 |
Node pools’ zone redundancy |
Max connection value for Azure MySQL is 300 by default. This needs to be set to 500.
Disk Encryption Set, Storage Account and Azure MySQL instance needs Get, Wrap and Unwrap key permissions.
Please refer below example for disk encryption set.
User’s access can be enabled by Adding users with required set of permission to access key, secret and certificate which is listed under USER section .
Please refer below screenshot.
Key Vault Private Endpoint and its Private IP
Above Key Vault’s Private IP associated with respective Private Link of Key Vault
Update Private IP in Private Link
After deploying all Azure resources for base and product infrastructure and before triggering the application deployment script, validate connectivity from Bastion VM to Invoice Extraction Kubernetes cluster.
Please refer Appendix
Please use below attached script after substituting with required names for vnet , resource group etc.
Sl No |
Product / PWF |
Asset Name |
SKU / Tier |
Count |
Max Count (where applicable) |
Comments |
1 |
Invoice Extraction |
K8s Cluster |
Private Cluster |
1 |
|
Availability Zones have to be enabled and VMs distributed in various zones per node pool |
2 |
Invoice Extraction |
Dev Node Pool VMs |
Standard_D8s_v3 |
3 |
4 |
|
3 |
Invoice Extraction |
Persist Node Pool VMs |
Standard_D4s_v3 |
3 |
4 |
|
4 |
Invoice Extraction |
GPU Node Pool VMs |
Standard_NC6_Promo |
2 |
4 |
|
5 |
Invoice Extraction |
OS Disks |
Premium 128GB |
12 |
|
|
6 |
Invoice Extraction |
DataDisks - RabbitMQ |
Premium 4GB |
3 |
|
|
7 |
Invoice Extraction |
DataDisks - MongoDB |
Premium 32GB |
3 |
|
|
8 |
Invoice Extraction |
DataDisks - ElasticSearch Log |
Premium 128GB |
3 |
|
|
9 |
Invoice Extraction |
DataDisks - ElasticSearch App |
Premium 32GB |
3 |
|
|
10 |
Invoice Extraction |
DataDisks - Prometheus |
Premium 64 GB |
2 |
|
|
11 |
Invoice Extraction |
Azure MySQL |
General Purpose, 2 vCore(s), 200 GB(Auto Grow) |
1 |
|
|
12 |
Invoice Extraction |
Storage Accounts |
General Purpose v2, ZRS, Hot Tier, Encryption enabled |
2 |
|
|
13 |
Invoice Extraction |
KeyVault |
Standard |
1 |
|
|
14 |
Invoice Extraction |
Disk Encryption Set |
Customer-managed key |
1 |
|
|
15 |
Common |
FTP Server |
Standard B2s (2 vcpus, 4 GiB memory) |
1 |
|
FTP server is used by all products in an environment. Hence kept as common. In Prod this will be per customer |
Resources |
Description |
Application Gateway |
Layer 7 load balancer which manages traffic to applications deployed in Kubernetes cluster |
Network Interface |
Network Interface created which is associated with the Bastion VM |
Network Security Group |
The network security rules associated with each Product ( SmartOps Platform , ITOps , IE) and maintenance nsg. Bastion VM uses maintenance NSG to access the Kubernetes cluster. |
Public IP |
Public IP associated with Application Gateway and Bastion VM |
Private DNZ zones |
we have 4 private endpoints per environment. For these 4 private endpoints, we have 3 private DNZ zones. One private zone per Azure blob, Azure Key Vault and Azure MySQL instances. |
VM |
Bastion VM which can be used to access the Kubernetes cluster. |
VNET |
SmartOps virtual network where all applications are deployed. |
VNET and Subnet |
Address Space |
Description |
smartops_vnet_address_space |
17.1.0.0/16 |
IP range for SmartOps VNET |
platform_snet_address_space |
17.1.0.0/20 |
Address space of SmartOps Platform subnet |
invoiceext_snet_address_space |
17.1.16.0/20 |
Address space of PWF Invoice Extraction subnet |
itops_snet_address_space |
17.1.32.0/20 |
Address space of ITOps subnet |
maintenance_snet_address_space |
17.1.48.0/24 |
Address space of Maintenance subnet |
Resources |
Description |
Azure MySQL |
Managed MySQL instance in Azure Cloud Infrastructure |
Azure Redis |
Azure Redis cache |
Kubernetes Service |
Azure Kubernetes Service for Invoice Extraction |
Azure Key Vault |
Azure Key vault instance where all Kubernetes secrets are configured |
Private endpoints |
Azure private endpoints for Azure Blob, Azure MySQL and Azure Key Vault. Private endpoint enables secure connection to Azure resources. |
Network Interface |
Network Interface associated with each private endpoint. |
Storage Accounts |
Azure storage account for backup file store and app file store of Invoice Extraction |
Node Pool |
Node Size |
AutoScale Range |
OS Disk Size |
devagentpool |
Standard_D8s_v3 |
2 to 3 |
128 GiB |
gpupool |
Standard_NC6_Promo |
1 to 3 |
128 GiB |
persistpool |
Standard_D4s_v3 |
3 to 4 |
128 GiB |
Other than base and product infra, we have the requirement of two isolated VMs.
SmartOps DE team manages both the above VMs in maintenance subnet. At times when the client requirement comes in, then the required VMs must be created in client subnet.
Data Archival - SharePoint Location
Restore Checklist – SharePoint Location
Connect to cluster (Kube config configured)
Please find below screenshots of the commands for reference on how to connect the Kubernetes cluster from Bastion VM. These commands needs to be executed from the Bastion VM.
#1 |
Private Endpoint IP of Azure MySQL, Storage accounts and key vault not associated with respective private link. |
#2 |
Azure ARM template deployment failures with status ‘Operation Timed Out’. This can be resolved by redeploying the template. |
#3 |
Azure ARM template deployment with status ‘CONFLICT’. This can be resolved by redeploying the template. |
#4 |
File copying issues to Storage account blob containers because of firewall policy restricted to ‘all networks’ |
#5 |
Kubernetes cluster not able to access key vault because of not setting correct access policies |
#6 |
Disk Encryption set not able to get the keys because of not having the Get Wrap and Unwrap permission to key stored in Azure Key Vault. |
#7 |
Deployment Engineer not able to access Azure Key vault because of not having required access policies set in Azure Key Vault. |
#8 |
Bad Request Status for Storage Account Resources. |
Please refer Private Endpoint’s IP Associations
This can occur intermittently because of latency or interruptions with Azure APIs. There are no specific fixes to be applied by Deployment engineer than executing the deployment script again.
‘CONFLICT’ during deployments are observed only for NSGs & Subnets. There are no specific fixes to be applied by Deployment engineer than executing the deployment script again.
Please refer Azure service principal needs Get permission to Key Vault secrets.
Please refer Disk Encryption set, storage account, Azure MySQL instance’s Key permissions.
Please refer User’s access to Azure Key Vault instance
Issue
Fix
Product |
Container name |
Missing probes |
Comments |
PWF IE |
du_pipeline |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
du_scheduler |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
du_tilt_correct |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
du_invoice_split |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
du_vespa |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
|
|
|
|
Platform |
du_pipeline |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
du_scheduler |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
du_tilt_correct |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
du_invoice_split |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
du_vespa |
readiness probe |
Celery containers having only liveness per Sarf & Bijith |
|
|
|
|
|
ITOps |
ticrrapp-server |
Readiness & liveness probe |
Per Rachel, this not planned from ITOps side as the requirement was to move as is. |
tensorflow-serving |
Readiness & liveness probe |
Per Rachel, this not planned from ITOps side as the requirement was to move as is. |
|
|
|
|
Please refer Prerequisite section of this doc.
Domain name parameter can be left blank when executing the deploy_base_infra.sh if we need to create the domain name which have the environment name within it. This is taken care by the ARM template.
E.g. if the environment name is dev01, and the domain name parameter name is left blank, ARM template will create the domain as smartops-dev01.eastus.cloudapp.azure.com
It is possible to set custom domains. The custom domain can be passed as a parameter to deploy_base_infra.sh script.
Login to Azure Portal and search ‘Resource Groups’
Click ‘Add’
Select the correct Subscription from the drop-down list and provide a Resource group name and select the Region. Then click ‘Review + Create’
Please refer AKS Node Pool Specifications.
Select the private endpoint of Azure resource
Click on Network Interface of Private Endpoint
IP address of private endpoint can be fond out from the NIC of resource’s Private Endpoint.
Azure MySQL password is externalised and can be passed as a parameter to script deploy_product_infra.sh. (Please refer) One important point to take care is to generate a strong password with alphanumeric characters.