Platform v7.1-Infrastructure Deployment Process

Contents

  1. Important Terms ‎
  2. Overview
  3. Architecture 
  4. Prerequisites set up before downloading the package and deploying the infrastructure
  5. Scripts location for Base Infra and Product Infra deployment
  6. ARM Templates for Base Infra and Product Infra deployment
  7. Install Base Infrastructure
  8. Base Infrastructure Deployment validation
  9. Install Product Infrastructure
  10. Product Infrastructure Deployment validation
  11. Disk Encryption and Data Encryption
  12. Azure Disk Encryption in AKS
  13. Data Encryption of Azure Database for MySQL with a customer-managed key 
  14. Data Encryption of Azure Storage account using Customer Managed Key ‎  
  15. Post Deployment Validations for Infrastructure
  16. Az MYSQL Firewall policies 
  17. Az MYSQL Max Connection settings
  18. Azure service principal needs Get permission to Key Vault secrets . 
  19. Disk Encryption set, storage account, Azure MySQL instance’s Key permissions.  ‎
  20. User’s access to Azure Key Vault instance
  21. Private Endpoint’s IP associations  ‎
  22. Node pools’ zone redundancy  
  23. Kubernetes cluster connectivity ‎
  24. APPENDIX
    1. SmartOps Platform Azure Resources Asset List
    2. SmartOps Platform Azure Resources Details
      1. Base Infra
    3. SmartOps VNET Address Space
    4. Product Infra
    5. AKS Node Pool Specification
    6. Additional Resources
    7. Smart Recovery Documentation
    8. ADFS Integration ‎
    9. Data Archival & Restore documentation ‎
    10. Connect to SmartOps Platform Kubernetes cluster from Bastion VM ‎
  25. Known Issues during Infrastructure deployment and Resolutions.
    1. Issues
    2. Resolutions
      1. Private Endpoint IP not associated with private link. ‎
      2. Azure ARM template deployment failures with status ‘Operation Timed Out’. ‎
      3. Azure ARM template deployment with status ‘CONFLICT’. ‎
      4. File copying issues to Storage account blob containers ‎
      5. Kubernetes cluster not able to access key vault ‎
      6. Disk Encryption set not able to get the keys ‎
    3. Deployment Engineer not able to access Azure Key vault ‎
    4. Bad Request Status for Storage Account Resources.
  26. 7.1 Known Limitations
    1. Services which are not HA ready
    2. Services which does not have Kubernetes health checks configured
  27. FAQ

 

Important Terms

Base Infra

The base Infrastructure which needs to be created before deploying product Infrastructure 

Product Infra 

The Kubernetes infrastructure required for respective product 

Overview

Welcome to Kubernetes Infrastructure deployment guide for SmartOps Platform 7.1. This document provides an overview of the flow of activities and overall architecture for installation of SmartOps Platform 7.1.
The main activities involved in this Kubernetes Infrastructure deployment guide is as below. 

To get a deeper understanding about the installation read further about SmartOps 7.1 architecture. 

Architecture 

SmartOps Platform 7.1 is a distributed cloud application. It can be installed only in cloud infrastructure.  SmartOps Platform 7.1 is deployed and managed in Kubernetes. Even though it is there in our roadmap to be cloud agnostic, at this point of time the installation is compatible and tested only on Microsoft Azure Cloud Infrastructure. Support for more cloud will follow in subsequent releases. 
The below diagram provides an overview Picture 5

 

Prerequisites set up before downloading the package and deploying the infrastructure

Prerequisites

Azure CLI

Azure cli needs to be installed in Bastion VM if the infrastructure deployment scripts are executed from Bastion VM

Resource Group

An Azure resource group should be created before executing the infrastructure deployment scripts and needs to be passed as a parameter to the infrastructure deployment script.

Service Principal

Azure Service Principal client-id and client-secret is required to be passed as parameters to the infrastructure deployment script

Azure MYSQL Password

A strong password needs to be generated for Azure MYSQL and needs to be passed as a parameter to product infra deployment script.

Az Login

Before Downloading the package, we have to Az login using Azure AD account

 

Download the installation package in cloud shell

 

The Release packages are stored in SharePoint location and in azure artifacts. Please follow below steps for downloading the Azure Infrastructure installation scripts to Azure cloud shell.

 

Open Azure cloud shell

 

Picture 6

az login in cloud shell

 

Picture 25

 

The package downloaded from SharePoint has to be made available in Azure cloud shell

SharePoint location

https://ustglobal.sharepoint.com/teams/InnovationEngineering/Shared%20Documents/Forms/AllItems.aspx?viewid=f349a736%2D8a62%2D467f%2D8448%2D067be464bd59&id=%2Fteams%2FInnovationEngineering%2FShared%20Documents%2FKnowledge%20Management%2FSmartOps%20Deployment 

 

After downloading the package in cloud shell scripts for deploying

Scripts location for Base Infra and Product Infra deployment

After package is downloaded, the scripts for installing base and product infra will be available in

<package_folder_name>/azure-setup/scripts folder

 

Picture 2

 

ARM Templates for Base Infra and Product Infra deployment

After package is downloaded, the scripts for installing base and product infra will be available in

<package_folder_name>/azure-setup/arm-templates folder.

 

Picture 11

Install Base Infrastructure

./deploy_base_infra.sh <subscription_id> <resource_group> <environment> <base_ip_seg> <sp_client_id> <tenant_id> <sp_client_secret> <smartops_domain_name>

 

Parameters

Parameter

Description

subscription_id

Azure Subscription ID where base infra needs to be deployed

resource_group

Azure Resource Group where base infra needs to be deployed

environment

Environment Name. Restricted to maximum 5 characters. E.g.: dev02, stg01

base_ip_seg

The base IP segment of the environment’s IP Range

smartops_domain_name

Can be left blank. DNS creation will be taken care by the ARM template.

sp_client_id

Azure Subscription Service Principal client ID

tenant_id

Azure subscription Tenant ID

sp_client_secret

Azure subscription client secret

 

 

Base Infrastructure Deployment validation

 

Select Deployments in respective resource group and look for ‘base deployment’ to check the status of base infrastructure deployment.

 

Picture 20

 

Install Product Infrastructure

Once the package is downloaded, cd into <package_folder_name>/azure-setup/scripts folder.

 

Execute the shell script deploy_product_infra.sh in Azure cloud shell by passing required parameters

 

Parameters

Parameter

Description

subscription_id

Azure Subscription ID where base infra needs to be deployed

resource_group

Azure Resource Group where base infra needs to be deployed

product

Name of product. E.g.: smartops

aksNodeImageVersion

OS Image version of K8s cluster. Recommended to be set latest.

sp_client_id

Azure Subscription Service Principal client ID

tenant_id

Azure subscription Tenant ID

sp_client_secret

Azure subscription client secret

mysql_admin_password

Admin password for the Azure MySQL instance

 

Product Infrastructure Deployment validation

 

Select Deployments in respective resource group and look for ‘base deployment’ to check the status of base infrastructure deployment

Picture 28

Disk Encryption and Data Encryption

Data at rest is encrypted using encrypting Azure Disks using Disk encryption sets. The Data stored in Azure resources like Azure MySQL and Azure storage account is encrypted using keys stored in Azure Key Vault.

 

Azure Disk Encryption in AKS

 
‎Reference: https://docs.microsoft.com/en-us/azure/aks/azure-disk-customer-managed-keys 
 
# Create a DiskEncryptionSet 
‎# key vault name, rg, etc needs to be changed accordingly 
‎#key key-smartops-k8s-disk-enc-001 ( key name given as an example ) needs to be created in Azure key Vault before creating the Disk Encryption Set  
 
keyVaultId=$(az keyvault show --name kv-engg-resrch-001 --query [id] -o tsv) 

keyVaultKeyUrl=$(az keyvault key show --vault-name kv-engg-resrch-001 --name key-smartops-k8s-disk-enc-001 --query [key.kid] -o tsv) 

az disk-encryption-set create -n smartops-k8s-des-001 -l eastus -g rg-smartopsengg-dev-001 --source-vault $keyVaultId --key-url $keyVaultKeyUrl 

 

 

Azure cloud shell:  

Picture 37

Ensure Get, Wrap and Unwrap permission is set for the disk encryption set to the key created in Az key vault. 

 

 

Picture 38

 

Please refer des-platform-qa01 in above pic 

 
‎IMPORTANT: After creating the disk encryption set, select the disk encryption set and click on allow access to disk encryption key created in the key vault.  PFB pic 

 

K8s storage class 
‎#currently kept as a part of env-setup template. Can be changed as required  

#diskEncryptionSetID values needs to be changed accordingly (  subscriptions, resourceGroups, diskEncryptionSets)  

kind: StorageClass 

apiVersion: storage.k8s.io/v1 

metadata: 

  name: pvc-ade-custom-storage-class 

provisioner: kubernetes.io/azure-disk 

parameters: 

  kind: Managed 

  skuname: Premium_LRS 

  diskEncryptionSetID: "/subscriptions/dfaa090f-c407-4e75-ac08-143cb932bdcf/resourceGroups/rg-smartopsengg-dev-001/providers/Microsoft.Compute/diskEncryptionSets/smartops-k8s-des-001" 

 
After deploying storage class, respective changes need to be made in statefulset’s pvcs referring to above custom storage class.

Data Encryption of Azure Database for MySQL with a customer-managed key 

 References:  
https://docs.microsoft.com/en-us/azure/mysql/howto-data-encryption-portal 

https://docs.microsoft.com/en-us/azure/mysql/concepts-data-encryption-mysql 

 

Key Encryption Key [ KEK ] 

Data Encryption Key [ DEK ] 

Symmetric key used to encrypt a block of data 

  

When you configure data encryption with a customer-managed key in Key Vault, continuous access to this key is required for the server to stay online. If the server loses access to the customer-managed key in Key Vault, the server begins denying all connections within 10 minutes. The server issues a corresponding error message and changes the server state to Inaccessible. Some of the reason why the server can reach this state are: 
 

 

Limitations 

 

Steps 

Picture 40 

 

Picture 41

Picture 42

 

Picture 43

 

 
‎Errors Observed while configuring  

If soft –delete is not enabled for keyvault , will get error like below  

Picture 44 

 

 

Picture 45

 

Data Encryption of Azure Storage account using Customer Managed Key
 

Steps 
 

 

Picture 30

 

 

 

 

 

Picture 32

Select the key vault and key by clicking ‘Select a key vault and key’  

 

Picture 12

 

Picture 21

 

Picture 14

 

Post Deployment Validations for Infrastructure

 

After successful deployments of base and product infrastructure using the shared ARM templates in the package, below validations needs to be performed before proceeding with the Application deployment.

#1

Az MYSQL Firewall policies 

#2

Key vault permissions for the Azure AD user

#3

Azure service principal needs Get permission to Key Vault secrets. 

#4

Disk Encryption set, storage account, Azure MySQL instance needs GET, Wrap and Unwrap Key permissions.

#5

User’s access can be enabled by Adding users with required set of permission to access key, secret and certificate which is listed under USER section  

#6

Private Endpoint’s IP associations 

#7

Node pools’ zone redundancy  

Az MYSQL Firewall policies 

 

 

 

Picture 1

 

 

Az MYSQL Max Connection settings

 

Max connection value for Azure MySQL is 300 by default. This needs to be set to 500.

Picture 8

Key vault permissions for the Azure AD user. 

Picture 23

 

 

 

 

Azure service principal needs Get permission to Key Vault secrets . 

 

Picture 7

 

Disk Encryption set, storage account, Azure MySQL instance’s Key permissions. 

Disk Encryption Set, Storage Account and Azure MySQL instance needs Get, Wrap and Unwrap key permissions.

Please refer below example for disk encryption set.  

Picture 26

 

User’s access to Azure Key Vault instance

 

User’s access can be enabled by Adding users with required set of permission to access key, secret and certificate which is listed under USER section.

Please refer below screenshot.

 

Private Endpoint’s IP associations 

 

Key Vault Private Endpoint and its Private IP 

Picture 46

 

Picture 47

Update Private IP in Private Link  

Please refer below screenshot when private endpoint’s IP is not associated with the private link.  
#x200e

Picture 48

 

Picture 49

 

 

Picture 50

 

 

Picture 51

Node pools’ zone redundancy  

 

Picture 16

 

 

Kubernetes cluster connectivity

After deploying all Azure resources for base and product infrastructure and before triggering the application deployment script, validate connectivity from Bastion VM to SmartOps Platform Kubernetes cluster.

Please refer Appendix

 

FTP Server Creation for SmartOps Extraction


Please use below attached script after substituting with required names for vnet , resource group etc.

 

APPENDIX

    1. SmartOps Platform Azure Resources Asset List
    2. SmartOps Platform Azure Resources Details
    3. Azure Disk Encryption
    4. SmartRecovery SOP
    5. ADFS Integration
    6. Data Archival & Restore documentation
    7. Connect to SmartOps Platform Kubernetes cluster from Bastion VM.

 

 

 

SmartOps Platform Azure Resources Asset List

 

Sl No

Product / PWF

Asset Name

SKU / Tier

Count

Max Count (where applicable)

Comments

1

Platform

K8s Cluster

Private Cluster

1

 

Availability Zones have to be enabled and VMs distributed in various zones per node pool

2

Platform

Dev Node Pool VMs

Standard_D8s_v3

3

5

 

3

Platform

Persist Node Pool VMs

Standard_D4s_v3

3

4

 

4

Platform

GPU Node Pool VMs

Standard_NC6_Promo

2

4

 

5

Platform

OS Disks

Premium 128GB

12

 

 

6

Platform

DataDisks - RabbitMQ

Premium 4GB

3

 

 

7

Platform

DataDisks - MongoDB

Premium 32GB

3

 

 

8

Platform

DataDisks - ElasticSearch Log

Premium 128GB

3

 

 

9

Platform

DataDisks - ElasticSearch App

Premium 32GB

3

 

 

10

Platform

DataDisks - Prometheus

Premium 64 GB

2

 

 

11

Platform

Azure MySQL

General Purpose, 2 vCore(s), 200 GB(Auto Grow)

1

 

 

12

Platform

Storage Accounts

General Purpose v2, ZRS, Hot Tier, Encryption enabled
Usage approx 2TB/month

2

 

 

13

Platform

KeyVault

Standard

1

 

 

14

Platform

Disk Encryption Set

Customer-managed key

1

 

 

15

Common

FTP Server

Standard B2s (2 vcpus, 4 GiB memory)

1

 

FTP server is used by all products in an environment. Hence kept as common. In Prod this will be per customer

 

SmartOps Platform Azure Resources Details

 

Base Infra

 

Picture 53

 

Resources

Description

Application Gateway

Layer 7 load balancer which manages traffic to applications deployed in Kubernetes cluster

Network Interface

Network Interface created which is associated with the Bastion VM

Network Security Group

The network security rules associated with each Product ( SmartOps Platform , ITOps , IE) and maintenance nsg. Bastion VM uses maintenance NSG to access the Kubernetes cluster.

Public IP

Public IP associated with Application Gateway and Bastion VM

Private DNZ zones

we have 4 private endpoints per environment. For these 4 private endpoints, we have 3 private DNZ zones. One private zone per Azure blob, Azure Key Vault and Azure MySQL instances.

VM

Bastion VM which can be used to access the Kubernetes cluster.

VNET

SmartOps virtual network where all applications are deployed.

 

SmartOps VNET Address Space

VNET and Subnet

Address Space

Description

smartops_vnet_address_space

17.1.0.0/16

IP range for SmartOps VNET

platform_snet_address_space

17.1.0.0/20

Address space of SmartOps Platform subnet

invoiceext_snet_address_space

17.1.16.0/20

Address space of PWF Invoice Extraction subnet

itops_snet_address_space

17.1.32.0/20

Address space of ITOps subnet

maintenance_snet_address_space

17.1.48.0/24

Address space of Maintenance subnet

 

Product Infra

 

Picture 13

 

Resources

Description

Azure MySQL

Managed MySQL instance in Azure Cloud Infrastructure

Azure Redis

Azure Redis cache

Kubernetes Service

Azure Kubernetes Service for SmartOps Platform

Azure Key Vault

Azure Key vault instance where all Kubernetes secrets are configured

Private endpoints

Azure private endpoints for Azure Blob, Azure MySQL and Azure Key Vault. Private endpoint enables secure connection to Azure resources.

Network Interface

Network Interface associated with each private endpoint.

Storage Accounts

Azure storage account for backup file store and app file store of SmartOps Platform

 

AKS Node Pool Specification

Node Pool

Node Size

AutoScale Range

OS Disk Size

devagentpool

Standard_D8s_v3

3 to 5

128 GiB

gpupool

Standard_NC6_Promo

2 to 4

128 GiB

persistpool

Standard_D4s_v3

3 to 4

128 GiB

 

Additional Resources

Other than base and product infra, we have the requirement of two isolated VMs.

    1. TCP Proxy VM for IHubLite
    2. FTP server

SmartOps DE team manages both the above VMs in maintenance subnet. At times when the client requirement comes in, then the required VMs has to be created in client subnet.

Smart Recovery Documentation


SharePoint Location

 

ADFS Integration

SharePoint Location

Data Archival & Restore documentation

Data Archival - SharePoint Location

Restore Checklist – SharePoint Location

 

Connect to SmartOps Platform Kubernetes cluster from Bastion VM

Connect to cluster (Kube config configured)  

Picture 34

Please find below screenshots of the commands for reference on how to connect the Kubernetes cluster from Bastion VM.  These commands needs to be executed from the Bastion VM.
Picture 35

Known Issues during Infrastructure deployment and Resolutions.

Issues

#1

Private Endpoint IP of Azure MySQL, Storage accounts and key vault not associated with respective private link.

#2

Azure ARM template deployment failures with status ‘Operation Timed Out’. This can be resolved by redeploying the template.

#3

Azure ARM template deployment with status ‘CONFLICT’. This can be resolved by redeploying the template.

#4

File copying issues to Storage account blob containers because of firewall policy restricted to ‘all networks’

#5

Kubernetes cluster not able to access key vault because of not setting correct access policies

#6

Disk Encryption set not able to get the keys because of not having the Get Wrap and Unwrap permission to key stored in Azure Key Vault.

#7

Deployment Engineer not able to access Azure Key vault because of not having required access policies set in Azure Key Vault.

#8

Bad Request Status for Storage Account Resources.

 

Resolutions

 

Private Endpoint IP not associated with private link.

Please refer Private Endpoint’s IP Associations

 

Azure ARM template deployment failures with status ‘Operation Timed Out’.

This can occur intermittently because of latency or interruptions with Azure APIs. There are no specific fixes to be applied by Deployment engineer than executing the deployment script  again.

Azure ARM template deployment with status ‘CONFLICT’.

‘CONFLICT’ during deployments are observed only for NSGs. There are no specific fixes to be applied by Deployment engineer than executing the deployment script again.

File copying issues to Storage account blob containers

Picture 56

 

Kubernetes cluster not able to access key vault

Please refer Azure service principal needs Get permission to Key Vault secrets.

 

Disk Encryption set not able to get the keys

Please refer Disk Encryption set, storage account, Azure MySQL instance’s Key permissions.

Deployment Engineer not able to access Azure Key vault

Please refer User’s access to Azure Key Vault instance

Bad Request Status for Storage Account Resources.

Issue

 

Picture 3

Fix

 

 

Picture 19

7.1 Known Limitations

Services which are not HA ready

 

Picture 18

 

 

 

 

Services which does not have Kubernetes health checks configured

 

Product

Container name

Missing probes

Comments

PWF IE

du_pipeline

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_scheduler

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_tilt_correct

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_invoice_split

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_vespa

readiness probe

Celery containers having only liveness per Sarf & Bijith

 

 

 

 

Platform

du_pipeline

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_scheduler

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_tilt_correct

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_invoice_split

readiness probe

Celery containers having only liveness per Sarf & Bijith

du_vespa

readiness probe

Celery containers having only liveness per Sarf & Bijith

 

 

 

 

ITOps

ticrrapp-server

Readiness & liveness probe

Per Rachel, this not planned from ITOps side as the requirement was to move as is.

tensorflow-serving

Readiness & liveness probe

Per Rachel, this not planned from ITOps side as the requirement was to move as is.

 

 

 

 

FAQ

 

    1. What is the pre-requisite setup to be done before starting the Infrastructure deployment script?

      Please refer Prerequisite section of this doc.

    2. How to set the Domain Name when deploying base infrastructure?

Domain name parameter can be left blank when executing the deploy_base_infra.sh if we need to create the domain name which have the environment name within it. This is taken care by the ARM template.

E.g. if the environment name is dev01, and the domain name parameter name is left blank, ARM template will create the domain as smartops-dev01.eastus.cloudapp.azure.com

It is possible to set custom domains. The custom domain can be passed as a parameter to deploy_base_infra.sh script.

 

3.How to create an Azure Resource Group?

 

Login to Azure Portal and search ‘Resource Groups’

Picture 62

Click ‘Add’

Picture 27

 

Select the correct Subscription from the drop-down list and provide a Resource group name and select the Region. Then click ‘Review + Create’

Picture 63

    1. What is the Disk capacity of AKS node pool?


‎Please refer AKS Node Pool Specifications.

 

    1. How to find IP address of Private Endpoint?

Select the private endpoint of Azure resource

Picture 33

Click on Network Interface of Private Endpoint

Picture 9

IP address of private endpoint can be fond out from the NIC of resource’s Private Endpoint.

Picture 10

 

    1. How to create Azure MySQL password?

Azure MySQL password is externalised and can be passed as a parameter to script deploy_product_infra.sh. (Please refer) One important point to take is to generate a strong password with alphanumeric characters.

 

    1. How to set max connection parameter of Azure MySQL?

Please refer