Contents
The Virtual Machine which has access to Kubernetes API server |
|
Base Infra |
The base Infrastructure which needs to be created before deploying product Infrastructure |
Product Infra |
The Kubernetes infrastructure required for respective product |
Smart Install |
The Holistic Solution for deploying SmartOps applications in Kubernetes infrastructure |
Base infra and infrastructure specific to SmartOps Platform available in the environment. Refer infrastructure creation document (WIP) to get an understanding on how to create the infrastructure.
Installation engineer has access to bastion VM (VM having visibility to the SmartOps Platform Kubernetes Cluster)
Bastion VM has ability to connect to K8s API server of SmartOps Platform (kubectl commands work.)
Access to the SmartOps Platform key vault is allowed for installation engineer and Bastion VM can connect to Key Vault (Configured via network firewall of Key vault)
The Release packages are stored in SharePoint location and in azure artifacts. Please follow below steps for downloading.
Primary Download location: Share point
Navigate to sharepoint location: https://ustglobal.sharepoint.com/teams/InnovationEngineering/Shared%20Documents/Forms/AllItems.aspx?viewid=f349a736%2D8a62%2D467f%2D8448%2D067be464bd59&id=%2Fteams%2FInnovationEngineering%2FShared%20Documents%2FKnowledge%20Management%2FSmartOps%20Deployment
Open the required release folder (eg:7.1.0)
Download the product zip and move it to the target VM
After downloading from the share point location copy package to the bastion VM for installation
Secondary download location: Azure Artifacts
Prerequisite: using below command install az cli in the target deployment vm
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
Navigate to https://dev.azure.com/USTInnovationEngineering/SmartOps/_packaging?_a=feed&feed=Smartops_ReleasesClick on the required package.
Click on the required package.
Please Note: If its first time you will be prompted to install azure-devops extension. Give ‘Y’ and hit enter to continue.
Key vault permissions for the Azure AD user.
Private Endpoint’s IP associations
Key Vault Private Endpoint and its Private IP
Above Key Vault’s Private IP associated with respective Private Link of Key Vault
Update Private IP in Private Link
Please refer below screenshot when private endpoint’s IP is not associated with the private link.
If the record set is present but the IP is not associated, then Click the record set (Here Azure Key Vault - kv-platform-stg01)
Install Tools: script to install prerequisite packages in bastion VM
chmod +x installbastiontools.sh
|
. / installbastiontools.sh
|
Connect to cluster (Kube config configured)
Pre-Check condition: Please check Python 3.6 is installed in the bastion VM. [ SmartInstall runs on Python 3.6]
Before starting the deployment, we must set secrets in Azure Key Vault using create-az-kv-secrets.sh script in kv-init folder with the updated values with respect to the product infrastructure created.
< Include kv init sc in package >
Before executing the create-az-kv-secrets.sh script, respective access policies must be set for the key vault for the Azure AD user.
Before executing the create-az-kv-secrets.sh script, Key Vault’s firewall policies should be allowed access from All networks. Please refer below screenshot.
< Note : As an alternate White list the IP through f/w policy or just allow all n/w >
Azure Login from bastion VM
Once Signed in successfully, there will be message in the browser like below
az account set - -subscription <sunscription_id>
|
Platform 7.6
As part of 7.6.1 CP upgrade Added new secrets in Key Vault. Please refer kv-init/create-az-kv-secrets-prod.sh to run it as a script.
Azure Service Bus credentials for notification-framework set up for Messaging
<NAMESPACE>-azure-sb-nf-username with value <secret>
<NAMESPACE>-azure-sb-nf-password with value <secret>
rhub-user for rhub-config-manager-rest
<NAMESPACE>-rhub-config-username with value "configadmin"
<NAMESPACE>-rhub-config-password with value "nimdagifnoc"
RHub Orchestrator Access Keys
<NAMESPACE>-rhub-orchestrator-username with value "orchestrator"
<NAMESPACE>-rhub-orchestrator-password with value "rotartsehcro"
RHub Adapter- oauth_basic Access Keys
<NAMESPACE>-rhub-adapters-oauth-basic-username with value "oauth_basic"
<NAMESPACE>-rhub-adapters-oauth-basic-password with value "cisabhtuao"
RHub Adapter- scheduler_job Access Keys
<NAMESPACE>-rhub-adapters-scheduler-job-username with value "scheduler"
<NAMESPACE>-rhub-adapters-scheduler-job-password with value "reludehcs"
RHub Adapter- oauth_jwt Access Keys
<NAMESPACE>-rhub-adapters-oauth-jwt-username with value "oauth_jwt"
<NAMESPACE>-rhub-adapters-oauth-jwt-password with value "twjhtuao"
RHub Adapter- api_endpoint Access Keys
<NAMESPACE>-rhub-adapters-api-endpoint-username with value "api_endpoint"
<NAMESPACE>-rhub-adapters-api-endpoint-password with value "tniopdneipa"
RHub Adapter- xml_json Access Keys
<NAMESPACE>-rhub-adapters-xml-json-username with value "xml_json"
<NAMESPACE>-rhub-adapters-xml-json-password with value "nosjlmx"
RHub Adapter- queue Access Keys
<NAMESPACE>-rhub-adapters-queue-username with value "queue"
<NAMESPACE>-rhub-adapters-queue-password with value "eueuq"
Platform v7.5
As part of Platform v7.5 release, following two new variables are included in Key Vault, which will be used in clones-engine service. You may refer kv-init/create-az-kv-secrets-prod.sh to run it as a script.
Platform v7.4
As part of Platform 7.4 release, we have following key vault additions for clones engine (secret value for the below keys will be same for all enviornments)
smartopsv1-stg01-jfrog-artifactory-username
smartopsv1-stg01-jfrog-artifactory-password
Smart Install uses environment JSON file to install respective application.
Before deployment, environment JSON file needs to be updated as required.
A template of environment JSON is available in the package
Please refer below environment JSON file key values and its details
Keys |
Sub Keys |
|
|
Suggested Values |
Info |
name |
|
|
|
stg01 |
Name of the environment |
product |
|
|
|
smartopsv1 |
Name of product which needs to be deployed. Json file name in products folder. |
version |
|
|
|
7.1.0 |
Helm Chart version |
dnsName |
|
|
|
|
DNS name of the environment |
includeIngress |
|
|
|
true |
Ingress needs to be deployed or not |
ingressIp |
|
|
|
|
IP of Ingress |
isPrivateIngress |
|
|
|
true |
For Private Kubernetes cluster, the internal traffic is through internal Kubernetes loadbalancer. |
|
|
|
|
|
Do not change this setting if not for a specific use case. |
gpuEnabled |
|
|
|
true |
For Kubernetes cluster which needs GPU node pools |
helmRepoLocation |
|
|
|
../charts |
Helm repo location. Either smartops-helm repo or the charts folder inside the package |
isProduction |
|
|
|
true |
Debug logs will be enabled for DU containers based on the value provided here.For production this value should be true |
defaultAppReplicaCount |
|
|
|
2 |
Number for replicas of application containers |
secretProvider |
|
|
|
|
For managing kubernetes secrets |
|
azure |
|
|
|
Provider is Azure for K8s cluster deployed in Azure infrastructure |
|
tenantId |
|
|
|
Tenant ID of Azure subscription |
|
servicePrincipal |
|
|
|
Service principle client id and client secrets |
|
|
clientId |
|
|
|
|
|
clientSecret |
|
|
|
|
keyVaultName |
|
|
|
Azure keyvault name where the secrets are configured with its respective values |
autoScaling |
|
|
|
|
For critical applcation containers, autoscaling is enabled through kubernetes Horizontal Pod Autoscaler |
|
enabled |
|
|
true |
Set true to enable autoscaling for supported services. |
diskEncryption |
|
|
|
|
Encryption for Data at rest. |
|
enabled |
|
|
true |
|
|
azure |
|
|
|
Azure Disc Encryptionset ID. |
storage |
|
|
|
|
Details of various data stores. |
|
mysql |
|
|
|
|
|
|
host |
|
|
Azure MySQL instance name |
|
|
port |
|
|
Port number |
|
|
backup |
|
|
|
|
|
|
enabled |
true |
|
|
|
|
schedule |
0 2 * * * |
|
|
appFileStore |
|
|
|
|
|
|
azure |
|
|
Provider Azure |
|
|
storageAccount |
|
|
Storage account name for application files storage |
|
modelFileStore |
|
|
|
|
|
|
azure |
|
|
Provider Azure |
|
|
storageAccount |
|
|
Storage account where the pre-trained models are stored for various applications. |
|
backupFileStore |
|
|
|
|
|
|
azure |
|
|
Provider Azure |
|
|
storageAccount |
|
|
Storage account where backup files are stored |
|
mongo |
volumeSize |
|
|
Mongo instance details with the volume configuration, backup and its schedule. |
|
|
backup |
|
|
|
|
|
|
enabled |
true |
|
|
|
|
schedule |
0 2 * * * |
|
|
elasticsearch |
|
|
|
|
|
|
volumeSize |
|
|
Elasticsearch instance details with the volume configuration, backup and its schedule. |
|
|
backup |
|
|
|
|
|
|
enabled |
true |
|
|
|
|
schedule |
0 2 * * * |
|
|
rabbitmq |
|
|
|
|
|
|
volumeSize |
|
|
RabbitMQ instance details with the volume configuration, backup and its schedule. |
|
|
backup |
|
|
|
|
|
|
enabled |
true |
|
|
|
|
schedule |
0 2 * * * |
|
|
appStatefulSets |
|
|
|
Volume size configuration for application services which are statefulsets. Eg. du-archival |
|
|
volumeSize |
|
16Gi |
|
|
|
|
|
|
|
|
servicebus |
|
|
platform-stg01.servicebus.windows.net |
|
logMonitoring |
|
|
|
|
Details for enabling log monitoring, log retention, cleanup and storage volume size. |
|
enabled |
|
|
true |
Recommended to set as true |
|
logRetentionInDays |
|
|
5 |
For logs before the configured number of days will be automatically removed as per the cleanup cron schedule. |
|
logCleanUpCronSchedule |
|
|
0 1 * * * |
Time duing which the retention job will run. |
|
|
|
|
|
|
|
logVolumeSize |
|
|
128Gi |
Immutable after first install. |
|
|
|
|
|
|
dataRestore |
databases |
|
|
|
This section applies only when smartinstall runs in restore mode. List of Data stores which needs to be restored |
|
mysqlBackupPath |
|
|
|
folder name inside Azure blob where mysql back up files are stored |
|
mysqlBackupFileName |
|
|
|
File name of mysql back up file |
|
mongoBackupPath |
|
|
|
folder name inside Azure blob where mongo back up files are stored |
|
mongoBackupFileName |
|
|
|
File name of mongo back up file |
|
elasticBasePath |
|
|
|
Path of Elasticsearch backup file in Azure blob |
|
minioBackupPath |
|
|
|
Folder name of Minio backup file in Azure blob |
|
rabbitmqBackupPath |
|
|
|
folder name inside Azure blob where RabbitMQ back up files are stored |
|
rabbitmqBackupFileName |
|
|
|
File name of RabbitMQ back up file |
|
restoreContainer |
|
|
|
Azure Blob container name where back up files are stored |
There are two paths for application install. You can take only one of the routes.
Restore data from an old Environment (E.g., 6.4.3) and install SmartOps Platform
Steps
Keys |
Info about the values which needs to be updated |
databases |
List of Data stores which needs to be restored |
mysqlBackupPath |
folder name inside Azure blob where mysql back up files are stored |
mysqlBackupFileName |
File name of mysql back up file |
mongoBackupPath |
folder name inside Azure blob where mongo back up files are stored |
mongoBackupFileName |
File name of mongo back up file |
elasticBasePath |
Folder name of Elasticsearch backup file in Azure blob |
minioBackupPath |
Folder name of Minio backup file in Azure blob |
rabbitmqBackupPath |
folder name inside Azure blob where Rabbitmq back up files are stored |
rabbitmqBackupFileName |
File name of RabbitMQ back up file |
restoreContainer |
Azure Blob container name where back up files are stored |
python3 restore.py --product ${product} --env ${environment} --kubecontext ${kubecontext} --verbose
|
Verify the restore process has started successfully via K9s
Sample Error Log where the restore has failed.
Smart Recovery Start
MySQL Restore
MySQL restore completed
Mongo Restore
Mongo restore completed
Minio Restore
Note: Minio restore completion can be verified through Kibana logs.
Elasticsearch Restore
Steps for Database Validation of Data counts after restore process
Verify Mysql Table Counts.
Following query can be used for Mysql validation to fetch table counts from current production environment.Execute the query by logging in existing production environment.
SELECT TABLE_SCHEMA ,table_name, table_rows FROM INFORMATION_SCHEMA.TABLES
The mysql table data information is printed in Smart-Recovery logs as in below screenshots in restored environement
Verify Minio Table counts
Login to azure portal and open backup storage account.(eg: sasinvoiceextbackupdrn01)
Open Storage expolorer and move to the back up container. Calculate the blob count in buckets available in minio-data-backup folder,by clickin gon folder statistics as in below screen shot
The blob count will be displayed as in below screen shot
Once restore process completed successfully for minio as mentioned in restore process completion, the file count for minio can be validated in storage account for App file Store.
Enable all networks for appfile store.
Validate the count of buckets restored from backup by clicking the folder properties
Set the Networking setting back to “Selected Networks” once the validation completes
How to monitor the deployments. (k9s)
python3 -u install.py --product ${product} --env ${environment} --kubecontext ${kubecontext} --verbose |
python3 -u installWithDataInit.py --product ${product} --env ${environment} --kubecontext ${kubecontext} --verbose
|
For details on Archiva file share implementation, refer Archiva Implementation
https://web.microsoftstream.com/video/fc814048-9405-423d-adca-22d28ecc30bc?list=trending
Application gateway Settings to Upload certificates
After Successful completion of Smart install installations, access the deployment in k9s and check all pods are in ready state
All pods created via Kubernetes jobs will be in completed state
Access Key cloak admin URL(Refer Appendix) and maintenance URL s from windows VM to verify the URL access
Appendix consists of the following sections
For enabling Data Encryption for Azure MySQL, Storage Accounts and enabling Disk Encryption for Volumes in Kubernetes cluster, we need to create Encryption keys in Azure Key vault which is used to encrypt the data.
Reference: https://docs.microsoft.com/en-us/azure/aks/azure-disk-customer-managed-keys
# Create a DiskEncryptionSet
# key vault name, rg, etc needs to be changed accordingly
#key key-smartops-k8s-disk-enc-001 ( key name given as an example ) needs to be created in Azure key Vault before creating the Disk Encryption Set
keyVaultId=$(az keyvault show --name kv-engg-resrch-001 --query [id] -o tsv)
keyVaultKeyUrl=$(az keyvault key show --vault-name kv-engg-resrch-001 --name key-smartops-k8s-disk-enc-001 --query [key.kid] -o tsv)
az disk-encryption-set create -n smartops-k8s-des-001 -l eastus -g rg-smartopsengg-dev-001 --source-vault $keyVaultId --key-url $keyVaultKeyUrl
Azure cloud shell:
Ensure Get, Wrap and Unwrap permission is set for the disk encryption set to the key created in Az key vault.
Ensure Get, Wrap and Unwrap permission is set for the disk encryption set to the key created in Az key vault.
Please refer des-platform-qa01 in above pic
IMPORTANT: After creating the disk encryption set, select the disk encryption set and click on allow access to disk encryption key created in the key vault. PFB pic
K8s storage class
#currently kept as a part of env-setup template. Can be changed as required
#diskEncryptionSetID values needs to be changed accordingly ( subscriptions, resourceGroups, diskEncryptionSets)
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: pvc-ade-custom-storage-class
provisioner: kubernetes.io/azure-disk
parameters:
kind: Managed
skuname: Premium_LRS
diskEncryptionSetID: "/subscriptions/dfaa090f-c407-4e75-ac08-143cb932bdcf/resourceGroups/rg-smartopsengg-dev-001/providers/Microsoft.Compute/diskEncryptionSets/smartops-k8s-des-001"
After deploying storage class, respective changes need to be made in statefulset’s pvcs referring to above custom storage class.
References:
https://docs.microsoft.com/en-us/azure/mysql/howto-data-encryption-portal
https://docs.microsoft.com/en-us/azure/mysql/concepts-data-encryption-mysql
Data Encryption of Azure MySQL instance is done at server level using CMK ( customer-managed key)
Key Terminologies
Key Encryption Key [ KEK ]
Stored in Azure Key Vault
KEK is used to encrypt Data Encryption Key
Data Encryption Key [ DEK ]
Symmetric key used to encrypt a block of data
DEK, encrypted with KEK are stored separately
MySQL server needs below permissions on Azure Key Vault
get
wrapKey
UnwrapKey
Key Vault and Azure Database for MySQL must belong to same Azure AD
Enable soft delete feature on Key Vault instance
Key must be in ‘Enabled’ state
When keys are imported, only .pfx, .byok, .backup file formats are supported
If key vault generates the key, create a key backup before using for the first time.
Backup Azure Key Vault Key Reference
When you configure data encryption with a customer-managed key in Key Vault, continuous access to this key is required for the server to stay online. If the server loses access to the customer-managed key in Key Vault, the server begins denying all connections within 10 minutes. The server issues a corresponding error message and changes the server state to Inaccessible. Some of the reason why the server can reach this state are:
If we create a Point in Time Restore server for your Azure Database for MySQL, which has data encryption enabled, the newly created server will be in Inaccessible state. You can fix this through Azure portal or CLI.
If we create a read replica for your Azure Database for MySQL, which has data encryption enabled, the replica server will be in Inaccessible state. You can fix this through Azure portal or CLI.
If you delete the Key Vault, the Azure Database for MySQL will be unable to access the key and will move to Inaccessible state. Recover the Key Vault and revalidate the data encryption to make the server Available.
If we delete the key from the Key Vault, the Azure Database for MySQL will be unable to access the key and will move to Inaccessible state. Recover the Key and revalidate the data encryption to make the server Available.
If the key stored in the Azure Key Vault expires, the key will become invalid and the Azure Database for MySQL will transition into Inaccessible state. Extend the key expiry date using CLI and then revalidate the data encryption to make the server Available .
Limitations
Support for this functionality is limited to General Purpose and Memory Optimized pricing tiers.
This feature is only supported in regions and servers which support storage up to 16TB. For the list of Azure regions supporting storage up to 16TB, refer to the storage section in documentation here
Encryption is only supported with RSA 2048 cryptographic key.
Steps
Create Azure Database for MySQL instance. PFB a sample screenshot for the confgurations.
Restart Azure Database for MySQL
Errors Observed while configuring
If soft –delete is not enabled for keyvault , will get error like below
The above issue has been resolved when new keyvault instance created with soft delete enabled and enabling purge protection after key vault creation.
PFB screenshot after configuring CMK for enabling Data Encryption.
Steps
Select the key vault and key by clicking ‘Select a key vault and key’
Please Note: If its first time you will be prompted to install azure-devops extension. Give ‘Y’ and hit enter to continue.
Container name |
CPU Threshold |
min replicas |
max replicas |
du-core-nlp |
80% |
2 |
4 |
du-pipeline |
80% |
2 |
4 |
du-rest |
80% |
2 |
4 |
du-tikaserver |
80% |
2 |
4 |
clones-engine |
80% |
2 |
4 |
az keyvault secret set --subscription <subscription-id> --vault-name <keyvault-name> --name <product-name>-<env-name>-offline-token --value "offline-token-value" -e base64
eg:
az keyvault secret set --subscription cdf5c496-95b3-4219-9117-35d4e0746d13 --vault-name kv-platform-drn01 --name smartopsv1-drn01-offline-token --value "eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhODdjYzgwOS02YTA1LTQyY2MtOTY3YS0zNjk3OGFjZGFkZTUifQ.eyJqdGkiOiIwNjcxZmU3Zi00NWYwLTQzZWUtOWEyMi0xZWFiN2EzMWUyOTEiLCJleHAiOjAsIm5iZiI6MCwiaWF0IjoxNjEwNTM4MzAzLCJpc3MiOiJodHRwczovL3NtYXJ0b3BzLWRybjAxLmVhc3R1cy5jbG91ZGFwcC5henVyZS5jb20vcGFhcy9pbnZvaWNlZXh0cmFjdGlvbi9rZXljbG9hay9hdXRoL3JlYWxtcy91c3RnbG9iYWwiLCJhdWQiOiJodHRwczovL3NtYXJ0b3BzLWRybjAxLmVhc3R1cy5jbG91ZGFwcC5henVyZS5jb20vcGFhcy9pbnZvaWNlZXh0cmFjdGlvbi9rZXljbG9hay9hdXRoL3JlYWxtcy91c3RnbG9iYWwiLCJzdWIiOiJiN2JhNTFlZS05MzFmLTQ3MDgtODNiNi1mZGRhMDc4OTUwNDEiLCJ0eXAiOiJPZmZsaW5lIiwiYXpwIjoic21hcnRvcHMtZnJvbnRlbmQiLCJhdXRoX3RpbWUiOjAsInNlc3Npb25fc3RhdGUiOiJiMmM1NzgxYi04YWQ0LTQ5NTUtYTBiZC05MWRlNTkxMTZhMTQiLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsib2ZmbGluZV9hY2Nlc3MiXX0sInNjb3BlIjoib2ZmbGluZV9hY2Nlc3MifQ._G6N72VU6X-N5IHHXDjzKHUjSWgglb4cKge6JNh93dc" -e base64
Container name |
du-search-normalization |
du-pipeline |
Following set of URLs are used for maintenance purpose of application. Credentials for the maintenance URL is updated in below table.
Note: Maintenance URL access will be blocked from internet for production environments. Restriction rules are handled in WAF rules
Access to maintenance URLS are allowed from Windows bastion VM only.
Refer document for accessing production windows bastion VM
Eg:http://<ip_of_kub_internal_lb>/kibana/
|
URL |
UserName |
Password |
PHPMyAdmin |
http://< ip_of_kub_internal_lb >/ phpmyadmin/ |
smartops-dev@<db_name> |
EAHlmFoxa1ZpJbXC |
Mongo Express |
http://<ip_of_kub_internal_lb >/ mongoexpress/ |
mongoex |
sm@rt0ps_mon_ex |
Kibana URL |
http://< ip_of_kub_internal_lb >/ kibana/ |
smartops |
7PQgHBVsbarM7TVc |
Grafana URL |
http://< ip_of_kub_internal_lb>/ grafana/ |
smartopsdev@ustglobal.com |
7X03P7vQ064fp0d |
Refer following Documentation for AD integration
*Contact SmartOps Support team< smartops-support-team@ust.com > for credentials to access online documentation
K9s is installed when the installbastiontools.sh script is executed. Please refer
Staying in home directory execute below command to open K9s
K9s/k9s
Or
cd k9s
./k9s
As we have changed the artifacts reference from Archiva to jFrog-Artifactory for Core Platform, it is expected to uninstall 2 stacks deployed for smartops-archiva
Issues |
Remarks |
smartops-secrets stack failure |
Secrets not correctly updated in Azure Key Vault or smartops-secrets chart |
401 error while Offline Token generation |
1.Getting 401 error while opening keycloak admin screen for offline token generation
2. This error could be occurring due to a corrupted or invalid certificate. Upload a valid certificate in Application gateway to resolve this error.
|
Restore failures |
1.All databases should be deployed and running in healthy state |
1. Login to SmartOps Master Admin UI, Revoke the offline tokens generated for all the organizations.
2. Navigate to Keycloak Administration console, do the below step(3) for all organizations except master.
3. Go to Users tab,
select sense_master user, go to Consents tab, check if there are any Offline Token entries, if yes, then click on Revoke button.
4. Uninstall keycloak services in K8.
5. Clear invalid offline token entries from DB, if any. Execute the below SQL scripts in MySQL.
TRUNCATE table keycloak.offline_client_session;
TRUNCATE table keycloak.offline_user_session;
6.
Install keycloak services in K8.
7. Login to SmartOps Master Admin UI, Generate new offline token for all the organizations
8. Update keyvault with new token generated for USTGlobal
9. Restart secret vault sync pod to reflect
the new value
10. Confirm that pods are using new offline token.