Lab Setup: Chaos Mesh, Scaling, and Pod affinity
This guide outlines steps to enhance the resilience of a UI service by implementing high availability practices. We'll cover installing helm, scaling the UI service, implementing pod anti-affinity, and using a helper script to visualize pod distribution across availability zones.
Installing Chaos Mesh
To enhance our cluster's resilience testing capabilities, we'll install Chaos Mesh. Chaos Mesh is a powerful chaos engineering tool for Kubernetes environments. It allows us to simulate various failure scenarios and test how our applications respond.
Let's install Chaos Mesh in our cluster using Helm:
Release "chaos-mesh" does not exist. Installing it now.
NAME: chaos-mesh
LAST DEPLOYED: Tue Aug 20 04:44:31 2024
NAMESPACE: chaos-mesh
STATUS: deployed
REVISION: 1
TEST SUITE: None
Scaling and Topology Spread Constraints
We use a Kustomize patch to modify the UI deployment, scaling it to 5 replicas and adding topology spread constraints rules. This ensures UI pods are distributed across different nodes, reducing the impact of node failures.
Here's the content of our patch file:
- Kustomize Patch
 - Deployment/ui
 - Diff
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ui
  namespace: ui
spec:
  replicas: 5
  selector:
    matchLabels:
      app: ui
  template:
    metadata:
      labels:
        app: ui
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: ui
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: ui
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/created-by: eks-workshop
    app.kubernetes.io/type: app
  name: ui
  namespace: ui
spec:
  replicas: 5
  selector:
    matchLabels:
      app: ui
      app.kubernetes.io/component: service
      app.kubernetes.io/instance: ui
      app.kubernetes.io/name: ui
  template:
    metadata:
      annotations:
        prometheus.io/path: /actuator/prometheus
        prometheus.io/port: "8080"
        prometheus.io/scrape: "true"
      labels:
        app: ui
        app.kubernetes.io/component: service
        app.kubernetes.io/created-by: eks-workshop
        app.kubernetes.io/instance: ui
        app.kubernetes.io/name: ui
    spec:
      containers:
        - env:
            - name: JAVA_OPTS
              value: -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom
            - name: METADATA_KUBERNETES_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: METADATA_KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: METADATA_KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          envFrom:
            - configMapRef:
                name: ui
          image: public.ecr.aws/aws-containers/retail-store-sample-ui:1.2.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 45
            periodSeconds: 20
          name: ui
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
          resources:
            limits:
              memory: 1.5Gi
            requests:
              cpu: 250m
              memory: 1.5Gi
          securityContext:
            capabilities:
              add:
                - NET_BIND_SERVICE
              drop:
                - ALL
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
          volumeMounts:
            - mountPath: /tmp
              name: tmp-volume
      securityContext:
        fsGroup: 1000
      serviceAccountName: ui
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              app: ui
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
        - labelSelector:
            matchLabels:
              app: ui
          maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
      volumes:
        - emptyDir:
            medium: Memory
          name: tmp-volume
     app.kubernetes.io/type: app
   name: ui
   namespace: ui
 spec:
-  replicas: 1
+  replicas: 5
   selector:
     matchLabels:
+      app: ui
       app.kubernetes.io/component: service
       app.kubernetes.io/instance: ui
       app.kubernetes.io/name: ui
   template:
[...]
         prometheus.io/path: /actuator/prometheus
         prometheus.io/port: "8080"
         prometheus.io/scrape: "true"
       labels:
+        app: ui
         app.kubernetes.io/component: service
         app.kubernetes.io/created-by: eks-workshop
         app.kubernetes.io/instance: ui
         app.kubernetes.io/name: ui
[...]
               name: tmp-volume
       securityContext:
         fsGroup: 1000
       serviceAccountName: ui
+      topologySpreadConstraints:
+        - labelSelector:
+            matchLabels:
+              app: ui
+          maxSkew: 1
+          topologyKey: topology.kubernetes.io/zone
+          whenUnsatisfiable: ScheduleAnyway
+        - labelSelector:
+            matchLabels:
+              app: ui
+          maxSkew: 1
+          topologyKey: kubernetes.io/hostname
+          whenUnsatisfiable: ScheduleAnyway
       volumes:
         - emptyDir:
             medium: Memory
           name: tmp-volume
Apply the changes using Kustomize patch and Kustomization file:
Verify Retail Store Accessibility
After applying these changes, it's important to verify that your retail store is accessible:
Waiting for k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com...
You can now access http://k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com
Once this command completes, it will output a URL. Open this URL in a new browser tab to verify that your retail store is accessible and functioning correctly.
The retail url may take 5-10 minutes to become operational.
Helper Script: Get Pods by AZ
The get-pods-by-az.sh script helps visualize the distribution of Kubernetes pods across different availability zones in the terminal. You can view the script file on github here.
Script Execution
To run the script and see the distribution of pods across availability zones, execute:
------us-west-2a------
ip-10-42-127-82.us-west-2.compute.internal:
ui-6dfb84cf67-6fzrk 1/1 Running 0 56s
ui-6dfb84cf67-dsp55 1/1 Running 0 56s
------us-west-2b------
ip-10-42-153-179.us-west-2.compute.internal:
ui-6dfb84cf67-2pxnp 1/1 Running 0 59s
------us-west-2c------
ip-10-42-186-246.us-west-2.compute.internal:
ui-6dfb84cf67-n8x4f 1/1 Running 0 61s
ui-6dfb84cf67-wljth 1/1 Running 0 61s
For more information on these changes, check out these sections: