Provisioning compute
In this lab, we'll use Karpenter to provision AWS Trainium nodes specifically designed for accelerated machine learning inference. Trainium is AWS's purpose-built ML accelerator that provides high performance and cost-effectiveness for running inference workloads like our Mistral-7B model.
To learn more about Karpenter, check out the Karpenter module in this workshop.
Karpenter has already been installed in our EKS cluster and runs as a Deployment:
NAME READY UP-TO-DATE AVAILABLE AGE
karpenter 2/2 2 2 11m
Let's review the configuration for the Karpenter NodePool that we'll be using to provision Trainium instances:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: trainium-trn1
spec:
  template:
    metadata:
      labels:
        instanceType: trn1.2xlarge
        provisionerType: Karpenter
        neuron.amazonaws.com/neuron-device: "true"
        vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
    spec:
      taints:
        - key: aws.amazon.com/neuron
          value: "true"
          effect: "NoSchedule"
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["trn1.2xlarge"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
      expireAfter: 720h
      terminationGracePeriod: 24h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: trainium-trn1
  limits:
    aws.amazon.com/neuron: 2
    cpu: 16
    memory: 64Gi
  disruption:
    consolidateAfter: 300s
    consolidationPolicy: WhenEmptyOrUnderutilized
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: trainium-trn1
spec:
  amiFamily: AL2023
  amiSelectorTerms:
    - alias: al2023@latest
  instanceStorePolicy: RAID0
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        deleteOnTermination: true
        encrypted: true
        volumeSize: 256Gi
        iops: 16000
        throughput: 1000
        volumeType: gp3
  role: ${KARPENTER_NODE_ROLE}
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="//"
    --//
    Content-Type: text/x-shellscript; charset="us-ascii"
    #!/bin/bash
    sed -i "s/^max_concurrent_downloads_per_image = .*$/max_concurrent_downloads_per_image = 10/" /etc/soci-snapshotter-grpc/config.toml
    sed -i "s/^max_concurrent_unpacks_per_image = .*$/max_concurrent_unpacks_per_image = 10/" /etc/soci-snapshotter-grpc/config.toml
    --//
    Content-Type: application/node.eks.aws
    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      featureGates:
        FastImagePull: true
    --//
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
    - tags:
        kubernetes.io/cluster/eks-workshop: owned
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
        kubernetes.io/role/internal-elb: "1"
  tags:
    app.kubernetes.io/created-by: eks-workshop
    karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
    aws-neuron: "true"
We're asking the NodePool to start all new nodes with a Kubernetes label provisionerType: Karpenter, which will allow us to specifically target Karpenter nodes with Pods for demonstration purposes. Since there are multiple nodes being autoscaled by Karpenter, there are additional labels added such as instanceType: trn1.2xlarge to indicate that this Karpenter node should be assigned to the trainium-trn1 pool.
The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-type to limit to a subset of specific instance types. You can learn which other properties are available here.
A Taint defines a specific set of properties that allow a node to repel a set of Pods. This property works with its matching label, a Toleration. Both tolerations and taints work together to ensure that Pods are properly scheduled onto the appropriate nodes. You can learn more about the other properties in this resource.
A NodePool can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool, providing a cap on the total compute.
Let's create the NodePool:
ec2nodeclass.karpenter.k8s.aws/trainium-trn1 created
nodepool.karpenter.sh/trainium-trn1 created
Once properly deployed, check for the NodePools:
NAME NODECLASS NODES READY AGE
trainium-trn1 trainium-trn1 0 True 31s
As seen from the above command the NodePool has been properly provisioned, allowing Karpenter to provision new nodes as needed. When we deploy our ML workload in the next step, Karpenter will automatically create the required Trainium instances based on the resource requests and limits we specify.