Running Graviton2 workloads on EKS clusters with Karpenter

Amazon Elastic Kubernetes Service (EKS) provides a managed Kubernetes service, allowing users to deploy, manage, and scale containerized applications using Kubernetes on AWS. With the introduction of Graviton2 processors, AWS offers enhanced performance and cost savings.

Karpenter is an open-source node lifecycle management project built for Kubernetes that was created by AWS as an alternative for the cluster autoscaler project.

In this article, we are going to look into what steps are needed to run graviton2 (arm64) based workloads in a EKS cluster that is managed with Karpenter. I’m going to assume you have a running EKS cluster and karpenter is properly configured in the cluster; if you need help setting up a new cluster with karpenter follow along with the documentation at the official site

NodePool

The NodePool sets constraints on the nodes that can be created by Karpenter and the pods that can run on those nodes. The NodePool configures things like:

  • Define taints to limit the pods that can run on nodes Karpenter creates
  • Define any startup taints to inform Karpenter that it should taint the node initially, but that the taint is temporary.
  • Limit node creation to certain zones, instance types, and computer architectures (like arm64 or amd64)

You can get the active karpenter nodepools in your cluster with:

kubectl describe nodepool

Let’s say that in our case this is driven by a configuration that looks like this:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default 
spec:  
  template:
    metadata:
      labels:
        intent: apps
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: "karpenter.k8s.aws/instance-cpu"
          operator: In
          values: ["4", "8", "16", "32"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
      nodeClassRef:
        name: default
      kubelet:
        containerRuntime: containerd
        systemReserved:
          cpu: 100m
          memory: 100Mi
  disruption:
    consolidationPolicy: WhenUnderutilized

Here we can see that this nodepool allows only amd64 (intel or amd) type of instances. If we want to support graviton2 (arm64) instances we would need to either update this definition to support that or create a new separate nodepool. Let’s just add support in the existing one by adding this key to the requrements:

       - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]

and then re-apply it with kubectl apply. Now our nodepool supports both intel and graviton2 instance types.

NodeClass

Another important concept for Karpenter is the EC2NodeClass. Node Classes enable configuration of AWS specific settings. Each NodePool must reference an EC2NodeClass using spec.template.spec.nodeClassRef. Here we configure things like subnets, security groups, and what AMIs to use for the instances.

The configuration for this might look something like:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  role: "${local.node_iam_role_name}"
  amiFamily: AL2 
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: ${local.name}
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: ${local.name}
  tags:
    IntentLabel: apps
    KarpenterNodePoolName: default
    NodeType: default
    intent: apps
    karpenter.sh/discovery: ${local.name}

Where ${local.name} is the name of the cluster and ${node_iam_role_name} is the name of the IAM role used for the ec2 instances. A configuration like this where we don’t define any of the AMIs and only use the amiFamily: AL2 (Amazon Linux 2) will automatically detect and use the latest ami for each of the available architectures we have in our nodepool; so we would not have to change anything in this case!!! ;)

You can see the compiled form with the actual AMIs using:

kubectl describe ec2nodeclass

Still, in some cases, folks will prefer to control this and define manually AMIs like this:

status:
  amis:
    - id: ami-01234567890123456
      name: custom-ami-amd64
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64

and if that is the case we need to make sure we have a similar definition for a valid arm64 ami:

    - id: ami-01234567890123456
      name: custom-ami-arm64
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64

That’s it; it is as simple as that; we have a nodepool that supports arm64 instances and a nodeclass that defines a proper ami to be used by those.

You can test this with a simple deployment like:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: workload-graviton
spec:
  replicas: 5
  selector:
    matchLabels:
      app: workload-graviton
  template:
    metadata:
      labels:
        app: workload-graviton
    spec:
      nodeSelector:
        intent: apps
        kubernetes.io/arch: arm64
      containers:
      - name: graviton2
        image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 512m
            memory: 512Mi 

and apply it with kubectl:

kubectl apply -f workload-graviton.yaml

Give it a couple of minutes and you can see the new node in the cluster:

kubectl get nodes -L karpenter.sh/capacity-type,beta.kubernetes.io/instance-type,karpenter.sh/nodepool,topology.kubernetes.io/zone -l karpenter.sh/initialized=true

the output will look something like:

NAME                          STATUS   ROLES    AGE   VERSION               CAPACITY-TYPE   INSTANCE-TYPE   NODEPOOL   ZONE
ip-10-0-62-224.ec2.internal   Ready    <none>   60s   v1.28.5-eks-5e0fdde   spot            c6g.xlarge      default    us-east-1a
ip-10-0-79-148.ec2.internal   Ready    <none>   87m   v1.28.5-eks-5e0fdde   on-demand       c6g.xlarge      default    us-east-1b

You can also check the karpenter logs with:

kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter --all-containers=true -f

Note: if you played along to test this, please don’t forget to clean up and delete the resources you no longer need.

Finally, I wanted to point out that because karpenter is automatically choosing the most cost-effective instances for your configuration (on-demand vs spot, or graviton2 vs intel) your instances might tilt automatically towards graviton2. You can still control your deployment if you want to run them on amd64 instances (for ex. if you don’t have arm64 versions available) using the kubernetes.io/arch spec config.

comments powered by Disqus