Troubleshooting

This section provides troubleshooting steps for Anjuna Confidential Pods that are not running properly on clusters with the Anjuna Kubernetes Toolset.

If you have noticed that the Anjuna Confidential Pods that you tried to deploy are not behaving as expected, keep reading to scope the problem and identify a solution.

On this page, some code blocks are shortened to emphasize only the relevant configuration. A line with <snip>…​ indicates that some lines have been removed from the full configuration.

Verifying the infrastructure

First, verify that the infrastructure is set up correctly. The infrastructure includes the anjuna-remote runtime class and the namespace anjuna-system. The following Pods should be Running in the anjuna-system namespace:

  1. cc-operator-controller-manager-*

  2. cc-operator-daemon-install-*

  3. cc-operator-pre-install-daemon-*

  4. anjuna-cloud-adaptor-daemonset-*

  5. peer-pods-webhook-controller-manager-*

Refer to Verify the installation to verify whether the needed infrastructure is correctly installed.

General guidelines for diagnostics

If the infrastructure is correctly installed, but your Anjuna Confidential Pod is not running as expected, you need to gather more information about the current state of the system. There are a few places where you can gather more information, as illustrated below.

Inspect Pod events

Apart from the Pod status itself (e.g., CrashLoopBackoff, ErrImagePull), the Pod events might provide some helpful insight into the issue. You can inspect Pod logs through kubectl describe pod <podname>. For example:

Get the Pod name:

$ kubectl get pods
NAME                            READY   STATUS              RESTARTS   AGE
anjuna-nginx-5f59f64769-g2xlc   0/1     ContainerCreating   0          18s

Describe the Pod. Look for the Events section at the bottom of the output:

$ kubectl describe pod anjuna-nginx-5f59f64769-g2xlc
Name:                anjuna-nginx-5f59f64769-g2xlc
Namespace:           default
Priority:            0
Runtime Class Name:  anjuna-remote
<snip>...


Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  2m5s  default-scheduler  Successfully assigned default/anjuna-nginx-5f59f64769-g2xlc to aks-nodepool1-14747780-vmss000001

Inspect the anjuna-cloud-adaptor logs

The anjuna-cloud-adaptor is a component responsible for creating and connecting the Anjuna Confidential Pods (i.e., AMD SEV-SNP confidential virtual machines) to the cluster. If any of the operations required to successfully deploy and configure an Anjuna Confidential Pod fails, the anjuna-cloud-adaptor logs will most likely show the reason.

Note that the anjuna-cloud-adaptor is a DaemonSet, therefore it runs on every Node of your cluster that has the label node.kubernetes.io/worker= applied to it.

If you have multiple Nodes with this label, you will see multiple instances of the anjuna-cloud-adaptor. Make sure to inspect the logs of the instance that runs on the same Node as your faulty Pod. You can verify the Node on which a Pod is running by adding the -o wide parameter to kubectl get pods.

For example:

$ kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
anjuna-nginx-5f59f64769-g2xlc   1/1     Running   0          5h50m   10.244.1.39   aks-nodepool1-14747780-vmss000001

In this example, the Pod runs on the Node aks-nodepool1-14747780-vmss000001:

$ NODE_NAME="aks-nodepool1-14747780-vmss000001"
$ kubectl get pods \
-o wide \
--field-selector spec.nodeName=${NODE_NAME} \
-n anjuna-system
NAME                                              READY   STATUS    RESTARTS        AGE   IP            NODE
<snip>...
anjuna-cloud-adaptor-daemonset-2l78s              1/1     Running   0               12d   10.224.0.4    aks-nodepool1-14747780-vmss000001   <none>           <none>

$ kubectl logs anjuna-cloud-adaptor-daemonset-2l78s -n anjuna-system
<snip>...
2023/10/11 15:03:12 [adaptor/proxy] getImageName: got image from annotations: docker.io/library/nginx:latest
2023/10/11 15:03:12 [adaptor/proxy] CreateContainer: calling PullImage for "docker.io/library/nginx:latest" before CreateContainer (cid: "6f0fa5737a806a2c6a54fe845ac8891b790ebbfc18c9e32805d8002c7ece895d")
2023/10/11 15:03:12 [adaptor/proxy] CreateContainer: successfully pulled image "docker.io/library/nginx:latest"
2023/10/11 15:03:12 [adaptor/proxy] StartContainer: containerID:6f0fa5737a806a2c6a54fe845ac8891b790ebbfc18c9e32805d8002c7ece895d

Possible issues include:

Issue

Resolution

Selected region does not support the selected instance type

Ensure that the selected instance type and the selected cloud service provider region are compatible with each other when configuring the Anjuna Cloud Adaptor.

Provided credentials do not have enough permissions to read the image

Ensure that the service principal used by the Anjuna Cloud Adaptor has at least Read access to your Shared Image Gallery.

The specified image (or its gallery) does not exist

Ensure that you specified the correct name and image ID.

Inspect the Confidential VM instance logs

If the logs from the anjuna-cloud-adaptor show that the instance is being created successfully, you can inspect the logs of the Anjuna Confidential Pod through anjuna-azure-cli (or the Azure portal). The name of the instance should be the same as the Pod. The resource group is the one that contains your image gallery.

$ anjuna-azure-cli instance log \
	--name <podname> \
	--resource-group <resource-group>

Frequently asked questions (FAQs)

The sections below illustrate some common issues and suggestions for approaching them.

The runtime class anjuna-remote is not present in the cluster

If the Anjuna Kubernetes Toolset is installed correctly, but the runtime class anjuna-remote is not installed in your cluster, you might be missing the label node.kubernetes.io/worker= on your Nodes.

You can verify this by describing the Node of interest:

$ kubectl get nodes # get the node name
$ kubectl describe node "<nodename>" | grep "node.kubernetes.io/worker="

If the command above does not return any results, you can apply the label to the specific Node:

$ kubectl label node "<nodename>" "node.kubernetes.io/worker="

Or to all Nodes at once:

$ kubectl label nodes --all "node.kubernetes.io/worker="

If the label is applied correctly, you should see, for each labeled Node, the following set of Pods in the anjuna-system namespace (* stands for a randomly-generated suffix):

  • cc-operator-daemon-install-*

  • cc-operator-pre-install-daemon-*

  • anjuna-cloud-adaptor-daemonset-*

A volume associated with a container in the Pod is not visible to the container

An environment variable assigned to the container in the spec is not present when it starts

External environment variables and files are a potential security concern, and are ignored by default. If you suspect some of them were ignored, you can verify that by inspecting the Anjuna Confidential Pod Confidential VM’s (CVM’s) logs.

First, fetch the logs by following the instructions in Getting CVM instance logs for an Anjuna Confidential Pod. Next, assuming you have fetched the logs into logs.txt, you can find the ignored environment variables or volumes by running the following command:

echo -e $(grep -Po '{"msg":"ANJ-KUBERNETES: \KIdentified the following .+(?=","level")' log.txt)

The output of this command may look like:

Identified the following environment variables not specifically allowed in the enclave config file:
 - KUBERNETES_SERVICE_PORT
 - KUBERNETES_SERVICE_PORT_HTTPS
 - KUBERNETES_PORT
 - KUBERNETES_PORT_443_TCP
 - PATH
 - KUBERNETES_PORT_443_TCP_PORT
 - KUBERNETES_SERVICE_HOST
 - HOSTNAME
 - KUBERNETES_PORT_443_TCP_PROTO
 - KUBERNETES_PORT_443_TCP_ADDR

Identified the following mounts not specifically allowed in the enclave config file:
 - /var/run/secrets/kubernetes.io/serviceaccount

If you have vetted these external resources and want to make them available to the container, you can explicitly add them to the container’s configuration. Refer to Untrusted configuration.

The anjuna-cloud-adaptor Pod is failing with an ErrImagePull status

This indicates an error when the cluster tries to pull the anjuna-cloud-adaptor image. This image is managed by the cluster administrator, who is in charge of installing the Anjuna Kubernetes Toolset.

First, identify the Anjuna Cloud Adaptor image that is causing the failure, which can be done with the following command:

$ kubectl get daemonset anjuna-cloud-adaptor-daemonset \
	-o jsonpath='{..image}' \
	-n anjuna-system

Here are some possible causes for this issue:

  • The image was not correctly pushed to the remote registry. This can be verified by pulling the image on a separate machine with access to the container registry. If you verify that the image was not pushed correctly, refer to Load and push the Anjuna Kubernetes Toolset image for instructions on how to push it again.

The image must be named anjuna-cloud-adaptor. If you picked a different image name, uninstall the Anjuna Kubernetes Toolset and re-install it with the correct image name.
  • The cluster does not have permission to pull the image (i.e., it is a private container repository). This can be fixed by assigning a role to the cluster Nodes so that they can pull images from your Azure Container Registry. Refer to Load and push the Anjuna Kubernetes Toolset image for more information.

The Confidential Pod is failing with an ErrImagePull status

If you are experiencing an ErrImagePull error when deploying an Anjuna Confidential Pod to Kubernetes, make sure that the image specified in the Pod specification matches the one you used when building the Anjuna Confidential Container disk image through anjuna-k8s-cli build. As you create and measure the disk image, anjuna-k8s-cli build will pull, unpack, and measure your specified container image, and only that can be deployed to the cluster. Specifying anything different in the Pod spec will lead to errors like this.

The container image running on the cluster will match the state of the container image at disk build time. If the container image is updated in the registry, the changes will only be deployable to Kubernetes once you rebuild the disk image and redeploy your application.

The Confidential Pod stays in ContainerCreating state forever

Verify the Pod events for any errors, such as missing mounts, and also check the anjuna-cloud-adaptor logs for more information, as it might be having issues related to your cloud service provider. Note that Anjuna Confidential Pods take about a minute or two to change to a Running state in Kubernetes.

Other errors

If you do not see your error documented on this page, try searching the documentation using the search bar at the upper-right corner of the page. Otherwise, contact support@anjuna.io with the error message, Anjuna Kubernetes Toolset version number, and relevant context about the action you were trying to perform.