Troubleshooting
This section provides troubleshooting steps for Anjuna Confidential Pods that are not running properly on clusters with the Anjuna Kubernetes Toolset.
If you have noticed that the Anjuna Confidential Pods that you tried to deploy are not behaving as expected, keep reading to scope the problem and identify a solution.
On this page,
some code blocks are shortened to emphasize only the relevant configuration.
A line with <snip>… indicates that some lines have been removed from the full configuration.
|
Verifying the infrastructure
First, verify that the infrastructure is set up correctly.
The infrastructure includes the anjuna-remote
runtime class and the namespace anjuna-system
.
The following Pods should be Running in the anjuna-system
namespace:
-
cc-operator-controller-manager-*
-
cc-operator-daemon-install-*
-
cc-operator-pre-install-daemon-*
-
anjuna-cloud-adaptor-daemonset-*
-
peer-pods-webhook-controller-manager-*
Refer to Verify the installation to verify whether the needed infrastructure is correctly installed.
General guidelines for diagnostics
If the infrastructure is correctly installed, but your Anjuna Confidential Pod is not running as expected, you need to gather more information about the current state of the system. There are a few places where you can gather more information, as illustrated below.
Inspect Pod events
Apart from the Pod status itself (e.g., CrashLoopBackoff
, ErrImagePull
),
the Pod events might provide some helpful insight into the issue.
You can inspect Pod logs through kubectl describe pod <podname>
. For example:
Get the Pod name:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
anjuna-nginx-5f59f64769-g2xlc 0/1 ContainerCreating 0 18s
Describe the Pod. Look for the Events section at the bottom of the output:
$ kubectl describe pod anjuna-nginx-5f59f64769-g2xlc
Name: anjuna-nginx-5f59f64769-g2xlc
Namespace: default
Priority: 0
Runtime Class Name: anjuna-remote
<snip>...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m5s default-scheduler Successfully assigned default/anjuna-nginx-5f59f64769-g2xlc to aks-nodepool1-14747780-vmss000001
Inspect the anjuna-cloud-adaptor
logs
The anjuna-cloud-adaptor
is a component responsible for creating and connecting the Anjuna Confidential Pods
(i.e., AMD SEV-SNP confidential virtual machines) to the cluster.
If any of the operations required to successfully deploy and configure an Anjuna Confidential Pod fails,
the anjuna-cloud-adaptor
logs will most likely show the reason.
Note that the anjuna-cloud-adaptor
is a DaemonSet,
therefore it runs on every Node of your cluster that has the
label node.kubernetes.io/worker=
applied to it.
If you have multiple Nodes with this label,
you will see multiple instances of the anjuna-cloud-adaptor
.
Make sure to inspect the logs of the instance that runs on the same Node as your faulty Pod.
You can verify the Node on which a Pod is running by adding the -o wide
parameter
to kubectl get pods
.
For example:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
anjuna-nginx-5f59f64769-g2xlc 1/1 Running 0 5h50m 10.244.1.39 aks-nodepool1-14747780-vmss000001
In this example, the Pod runs on the Node aks-nodepool1-14747780-vmss000001
:
$ NODE_NAME="aks-nodepool1-14747780-vmss000001"
$ kubectl get pods \
-o wide \
--field-selector spec.nodeName=${NODE_NAME} \
-n anjuna-system
NAME READY STATUS RESTARTS AGE IP NODE
<snip>...
anjuna-cloud-adaptor-daemonset-2l78s 1/1 Running 0 12d 10.224.0.4 aks-nodepool1-14747780-vmss000001 <none> <none>
$ kubectl logs anjuna-cloud-adaptor-daemonset-2l78s -n anjuna-system
<snip>...
2023/10/11 15:03:12 [adaptor/proxy] getImageName: got image from annotations: docker.io/library/nginx:latest
2023/10/11 15:03:12 [adaptor/proxy] CreateContainer: calling PullImage for "docker.io/library/nginx:latest" before CreateContainer (cid: "6f0fa5737a806a2c6a54fe845ac8891b790ebbfc18c9e32805d8002c7ece895d")
2023/10/11 15:03:12 [adaptor/proxy] CreateContainer: successfully pulled image "docker.io/library/nginx:latest"
2023/10/11 15:03:12 [adaptor/proxy] StartContainer: containerID:6f0fa5737a806a2c6a54fe845ac8891b790ebbfc18c9e32805d8002c7ece895d
Possible issues include:
Issue |
Resolution |
Selected region does not support the selected instance type |
Ensure that the selected instance type and the selected cloud service provider region are compatible with each other when configuring the Anjuna Cloud Adaptor. |
Provided credentials do not have enough permissions to read the image |
Ensure that the service principal used by the Anjuna Cloud Adaptor has at least Read access to your Shared Image Gallery. |
The specified image (or its gallery) does not exist |
Ensure that you specified the correct name and image ID. |
Inspect the Confidential VM instance logs
If the logs from the anjuna-cloud-adaptor
show that the instance is being created successfully,
you can inspect the logs of the Anjuna Confidential Pod through anjuna-azure-cli
(or the Azure portal).
The name of the instance should be the same as the Pod.
The resource group is the one that contains your image gallery.
$ anjuna-azure-cli instance log \
--name <podname> \
--resource-group <resource-group>
Frequently asked questions (FAQs)
The sections below illustrate some common issues and suggestions for approaching them.
The runtime class anjuna-remote
is not present in the cluster
If the Anjuna Kubernetes Toolset is installed correctly,
but the runtime class anjuna-remote
is not installed in your cluster,
you might be missing the label node.kubernetes.io/worker=
on your Nodes.
You can verify this by describing the Node of interest:
$ kubectl get nodes # get the node name
$ kubectl describe node "<nodename>" | grep "node.kubernetes.io/worker="
If the command above does not return any results, you can apply the label to the specific Node:
$ kubectl label node "<nodename>" "node.kubernetes.io/worker="
Or to all Nodes at once:
$ kubectl label nodes --all "node.kubernetes.io/worker="
If the label is applied correctly, you should see, for each labeled Node,
the following set of Pods in the anjuna-system
namespace
(*
stands for a randomly-generated suffix):
-
cc-operator-daemon-install-*
-
cc-operator-pre-install-daemon-*
-
anjuna-cloud-adaptor-daemonset-*
An environment variable assigned to the container in the spec is not present when it starts
External environment variables and files are a potential security concern, and are ignored by default. If you suspect some of them were ignored, you can verify that by inspecting the Anjuna Confidential Pod Confidential VM’s (CVM’s) logs.
First, fetch the logs by following the instructions
in Getting CVM instance logs for an Anjuna Confidential Pod.
Next, assuming you have fetched the logs into logs.txt
,
you can find the ignored environment variables or volumes by running the following command:
echo -e $(grep -Po '{"msg":"ANJ-KUBERNETES: \KIdentified the following .+(?=","level")' log.txt)
The output of this command may look like:
Identified the following environment variables not specifically allowed in the enclave config file:
- KUBERNETES_SERVICE_PORT
- KUBERNETES_SERVICE_PORT_HTTPS
- KUBERNETES_PORT
- KUBERNETES_PORT_443_TCP
- PATH
- KUBERNETES_PORT_443_TCP_PORT
- KUBERNETES_SERVICE_HOST
- HOSTNAME
- KUBERNETES_PORT_443_TCP_PROTO
- KUBERNETES_PORT_443_TCP_ADDR
Identified the following mounts not specifically allowed in the enclave config file:
- /var/run/secrets/kubernetes.io/serviceaccount
If you have vetted these external resources and want to make them available to the container, you can explicitly add them to the container’s configuration. Refer to Untrusted configuration.
The anjuna-cloud-adaptor
Pod is failing with an ErrImagePull
status
This indicates an error when the cluster tries to pull the anjuna-cloud-adaptor
image.
This image is managed by the cluster administrator, who is in charge of installing the Anjuna Kubernetes Toolset.
First, identify the Anjuna Cloud Adaptor image that is causing the failure, which can be done with the following command:
$ kubectl get daemonset anjuna-cloud-adaptor-daemonset \
-o jsonpath='{..image}' \
-n anjuna-system
Here are some possible causes for this issue:
-
The image was not correctly pushed to the remote registry. This can be verified by pulling the image on a separate machine with access to the container registry. If you verify that the image was not pushed correctly, refer to Load and push the Anjuna Kubernetes Toolset image for instructions on how to push it again.
The image must be named anjuna-cloud-adaptor .
If you picked a different image name,
uninstall the Anjuna Kubernetes Toolset and re-install it with the correct image name.
|
-
The cluster does not have permission to pull the image (i.e., it is a private container repository). This can be fixed by assigning a role to the cluster Nodes so that they can pull images from your Azure Container Registry. Refer to Load and push the Anjuna Kubernetes Toolset image for more information.
The Confidential Pod is failing with an ErrImagePull
status
If you are experiencing an ErrImagePull
error when deploying an Anjuna Confidential Pod to Kubernetes,
make sure that the image specified in the Pod specification
matches the one you used when building the Anjuna Confidential Container disk image through anjuna-k8s-cli build
.
As you create and measure the disk image, anjuna-k8s-cli build
will pull, unpack,
and measure your specified container image,
and only that can be deployed to the cluster.
Specifying anything different in the Pod spec will lead to errors like this.
The container image running on the cluster will match the state of the container image at disk build time. If the container image is updated in the registry, the changes will only be deployable to Kubernetes once you rebuild the disk image and redeploy your application. |
The Confidential Pod stays in ContainerCreating
state forever
Verify the Pod events for any errors,
such as missing mounts, and also check the anjuna-cloud-adaptor
logs for more information,
as it might be having issues related to your cloud service provider.
Note that Anjuna Confidential Pods take about a minute or two to change to a Running state in Kubernetes.
Other errors
If you do not see your error documented on this page, try searching the documentation using the search bar at the upper-right corner of the page. Otherwise, contact support@anjuna.io with the error message, Anjuna Kubernetes Toolset version number, and relevant context about the action you were trying to perform.