Introduzione
In questo documento viene descritto il problema e la soluzione dei supporti del Registro di sistema con stato ImagePullBackOff.
Problema
I pod di registro in Cluster Manager (CM) di Ultra Cloud Core Subscriber Microservices Infrastructure (SMI) sono nello stato ImagePullBackOff.
cloud-user@lab-deployer-cm-primary:~$ kubectl get pods -A -o wide | grep -v "Running"
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
registry charts-cee-2020-02-2-1-1-0 0/1 ImagePullBackOff 0 100d 10.10.10.178 lab-deployer-cm-primary <none> <none>
registry charts-cluster-deployer-2020-02-2-35-0 0/1 ImagePullBackOff 0 100d 10.10.10.180 lab-deployer-cm-primary <none> <none>
registry registry-cee-2020-02-2-1-1-0 0/1 ImagePullBackOff 0 100d 10.10.10.198 lab-deployer-cm-primary <none> <none>
registry registry-cluster-deployer-2020-02-2-35-0 0/1 ImagePullBackOff 0 100d 10.10.10.152 lab-deployer-cm-primary <none> <none>
registry software-unpacker-0 0/1 ImagePullBackOff 0 100d 10.10.10.160 lab-deployer-cm-primary <none> <none>
In Distribuzione Common Execution Environment (CEE) viene visualizzato lo zero percento del sistema pronto perché la sincronizzazione del sistema in sospeso è true.
[deployer/cee] cee# show system
system uuid 012345678-9abc-0123-4567-000011112222
system status deployed true
system status percent-ready 0.0
system ops-center repository https://charts.10.192.1.1.nip.io/cee-2020.02.2.35
system ops-center-debug status false
system synch running true
system synch pending true.
Per la connessione a CEE, usare il protocollo SSH (Secure Shell Protocol). Viene segnalato l'errore 404 Not Found.
[deployer/cee] cee#
Message from confd-api-manager at 2022-05-05 01:01:01...
Helm update is ERROR. Trigger for update is CHANGE. Message is:
WebApplicationException: HTTP 404 Not Found
com.google.common.util.concurrent.UncheckedExecutionException: javax.ws.rs.WebApplicationException: HTTP 404 Not Found
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2052)
at com.google.common.cache.LocalCache.get(LocalCache.java:3943)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3967)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4952)
at com.broadhop.confd.config.proxy.dao.HelmRepositoryDAO.getChartVersion(HelmRepositoryDAO.java:638)
at com.broadhop.confd.config.proxy.dao.HelmRepositoryDAO.installRelease(HelmRepositoryDAO.java:359)
at com.broadhop.confd.config.proxy.dao.HelmRepositoryDAO.sendConfiguration(HelmRepositoryDAO.java:254)
at com.broadhop.confd.config.proxy.service.ConfigurationSynchManager.run(ConfigurationSynchManager.java:233)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.ws.rs.WebApplicationException: HTTP 404 Not Found
at com.broadhop.confd.config.proxy.dao.HelmRepositoryDAO.retrieveHelmIndex(HelmRepositoryDAO.java:620)
at com.broadhop.confd.config.proxy.dao.HelmRepositoryDAO$2.load(HelmRepositoryDAO.java:114)
at com.broadhop.confd.config.proxy.dao.HelmRepositoryDAO$2.load(HelmRepositoryDAO.java:112)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2273)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2156)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2046)
Analisi
- Controllare la configurazione del repository del timone in CEE Deployer.
[deployer/cee] cee# show running-config helm
helm default-repository base-repos
helm repository base-repos
url https://charts.10.192.1.1.nip.io/cee-2020.02.2.35
exit
- Eseguire una query sul file index.yaml dell'URL da Gestione cluster primario per verificare che la risposta 404 sia stata inviata.
cloud-user@deployer-cm-primary:~$ curl -k https://charts.10.192.1.1.nip.io/cee-2020.02.2.35/index.yaml
default backend - 404
- Eseguire una query sull'elenco delle immagini con
kubectl describe pod
Nessuna immagine basata sull'errore di descrizione.
cloud-user@lab-deployer-cm-primary:~$ kubectl describe pod ops-center-cee-labcluster-ops-center-df69975c7-gzszg -n cee-labcluster | grep Image
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-apps/cee-ops-center/2020.02.2/confd_init:0.7.0-00001111
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.33/smi-apps/cee-ops-center/2020.02.2/confd_init@sha256:0123456789012345678901234567890123456789012345678901234567890123
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-libraries/ops-center/2020.02.2/crd_registry:0.7.1-00002222
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.27/smi-libraries/ops-center/2020.02.2/crd_registry@sha256:0123456789012345678901234567890123456789012345678901234567890123
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-libraries/ops-center/2020.02.2/local_storage_init:0.7.1-00003333
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.27/smi-libraries/ops-center/2020.02.2/local_storage_init@sha256:0123456789012345678901234567890123456789012345678901234567890123
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-libraries/ops-center/2020.02.2/confd:0.7.1-00004444
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.27/smi-libraries/ops-center/2020.02.2/confd@sha256:0123456789012345678901234567890123456789012345678901234567890123
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-libraries/ops-center/2020.02.2/confd_api_bridge:0.7.1-00005555
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.33/smi-libraries/ops-center/2020.02.2/confd_api_bridge@sha256:0123456789012345678901234567890123456789012345678901234567890123
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-apps/cee-ops-center/2020.02.2/product_confd_callback:0.7.0-00006666
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.27/smi-apps/cee-ops-center/2020.02.2/product_confd_callback@sha256:0123456789012345678901234567890123456789012345678901234567890123
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-libraries/ops-center/2020.02.2/ssh_ui:0.7.1-00007777
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-libraries/ops-center/2020.02.2/ssh_ui@sha256:0123456789012345678901234567890123456789012345678901234567890123
Image: docker.10.192.1.1.nip.io/cee-2020.02.2.35/smi-libraries/ops-center/2020.02.2/confd_notifications:0.7.1-00008888
Image ID: docker-pullable://docker.10.192.1.1.nip.io/cee-2020.02.2.27/smi-libraries/ops-center/2020.02.2/confd_notifications@sha256:0123456789012345678901234567890123456789012345678901234567890123
- Eseguire la
kubectl describe pod
per il registro di sistema name state.
- Eseguire la
kubectl get pods -A -o wide | grep -v "Running"
per controllare lo stato dei pod in tutti gli spazi dei nomi nel cluster Kubernetes. cloud-user@lab-deployer-cm-primary:~$ kubectl describe pod charts-cee-2020-02-2-1-1-0 -n registry
Volumes:
charts-volume:
Type: HostPath (bare host directory volume)
Path: /data/software/packages/cee-2020.02.2.1.1/data/charts
HostPathType: DirectoryOrCreate
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 9m3s (x104861 over 16d) kubelet Back-off pulling image
"dockerhub.cisco.com/smi-fuse-docker-internal/smi-apps/distributed-registry/2020.02.2/apache:0.1.0-abcd123"
Warning Failed 3m59s (x104884 over 16d) kubelet Error: ImagePullBackOff
cloud-user@lab-deployer-cm-primary:$ kubectl describe pod charts-cluster-deployer-2020-02-2-35-0 -n registry
Name: charts-cluster-deployer-2020-02-2-35-0
Namespace: registry
Priority: 1000000000
Priority Class Name: infra-critical
Node: lab-deployer-cm-primary/10.192.1.1
Start Time: Thu, 01 Jan 2022 13:05:03 +0000
Labels: chart-app=charts-cluster-deployer-2020-02-2-35
component=charts
controller-revision-hash=charts-cluster-deployer-2020-02-2-35-589fdf57b8
registry=cluster-deployer-2020.02.2.35
statefulset.kubernetes.io/pod-name=charts-cluster-deployer-2020-02-2-35-0
Annotations: cni.projectcalico.org/podIP: 10.10.10.180/32
cni.projectcalico.org/podIPs: 10.10.10.180/32
sidecar.istio.io/inject: false
Status: Pending
IP: 10.10.10.180
IPs:
IP: 10.10.10.180
Controlled By: StatefulSet/charts-cluster-deployer-2020-02-2-35
Containers:
charts:
Container ID:
Image: dockerhub.cisco.com/smi-fuse-docker-internal/smi-apps/distributed-registry/2020.02.2/apache:0.1.0-abcd123
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qcmhx (ro)
/var/www/html/cluster-deployer-2020.02.2.35 from charts-volume (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
charts-volume:
Type: HostPath (bare host directory volume)
Path: /data/software/packages/cluster-deployer-2020.02.2.35/data/charts
HostPathType: DirectoryOrCreate
default-token-qcmhx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qcmhx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 30s
node.kubernetes.io/unreachable:NoExecute op=Exists for 30s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 118s (x104949 over 16d) kubelet Back-off pulling image
"dockerhub.cisco.com/smi-fuse-docker-internal/smi-apps/distributed-registry/2020.02.2/apache:0.1.0-abcd123"
cloud-user@lab-deployer-cm-primary:/data/software/packages/cluster-deployer-2020.02.2.35/data/charts$
cloud-user@lab-deployer-cm-primary:$ kubectl get pods -A -o wide | grep -v "Running"
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
registry charts-cee-2020-02-2-1-1-0 0/1 ImagePullBackOff 0 100d 10.10.10.178 lab-deployer-cm-primary <none> <none>
registry charts-cluster-deployer-2020-02-2-35-0 0/1 ErrImagePull 0 100d 10.10.10.180 lab-deployer-cm-primary <none> <none>
registry registry-cee-2020-02-2-1-1-0 0/1 ErrImagePull 0 100d 10.10.10.198 lab-deployer-cm-primary <none> <none>
registry registry-cluster-deployer-2020-02-2-35-0 0/1 ImagePullBackOff 0 100d 10.10.10.152 lab-deployer-cm-primary <none> <none>
registry software-unpacker-0 0/1 ImagePullBackOff 0 100d 10.10.10.160 lab-deployer-cm-primary <none> <none>
- Confermare i file nel server di distribuzione cluster.
cloud-user@lab-deployer-cm-primary:/data/software/packages$ cd cluster-deployer-2020.02.2.35/
cloud-user@lab-deployer-cm-primary:/data/software/packages/cluster-deployer-2020.02.2.35$ ll
total 12
drwxrwxr-x 3 303 303 4096 Jan 1 2021 ./
drwxrwxrwt 5 root root 4096 Mar 1 11:39 ../
drwxrwxr-x 5 303 303 4096 Jan 1 2021 data/
cloud-user@lab-deployer-cm-primary:/data/software/packages/cluster-deployer-2020.02.2.35$ cd data/
cloud-user@lab-deployer-cm-primary:/data/software/packages/cluster-deployer-2020.02.2.35/data$ ll
total 20
drwxrwxr-x 5 303 303 4096 Jan 1 2021 ./
drwxrwxr-x 3 303 303 4096 Jan 1 2021 ../
drwxr-xr-x 2 303 303 4096 Mar 1 12:55 charts/
drwxr-xr-x 4 303 303 4096 Aug 10 2021 deployer-inception/
drwxr-xr-x 3 303 303 4096 Aug 10 2021 docker/
cloud-user@lab-deployer-cm-primary:/data/software/packages/cluster-deployer-2020.02.2.35/data$ cd charts/
cloud-user@lab-deployer-cm-primary:/data/software/packages/cluster-deployer-2020.02.2.35/data/charts$ ll
total 116
drwxr-xr-x 2 303 303 4096 Mar 1 12:55 ./
drwxrwxr-x 5 303 303 4096 Jan 1 2021 ../
-rw-r--r-- 1 303 303 486 Aug 10 2021 index.yaml
-rw-r--r-- 1 303 303 102968 Mar 1 12:55 smi-cluster-deployer-1.1.0-2020-02-2-1144-210826141421-15f3d5b.tgz
cloud-user@lab-deployer-cm-primary:/tmp$
cloud-user@lab-deployer-cm-primary:/tmp$ ls /tmp/k8s-* -al
-rw-r--r-- 1 root root 2672 Sep 7 2021 /tmp/k8s-offline.tgz.txt
Soluzione
Il problema è probabilmente causato da un errore di sincronizzazione del cluster. La soluzione consiste nell'eseguire la sincronizzazione di un cluster dal server di avvio al server ad alta disponibilità (HA, High Availability) di CM.
- Utilizzare SSH per connettersi al server di ispezione.
- Usare SSH per collegarsi alla porta 2022 del centro operativo.
cloud-user@all-in-one-vm:~$ ssh admin@localhost -p 2022
- Verificare che il cluster si trovi nel server di avvio.
[all-in-one-base-vm] SMI Cluster Deployer# show clusters
- Verificare e confermare che la configurazione del cluster sia corretta. In questo esempio il nome del cluster è lab-deployer.
[all-in-one-base-vm] SMI Cluster Deployer# show running-config clusters lab-deployer
- Eseguire la sincronizzazione del cluster.
[all-in-one-base-vm] SMI Cluster Deployer# clusters lab-deployer actions sync run debug
- Monitorare i registri di sincronizzazione.
[all-in-one-base-vm] SMI Cluster Deployer# monitor sync-logs lab-deployer
Successful cluster sync logs example below :
Wednesday 01 December 2021 01:01:01 +0000 (0:00:00.080) 0:33:08.600 ****
===============================================================================
2021-12-01 01:01:01.230 DEBUG cluster_sync.ca-deployer: Cluster sync successful
2021-12-01 01:01:01.230 DEBUG cluster_sync.ca-deployer: Ansible sync done
2021-12-01 01:01:01.231 INFO cluster_sync.ca-deployer: _sync finished. Opening lock
- Utilizzare SSH per connettersi a Cluster Manager e verificare che i pod siano in stato "running".
cloud-user@lab-deployer-cm-primary:~$ kubectl get pods -A -o wide | grep -v "Running"