Overview
This chapter describes the Key Performance Indicators (KPIs) available to monitor and analyze the performance of the CDL.
The label name and description of the metrics used in CDL are defined in the following table:
Metric Label Name | Label Description |
---|---|
db | The name of the datastore. |
operation | The name of operation performed on the CDL pods. |
errorCode | The error code sent in response. |
errorMessage | The error message sent in response. |
slot_shard_id | The Slot map or Shard id where the operation is performed. |
slot_instance_id | The Slot instance id where the operation is performed. |
shardId | The Slot or Index map or Shard id where the metric is pegged. |
instanceId | The Slot or Index map or Instance id where the metric is pegged. |
session_type | The type of session data present in the record. |
bucket | The bucket represents the bucket under which the session lies. The current buckets are <=1kb, <=2kb, <=4kb,<=8kb, <=16kb, <=32kb, >32kb |
notification_type | The type of notification sent from CDL. Values: TIMER_EXPIRED, RECORD_CONFLICT, BULK_TASK_NOTIFICATION |
topic | The topic while publishing to kafka |
not_found_in | The pod from where the data was not found. Values: Index/Slot |
CDL Category
bulk_task_total
Description: Total number of bulk tasks with processing status
Sample Query: bulk_task_total
Labels:
-
Label:
db
Label Description: DB name
Example: session
-
Label:
slot_shard_id
Label Description: The slot shard id
Example: 1, 2
-
Label:
slot_instance_id
Label Description: The slot instance id
Example: 1, 2
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
-
Label:
status
Label Description: Processing status of bulk task
Example: timeout, skipped, completed_last_record, completed
cdl_ep_to_slot_request_tps
Description: Recording rule for endpoint to slot request TPS measurement
Sample Query: cdl_ep_to_slot_request_tps
Labels:
-
Label:
namespace
Label Description: Kubernetes namespace from which the metric is generated
Example: cdl-global
-
Label:
pod
Label Description: Endpoint pod name from which the metric is generated
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, UpdateFlags
-
Label:
errorCode
Label Description: The errorCode in the DB response for deletion
Example: 0, 502
cdl_ep_to_slot_response_time
Description: Recording rule for endpoint to slot response time measurement
Sample Query: cdl_ep_to_slot_response_time
Labels:
-
Label:
namespace
Label Description: Kubernetes namespace from which the metric is generated
Example: cdl-global
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, UpdateFlags
-
Label:
errorCode
Label Description: The errorCode in the DB response for deletion
Example: 0, 502
cdl_geo_replication_enabled
Description: Gauge metric to indicate geo replication status. If Geo replication is enabled then value is 1 else 0
Sample Query: cdl_geo_replication_enabled
cdl_index_record_capacity
Description: Total index record capacity of CDL
Sample Query: cdl_index_record_capacity{db=\"session\"}
Labels:
-
Label:
db
Label Description: DB name
Example: session
cdl_slot_record_capacity
Description: Total slot record capacity of CDL
Sample Query: cdl_slot_record_capacity{db=\"session\"}
Labels:
-
Label:
db
Label Description: DB name
Example: session
cdl_slot_size_capacity
Description: Total slot size capacity of CDL
Sample Query: cdl_slot_size_capacity{db=\"session\"}
Labels:
-
Label:
db
Label Description: DB name
Example: session
consumer_kafka_nonprocessed_records_total
Description: Total count of unprocessed kafka records since originated from same pod
Sample Query: sum(consumer_kafka_nonprocessed_records_total)by(shardId,instanceId)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Get, Multi
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
instanceId
Label Description: The instance id
Example: 1, 2
-
Label:
reason
Label Description: The reason for skipping the consumed kafka record
Example: old_timestamp
consumer_kafka_records_duration_seconds
Description: Time taken to process consumed kafka records
Sample Query: sum(irate(consumer_kafka_records_duration_seconds[5m]))by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
origin_instance_id
Label Description: The index instance id from which the kafka request originated
Example: 1.1, 1.2
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
consumer_kafka_records_total
Description: Total count of records consumed from kafka
Sample Query: sum(irate(consumer_kafka_records_total[5m]))by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
origin_instance_id
Label Description: The index instance id from which the kafka request originated
Example: 1.1, 1.2
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
datastore_internal_requests_duration_seconds
Description: Time taken for processing of internal datastore requests
Sample Query: sum(datastore_internal_requests_duration_seconds)by(operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: RemoteBulkRead, RemoteBulkReadIndexing, GetChecksumRemoteSlot
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 1401
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
datastore_requests_duration_seconds
Description: Total time taken for processing requests at cdl-ep
Sample Query: sum(irate(datastore_requests_duration_seconds{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, Find, FindByUK, GetCdlStatus, UpdateFlags
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 400, 403, 404, 409, 413, 501, 502, 503, 507, 508
Labels:
-
Label:
local_request
Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.
Example: 1, 0
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
datastore_requests_total
Description: Total count of requests received at cdl-ep
Sample Query: sum(irate(datastore_requests_total{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, Find, FindByUK, GetCdlStatus, UpdateFlags
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 400, 403, 404, 409, 413, 501, 502, 503, 507, 508
Labels:
-
Label:
local_request
Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.
Example: 1, 0
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
db_records_softdelete_total
Description: Total count of records for the db which are in soft delete/purge state due to purgeOnEval set
Sample Query: sum(avg(db_records_softdelete_total{notify=\"1\"})by(notify))
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
Labels:
-
Label:
notify
Label Description: Whether purgeOnNotify is set. 1 indicates purgeOnNotify=true, 0 otherwise.
Example: 1
db_records_total
Description: Total count of records for the db. The following metrics can be achieved: 1. Total record count - Query: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,sliceName)) 2. Slice wise record count - Query: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,sliceName))by(sliceName) 3. System ID based count: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,sliceName))by(systemId) 4. Sessions grouped by session type - Query: avg(db_records_total{namespace=\"$namespace\",session_type!=\"total\"}) by (session_type)
Sample Query: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,sliceName))
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
session_type
Label Description: The session type stored in the data
Example: GX, RX, total
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
Labels:
-
Label:
appInstanceId
Label Description: The app instance id populated by app in the record.
Example: 1
dpapp_internal_requests_total
Description: Total count of internal dp app requests
Sample Query: sum(dpapp_internal_requests_total)by(operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: RemoteBulkRead, RemoteBulkReadIndexing, GetChecksumRemoteSlot
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 1401
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
duplicate_slot_records_deleted
Description: Total slot records deleted due to duplicate slot data found
Sample Query: duplicate_slot_records_deleted
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 502
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
find_no_record_total
Description: Total count of find requests for which no records are sent back
Sample Query: sum(find_no_record_total)by(not_found_in,operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: FindByUk, FindTagsByUk, Find
Labels:
-
Label:
not_found_in
Label Description: Whether the data not found in index or slot
Example: index, slot
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
findall_records_bucket
Description: The total number of findAll requests received which can be grouped into the number of records sent in response
Sample Query: sum(irate(findall_records_bucket[5m]))by(bucket)
Labels:
-
Label:
bucket
Label Description: Buckets grouped by no of records
Example: =0, <=10, <=20, <=50, <=100, >100
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
index_init_sync_duration_seconds
Description: Time taken by the index to sync with local and remote peers during startup
Sample Query: sum(index_init_sync_duration_seconds)by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
indexing_audit_deleted_keys_total
Description: Total number of unique keys and primary keys deleted during index auditing
Sample Query: sum(irate(indexing_audit_deleted_keys_total{errorCode=\"0\",key_type=\"unique\"}[5m]))by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
key_type
Label Description: The type of key
Example: primary, unique
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 302, 403
indexing_audit_duration_seconds
Description: Total time taken for performing the indexing audit
Sample Query: indexing_audit_duration_seconds{pod=~\".*\"}
indexing_audit_total
Description: Total times the indexing audit was run
Sample Query: indexing_audit_total{pod=~\".*\"}
indexing_is_leader
Description: Indexing is leader or follower
Sample Query: indexing_is_leader
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
indexing_kafka_replication_delay_seconds
Description: Total delay in replicating indexes from kafka in index
Sample Query: sum(irate(indexing_kafka_replication_delay_seconds[5m]))by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
origin_instance_id
Label Description: The index instance id from which the kafka request originated
Example: 1.1, 1.2
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
indexing_operation_duration_seconds
Description: Time taken for response of indexing operations sent from cdl ep to index app
Sample Query: sum(irate(indexing_operation_duration_seconds{errorCode=\"0\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, GetByPk, GetByUk
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 404, 500, 503
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
indexing_operation_total
Description: Total count of indexing operations sent from cdl ep to index app
Sample Query: sum(irate(indexing_operation_total{errorCode=\"0\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, GetByPk, GetByUk
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 404, 500, 503
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
indexing_overwrites_total
Description: Total number of indexing set operations for which index record is overwriting
Sample Query: sum(indexing_overwrites_total)by(key_type,shardId)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
key_type
Label Description: The type of key
Example: primary, unique
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
indexing_records_total
Description: Total count of records in the indexing
Sample Query: indexing_records_total{pod=~\".*\"}
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
indexing_requests_duration_seconds
Description: Time taken for response of indexing requests received
Sample Query: sum(irate(indexing_requests_duration_seconds{errorCode=\"0\",isKafka=\"1\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Set, Delete
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 404, 1408
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
Labels:
-
Label:
isKafka
Label Description: Whether the request is from kafka or GRPC. If isKafka = 1 then the request is from kafka
Example: 1, 0
indexing_requests_total
Description: Total number of requests received at index pod
Sample Query: sum(irate(indexing_requests_total{errorCode=\"0\",isKafka=\"1\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Set, Delete
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 404, 1408
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
Labels:
-
Label:
isKafka
Label Description: Whether the request is from kafka or GRPC. If isKafka = 1 then the request is from kafka
Example: 1, 0
inmemory_indexing_operation_duration_seconds
Description: Total time taken for responses to requests from cdl-ep to cdl-index pod
Sample Query: sum(inmemory_indexing_operation_duration_seconds{errorCode=\"0\"})by(operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Get, Multi
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
instanceId
Label Description: The instance id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 404
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
inmemory_indexing_operation_total
Description: Total count of operations from cdl-ep to cdl-index pod
Sample Query: sum(inmemory_indexing_operation_total)by(operation,shardId)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Get, Multi
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
instanceId
Label Description: The instance id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 404
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
kafka_connection_status
Description: Kafka connection status
Sample Query: kafka_connection_status
Labels:
-
Label:
topic
Label Description: Kafka topic name
Example: kv.kafka.shard.1.1.1
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
kafka_producer_pending_publish_total
Description: Total count of messages pending to be published to kafka
Sample Query: kafka_producer_pending_publish_total{pod=~\".*\"}
kafka_producer_requests_duration_seconds
Description: Total time taken by kafka producer to process requests
Sample Query: sum(irate(kafka_producer_requests_duration_seconds[5m])) by (topic)
Labels:
-
Label:
topic
Label Description: Kafka topic name
Example: kv.kafka.shard.1.1.1
kafka_producer_requests_total
Description: Total count of requests sent towards kafka
Sample Query: kafka_producer_requests_total
Labels:
-
Label:
topic
Label Description: Kafka topic name
Example: kv.kafka.shard.1.1.1
kafka_records_replayed_total
Description: Total number of records published to kafka due to leader-switchover or kafka-reconnection
Sample Query: sum(kafka_records_replayed_total{reason=\"leader_switchover\"})by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
reason
Label Description: The reason for replaying kafka records
Example: leader_switchover, kafka_reconnection
overwritten_index_records_deleted
Description: Total number of records deleted due to overwritten/duplicate unique keys at index
Sample Query: overwritten_index_records_deleted
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response for deletion
Example: 0, 502
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
overwritten_index_records_skipped
Description: Total number of unprocessed stale records due to queue being full
Sample Query: overwritten_index_records_skipped
Labels:
-
Label:
action
Label Description: action that was supposed to be performed for the stale record
Example: delete, notify
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
records_notification_duration_seconds
Description: Time taken for notification sent towards notification endpoint
Sample Query: sum(irate(records_notification_duration_seconds[5m])) by (shardId,instance_id,notification_type)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
notification_type
Label Description: Type of the notification
Example: TIMER_EXPIRED, RECORD_CONFLICT, BULK_TASK_NOTIFICATION
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 1406
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
records_notification_retry_count
Description: Total notification retries by the slot app
Sample Query: sum(irate(records_notification_retry_count[5m])) by (shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
records_notification_total
Description: Total count of notifications sent towards notification endpoint
Sample Query: sum(irate(records_notification_total[5m])) by (shardId,instance_id,notification_type)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
notification_type
Label Description: Type of the notification
Example: TIMER_EXPIRED, RECORD_CONFLICT, BULK_TASK_NOTIFICATION
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 1406
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
remote_requests_dropped_total
Description: Total number of remote requests that have been dropped
Sample Query: remote_requests_dropped_total
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, UpdateFlags
Labels:
-
Label:
reason
Label Description: The reason for dropping the remote requests
Example: queue_full
remote_site_connection_status
Description: CDL endpoint to remote site cdl-ep connection count
Sample Query: sum(remote_site_connection_status)by(pod,systemId)
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
remote_site_connections_total
Description: Total number of remote site connections configured per endpoint pod
Sample Query: remote_site_connections_total
Labels:
-
Label:
systemId
Label Description: The systemId id of the remote site
Example: 1, 2
slot_checksum_mismatch_total
Description: Total number of checksum mismatch
Sample Query: sum(irate(slot_checksum_mismatch_total[5m]))by(slot_shard_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
slot_shard_id
Label Description: The slot shard id
Example: 1, 2
slot_geo_replication_requests_duration_seconds
Description: Time taken to send the response of slot geo replication
Sample Query: sum(irate(slot_geo_replication_requests_duration_seconds[5m]))by(systemId,operation)
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: CREATE, DELETE, UPDATE, UPDATEFLAGS
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 503
slot_geo_replication_requests_total
Description: Total number of requests for slot geo replication
Sample Query: sum(irate(slot_geo_replication_requests_total[5m]))by(systemId,operation)
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: CREATE, DELETE, UPDATE, UPDATEFLAGS
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 503
slot_init_sync_duration_seconds
Description: Time taken by the slot to sync with local and remote peers during startup
Sample Query: sum(slot_init_sync_duration_seconds)by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
slot_operation_duration_seconds
Description: Time taken for response of operations sent from cdl ep to slot app
Sample Query: sum(irate(slot_operation_duration_seconds{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, Find, UpdateFlags
Labels:
-
Label:
slot_shard_id
Label Description: The slot shard id
Example: 1, 2
Labels:
-
Label:
slot_instance_id
Label Description: The slot instance id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 104, 105
Labels:
-
Label:
local_request
Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.
Example: 1, 0
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
slot_operation_total
Description: Total count of operations sent from cdl ep to slot app
Sample Query: sum(irate(slot_operation_total{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Update, Delete, Find, UpdateFlags
Labels:
-
Label:
slot_shard_id
Label Description: The slot shard id
Example: 1, 2
Labels:
-
Label:
slot_instance_id
Label Description: The slot instance id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 104, 105
Labels:
-
Label:
local_request
Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.
Example: 1, 0
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
slot_purged_sessions_duration_seconds
Description: Time taken for purging sessions at slot due to next eval timer expiry and purge=true
Sample Query: sum(irate(slot_purged_sessions_duration_seconds{errorCode=\"0\"}[5m]))by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 501, 508
-
Example: Get index record failure, Invalid Slice Name received
Labels:
-
Label:
notify
Label Description: Whether purgeOnNotify is set. 1 indicates purgeOnNotify=true, 0 otherwise.
Example: 1
slot_purged_sessions_total
Description: Total number of sessions purged at slot due to next eval timer expiry and purge=true
Sample Query: sum(irate(slot_purged_sessions_total{errorCode=\"0\"}[5m]))by(shardId,instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 501, 508
Labels:
-
Label:
notify
Label Description: Whether purgeOnNotify is set. 1 indicates purgeOnNotify=true, 0 otherwise.
Example: 1
slot_reconciled_records_total
Description: Total number of reconciled records
Sample Query: sum(slot_reconciled_records_total)by(systemId,slot_shard_id,slot_instance_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
slot_shard_id
Label Description: The slot shard id
Example: 1, 2
Labels:
-
Label:
slot_instance_id
Label Description: The slot instance id
Example: 1, 2
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Create, Delete
slot_reconciliation_duration_seconds
Description: Total time taken to execute reconciliation
Sample Query: sum(slot_reconciliation_duration_seconds{isError=\"0\"})by(slot_shard_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
slot_shard_id
Label Description: The slot shard id
Example: 1, 2
Labels:
-
Label:
isError
Label Description: Whether any error occurred while reconciling. If isError = 1, then error happened
Example: 0, 1
slot_reconciliation_total
Description: Total number of reconciliation triggered by checksum mismatch
Sample Query: sum(slot_reconciliation_total{isError=\"0\"})by(slot_shard_id)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
slot_shard_id
Label Description: The slot shard id
Example: 1, 2
Labels:
-
Label:
isError
Label Description: Whether any error occurred while reconciling. If isError = 1, then error happened
Example: 0, 1
slot_records_size_total
Description: Total size of records in bytes in the slot
Sample Query: sum(slot_records_size_total)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
slot_records_total
Description: Total count of records in the slot
Sample Query: sum(slot_records_total{session_type\"total\"}) by(pod)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
Labels:
-
Label:
session_type
Label Description: The session type stored in the data
Example: GX, RX, total
Labels:
-
Label:
systemId
Label Description: The id of the system
Example: 1, 2
Labels:
-
Label:
bucket
Label Description: The bucket grouped by size
Example: <=1kb, 2kb, 4kb, 8kb
Labels:
-
Label:
appInstanceId
Label Description: The app instance id populated by app in the record.
Example: 1
slot_requests_duration_second
Description: Time taken for response of requests received at slot app
Sample Query: sum(irate(slot_requests_duration_seconds{errorCode\"0\"}[5m])) by (errorCode)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Get, Create, Delete
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 1406
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
slot_requests_total
Description: Total count of requests received at slot app
Sample Query: sum(irate(slot_requests_total{errorCode=\"0\"}[5m])) by (operation)
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
operation
Label Description: The type of DB operation
Example: Get, Create, Delete
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 1406
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
slot_stale_record_duration_seconds
Description: Time taken by the slot to process the stale slot records
Sample Query: slot_stale_record_duration_seconds
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
delete
Label Description: To check if the stale record has been send to delete or skipped. If delete = 1 , then it has been send to delete, otherwise it has been skipped
Example: 1, 0
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 502
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
Labels:
-
Label:
reason
Label Description: The reason for stale record deletion
Example: find_all_notify, stale_check_enabled
slot_stale_record_total
Description: Total count of stale slot record deletions processed
Sample Query: slot_stale_record_total
Labels:
-
Label:
db
Label Description: DB name
Example: session
Labels:
-
Label:
delete
Label Description: To check if the stale record has been send to delete or skipped. If delete = 1 , then it has been send to delete, otherwise it has been skipped
Example: 1, 0
Labels:
-
Label:
shardId
Label Description: The shard id
Example: 1, 2
Labels:
-
Label:
errorCode
Label Description: The errorCode in the DB response
Example: 0, 502
Labels:
-
Label:
sliceName
Label Description: The name of the logical sliceName
Example: session
Labels:
-
Label:
reason
Label Description: The reason for stale record deletion
Example: find_all_notify, stale_check_enabled