CDL Metrics

Overview

This chapter describes the Key Performance Indicators (KPIs) available to monitor and analyze the performance of the CDL.

The label name and description of the metrics used in CDL are defined in the following table:

Table 1. Metrics Label Description
Metric Label Name Label Description
db The name of the datastore.
operation The name of operation performed on the CDL pods.
errorCode The error code sent in response.
errorMessage The error message sent in response.
slot_shard_id The Slot map or Shard id where the operation is performed.
slot_instance_id The Slot instance id where the operation is performed.
shardId The Slot or Index map or Shard id where the metric is pegged.
instanceId The Slot or Index map or Instance id where the metric is pegged.
session_type The type of session data present in the record.
bucket The bucket represents the bucket under which the session lies. The current buckets are <=1kb, <=2kb, <=4kb,<=8kb, <=16kb, <=32kb, >32kb
notification_type The type of notification sent from CDL. Values: TIMER_EXPIRED, RECORD_CONFLICT, BULK_TASK_NOTIFICATION
topic The topic while publishing to kafka
not_found_in The pod from where the data was not found. Values: Index/Slot

CDL Category

bulk_task_ongoing

Description: Gauge metric to indicate number of bulk tasks that is being processed at any given point in time

Sample Query: bulk_task_ongoing

Labels:

  • Label: db

    Label Description: DB name

    Example: session

  • Label: slot_shard_id

    Label Description: The slot shard id

    Example: 1, 2

  • Label: slot_instance_id

    Label Description: The slot instance id

    Example: 1, 2

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

bulk_task_total

Description: Total number of bulk tasks with processing status

Sample Query: bulk_task_total

Labels:

  • Label: db

    Label Description: DB name

    Example: session

  • Label: slot_shard_id

    Label Description: The slot shard id

    Example: 1, 2

  • Label: slot_instance_id

    Label Description: The slot instance id

    Example: 1, 2

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

  • Label: status

    Label Description: Processing status of bulk task

    Example: timeout, skipped, aborted, completed_last_record, completed

cdl_ep_to_slot_request_tps

Description: Recording rule for endpoint to slot request TPS measurement

Sample Query: cdl_ep_to_slot_request_tps

Labels:

  • Label: namespace

    Label Description: Kubernetes namespace from which the metric is generated

    Example: cdl-global

Labels:

  • Label: pod

    Label Description: Endpoint pod name from which the metric is generated

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, UpdateFlags

  • Label: errorCode

    Label Description: The errorCode in the DB response for deletion

    Example: 0, 502

cdl_ep_to_slot_response_time

Description: Recording rule for endpoint to slot response time measurement

Sample Query: cdl_ep_to_slot_response_time

Labels:

  • Label: namespace

    Label Description: Kubernetes namespace from which the metric is generated

    Example: cdl-global

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, UpdateFlags

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response for deletion

    Example: 0, 502

cdl_geo_replication_enabled

Description: Gauge metric to indicate geo replication status. If Geo replication is enabled then value is 1 else 0

Sample Query: cdl_geo_replication_enabled

cdl_index_record_capacity

Description: Total index record capacity of CDL

Sample Query: cdl_index_record_capacity{db=\"session\"}

Labels:

  • Label: db

    Label Description: DB name

    Example: session

cdl_slice_state

Description: CDL slice active state information in GR instance-awareness. If value is 1 then slice is active

Sample Query: cdl_slice_state

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: shardId

    Label Description: The slot shard id

    Example: 1, 2

cdl_slot_record_capacity

Description: Total slot record capacity of CDL

Sample Query: cdl_slot_record_capacity{db=\"session\"}

Labels:

  • Label: db

    Label Description: DB name

    Example: session

cdl_slot_size_capacity

Description: Total slot size capacity of CDL

Sample Query: cdl_slot_size_capacity{db=\"session\"}

Labels:

  • Label: db

    Label Description: DB name

    Example: session

consumer_kafka_nonprocessed_records_total

Description: Total count of unprocessed kafka records since originated from same pod

Sample Query: sum(consumer_kafka_nonprocessed_records_total)by(shardId,instanceId)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Get, Multi

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: instanceId

    Label Description: The instance id

    Example: 1, 2

Labels:

  • Label: reason

    Label Description: The reason for skipping the consumed kafka record

    Example: old_timestamp

consumer_kafka_records_duration_seconds

Description: Time taken to process consumed kafka records

Sample Query: sum(irate(consumer_kafka_records_duration_seconds[5m]))by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: origin_instance_id

    Label Description: The index instance id from which the kafka request originated

    Example: 1.1, 1.2

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

consumer_kafka_records_total

Description: Total count of records consumed from kafka

Sample Query: sum(irate(consumer_kafka_records_total[5m]))by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: origin_instance_id

    Label Description: The index instance id from which the kafka request originated

    Example: 1.1, 1.2

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

datastore_internal_requests_duration_seconds

Description: Time taken for processing of internal datastore requests

Sample Query: sum(datastore_internal_requests_duration_seconds)by(operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: RemoteBulkRead, RemoteBulkReadIndexing, GetChecksumRemoteSlot

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 1401

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

datastore_requests_duration_seconds

Description: Total time taken for processing requests at cdl-ep

Sample Query: sum(irate(datastore_requests_duration_seconds{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, Find, FindByUK, GetCdlStatus, UpdateFlags

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 400, 403, 404, 409, 413, 501, 502, 503, 507, 508

Labels:

  • Label: local_request

    Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.

    Example: 1, 0

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

datastore_requests_total

Description: Total count of requests received at cdl-ep

Sample Query: sum(irate(datastore_requests_total{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, Find, FindByUK, GetCdlStatus, UpdateFlags

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 400, 403, 404, 409, 413, 501, 502, 503, 507, 508

Labels:

  • Label: local_request

    Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.

    Example: 1, 0

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

db_records_softdelete_total

Description: Total count of records for the db which are in soft delete/purge state due to purgeOnEval set

Sample Query: sum(avg(db_records_softdelete_total{notify=\"1\"})by(notify))

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: notify

    Label Description: Whether purgeOnNotify is set. 1 indicates purgeOnNotify=true, 0 otherwise.

    Example: 1

db_records_total

Description: Total count of records for the db. The following metrics can be achieved: 1. Total record count - Query: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,cdl_slice)) 2. Slice wise record count - Query: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,cdl_slice))by(cdl_slice) 3. System ID based count: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,cdl_slice))by(systemId) 4. Sessions grouped by session type - Query: avg(db_records_total{namespace=\"$namespace\",session_type!=\"total\"}) by (session_type)

Sample Query: sum(avg(db_records_total{namespace=\"$namespace\",session_type=\"total\",appInstanceId=\"0\"})by(systemId,cdl_slice))

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: session_type

    Label Description: The session type stored in the data

    Example: GX, RX, total

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: appInstanceId

    Label Description: The app instance id populated by app in the record.

    Example: 1

dpapp_internal_requests_total

Description: Total count of internal dp app requests

Sample Query: sum(dpapp_internal_requests_total)by(operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: RemoteBulkRead, RemoteBulkReadIndexing, GetChecksumRemoteSlot

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 1401

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

duplicate_slot_records_deleted

Description: Total slot records deleted due to duplicate slot data found

Sample Query: duplicate_slot_records_deleted

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 502

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

find_no_record_total

Description: Total count of find requests for which no records are sent back

Sample Query: sum(find_no_record_total)by(not_found_in,operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: FindByUk, FindTagsByUk, Find

Labels:

  • Label: not_found_in

    Label Description: Whether the data not found in index or slot

    Example: index, slot

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

findall_records_bucket

Description: The total number of findAll requests received which can be grouped into the number of records sent in response

Sample Query: sum(irate(findall_records_bucket[5m]))by(bucket)

Labels:

  • Label: bucket

    Label Description: Buckets grouped by no of records

    Example: =0, <=10, <=20, <=50, <=100, >100

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

index_init_sync_duration_seconds

Description: Time taken by the index to sync with local and remote peers during startup

Sample Query: sum(index_init_sync_duration_seconds)by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

index_rebalanced_keys_total

Description: Total no of index keys that have been rebalanced

Sample Query: index_rebalanced_keys_total

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

indexing_audit_deleted_keys_total

Description: Total number of unique keys and primary keys deleted during index auditing

Sample Query: sum(irate(indexing_audit_deleted_keys_total{errorCode=\"0\",key_type=\"unique\"}[5m]))by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: key_type

    Label Description: The type of key

    Example: primary, unique

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 302, 403

indexing_audit_duration_seconds

Description: Total time taken for performing the indexing audit

Sample Query: indexing_audit_duration_seconds{pod=~\".*\"}

indexing_audit_total

Description: Total times the indexing audit was run

Sample Query: indexing_audit_total{pod=~\".*\"}

indexing_is_leader

Description: Indexing is leader or follower

Sample Query: indexing_is_leader

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

indexing_kafka_replication_delay_seconds

Description: Total delay in replicating indexes from kafka in index

Sample Query: sum(irate(indexing_kafka_replication_delay_seconds[5m]))by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: origin_instance_id

    Label Description: The index instance id from which the kafka request originated

    Example: 1.1, 1.2

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

indexing_operation_duration_seconds

Description: Time taken for response of indexing operations sent from cdl ep to index app

Sample Query: sum(irate(indexing_operation_duration_seconds{errorCode=\"0\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, GetByPk, GetByUk

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 404, 500, 503

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

indexing_operation_total

Description: Total count of indexing operations sent from cdl ep to index app

Sample Query: sum(irate(indexing_operation_total{errorCode=\"0\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, GetByPk, GetByUk

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 404, 500, 503

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

indexing_overwrites_total

Description: Total number of indexing set operations for which index record is overwriting

Sample Query: sum(indexing_overwrites_total)by(key_type,shardId)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: key_type

    Label Description: The type of key

    Example: primary, unique

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

indexing_records_total

Description: Total count of records in the indexing

Sample Query: indexing_records_total{pod=~\".*\"}

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

indexing_requests_duration_seconds

Description: Time taken for response of indexing requests received

Sample Query: sum(irate(indexing_requests_duration_seconds{errorCode=\"0\",isKafka=\"1\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Set, Delete

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 404, 1408

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: isKafka

    Label Description: Whether the request is from kafka or GRPC. If isKafka = 1 then the request is from kafka

    Example: 1, 0

indexing_requests_total

Description: Total number of requests received at index pod

Sample Query: sum(irate(indexing_requests_total{errorCode=\"0\",isKafka=\"1\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Set, Delete

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 404, 1408

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: isKafka

    Label Description: Whether the request is from kafka or GRPC. If isKafka = 1 then the request is from kafka

    Example: 1, 0

inmemory_indexing_operation_duration_seconds

Description: Total time taken for responses to requests from cdl-ep to cdl-index pod

Sample Query: sum(inmemory_indexing_operation_duration_seconds{errorCode=\"0\"})by(operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Get, Multi

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: instanceId

    Label Description: The instance id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 404

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

inmemory_indexing_operation_total

Description: Total count of operations from cdl-ep to cdl-index pod

Sample Query: sum(inmemory_indexing_operation_total)by(operation,shardId)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Get, Multi

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: instanceId

    Label Description: The instance id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 404

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

kafka_connection_status

Description: Kafka connection status

Sample Query: kafka_connection_status

Labels:

  • Label: topic

    Label Description: Kafka topic name

    Example: kv.kafka.shard.1.1.1

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

kafka_producer_downtime_op_total

Description: Total numbers of operations when kafka producer is not available and added to downtime cache

Sample Query: sum(kafka_producer_downtime_op_total) by (pod,reason)

Labels:

  • Label: operation

    Label Description: The operation which failed

    Example: Set, Delete

Labels:

  • Label: reason

    Label Description: The reason why the operation failed

    Example: error, queue_full

Labels:

  • Label: success

    Label Description: Whether addition to downtime cache is success or failed

    Example: 0, 1

kafka_producer_pending_publish_total

Description: Total count of messages pending to be published to kafka

Sample Query: kafka_producer_pending_publish_total{pod=~\".*\"}

kafka_producer_republished_total

Description: Total count of requests republished by kafka producer

Sample Query: kafka_producer_republished_total

Labels:

  • Label: operation

    Label Description: CDL Kafka operation

    Example: Delete, Set

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: topic

    Label Description: Kafka topic name

    Example: kv.kafka.shard.1.1.1

kafka_producer_requests_duration_seconds

Description: Total time taken by kafka producer to process requests

Sample Query: sum(irate(kafka_producer_requests_duration_seconds[5m])) by (topic)

Labels:

  • Label: topic

    Label Description: Kafka topic name

    Example: kv.kafka.shard.1.1.1

kafka_producer_requests_total

Description: Total count of requests sent towards kafka

Sample Query: kafka_producer_requests_total

Labels:

  • Label: topic

    Label Description: Kafka topic name

    Example: kv.kafka.shard.1.1.1

kafka_records_replayed_total

Description: Total number of records published to kafka due to leader-switchover or kafka-reconnection

Sample Query: sum(kafka_records_replayed_total{reason=\"leader_switchover\"})by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: reason

    Label Description: The reason for replaying kafka records

    Example: leader_switchover, kafka_reconnection

notification_ep_connection_total

Description: Total numbers of connections from CDL to notification endpoint

Sample Query: notification_ep_connection_total

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: appInstanceId

    Label Description: The app instance id

    Example: 1

notification_streaming_enabled

Description: CDL to Notification endpoint streaming connection status. If streaming is enabled then value is 1

Sample Query: notification_streaming_enabled

overwritten_index_records_deleted

Description: Total number of records deleted due to overwritten/duplicate unique keys at index

Sample Query: overwritten_index_records_deleted

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response for deletion

    Example: 0, 502

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: prefix

    Label Description: The unique key prefix pattern that detected the stale record

    Example: uk1

overwritten_index_records_skipped

Description: Total number of unprocessed stale records due to queue being full

Sample Query: overwritten_index_records_skipped

Labels:

  • Label: action

    Label Description: action that was supposed to be performed for the stale record

    Example: delete, notify

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: prefix

    Label Description: The unique key prefix pattern that detected the stale record

    Example: uk1

records_notification_duration_seconds

Description: Time taken for notification sent towards notification endpoint

Sample Query: sum(irate(records_notification_duration_seconds[5m])) by (shardId,instance_id,notification_type)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: notification_type

    Label Description: Type of the notification

    Example: TIMER_EXPIRED, RECORD_CONFLICT, BULK_TASK_NOTIFICATION

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 1406

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

records_notification_retry_count

Description: Total notification retries by the slot app

Sample Query: sum(irate(records_notification_retry_count[5m])) by (shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

records_notification_total

Description: Total count of notifications sent towards notification endpoint

Sample Query: sum(irate(records_notification_total[5m])) by (shardId,instance_id,notification_type)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: notification_type

    Label Description: Type of the notification

    Example: TIMER_EXPIRED, RECORD_CONFLICT, BULK_TASK_NOTIFICATION

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 1406

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

remote_requests_dropped_total

Description: Total number of remote requests that have been dropped

Sample Query: remote_requests_dropped_total

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, UpdateFlags

Labels:

  • Label: reason

    Label Description: The reason for dropping the remote requests

    Example: queue_full

remote_site_connection_status

Description: CDL endpoint to remote site cdl-ep connection count

Sample Query: sum(remote_site_connection_status)by(pod,systemId)

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

remote_site_connections_total

Description: Total number of remote site connections configured per endpoint pod

Sample Query: remote_site_connections_total

Labels:

  • Label: systemId

    Label Description: The systemId id of the remote site

    Example: 1, 2

slot_checksum_mismatch_total

Description: Total number of checksum mismatch

Sample Query: sum(irate(slot_checksum_mismatch_total[5m]))by(slot_shard_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: slot_shard_id

    Label Description: The slot shard id

    Example: 1, 2

slot_geo_replication_requests_duration_seconds

Description: Time taken to send the response of slot geo replication

Sample Query: sum(irate(slot_geo_replication_requests_duration_seconds[5m]))by(systemId,operation)

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: CREATE, DELETE, UPDATE, UPDATEFLAGS

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 503

slot_geo_replication_requests_total

Description: Total number of requests for slot geo replication

Sample Query: sum(irate(slot_geo_replication_requests_total[5m]))by(systemId,operation)

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: CREATE, DELETE, UPDATE, UPDATEFLAGS

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 503

slot_init_sync_duration_seconds

Description: Time taken by the slot to sync with local and remote peers during startup

Sample Query: sum(slot_init_sync_duration_seconds)by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

slot_operation_duration_seconds

Description: Time taken for response of operations sent from cdl ep to slot app

Sample Query: sum(irate(slot_operation_duration_seconds{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, Find, UpdateFlags

Labels:

  • Label: slot_shard_id

    Label Description: The slot shard id. Kept empty string if metric verbosity is production

    Example: 1, 2

Labels:

  • Label: slot_instance_id

    Label Description: The slot instance id. Kept empty string if metric verbosity is production

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 104, 105

Labels:

  • Label: local_request

    Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.

    Example: 1, 0

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

slot_operation_total

Description: Total count of operations sent from cdl ep to slot app

Sample Query: sum(irate(slot_operation_total{errorCode=\"0\",local_request=\"1\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Update, Delete, Find, UpdateFlags

Labels:

  • Label: slot_shard_id

    Label Description: The slot shard id. Empty string if metric verbosity is production

    Example: 1, 2

Labels:

  • Label: slot_instance_id

    Label Description: The slot instance id. Empty string if metric verbosity is production

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 104, 105

Labels:

  • Label: local_request

    Label Description: Whether the DB requests is Local or GR. If local_request = 1 then it is Local otherwise it is GR.

    Example: 1, 0

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

slot_purged_sessions_duration_seconds

Description: Time taken for purging sessions at slot due to next eval timer expiry and purge=true

Sample Query: sum(irate(slot_purged_sessions_duration_seconds{errorCode=\"0\"}[5m]))by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 501, 508

Example: Get index record failure, Invalid Slice Name received

Labels:

  • Label: notify

    Label Description: Whether purgeOnNotify is set. 1 indicates purgeOnNotify=true, 0 otherwise.

    Example: 1

slot_purged_sessions_total

Description: Total number of sessions purged at slot due to next eval timer expiry and purge=true

Sample Query: sum(irate(slot_purged_sessions_total{errorCode=\"0\"}[5m]))by(shardId,instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 501, 508

Labels:

  • Label: notify

    Label Description: Whether purgeOnNotify is set. 1 indicates purgeOnNotify=true, 0 otherwise.

    Example: 1

slot_reconciled_records_total

Description: Total number of reconciled records

Sample Query: sum(slot_reconciled_records_total)by(systemId,slot_shard_id,slot_instance_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: slot_shard_id

    Label Description: The slot shard id

    Example: 1, 2

Labels:

  • Label: slot_instance_id

    Label Description: The slot instance id

    Example: 1, 2

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Create, Delete

slot_reconciliation_duration_seconds

Description: Total time taken to execute reconciliation

Sample Query: sum(slot_reconciliation_duration_seconds{isError=\"0\"})by(slot_shard_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: slot_shard_id

    Label Description: The slot shard id

    Example: 1, 2

Labels:

  • Label: isError

    Label Description: Whether any error occurred while reconciling. If isError = 1, then error happened

    Example: 0, 1

slot_reconciliation_total

Description: Total number of reconciliation triggered by checksum mismatch

Sample Query: sum(slot_reconciliation_total{isError=\"0\"})by(slot_shard_id)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: slot_shard_id

    Label Description: The slot shard id

    Example: 1, 2

Labels:

  • Label: isError

    Label Description: Whether any error occurred while reconciling. If isError = 1, then error happened

    Example: 0, 1

slot_records_size_total

Description: Total size of records in bytes in the slot

Sample Query: sum(slot_records_size_total)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

slot_records_total

Description: Total count of records in the slot

Sample Query: sum(slot_records_total{session_type\"total\"}) by(pod)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: session_type

    Label Description: The session type stored in the data

    Example: GX, RX, total

Labels:

  • Label: systemId

    Label Description: The id of the system

    Example: 1, 2

Labels:

  • Label: bucket

    Label Description: The bucket grouped by size

    Example: <=1kb, 2kb, 4kb, 8kb

Labels:

  • Label: appInstanceId

    Label Description: The app instance id populated by app in the record.

    Example: 1

slot_requests_duration_second

Description: Time taken for response of requests received at slot app

Sample Query: sum(irate(slot_requests_duration_seconds{errorCode\"0\"}[5m])) by (errorCode)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Get, Create, Delete

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 1406

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

slot_requests_total

Description: Total count of requests received at slot app

Sample Query: sum(irate(slot_requests_total{errorCode=\"0\"}[5m])) by (operation)

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: operation

    Label Description: The type of DB operation

    Example: Get, Create, Delete

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 1406

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

slot_stale_record_duration_seconds

Description: Time taken by the slot to process the stale slot records

Sample Query: slot_stale_record_duration_seconds

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: delete

    Label Description: To check if the stale record has been send to delete or skipped. If delete = 1 , then it has been send to delete, otherwise it has been skipped

    Example: 1, 0

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 502

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: reason

    Label Description: The reason for stale record deletion

    Example: find_all_notify, stale_check_enabled

slot_stale_record_total

Description: Total count of stale slot record deletions processed

Sample Query: slot_stale_record_total

Labels:

  • Label: db

    Label Description: DB name

    Example: session

Labels:

  • Label: delete

    Label Description: To check if the stale record has been send to delete or skipped. If delete = 1 , then it has been send to delete, otherwise it has been skipped

    Example: 1, 0

Labels:

  • Label: shardId

    Label Description: The shard id

    Example: 1, 2

Labels:

  • Label: errorCode

    Label Description: The errorCode in the DB response

    Example: 0, 502

Labels:

  • Label: cdl_slice

    Label Description: The name of the logical cdl slice

    Example: session

Labels:

  • Label: reason

    Label Description: The reason for stale record deletion

    Example: find_all_notify, stale_check_enabled