About Telemetry
Collecting data for analyzing and troubleshooting has always been an important aspect in monitoring the health of a network.
Cisco NX-OS provides several mechanisms such as SNMP, CLI, and Syslog to collect data from a network. These mechanisms have limitations that restrict automation and scale. One limitation is the use of the pull model, where the initial request for data from network elements originates from the client. The pull model does not scale when there is more than one network management station (NMS) in the network. With this model, the server sends data only when clients request it. To initiate such requests, continual manual intervention is required. This continual manual intervention makes the pull model inefficient.
A push model continuously streams data out of the network and notifies the client. Telemetry enables the push model, which provides near-real-time access to monitoring data.
Telemetry Components and Process
Telemetry consists of four key elements:
-
Data Collection—Telemetry data is collected from the Data Management Engine (DME) database in branches of the object model specified using distinguished name (DN) paths. The data can be retrieved periodically (frequency-based) or only when a change occurs in any object on a specified path (event-based). You can use the NX-API to collect frequency-based data.
-
Data Encoding—The telemetry encoder encapsulates the collected data into the desired format for transporting.
NX-OS encodes telemetry data in the Google Protocol Buffers (GPB) and JSON format.
-
Data Transport—NX-OS transports telemetry data using HTTP for JSON encoding and the Google remote procedure call (gRPC) protocol for GPB encoding. The gRPC receiver supports message sizes greater than 4 MB. (Telemetry data using HTTPS is also supported if a certificate is configured.)
UDP and secure UDP (DTLS) are supported as telemetry transport protocols. You can add destinations that receive UDP. The encoding for UDP and secure UDP can be GPB or JSON.
Telemetry supports streaming to IPv6 destinations and IPv4 destinations.
Use the following command to configure the UDP transport to stream data using a datagram socket either in JSON or GPB: destination-group num ip address xxx.xxx.xxx.xxx port xxxx protocol UDP encoding {JSON | GPB }
Example for an IPv4 destination: destination-group 100 ip address 171.70.55.69 port 50001 protocol UDP encoding GPB
Example for an IPv6 destination: destination-group 100 ipv6 address 10:10::1 port 8000 protocol gRPC encoding GPB
The UDP telemetry is with the following header: typedef enum tm_encode_ { TM_ENCODE_DUMMY, TM_ENCODE_GPB, TM_ENCODE_JSON, TM_ENCODE_XML, TM_ENCODE_MAX, } tm_encode_type_t; typedef struct tm_pak_hdr_ { uint8_t version; /* 1 */ uint8_t encoding; uint16_t msg_size; uint8_t secure; uint8_t padding; }__attribute__ ((packed, aligned (1))) tm_pak_hdr_t;
Use the first 6 bytes in the payload to process telemetry data using UDP, using one of the following methods:
-
Read the information in the header to determine which decoder to use to decode the data, JSON or GPB, if the receiver is meant to receive different types of data from multiple endpoints.
-
Remove the header if you are expecting one decoder (JSON or GPB) but not the other.
Note
Depending on the receiving operation system and the network load, using the UDP protocol may result in packet drops.
-
-
Telemetry Receiver—A telemetry receiver is a remote management system or application that stores the telemetry data.
The GPB encoder
stores data in a generic key-value format. The encoder requires metadata in the
form of a compiled
.proto
file to translate the data into GPB format.
In order to receive and decode the data stream correctly, the receiver requires the .proto
file that describes the encoding and the transport services. The encoding decodes the binary stream into a key value string
pair.
A telemetry
.proto
file that describes the GPB encoding and gRPC
transport is available on Cisco's GitLab: https://github.com/CiscoDevNet/nx-telemetry-proto
High Availability of the Telemetry Process
High availability of the telemetry process has the following behaviors:
-
System Reload—During a system reload, any telemetry configuration, and streaming services are restored.
-
Supervisor Failover—Although telemetry is not on hot standby, telemetry configuration, and streaming services are restored when the new active supervisor is running.
-
Process Restart—If the telemetry process freezes or restarts for any reason, configuration and streaming services are restored when telemetry restarts.
Licensing Requirements for Telemetry
Product |
License Requirement |
---|---|
Cisco NX-OS |
Telemetry requires no license. Any feature that is not included in a license package is bundled with the Cisco NX-OS image and is provided at no extra charge to you. For a complete explanation of the Cisco NX-OS licensing scheme, see the Cisco NX-OS Licensing Guide. |
Installing and Upgrading Telemetry
Installing the Application
The telemetry application is packaged as a feature RPM and included with the NX-OS release. The RPM is installed by default
as part of the image bootup. After installation, you can start the application using the feature telemetry command. The RPM file is in the /rpms
directory and has the following name:
telemetry-version-build_ID.libn32_n3000.rpm
As in the following example:
telemetry-2.0.0.lib32_n3000.rpm
Installing Incremental Updates and Fixes
Copy the RPM to the device bootflash and use the following commands from the bash
prompt:
feature bash
run bash sudo su
Then copy the RPM to the device bootflash. Use the following commands from the bash
prompt:
yum upgrade telemetry_new_version.rpm
When the application restarts, it is upgraded and the change appears.
Downgrading to a Previous Version
bash
prompt: yum downgrade telemetry
Verifying the Active Version
To verify the active version, run the following command from the switch exec
prompt:
show install active
Note |
The show install active command shows the active installed RPM only after an upgrade has occurred. The default RPM that comes bundled with the NX-OS
is not displayed.
|
Guidelines and Limitations
Telemetry has the following configuration guidelines and limitations:
-
Telemetry is supported in Cisco NX-OS releases that support the data management engine (DME) Native Model.
-
Telemetry supports DME data collection, NX-API data sources, Google protocol buffer (GPB) encoding over Google Remote Procedure Call (gRPC) transport, and JSON encoding over HTTP.
-
The smallest sending interval (cadence) supported is five seconds for a depth of 0. The minimum cadence values for depth values greater than 0 depends on the size of the data being streamed out. Configuring cadences below the minimum value may result in undesirable system behavior.
-
Up to five remote management receivers (destinations) are supported. Configuring more than five remote receivers may result in undesirable system behavior.
-
If a telemetry receiver goes down, other receivers see data flow interrupted. The failed receiver must be restarted. Then start a new connection with the switch by unconfiguring then reconfiguring the failed receiver's IP address under the destination group.
-
Telemetry can consume up to 20% of the CPU resource.
-
To configure SSL certificate-based authentication and the encryption of streamed data, you can provide a self-signed SSL certificate with certificate ssl cert path hostname "CN" command.
Configuration Commands After Downgrading to an Older Release
After a downgrade to an older release, some configuration commands or command options can fail because the older release may not support them. As a best practice when downgrading to an older release, unconfigure and reconfigure the telemetry feature after the new image comes up. By doing so, you avoid possible failure of unsupported commands or command options.
The following example shows this procedure:
-
Copy the telemetry configuration to a file: switch# show running-config | section telemetry feature telemetry telemetry destination-group 100 ip address 1.2.3.4 port 50004 protocol gRPC encoding GPB use-chunking size 4096 sensor-group 100 path sys/bgp/inst/dom-default depth 0 subscription 600 dst-grp 100 snsr-grp 100 sample-interval 7000 switch# show running-config | section telemetry > telemetry_running_config switch# show file bootflash:telemetry_running_config feature telemetry telemetry destination-group 100 ip address 1.2.3.4 port 50004 protocol gRPC encoding GPB use-chunking size 4096 sensor-group 100 path sys/bgp/inst/dom-default depth 0 subscription 600 dst-grp 100 snsr-grp 100 sample-interval 7000 switch#
-
Execute the downgrade operation. When the image comes up and the switch is ready, copy the telemetry configurations back to the switch: switch# copy telemetry_running_config running-config echo-commands `switch# config terminal` `switch(config)# feature telemetry` `switch(config)# telemetry` `switch(config-telemetry)# destination-group 100` `switch(conf-tm-dest)# ip address 1.2.3.4 port 50004 protocol gRPC encoding GPB ` `switch(conf-tm-dest)# sensor-group 100` `switch(conf-tm-sensor)# path sys/bgp/inst/dom-default depth 0` `switch(conf-tm-sensor)# subscription 600` `switch(conf-tm-sub)# dst-grp 100` `switch(conf-tm-sub)# snsr-grp 100 sample-interval 7000` `switch(conf-tm-sub)# end` Copy complete, now saving to disk (please wait)... Copy complete. switch#
gRPC Error Behavior
The switch client disable the connection to the gRPC receiver if the gRPC receiver sends 20 errors. You will need to unconfigure then reconfigure the receiver's IP address under the destination group to enable the gRPC receiver. Errors include:
-
The gRPC client sends the wrong certificate for secure connections.
-
The gRPC receiver takes too long to handle client messages and incurs a timeout. Avoid timeouts by processing messages using a separate message processing thread.
Telemetry Compression for gRPC Transport
Telemetry compression support is available for gRPC transport. You can use the use-compression gzip command to enable compression. (Disable compression with the no use-compression gzip command.)
The following example enables compression:
switch(config)# telemetry
switch(config-telemetry)# destination-profile
switch(config-tm-dest-profile)# use-compression gzip
The following example shows that compression is enabled:
switch(conf-tm-dest)# show telemetry transport 0 stats
Session Id: 0
Connection Stats
Connection Count 0
Last Connected: Never
Disconnect Count 0
Last Disconnected: Never
Transmission Stats
Compression: gzip
Source Interface: loopback1(1.1.3.4)
Transmit Count: 0
Last TX time: None
Min Tx Time: 0 ms
Max Tx Time: 0 ms
Avg Tx Time: 0 ms
Cur Tx Time: 0 ms
switch2(config-if)# show telemetry transport 0 stats
Session Id: 0
Connection Stats
Connection Count 0
Last Connected: Never
Disconnect Count 0
Last Disconnected: Never
Transmission Stats
Compression: disabled
Source Interface: loopback1(1.1.3.4)
Transmit Count: 0
Last TX time: None
Min Tx Time: 0 ms
Max Tx Time: 0 ms
Avg Tx Time: 0 ms
Cur Tx Time: 0 ms
switch2(config-if)#
{
"telemetryDestProfile": {
"attributes": {
"adminSt": "enabled"
},
"children": [
{
"telemetryDestOptCompression": {
"attributes": {
"name": "gzip"
}
}
}
]
}
}
Support for gRPC Chunking
For streaming to occur successfully, you must enable chunking if gRPC has to send an amount of data greater than 12 MB to the receiver.
gRPC chunking must be done by the gRPC user. Fragmentation occurs on the gRPC client side and reassembly occurs on the gRPC server side. Telemetry is still bound to memory and data can be dropped if the memory size is more than the allowed limit of 12 MB for telemetry. In order to support chunking, use the telemetry .proto file that is available at Cisco's GibLab, which has been updated for gRPC chunking, as described in Telemetry Components and Process.
The chunking size is from 64 through 4096 bytes.
Following shows a configuration example through the NX-API CLI:
feature telemetry
!
telemetry
destination-group 1
ip address 171.68.197.40 port 50051 protocol gRPC encoding GPB
use-chunking size 4096
destination-group 2
ip address 10.155.0.15 port 50001 protocol gRPC encoding GPB
use-chunking size 64
sensor-group 1
path sys/intf depth unbounded
sensor-group 2
path sys/intf depth unbounded
subscription 1
dst-grp 1
snsr-grp 1 sample-interval 10000
subscription 2
dst-grp 2
snsr-grp 2 sample-interval 15000
Following shows a configuration example through the NX-API REST:
{
"telemetryDestGrpOptChunking": {
"attributes": {
"chunkSize": "2048",
"dn": "sys/tm/dest-1/chunking"
}
}
}
The following error message appears on systems that do not support gRPC chunking:
switch-1(conf-tm-dest)# use-chunking size 200
ERROR: Operation failed: [chunking support not available]
NX-API Sensor Path Limitations
NX-API can collect and stream switch information not yet in the DME using show commands. However, using the NX-API instead of streaming data from the DME has inherent scale limitations as outlined:
-
The switch backend dynamically processes NX-API calls such as show commands,
-
NX-API spawns several processes that can consume up to a maximum of 20% of the CPU.
-
NX-API data translates from the CLI to XML to JSON.
The following is a suggested user flow to help limit excessive NX-API sensor path bandwidth consumption:
-
Check whether the show command has NX-API support. You can confirm whether NX-API supports the command from the VSH with the pipe option:
show <command> | json
orshow <command> | json pretty
.
Note
Avoid commands that take the switch more than 30 seconds to return JSON output.
-
Refine the show command to include any filters or options.
-
Avoid enumerating the same command for individual outputs; for example, show vlan id 100 , show vlan id 101 , and so on. Instead, use the CLI range options; for example, show vlan id 100-110,204 , whenever possible to improve performance.
If you need only the summary or counter, avoid dumping a whole show command output. By doing so, you limit the bandwidth and data storage that is required for data collection.
-
-
Configure telemetry with sensor groups that use NX-API as their data sources. Add the show commands as sensor paths
-
Configure telemetry with a cadence of five times the processing time of the respective show command to limit CPI usage.
-
Receive and process the streamed NX-API output as part of the existing DME collection.
Telemetry VRF Support
Telemetry VRF support allows you to specify a transport VRF, which means that the telemetry data stream can egress through front-panel ports and avoid possible competition between SSH/NGINX control sessions.
You can use the use-vrf vrf-name command to specify the transport VRF.
The following example specifies the transport VRF:
switch(config)# telemetry
switch(config-telemetry)# destination-profile
switch(config-tm-dest-profile)# use-vrf test_vrf
{
"telemetryDestProfile": {
"attributes": {
"adminSt": "enabled"
},
"children": [
{
"telemetryDestOptVrf": {
"attributes": {
"name": "default"
}
}
}
]
}
}
Support for Streaming of YANG Models
The YANG ("Yet Another Next Generation") data modeling language is supported as part of telemetry. Both device YANG and open config YANG model data streaming are supported.