Service Provisioning Framework
This guide explains the steps involved in device and fabric provisioning in Cisco DNA Center. This guide also explains important APIs and how to troubleshoot issues.
The service provisioning framework (SPF) facilitates provisioning and configuration of network services and polices on network devices through a RESTful interface. The SPF uses the orchestration engine to implement the internal execution flow and perform customer-facing service-to-resource-facing service (CFS-to-RFS) mapping.
The SPF is a set of microservices that run on top of the Maglev cluster.
SPF Terminology and Core Services
The SPF uses the following terminology:
-
Customer-facing service: The commercial view of a service that is exposed to customers. It groups a set of resource-facing services that together provide the necessary technical functionality.
A VPN is an example of a customer-facing service.
-
Resource-facing service: Is indirectly part of a product, but is invisible to the customer. It exists to support one or more customer-facing services.
A VPN might require BGP to support it. Customers don’t purchase BGP, and hopefully aren’t even aware that BGP is running. Therefore, BGP is an example of a resource-facing service.
The SPF includes the following core services:
-
spf-service-manager-service: Initiates and is responsible for the provisioning workflow.
-
spf-device-manager-service: Generates the device-level RFS spec and device-level lock and initiates the configuration deployment.
-
apic-em-network-programmer-service: Converts the RFS to CLI commands and pushes them to the device.
-
task-service: Creates and maintains provisioning workflow tasks.
-
orchestration-engine-service: Executes the provisioning workflow.
-
rabbitmq-service: Handles asynchronous messaging between the different services during the provisioning workflow.
SPF Architecture Diagram
The following figure shows the microservice view of the automation component, including the service and device framework layers.
SPF Diagnostics
SPF Diagnostics APIs
SPF diagnostic APIs are available as part of the SPF northbound interface to collect all the steps executed as part of a provisioning request. SPF diagnostic APIs use taskid or workflowid and provide the details required to debug provisioning issues.
https://<server-ip>/api/v2/data/spf-diagnostics/workflowinfo?taskid=d54b03b6-71bd-46ae-84b0-50412eab...
https://<server-ip>/api/v2/data/spf-diagnostics/workflowinfo?workflowid=f5a66f1e-b680-46ce-b362-860d...
Task API
The task API retrieves the tasks created for SPF provisioning operations.
https://<server-ip>/api/v1/task?serviceType=SPFService&progress=TASK_MODIFY_PUT&isError=true&sortB...
Get to the rabbitmq Console
#First enable rabbitmq management
magctl service attach rabbitmq
rabbitmq-plugins enable rabbitmq_management
# On the root expose port. That will return to you random port
magctl service expose rabbitmq 15672
# Go rabbit console.
http://<ip>:<port that you just got though expose>/
# Credentials
To get password, on the root, run:
$ magctl service exec rabbitmq "cat /vault/rabbitmq-appuser"
Defaulting container name to rabbitmq.
Use 'kubectl describe pod/rabbitmq-0' to see all of the containers in this pod.
<password>
Username will always be: appuser
Provisioning Request Generation
The following diagram shows the different services involved in creating a provisioning request from the Cisco DNA Center GUI. The services are used in the device provisioning and fabric provisioning workflow.
If a service is not available, it impacts the provisioning request.
Provisioning Workflow
Any provisioning operation involves the following main steps. If the provisioning operation doesn't need to push a configuration to the switches, the second step is skipped.
-
Translate CFS (UI/API input) to RFS/BCS (configuration to push).
-
Deploy RFS to the target devices.
Step 1: Translate CFS to BCS
Translating the CFS to a device-based RFS that can be pushed to the devices involves the following steps:
-
CFS preprocessor: Creates the service provisioning request snapshot and creates a lock for the CFS so that subsequent provisioning operations must wait until the current changes are complete.
-
CFS validator: Validates the service provisioning request and ensures that all prerequisites are met.
For example, during device provisioning, validation ensures that the device exists in the inventory and is not part of an active LAN automation.
-
CFS processor: Takes a snapshot of the CFS models to provision and store in the corresponding CFS and serialized snapshot database tables. The snapshot is taken under a global lock so that only one source of truth snapshot is generated without any race conditions. The lock is relinquished immediately after the snapshot is persisted in the database.
During device provisioning, an entry is created in the CustomerFacingService table using the ID from Deviceinfo.
-
CFS target resolver: Finds the target device ID to provision. If you choose multiple devices to provision or perform a VN update, this step determines the list of relevant devices.
-
CFS translator: Reads the date stored in the serialized snapshot database table and translates the user input to provision the device-specific RFS. This information is sent to the spf-device manager.
At this stage, Cisco DNA Center reads the corresponding RFS entries from the inventory-filled database tables. Then, Cisco DNA Center updates the RFS objects with the changes made in the CFS.
Step 2: Deploy the Configuration to Target Devices
Deploying the RFS to the target devices involves the following steps:
-
RFS translator: Reads the bulkconfigspec and creates device-level RFS objects to push to the device.
At this stage, Cisco DNA Center creates a device-level database lock on all provisioning-related database tables.
-
RFS deployer: Creates a per-device feature document to send to the network programmer, including populating the feature name (device pack writer name) for a given RFS. The RFS deployer handles rollback across multiple devices based on defined rollback rules (such as rollback of a successfully configured device if the configuration to another device fails). The RFS deployer also updates the results of the per-device deploy.
Other Services Involved in Provisioning
Provisioning involves the following services:
-
network-design-service: Handles network settings and device credential CRUD operations. Creates network profiles and associates sites.
-
orchestration-engine-service: Executes the provisioning workflow.
-
identity-manager-pxgrid-service: If Cisco ISE is configured as a AAA server, this service pulls all Cisco ISE server information.
-
apic-em-inventory-manager-service: Ensures that the devices are discovered in the Cisco DNA Center inventory and their synch status is not Partial Collection Failure.
During the CFS-to-RFS translation phase, the provisioning app reads the values from the database table and updates and provisions those models.
-
grouping-service: Handles site creation and modification. If device provisioning fails with missing sites, check the grouping-service.
If any sites are missing network settings or have incorrect inheritance, check the grouping-service and confirm that site update notification is working correctly.
-
template-programmer-service: Executes any templates that are attached to the provisioning workflow.
Important APIs
The following APIs are important:
-
api/v2/data/customer-facing-service/<CFSType>
For example, api/v2/data/customer-facing-service/deviceinfo retrieves all device-related data.
-
api/v1/siteprofile
For example, api/v1/siteprofile?namespace=authentication lists all authentication profiles.
-
api/v2/ippool/group?siteId=<siteId>
Lists all IP pools reserved under the selected site.
-
api/v1/commonsetting/
For example, api/v1/commonsetting/global lists all global network settings.
Service Name | CFS Service Type | CFS Table | RFS Table |
---|---|---|---|
Device Provisioning |
DeviceInfo |
DeviceInfo Networkwidesettingscfs |
Networkdevice AAANeSettings |
Fabric Creation Fabric Transit Creation |
ConnectivityDomain |
connectivitydomain Virtualnetwork |
— |
Add Site to Fabric Update Authentication Template for Virtual Network |
ConnectivityDomain |
Connectivitydomain Virtualnetwork authenticationprofile |
Vrf Dot1xNeSettings |
Add Edge to Fabric |
ConnectivityDomain |
Connecitivtydomain Deviceinfo |
Lispcomponent Networkvlan DhcpServerSettings LispItrMapCacheEntry |
Add Border to Fabric |
ConnectivityDomain |
Connecitivtydomain Deviceinfo |
Lispcomponent Networkvlan DhcpServerSettings BgpProcessSettings RoutePolicyMap BgpNeighborInfo |
Enable Multicast |
ConnectivityDomain |
Connecitivtydomain Deviceinfo |
igmpNrSettings MsdpNrSettings IpV4PimNrSettings IpV4MulticastNrSettings IpV4PimNrSettings |
Add Virtual Network IP Pool |
ConnectivityDomain |
Connecitivtydomain virtualnetwork |
Segment Vrf protocolendpoint |
Static Port Assignment |
ConnectivityDomain |
Connecitivtydomain Deviceinfo deviceinterfaceinfo |
protocolendpoint |
Troubleshooting
A device provisioning failure is not uncommon. To troubleshoot a failure, do the following:
Procedure
Step 1 |
For the problematic device, under Provisioning Status, click See details to see a list of recent provisioning tasks and status. Click See details for the most recent provisioning task. A window similar to the following is displayed.
|
Step 2 |
If the status of any of the following provisioning tasks is FAILED, check the spf-service-manager log file:
|
Step 3 |
If the provisioning fails during the validation phase, examine the JSON request that is returned as part of the provisioning request. The JSON request contains a "Service Request" string followed by a task ID. For example, the following string indicates that the AAA server is missing from the provisioning request or CFS tables:
|
Step 4 |
If provisioning fails during the conversion of business intent to network intent phase, check the exception stack trace based on the task ID. Look for the following error:
Example log messages:
|
Step 5 |
If the status of any of the following provisioning tasks is FAILED, check the spf-device-manager and network-programmer log files:
For example:
|
Aggregated Provisioning Status
The provisioning status for each device is aggregated across different applications.
For example, for the following provisioning operations, the aggregated status is Failed:
-
Latest device provisioning status: Failed
-
Previous device provisioning status: Success
-
Latest fabric provisioning status: Success
-
Latest policy provisioning status: Success
In Cisco DNA Center 1.1.x, some operations (namespaces) create a unique ID for the same device during each provisioning operation, which causes a Failed device provisioning status (even if the operation succeeds after the initial failure).
The same behavior persists in Cisco DNA Center 1.2.x and later, but the latest provisioning status is shown separately.
In the following instances, the device provisioning status always remains "Configuring":
-
The provisioning status message hangs at a service.
Because provisioning is an asynchronous operation, communication occurs between the affected services (spf-serv, spf-dev, orch-eng, network-programmer) through a rabbit-mq message. If a service can't receive the message, the message hangs in unacknowledged state in the queue.
For example, if the spf-device manager never receives and processes the message from the spf-service, you should expose the rabbit-mq console and log in to the GUI. Filter the queue with the string "spf" and check for any unacknowledged messages.
-
The postgres service fails or crashes during provisioning.
When a provisioning task is created for each device, the system creates a database entry with a status of "Configuring." After the provisioning completes, the spf-serv manager updates the database entry to "Failed" or "Success." If the database crashes when the spf-serv manager tries to update the status, device provisioning hangs in "Configuring."
Because Cisco DNA Center always shows the aggregated device provisioning status, even after the postgres service is restored and the provisioning operation succeeds, the status remains "Configuring." You should check whether the postgres restart count increased recently.