Published: January 2019
When you have thousands of network devices--we’ve got 15,000--efficient management pays big dividends in terms of operational costs and agility. Today we have excellent tools for zero-touch deployment, software image management, and fabric management. "But network management and automation have developed more slowly than server and storage management--and we wanted more,"said Stephen Hoover, IT manager. High on our wish list are two capabilities:
Automation is what makes us so excited about intent-based networking. "We get the most value from our engineers when they use their brain power to solve complex problems," said Hoover. "It’s inefficient to assign them to tasks that can and should be automated."
That's the idea behind intent-based networking. You express your business intent--for example, "Restrict access to this database to engineers on Team A"--and then the network does the work to realize that intent. Rolling up many individual steps into an automated workflow reduces errors, improves consistency, and saves time.
We're in the early stages of achieving both goals--a unified view of the network and intent-based networking--using Cisco Catalyst Center. Cisco Catalyst Center combines the functions of the multiple management tools we use today. Not only is it a centralized management dashboard, it also automates provisioning and change management, checks compliance against our policies, and captures asset logs that we analyze for troubleshooting, problem resolution, and predictive maintenance.
Cisco IT is "customer zero" for our new products, like Cisco Catalyst Center. That means the product that Cisco ships to customers reflects the suggestions we make as we put the product through its paces in our business. As customer zero for Cisco Catalyst Center, we're working with the business unit to share new capabilities we'd like to see, fine-tune the user interface, and report bugs.
We're currently using Cisco Catalyst Center in production for software-defined access and wireless assurance. We're also actively working with the business unit to codevelop a more dynamic contextual dashboard for incident resolution, automated ticketing in our service management tool, and automated software image management. Plug-and-play deployment (also known as zero-touch deployment) is coming soon.
This table summarizes the benefits we're expecting from Cisco Catalyst Center. The remainder of this article explains what we’re doing today and our progress so far.
We analyze historical and real-time activity logs to identify the cause of network problems and, even better, predict problems before they happen. Until now we've stored these logs on the devices themselves. The trouble is that device memory fills up quickly, limiting how much information we can gather and analyze. And if a failed device needs to restart, we lose the logs, making it harder to identify the cause of an outage.
Now we're starting to use Cisco Catalyst Center to gather and analyze information on servers rather than on devices. This improves forensics and expedites troubleshooting. We're conducting a pilot with 15 sites in the United States, Australia, Canada, England, India, Japan, Norway, Poland, and Singapore. In the pilot sites, the wireless controller continually sends telemetry data to Cisco Catalyst Center. If someone reports a connectivity problem, a help desk engineer checks Cisco Catalyst Center to quickly see whether the problem originates in the network or the user device.
Later we'd like to offer self-service troubleshooting. Say you're having a poor wireless experience; maybe the coverage is spotty or an access point is failing. One of our ideas is that you will enter a Webex Teams room for your building and type something like, “Why can’t I connect to the network?” A bot will retrieve data from Cisco Catalyst Center and respond, “Based on the data, it seems the problem is your laptop. I’ve opened a case with the desktop team, and here’s the case number.” Alternatively, it might respond, “Looks like everyone on your floor is having the same problem. I've opened a case with the network team, and here’s the number." The result is faster resolution for you, less overhead for the help desk.
We currently automate access control based on user identity. If you attempt to connect to our network, Cisco Identity Services Engine (ISE) looks at both your identity and your device to decide whether to connect you to the production, lab, or guest network.
Using Cisco Catalyst Center, we're starting to control access based on the specific application or time of day. For example, a contractor working on a new web interface can be granted access to the web front end but not the database. An executive webinar can get the highest priority on the network for 30 minutes. "Until now, giving priority to a certain traffic flow for a limited time took so much work that we rarely did it," said Dipesh Patel, enterprise network architect.
The software-defined access pilot is underway in our North Sydney, Australia, office. We use ISE to set the policy for each user, which can include limiting access to specific applications permanently or for a specified time. For example, we can grant a contractor access to the lab network next Monday from 8:00 a.m. to 12:00 p.m. only.
We can also allow users on a certain team to access certain applications on the production network but not others. ISE shares the policy with Cisco Catalyst Center, which instructs all the switches to put the policy in place. The switches tag each user’s traffic. "The tag works something like an employee badge, letting you in some doors but not others," said Patel.
Until recently we used a combination of applications to monitor network health and resolve issues. Now we're working with the business unit to move the same capabilities into Cisco Catalyst Center. The immediate improvement will be the ability to see service flows anywhere--from source to destination--from one pane of glass. (Currently we’re limited to one theater or region at a time.) Seeing everything from one pane of glass will help our engineers more quickly understand the problem and its context so they can resolve it sooner.
We're also working with the business unit to add more capabilities for troubleshooting and incident resolution. First, if Cisco Catalyst Center reports that a device has an issue, we'll be able to create a dynamic incident dashboard with a few clicks. The dashboard will show information for the affected device and its neighbors, making it easier to see and resolve problems.
If the remedy is a configuration change, we'll be able to enter the change once and then apply it to a group of switches with one click. Faster configuration changes are valuable if, say, we discover a new security threat and need to quickly change the security settings for hundreds of devices in a building.
What's next? Cisco Catalyst Center already includes the information that engineers need for initial troubleshooting when a case is opened. The business unit is working on our recommendation to also show the probable cause and recommended actions, based on machine learning and machine reasoning. The recommendations will become increasingly accurate as we and our customers continue using Cisco Catalyst Center. At some point Cisco Catalyst Center might even give the engineer the option to take the recommended actions to remediate an issue with the click of a button.
We want the single pane of glass to also include third-party management tools. To that end, we're currently integrating Cisco Catalyst Center with our ticketing system, ServiceNow. Later we'll add other popular systems, including NetSuite and Remedy. Here’s what we have in mind: When Cisco Catalyst Center detects an outage, it will automatically create a case, flag it for Tier 1, 2, or 3 support, and attach logs from the affected device and neighboring devices. No more delays while support engineers pull up the logs.
Today, our network engineers spend 20 to 40 percent of their time on code upgrades. We currently don't have an efficient bulk upgrade solution, so engineers manually upgrade devices one by one, following a playbook containing 30 to 50 steps. Manual upgrades cause two problems, according to Hoover. "First, there isn't enough time to upgrade every device whenever a new image is certified, so top priority goes to security upgrades," he said. "That's why only about 55 to 60 percent of devices are compliant with the latest image at any given time." The other problem is that manual steps inevitably lead to network errors. In fact, most network-related outages can be traced back to network changes.
As customer zero for Cisco Catalyst Center, we're working with the business unit to automate code upgrades and eliminate outages. We're planning to pilot Cisco Catalyst Center for software image management in 10 to 15 sites. Cisco Catalyst Center will identify devices with out-of-date images, and then stage the current code to those devices hours or days before activation. We'll wait to activate the new code until after normal business hours or over the weekend, when the change won’t disrupt work for people in that office.
We're projecting that automated software image management will increase compliance from between 55 and 60 percent, to over 70 percent. Mean time to repair (MTTR) for patching security vulnerabilities will improve from 2 hours today to an estimated 1 hour per office. That is, we’ll reduce our exposure time by 50 percent and also save 570 man-hours per upgrade for our 285 small and medium-size offices.
Plug and Play is what we used to call zero-touch fabric deployment. We're expecting significant cost savings when it's available. Today we spend $8 million annually for switch and router hardware and code upgrades through our fleet program. Say a South American office is scheduled to receive a new Catalyst 9000. Currently, we first send it to a regional shipping depot, where an engineer stages the code and applies the configurations. Someone at the depot re-boxes the device and ships it to the destination for racking and stacking by an engineer we’ve dispatched to the site. "We incur double shipping charges, plus the engineer's time and travel costs," David Maksim said.
With Plug and Play, we"ll be able to ship new devices directly to the branch office, where a local vendor will power it up. The switch will automatically connect to Cisco Catalyst Center to retrieve the correct code based on its serial number. "With automation and centralized management, fewer engineers can manage fleet upgrades at more sites--with zero travel," said Hoover.
We project that Plug and Play will reduce fleet program costs by about 25 percent, or more than $1.6 million annually. Savings include:
Once we've finished codeveloping the Cisco Catalyst Center use cases described here, we'll put them in production, starting with remote offices with up to 500 people. During the process, we'll continue acting in our role as customer zero, working closely with the business unit to suggest new use cases and share our experiences to keep adding business value for our customers.