Cisco Crosswork Situation Manager 8.0.x Implementer Guide

Available Languages

Download Options

PDF (7.4 MB)
View with Adobe Reader on a variety of devices

Updated:August 27, 2020

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Cisco Crosswork Situation Manager 8.0.x Implementer Guide

(Powered by Moogsoft AIOps 8.0)

The Implementer Guide contains instructions to help you plan, install, configure, and maintain Cisco Crosswork Situation Manager.

Related image, diagram or screenshot

Planning and Installation

System Requirements lists operating systems, browsers and third-party software required to run Cisco Crosswork Situation Manager. It also provides sizing recommendations for small, medium and large Cisco Crosswork Situation Manager systems.

Install Cisco Crosswork Situation Manager tells you how to install Cisco Crosswork Situation Manager using the various deployment options, how to install Add-ons and provides information on troubleshooting an installation.

Secure Your Installation tells you how to apply various security measures to your Cisco Crosswork Situation Manager system, including SSL certificates, external authentication, single sign-on with LDAP and SAML, and how to encrypt database communications. It also tells you how to manage users, roles and teams in Cisco Crosswork Situation Manager.

Data Ingestion and Event Processing

Before Ingesting Data outlines the steps to take before your Cisco Crosswork Situation Manager system can begin to ingest data. These include configuring logging, changing passwords for default users, analyzing your data and performing a business analysis to determine your Situation design goals.

Ingest Event Data from Monitoring Tools tells you how to prepare your data for ingestion, including how to select, clean, format, integrate and construct the data. It tells you how to map, parse and normalize data and describes the types of Link Access Module (LAM) and LAMbots you will use to achieve this.

Alert Creation and Enrichment

Configure Alert Creation tells you how to configure the Alert Builder, which creates alerts by processing event data from the Message Bus.

Process Alerts describes the components responsible for adding information to alerts and reducing noise. It tells you how to use enrichment processes to add supplemental data to alerts and Situations, and how to use topologies to view alerts and Situations according to the relationships that are important to your users.

Alert Clustering and Ticketing

Situation Design tells you how to use Cisco Crosswork Situation Manager features to create insightful, informative Situations for your users and teams. These features include Cookbook, Tempus, Merge groups, and topologies.

Process Situations tells you how to create a Situation action Workflow engine to trigger workflows based on Situation actions. For example, when a Situation is created, updated, or closed.

Integrate with Ticketing Services tells you how to integrate with ticketing services including ServiceNow.

Operational Administration

Configure Operator Experience tells you how to configure the UI to best suit your operators, including the landing page, hotkeys, alert and Situation columns, and ChatOps. It also tells you how to configure and retrain Probable Root Cause (PRC).

Reporting and Dashboards tells you how to use Insights to analyze trends in operational performance. You can use the default dashboard or you can use Grafana to create a custom dashboard.

Customize Cisco Crosswork Situation Manager Further tells you how to use customization options in Cisco Crosswork Situation Manager including server and client tools. It also contains information on how to troubleshoot problems in Cisco Crosswork Situation Manager and how to run diagnostic tools.

Housekeeping Tasks provides instructions on maintaining your Cisco Crosswork Situation Manager system, including upgrading the software, maintaining Situation design, configuring historic data retention, archiving Situations and alerts, and scheduling system downtime.

Planning and Installation

Related image, diagram or screenshot

System Architecture Overview

You can think of the Cisco Crosswork Situation Manager architecture in different ways:

1. The data processing modules that handle the various processing tasks like data ingestion and core data processing. To learn about data processing, see Data Processing Flow.

2. The individual software components that comprise and support data processing and the User Interface, like the Percona database, RabbitMQ, Apache Tomcat, Ngnix, etc. To learn about the software components and their relationship to data processing, see Server Roles.

Data Processing Flow

Before you configure or customize data processing in Cisco Crosswork Situation Manager, take some time to learn the components that comprise the basic flow for processing event, alert, and Situation data.

Except for the Link Access Modules (LAMs) that perform data ingestion, the rest of the data processing components are individual Moolets that run as part of the Moogfarmd. For more information, see Configure Data Processing.

Related image, diagram or screenshot

A) LAMs / Data Ingestion

The LAMs or Integrations ingest raw event data from your monitoring sources. LAMs do one of the following with the event data:

1. Map raw events into Cisco Crosswork Situation Manager events.

2. Discard events based upon system configuration. For example a blacklisting rule.

3. See Introduction to Integrations for more information.

B) Event Workflow Engine

The Event Workflow Engine listens for events on the message bus and processes them based upon any active workflows.

See Workflow Engine for an overview of how the Workflow Engine UI works. See Workflow Engine Moolets for information on the Moolet.

C) Alert Builder

The Alert Builder deduplicates events into alerts and calculates the entropy value for alerts. Deduplicated events are visible in the UI after passing through the Alert Builder.

See Configure Event De-duplication in the Alert Builder for more information.

D) Enricher

The Enricher is an optional moolet that you can use to enrich alert data from external data sources such as a CMDB. See Enrichment Overview for information about the enrichment process.

See Enricher Moolet for information on the Moolet.

E) Enrichment Workflow Engine

The Enrichment Workflow Engine listens for alerts on the message bus and processes them based upon any active workflows. See Workflow Engine for an overview of how the Workflow Engine UI works. See Workflow Engine Moolets for information on the Moolet.

See Workflow Engine for an overview of how the Workflow Engine UI works. See Workflow Engine Moolets for information on the Moolet.

F) Maintenance Window Manager

The Maintenance Window Manager prevents alerts from creating Situations during known maintenance downtimes.

To learn how to create a maintenance window, see Maintenance Window Manager for information on the Moolet.

G) Alert Workflow Engine

The Alert Workflow Engine listens for alerts on the message bus after they have passed through the Maintenance Window Manager. It processes alerts based upon any active workflows you have created. If you want to set up alert routing to a different clustering algorithm, you can use the Alert Workflow Engine. For example, you can forward alerts to Tempus.

See Workflow Engine works for an overview of how the. See Workflow Engine Moolets for information on the Moolet.

H) Alert Rules Engine

If you upgraded from a previous version, you may have data processing configurations that use the Alert Rules Engine. The Alert Rules Engine lets you define criteria to process alerts according to different Transitions to move these alerts to different Action States. Before you start an implementation with the Alert Rules Engine, see if the Workflow Engine meets your needs.

See Alert Rules Engine for more information.

I) Clustering Algorithms

The clustering algorithms (Sigalisers) in Cisco Crosswork Situation Manager group related alerts into Situations.

See the Clustering Algorithm Guide for an overview of the algorithms. To configure a clustering algorithm, see Configure Clustering Algorithms.

J) Situation Manager

The Situation Manager listens for Situation creation, update, and closure actions and lets you automate processes like data enrichment, assignment, or notification to a ticketing system.

The Labeler is part of the Situation Manager. See Situation Manager for more information.

K) Teams Manager

The Teams Manager Moolet listens for new Situation creation, update, and closure actions. It handles the team assignments you create in the Settings UI. See Manage Teams.

See Teams Manager Moolet for information on the Moolet.

L) Situation Workflow Engine

The Situation Workflow Engine listens for Situations on the message bus after they have passed through the Situation Manager. It processes Situations based upon any active workflows you have created.

See Workflow Engine for an overview of how the Workflow Engine UI works. See Workflow Engine Moolets for information on the Moolet.

The following video further explains the data processing flow:

Server Roles

In order to plan your Cisco Crosswork Situation Manager deployment, it helps to understand the different components of Cisco Crosswork Situation Manager and the options for distributing them among multiple physical or virtual machines.

A server role within an Cisco Crosswork Situation Manager installation is a functional entity containing components that must be installed on the same machine. You can distribute different roles to different machines.

The following diagram illustrates the typical deployment strategy for the components of Cisco Crosswork Situation Manager in an Highly Available configuration:

Related image, diagram or screenshot

The architecture is built upon two clusters with software components that serve several roles. See also HA Reference Architecture.

In the case of a single-server installation, you install all the roles on one machine.

UI role

The UI role comprises Nginx and Apache Tomcat, represented in the diagram as numbers 1 and 2. The Cisco Crosswork Situation Manager servlets groups run in active / active configuration.

Ngnix is the proxy for the web application server and for integrations.

Tomcat is the web application server. It reads and writes to the Message Bus and the database.

Database role

Percona XtraDB Cluster serves the database role, represented in the diagram as numbers 3, 4, and 5. The cluster runs in active / active standby / active standby mode.

Percona Xtra Db Cluster is the system datastore that handles transactional data from other parts of the system: LAMs (integrations), data processing, and the web application server.

HA Proxy handles database query routing and load balancing.

Core role

The Core role, represented by numbers 6 and 7 in the diagram comprises the following:

1. Moogfarmd, the Cisco Crosswork Situation Manager data processing component. Moogfarmd consumes messages from the Message Bus. It processes event data in a series of servlet-like modules called Moolets. Moogfarmd reads and writes to the database and publishes messages to the bus.

2. RabbitMQ which provides the message queue. It receives published messages from integrations. It publishes messages destined for data processing (Moogfarmd) and the web application server.

3. Elasticsearch which provides the UI search capability. It indexes documents from the indexer Moolet in the data processing series. It returns search results to Tomcat.

In HA deployments, Moogfarmd automatically runs in active / passive mode. See High Availability Overview for more information.

In concert with the the Redundancy Role server, RabbitMQ and Elasticsearch run in active / active / active mode.

Redundancy role

The redundancy role, represented by number 8 in the diagram, provides the third node required for true HA for RabbitMQ and Elasticsearch.

Data ingestion role

Link Access Modules (LAMs) make up the data ingestion role represented by numbers 9 and 10 in the diagram. Receiving LAMs listen for events from monitoring sources and Polling LAMs poll monitoring sources for events. Both parse and encode raw events into discrete events, and then write the discrete events to the Message Bus.

In HA deployments, receiving LAMs run in active / active mode, but polling LAMs run in active / passive mode.

Load balancers

The load balancers in front of the UI server role and the data ingestion server role are the customer's responsibility.

Scale Your Cisco Crosswork Situation Manager Implementation

Cisco Crosswork Situation Manager supports several options to help you scale your implementation to meet your performance needs. Monitor and Troubleshoot Cisco Crosswork Situation Manager to monitor your system for signs that it is time to scale.

For information on the performance tuning capabilities of individual Cisco Crosswork Situation Manager components, see Monitor Component Performance.

Horizontal Scaling

Cisco Crosswork Situation Manager currently supports horizontal scaling at the integration (LAM) and visualization (Ngnix + Tomcat) layers.

1. You can add more LAMs, either on additional servers or on the same server, to achieve higher event rates. In this case, you have the option to configure event sources to send to the parallel LAMs separately or to implement a load balancer in front of the LAMs.

2. You can add Nginx/Tomcat UI "stacks" behind a load balancer to increase performance for UI users. Adding UI stacks does not always provide better performance. It can degrade performance by adding more connection pressure to the database.

The following are typical horizontal scaling scenarios:

1. You can add an additional LAM to process incoming events if you see that, despite attempts to tune the number of threads for an individual LAM, its event rate hits a plateau. This is a sign that the LAM is the bottleneck, so adding other instances of the LAM behind a load balancer will allow a higher event processing rate.

2. You can add an additional UI stack if database pool diagnostics for Tomcat suggest that all or most of the database connections are constantly busy with long running connections, but the database itself is performing fine.

The data processing layer (moogfarmd) is not currently well suited to horizontal scaling. Moolets of the same type cannot currently share processing. Adding more Moolets like the AlertBuilder in an attempt to increase the event processing rate is likely to lead to database problems.

Vertical Scaling

All Cisco Crosswork Situation Manager components ultimately benefit from being run on the best available hardware, but the data processing layer (moogfrarmd) benefits most from this approach. Depending on the number and complexity of Moolets in your configuration, you will see performance benefits in data processing on servers having the fastest CPUs with numerous cores and a large amount of memory. This enables you to increase the number of threads for moogfarmd to improve processing speed. You should also locate the database on the most feasibly powerful server (clock speed, number of cores and memory) with the biggest/fastest disk.

Distributed Installations

In some cases you distribute Cisco Crosswork Situation Manager components among different hosts to gain performance because it reduces resource contention on a single server: The most common distribution is to install the database on a separate server, ideally within the same fast network to minimize risk of latency. An additional benefit of this move is that it allows you to run a clustered or master/slave database for redundancy.

Another common distribution is to install the UI stack (Nnginx) on a separate server within the same fast network.

Some integrations (LAMs) benefit in being closer to the source so are a candidates for distribution.

See Server Roles and HA Installation for more information.

System Requirements

Cisco Crosswork Situation Manager 8.0 Supported Environments

The following operation systems, browsers and third-party software are either supported or are required in order to run Cisco Crosswork Situation Manager.

Any operating systems and browsers not listed in the sections below are not officially recommended or supported.

Operating systems

You can run Cisco Crosswork Situation Manager on the following versions of Red Hat Enterprise Linux®(RHEL) and CentOS Linux:

OS	Version
CentOS	v7
RHEL	v7

Note

No other Linux distributions are currently supported

Browsers

You can use the following browsers for the Cisco Crosswork Situation Manager UI:

Browser	Version
Chrome	Latest
Firefox	Latest
Safari	Latest
Edge	Latest

Note

Due to a known issue in the Safari web browser, you must take additional steps if you've enabled the enhanced Content Security Policy in v8.0 and you want to access the UI with Safari. For more information, see RPM - Upgrade UI components or Tarball - Upgrade UI components depending on your implementation type.

After upgrading macOS to Catalina, the UI is inaccessible in Chrome, Safari and Edge browsers because self-signed certificates are no longer trusted. For workaround instructions see Catalina Browser Certificate Workaround.

Supported Third Party software

Cisco Crosswork Situation Manager v8.0 ships with the following third-party applications:

Application	Version
Apache Tomcat®	v9.0.35
Elasticsearch	v6.8.1 (LTS version)
Nginx	v1.14.0 or above
RabbitMQ	v3.7.4
Percona XtraDb Cluster	v5.7.28
Percona XtraBackup	v2.4.20
HA Proxy	v1.5.18-9.el7.x86_64

Other supported application packages include:

Application	Version
Erlang	v20.1.7
JDK	java-11-openjdk-devel >= 1:11.0.5
Apache Tomcat® Native	v1.2.23 or above
MySQL	v5.7.28

You can do a minor upgrade of the JDK (for example, from 11.0.4 to 11.0.5) without having to also update the Cisco Crosswork Situation Manager RPM files.

Note

MySQL is supported for upgrading customers only. New Cisco Crosswork Situation Manager installations use Percona XtraDB Cluster for database management.

Integration support

The following table outlines the vendor supported integrations for the current version of Cisco Crosswork Situation Manager alongside the corresponding supported software versions.

Integrations support IPv6 connectivity.

Integration Version	Supported Software / Version
Ansible Tower Integration v1.11	Ansible Tower v3.0, 3.1
Apache Kafka Integration v1.14	Apache Kafka v0.9, 1.1, 2.2
AppDynamics Integration v2.2	AppDynamics v4.0, 4.1
AWS CloudWatch Integration v2.1	aws-java-sdk v1.11
AWS SNS Integration v1.3	Runtime Node.js 8.10, Node.js 10.x, and Node.js 12.x
BMC Remedy Integration v1.9	Remedy v9.1
CA UIM Integration v1.8	CA Nimsoft UIM v8.4
CA Spectrum Integration v2.3	CA Spectrum v10.2
Catchpoint Integration v1.1	Catchpoint v2019
Cherwell Service Management Integration v1.6	Cherwell v9.3
Datadog Polling Integration v1.4	Datadog v2018
Datadog Webhook Integration v1.12	Datadog v5.21
Dynatrace APM Plugin Integration v1.9	Dynatrace v6.5, 7.0
Dynatrace APM Polling Integration v2.4	Dynatrace v7.2.0.1697
Dynatrace Notification Integration v1.6	Dynatrace v1.187.132.20200224-165652
Email Integration v2.6	IMAP, IMAPS, POP3, POP3S
EMC Smarts Integration v1.5	RabbitMQ v3.7.4 and Smarts v9.5
ExtraHop Integration v1.2	ExtraHop v2018
AWS FireLens v1.01	AWS FireLens (New in 8.0)
FluentD Integration v1.11	FluentD v0.12
Grafana Integration v1.2	Grafana v5.2.4
HP NNMi Integration v2.6	HP NNMi v10.30
HP OMi Plugin Integration v1.9	HP OMi v10.1
HP OMi Polling Integration v2.6	HP OMi v10.1
JIRA Service Desk Integration v1.12	JIRA Service Desk v. 4.7.1 and JIRA Cloud Rest API v. 2
JIRA Software Integration v1.12	JIRA Software v. 7, v. 8.7.1 and JIRA Cloud Rest API v. 2
JMS Integration v1.12	ActiveMQ v5.14, JBoss v10, WebLogic v12.0
Moogsoft Express Polling Integration v1.0	Moogsoft Express (new in 8.0)
Moogsoft Express Webhook Integration v1.0	Moogsoft Express (new in 8.0)
Microsoft Azure Integration v1.2	Microsoft Azure Monitor v2018
Microsoft Azure Classic Integration v1.2	Microsoft Azure Classic v2018
Microsoft SCOM Integration v2.7	Microsoft SCOM v2012, 2016 and 2019
Microsoft Teams Integration v1.1	Microsoft Teams v1.2.00.3961
Nagios Integration v2.10	Nagios vXI
New Relic Integration v1.10	New Relic v2016
New Relic Polling Integration v2.1	New Relic v2.3
New Relic Insights Polling Integration v1.1	New Relic v2.3
Node.js Integration v1.10	Node.js v1.6
NodeRED Integration v1.10	Nagios Red v016, 017
OEM Integration v2.3	Oracle Enterprise Manager v12c, 13c
Office 365 Email Integration v1.0
OpsGenie v1.0	Opsgenie's Alerts v2 REST API (new in 8.0)
PagerDuty v1.0	PagerDuty SaaS (new in 8.0)
Pingdom Integration v1.9	Pingdom v2017
Sensu Integration v1.0	Sensu Core v1.8
ServiceNow Integration v4.5	ServiceNow vNew York, Madrid, London, Kingston
SevOne Integration v1.5	SevOne v5.7.2.0
Site24x7 Integration v1.1	Site24x7 June-2019
Slack Integration v1.7	Slack v3.1
SolarWinds Integration v3.3	SolarWinds v. 11.5, v. 12.2, v. 12.3, v. 2019.4
Splunk Integration v2.6	Splunk v. 6.5, v. 6 6, v. 7.0, v. 7.1, v. 7.2, v. 7.3, v. 8.0
Splunk Streaming Integration v1.1	Splunk v. 7.1, v. 7.2, v. 7.3, v. 8.0
Sumo Logic Integration v1.2	Sumo Logic v2018
VMware vCenter Integration v2.4	VMware vCenter v. 6.0, v. 6.5, v. 6.7
VMware vROps Integration v2.4	VMware vROps v6.6, v7.5.0
VMware vSphere Integration v2.5	VMware vSphere v. 6.0, v. 6.5, v. 6.7
VMware vRealize Log Insight Integration v2.5	VMware vRealize Log Insight v4.3
WebSphere MQ Integration v1.13	WebSphere MQ v8
xMatters Integration v2.0	xMatters v5.5
Zabbix Integration v1.0	Zabbix v3.4
Zabbix Polling Integration v3.5	Zabbix v3.2, v4.0, v4.4
Zenoss Integration v2.6	Zenoss v4.2, v6.3.2

Add-ons

Cisco Crosswork Situation Manager v8.x supports Cisco Add-ons v2.0 and later. See Add-ons.

Sizing Recommendations

The sizing recommendations below are guidelines for small, medium and large Cisco Crosswork Situation Manager systems based on input data rate and volume. Event calculations depend on the number of events sent to the Alert Builder.

In the context of this guide, Managed Devices (MDs) are all of the components in the network infrastructure that generate and emit events:

Small

Environment

CPU

File System

1000 to 5000 Managed Devices (MDs)

Less than 20 users

Up to 5 integrations

Less than 20 events per second to Alert Builder

8 Cores

32GB RAM

2 x 1GB Ethernet

Physical or Virtual Server

1 TB Local or SAN

See Retention policy below.

Medium

Environment

CPU

File System

5000 to 20,000 MDs

Between 20 and 40 users

Between 6 and 10 integrations

Between 20 and 100 events per second to Alert Builder

16 Cores

64GB RAM

2 x 1GB Ethernet

Physical or Virtual Server

1 TB Local or SAN

See Retention policy below.

Large

Environment

CPU

File System

More than 20,000 MDs

More than 40 users

More than 10 integrations

More than 100 events per second to Alert Builder

24+ Cores

128GB RAM

2 x 1GB Ethernet

Physical or Virtual Server

1 TB Local or SAN

See Retention policy below.

Virtualization restrictions

Consider the following restrictions for virtual environments:

· Ideally all Cisco Crosswork Situation Manager servers (guests) should be on the same compute node (host) sharing a hypervisor or virtual machine monitor. This minimizes latency between Cisco Crosswork Situation Manager guests.

· If servers are liable to automated resource balancing (for example vMotion) and liable to move compute nodes, then all Cisco Crosswork Situation Manager servers should be moved at the same time. If this is not possible, then Cisco Crosswork Situation Manager servers should be constrained to movements that minimize the resulting network distance.

· If Cisco Crosswork Situation Manager servers are distributed amongst compute nodes then the network “distance” (logical hops) between the nodes should be minimized.

· Network latency between components may affect event processing throughput. This is especially true of the core to db servers.

Shared storage

On any shared compute platform Cisco makes the following recommendations:

The minimum resource requirements are multiplied by at least 33% to account for shared resource usage and allocation.

Storage latency will reduce effective throughput at the core processing layer and should be minimised within the available constraints of a SAN.

Cisco Crosswork Situation Manager should be treated as a highly transactional system and not placed on the same compute node as other highly transactional applications that may cause SAN resource contention.

SAN port and array port contention should be minimized.

Storage medium should be as fast as possible to minimize the transaction times to the database.

Retention policy

You can determine the amount of disk space in GB required for the database server using the following calculation:

(es x eps x d x 86,400) x 1.2 / 1,000,000

For this calculation: es = average event size in KB, eps = average events per second, d = number of days of retention and 86,400 represents the number of seconds per day.

For the majority of event sources, you can reasonably estimate a 2KB event size. However, some sources have larger than average events. For example, Microsoft SCOM. A 2KB base takes account of the other event and alert based storage such as an alert's Situation membership and Situation room thread sizes.

The average event rate is across all LAMs and integrations.

Note

If you do not enable the Archiver tool, the historic database will grow indefinitely. See Archive Situations and Alerts for more information.

For example, the following calculation represents a 400 day retention period with an average event size of 2KB at 300 events per second:

(2 x 300 x 400 x 86,400) x 1.2 / 1,000,000 = 24,883.2 GB.

Install Cisco Crosswork Situation Manager

Use this guide to learn how to install Cisco Crosswork Situation Manager:

If you are installing another version, see Welcome to the Cisco Docs! for more information. Refer to the following topics to help choose the right environment for your Cisco Crosswork Situation Manager deployment:

1. The Cisco Crosswork Situation Manager Cisco Crosswork Situation Manager 8.0 Supported Environments topic details supported operating systems and system requirements.

2. The Sizing Recommendations topic will help you make sure you select hardware to support your data ingestion and user requirements.

If you are upgrading Cisco Crosswork Situation Manager, see Upgrade Cisco Crosswork Situation Manager.

Deployment options

You have the option to install all Cisco Crosswork Situation Manager packages on a single machine. However, the modular approach of the Cisco Crosswork Situation Manager distribution means fewer dependencies between individual packages. This means you have the flexibility to install different components to different machines. See Server Roles for a description of how you can distribute the different components amongst multiple machines.

1. For smaller deployments, you can run all the components in on a single machine.

— If you have root access to the machine and want to use Yum to install, see RPM Installation.

1. For most production deployments, you may install different components to different machines in order to distribute the workload. See High Availability Overview for more information.

Install Cisco Add-Ons

Cisco periodically releases add-ons to extend and enhance the core Cisco Crosswork Situation Manager functionality. For example, new Workflow Engine functions, new Workflow Engines, or Integrations tiles. All add-ons releases are cumulative and include the fixes from previous releases.

Once you have finished upgrading or installing Cisco Crosswork Situation Manager, you should install the Cisco Crosswork Situation Manager add-ons to ensure you have the latest version.

See Install Cisco Add-ons for more information on how to install the Cisco Crosswork Situation Manager add-ons.

Troubleshoot the installation

You may encounter issues when installing or upgrading Cisco Crosswork Situation Manager.

See Troubleshoot Installation and Upgrade for more information on how to resolve these issues.

For more information on message bus or Elasticsearch configuration, see Configure the Message Bus or Configure Search and Indexing.

Prepare to Install Cisco Crosswork Situation Manager

Before you start to install Cisco Crosswork Situation Manager v8.0.x, you must perform certain pre-installation tasks.

The instructions to follow depends on your preferred mode of deployment:

1. RPM: Use this method if you have root access to your Cisco Crosswork Situation Manager server(s) and you do not want to change the default installation locations.

2. Use the offline instructions if your Cisco Crosswork Situation Manager server(s) do not have access to the internet.

For pre-installation instructions, refer to one of the following topics:

1. Online RPM pre-installation

2. Cisco Crosswork Situation Manager - Offline RPM pre-installation

Cisco Crosswork Situation Manager - Offline RPM pre-installation

You must perform certain preparatory tasks before you install Cisco Crosswork Situation Manager v8.0.x.

Follow these steps if you have root access to the machine or machines on which you will install or upgrade Cisco Crosswork Situation Manager, but you cannot connect to Yum repositories outside your network from those machines.

If you are performing another type of installation, see:

a. Online RPM pre-installation: #.

b. Online Tarball pre-installation: #.

c. Offline Tarball pre-installation: #.

Before you begin

Before you begin to prepare for the installation, verify the following:

1. You have root access to the system where you plan to install Cisco Crosswork Situation Manager.

2. You are familiar with the supported versions of third party software, as outlined in Cisco Crosswork Situation Manager 8.0 Supported Environments.

Download the installation files

Complete the following steps before you perform an offline RPM installation of Cisco Crosswork Situation Manager v8.0.x:

· Download the Percona and dependency packages using cURL on an internet-connected host:

curl -L -O http://repo.percona.com/percona/yum/release/7/RPMS/x86_64/Percona-XtraDB-Cluster-shared-57-5.7.28-31.41.1.el7.x86_64.rpm
curl -L -O http://repo.percona.com/percona/yum/release/7/RPMS/x86_64/Percona-XtraDB-Cluster-client-57-5.7.28-31.41.1.el7.x86_64.rpm
curl -L -O http://repo.percona.com/percona/yum/release/7/RPMS/x86_64/Percona-XtraDB-Cluster-server-57-5.7.28-31.41.1.el7.x86_64.rpm
curl -L -O http://repo.percona.com/percona/yum/release/7/RPMS/x86_64/Percona-XtraDB-Cluster-shared-compat-57-5.7.28-31.41.1.el7.x86_64.rpm
curl -L -O http://repo.percona.com/percona/yum/release/7/RPMS/x86_64/percona-xtrabackup-24-2.4.19-1.el7.x86_64.rpm

· Copy the Percona install_percona_nodes.sh install script and RPM install files to all servers that will house a database node.

· Copy the tar.gz files to all servers that will run Cisco Crosswork Situation Manager components.

· Download the HA Proxy RPM on an internet-connected host (requires root permissions):

yum install --downloadonly --downloaddir ./ haproxy

Copy the HA Proxy RPM to the servers that will have the Core, UI and LAM server roles.

See Server Roles for more information on the Core, UI and LAM server roles.

Prepare the local Yum repositories

Follow these steps to create local Yum repositories to house the installation packages. If you are running a distributed installation, perform these steps on each machine that will run Cisco Crosswork Situation Manager components.

· Create two directories to house the repositories. For example:

sudo mkdir -p /media/localRPM/BASE/

sudo mkdir -p /media/localRPM/ESR/

· Extract the two Tarball files into separate directories and move the HA Proxy RPM to /media/localRPM/BASE/. For example:

tar xzf *-MoogsoftBASE7_offline_repo.tar.gz -C /media/localRPM/BASE/

tar xzf *-MoogsoftESR_8.0.0.1_offline_repo.tar.gz -C /media/localRPM/ESR/

mv haproxy*rpm /media/localRPM/BASE/

Back up the existing /etc/yum.repos.d directory. For example:

mv /etc/yum.repos.d /etc/yum.repos.d-backup

· Create an empty /etc/yum.repos.d directory. For example:

mkdir /etc/yum.repos.d

· Create a local.repo file ready to contain the local repository details:

vi /etc/yum.repos.d/local.repo

· Edit local.repo and configure the baseurl paths for BASE and ESR to point to the your directories. For example:

[BASE]
name=MoogCentOS-$releasever - MoogRPM
baseurl=file:///media/localRPM/BASE/RHEL
gpgcheck=0
enabled=1
[ESR]
name=MoogCentOS-$releasever - MoogRPM
baseurl=file:///media/localRPM/ESR/RHEL
gpgcheck=0
enabled=1

· Clean the Yum cache:

yum clean all

Verify that Yum can detect the newly created repositories. For example:

      yum info "moogsoft-*"

Available Packages
Arch        : x86_64
Version     : 8.0.0.1
Release     : XYZ
Size        : 76 M
Repo        : ESR
Summary     : Algorithmic Intelligence for IT Operations
URL         : https://www.moogsoft.com
License     : Proprietary
Description : Moogsoft AIOps (8.0.0.1) - Build: XYZ - (Revision: XYZ)

The results should include the following packages:

      Name        : moogsoft-db
Name        : moogsoft-integrations
Name        : moogsoft-integrations-ui
Name        : moogsoft-mooms
Name        : moogsoft-search
Name        : moogsoft-server
Name        : moogsoft-ui
Name        : moogsoft-utils
Name        : moogsoft-common
Name        : moogsoft-ccsm

· Install the downloaded Percona RPMs on all servers that will house a database node:

yum -y install Percona-XtraDB-Cluster-*.rpm percona-xtrabackup-24-2.4.19-1.el7.x86_64.rpm

· Install Java 11:

VERSION=11.0.7.10; yum -y install java-11-openjdk-headless-${VERSION} java-11-openjdk-${VERSION} java-11-openjdk-devel-${VERSION}

· Set SELinux to permissive mode or disable it completely. For example, to set SELinux to permissive mode:

setenforce 0

If you want to disable SELinux at boot time, edit the file /etc/sysconfig/selinux.

· Ensure the current user, or the user that will be running the Moogsoft processes, has sufficient resource limits by running the following commands as that user:

ulimit -n
ulimit -u

If either of the values returned are less than 65536, add the following to the /etc/security/limits.conf file as root:

      moogsoft soft    nofile   65536
moogsoft    hard    nofile   65536
moogsoft    soft    nproc   65535
moogsoft    hard    nproc   65535

Optional: GPG key validation of the RPMs

To validate the RPMs before installation:

· Download the key. For 8.0.0.1 and prior:

https://keys.openpgp.org/vks/v1/by-fingerprint/2529C94A49E42429EDAAADAEC7A2253BFC50512A

· Copy the key to the server onto which the RPMs or tarball will be installed (it will be an .asc file)

· Import the key. For example, for 8.0.0.1 and prior:

gpg --import 2529C94A49E42429EDAAADAEC7A2253BFC50512A.asc

· You can download the CCSM RPMs from Cisco eDelivery.

· Move the RPMs and .sig files into the same folder. For example, /tmp, as used in the example below.

· Copy the following code into a bash terminal and run it to perform the validation:

while read RPM
do
echo "Current RPM: $RPM"
gpg --verify ${RPM}.sig ${RPM} 2>&1
done < <(find /media/localRPM/ESR/RHEL/ -name '*.rpm');

· Confirm that all the commands for each RPM report:

Good signature from "Moogsoft Information Security Team "<security@moogsoft.com>"

Your local Yum repositories are now ready. Proceed with your offline installation or upgrade. See the upgrade instructions relevant to your deployment.

RPM Installation

This topic describes how to install Cisco Crosswork Situation Manager v8.0.x on a single host.

Follow these steps if you have root access to the machine or machines on which you will install Cisco Crosswork Situation Manager, and you can connect to Yum repositories outside your network from those machines.

To install Cisco Crosswork Situation Manager in a highly available distributed environment, see HA Installation.

Before you begin

Before you start to install Cisco Crosswork Situation Manager, complete all steps in one of the following documents:

· Online RPM pre-installation: If you have root access to the machine or machines on which you will install Cisco Crosswork Situation Manager, and you can connect to Yum repositories outside your network from those machines.

· Cisco Crosswork Situation Manager - Offline RPM pre-installation: If you have root access to the machine or machines on which you will install or upgrade Cisco Crosswork Situation Manager, but you cannot connect to Yum repositories outside your network from those machines.

Install Cisco Crosswork Situation Manager

To complete an RPM installation of Cisco Crosswork Situation Manager v8.0.x, perform the following steps:

· Download and install the Cisco Crosswork Situation Manager RPM packages, using one of the following methods according to your deployment type:

· If you are performing an RPM installation:

VERSION=8.0.0.1; yum -y install moogsoft-server-${VERSION} \
moogsoft-db-${VERSION} \
moogsoft-utils-${VERSION} \
moogsoft-search-${VERSION} \
moogsoft-ui-${VERSION} \
moogsoft-ccsm-${VERSION} \
moogsoft-common-${VERSION} \
moogsoft-mooms-${VERSION} \
moogsoft-integrations-${VERSION} \
moogsoft-integrations-ui-${VERSION}

· If you are performing an offline RPM installation, navigate to the location where you copied the RPM files and install them:

yum install *.rpm

· Edit your ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

· Source the ~/.bashrc file:

source ~/.bashrc

· Run the Percona install script:

bash install_percona_nodes.sh;

The script guides you through the installation process. To configure a single database node on the same server as Cisco Crosswork Situation Manager, use these settings:

— Configure Percona as "Primary".

— Do not set the server to "DB only".

— Set the first database node IP address to the server IP address.

— When prompted to enter the IP addresses of the second and third nodes, press Enter to skip these settings.

Initialize Cisco Crosswork Situation Manager

When the installation process is complete, initialize Cisco Crosswork Situation Manager as follows:

· Run the initialization script moog_init, replacing <zone name> with your desired RabbitMQ VHOST:

$MOOGSOFT_HOME/bin/utils/moog_init.sh -I <zone_name> -u root

The script prompts you to accept the End User License Agreement (EULA) and guides you through the initialization process.

When asked if you want to change the configuration hostname, say yes and enter the public URL for the server. The public URL is the URL the instance will be connected to through a browser.

Note

When prompted for a password, enter the password for the root database user (not the UNIX system user). If you are installing Percona on this machine for the first time, leave the password blank and press Enter to continue.

The zone_name sets up a virtual host for the Message Bus. If you have multiple systems sharing the same bus, set a different zone name for each.

If you are deploying more than one database, configure HA Proxy to load-balance the database nodes. The following script requires root privileges. Run this script on any host running any Cisco Crosswork Situation Manager components after you have installed the RPMs:

$MOOGSOFT_HOME/bin/utils/haproxy_installer.sh

· Restart Moogfarmd:

service moogfarmd restart

Configure Elasticsearch heap size

The minimum and maximum JVM heap sizes must be large enough to ensure that Elasticsearch starts.

See Finalize and Validate the Upgrade for more information.

Enable the enhanced Content Security Policy (optional)

Cisco has provided an optional enhanced Content Security Policy (CSP) as part of this release. CSP is a security standard introduced to prevent Cross Site Scripting (XSS) and other data injection attacks. For more information, see the Mozilla document on Content Security Policy.

The CSP is controlled by Nginx and is disabled by default. To enable it:

· Edit the following file:

/etc/nginx/conf.d/moog-ui-headers.conf

· Uncomment the line that starts with add_header Content-Security-Policy and save the file.

· Restart Nginx:

service nginx reload

Note

If you enable the enhanced CSP you must follow the steps below to allow access to external domains. If you want to access the UI with the Safari web browser, you must follow the steps below to configure Cisco Crosswork Situation Manager for use with Safari.

Allow access to external domains

If you enable the enhanced CSP, the following features require additional configuration to allow access to external domains:

1. Situation Room plugins to external domains

2. Situation client tools to external URLs

To allow access to required external domains:

· Edit the following file:

/etc/nginx/conf.d/moog-ui-headers.conf

· Add a frame-src directive to the Content-Security-Policy header for the required domain. For example, run the following command to allow Google domains:

sed -i "s/add_header Content-Security-Policy$.*$\" always/add_header Content-Security-Policy\1; frame-src 'self' *.google.com\" always/" /etc/nginx/conf.d/moog-ui-headers.conf

· Restart Nginx:

service nginx reload

Note

Cisco Crosswork Situation Manager allows access to Pendo and WalkMe domains by default.

Configure Cisco Crosswork Situation Manager for use with Safari

Due to a known issue in the Safari web browser, you must take additional steps if you've enabled the enhanced CSP and you want to access the UI with Safari:

· Edit the following file:

/etc/nginx/conf.d/moog-ui-headers.conf

· Add the following websocket URLs to the Content-Security-Policy section of the file. Substitute your hostname for <webhost>:

wss://<webhost>/moogpoller/ws
wss://<webhost>/integrations/ws/v1

You can update the configuration using a command similar to the following. Substitute your hostname for <webhost>:

sed -i.bak "s;connect-src 'self' app;connect-src 'self' wss://<webhost>/moogpoller/ws wss://<webhost>/integrations/ws/v1 app;g" /etc/nginx/conf.d/moog-ui-headers.conf

· Restart Nginx:

service nginx reload

Confirm system ulimits

Ensure the 'moogsoft' system user has sufficient limits by running the following command as root:

runuser -l moogsoft -c 'ulimit -n; ulimit -u;'

If either of the values returned are less than 65536, add the following to the /etc/security/limits.conf file as root:

      moogsoft soft    nofile   65536
moogsoft    hard    nofile   65536
moogsoft    soft    nproc   65535
moogsoft    hard    nproc   65535

Verify the installation

To verify that the installation has completed successfully, follow the steps outlined in Validate the Installation.

Change passwords for default users

When the installation is complete, it is critical that you change the passwords for the default users created during the installation process. See Change passwords for default users for more information.

Install Cisco Add-Ons

Once you have finished upgrading or installing Cisco Crosswork Situation Manager, you should install the Cisco Crosswork Situation Manager add-ons to ensure you have the latest version.

See Install Cisco Add-ons for more information on how to install the Cisco Crosswork Situation Manager add-ons.

High Availability Overview

Cisco Crosswork Situation Manager supports high availability (HA) architectures to improve the fault tolerance of Cisco Crosswork Situation Manager. Each component supports a multi-node architecture to enable redundancy, failover, or both to minimize the risk of data loss, for example, in the case of a hardware failure

This topic covers the architectures you can use to achieve HA with Cisco Crosswork Situation Manager. For an example of how to set up a single site HA system, see HA Installation. See HA Reference Architecture for a detailed diagram of the components in a single site HA configuration.

Distributed HA architectures

Cisco Crosswork Situation Manager supports high availability in distributed architectures where different machines host a subset of the stack. You can run one or more of the server roles on its own machine.

See Server Roles for details of the HA architecture server roles in Cisco Crosswork Situation Manager.

If you run more than one server role on a machine, choose a primary role for the server. The primary role dictates which additional roles are supported on the machine as follows:

Primary Role	Supported Secondary Roles
Core	UI, Data ingestion and Database
UI	Data ingestion
Data Ingestion	UI
Database	Redundancy
Redundancy	Database

See Scale Your Cisco Crosswork Situation Manager Implementation for information on how to increase capacity within the HA architecture, you can.

Contact your Cisco technical representative to discuss scaling your deployment.

See Sizing Recommendations for more information on hardware sizes and capacity.

After you decide on the best HA architecture for your environment, you can plan your implementation.

Resilience and failover

Cisco Crosswork Situation Manager provides support for automatic failover between the two nodes within an HA pair. For example from one instance of Moogfarmd to another, or from one instance of a LAM to another. When an active instance in an HA pair fails, Cisco Crosswork Situation Manager persists any affected messages. The passive instance of the HA pair automatically takes over, and processes those messages without any interruption or loss of data. See Message Persistence for more information.Configure the Message Bus

There is no automatic failover between multiple HA pairs. For example, there is no failover from a primary site to a second site, such as a disaster recovery replica.

Cisco Crosswork Situation Manager does not support automated fail-back for any architecture. For example, consider an HA pair of Moogfarmd instances. When the instance of Moogfarmd in cluster 1 becomes unavailable, the instance in cluster 2 enters an active state. When the instance from cluster 1 recovers and becomes available, the instance in cluster 2 remains active.

High Availability Configuration Hierarchy

Cisco Crosswork Situation Manager deployments use a tiered hierarchy of clusters, groups, instances and roles to achieve High Availability.

A cluster is a collection of Cisco Crosswork Situation Manager components that can deliver an uninterrupted processing workflow. To achieve HA you need at least two clusters that include all the Cisco Crosswork Situation Manager components. You need an additional, third machine, for message queue and search components.

A group comprises a single component or two identical components that provide resilience over two or more clusters. Cisco Crosswork Situation Manager automatically controls the active or passive behaviour and failover of the instances within a group.

An example of a group is a Socket LAM configured for the same source in two separate clusters. Other groups include the following;

· Servlets for the UI.

· Moogfarmd for data processing.

· Individual LAMs for data ingestion. For example the REST LAM.

An instance is an individual component running within a group. Each instance in a group provides resilience for the other instance. For example the primary instance of a Socket LAM pairs with a secondary instance in the second cluster to make a group.

A role within a Cisco Crosswork Situation Manager installation is a functional entity containing components that must be installed on the same machine. You can distribute different roles to different machines. For example, the Core role.

HA Reference Architecture

The diagram in this topic represents a Cisco Crosswork Situation Manager High Availability deployment to a single site: one datacenter, LAN, or availability zone. To support this architecture, all servers must have sufficient connection speed amongst themselves so that latency between hosts does not exceed 5 ms.

Related image, diagram or screenshot

A) Load balancers / VIPs

All Cisco Crosswork Situation Manager components have their own HA mechanism that provides failover capabilities , but it is also a best practice to use a load balancer or load balancers. You can use either software or hardware load balancers with the following requirements and recommendations:

· Load balancers must use TCP.

· You must implement health checks using your preferred approach to remove unhealthy servers from the cluster.

· The load balancer should provide load balancing capabilities and a VIP for each server role. For example: one UI VIP per site, one LAM VIP per site.

· Sticky sessions are recommended.

· You can choose your preferred load balancing approach. For example, round robin or least-connection.

B) User interface

The Cisco Crosswork Situation Manager UI comprises the following components:

1. Nginx: The web server that provides static UI content and acts as a proxy for the application server. For HA deployments, install a minimum of two Nginx instances on separate servers and optionally cluster the Nginx instances.

2. Apache Tomcat: The web server that provides servlet and API support. For HA deployments, install a minimum of two Tomcat instances on separate servers and optionally cluster the instances.

The UI components run in active/active configuration, so configure servlet instances to run in separate groups.

Required Ports: 80, 443

C) Database

Cisco Crosswork Situation Manager uses Percona XtraDB as the system database. HA requires a minimum of three server nodes configured in each cluster with latency between them not exceeding 5 ms.

Required Ports: 3306

D) Search and indexing

Cisco Crosswork Situation Manager uses Elasticsearch to store active alert and Situation data to provide search functionality within the product. For HA deployments install a cluster of a minimum of three data servers with one active master server.

Required Ports: 9200, 9300

E) Core data processing

Moogfarmd is the core data processing application that controls all other services in Cisco Crosswork Situation Manager. It manages the clustering algorithms and other applets (Moolets) that run as part of the system. For HA deployments, install a minimum of two Moogfarmd services on separate servers. Moogfarmd can only run as a two-instance group in an active/passive mode.

Required Ports: 5701, 8901 for Hazelcast: the in-memory data grid that provides fault tolerance.

F) Message Bus

Cisco Crosswork Situation Manager uses RabbitMQ as the system Message Bus. It requires a minimum of three servers for HA. RabbitMQ relies on its native clustering functionality and mirrored queues to handle failover; it does not use the Cisco Crosswork Situation Manager load balancing feature.

Required Ports: 5672, 4369, 15672, 25672

G) Data ingestion

Cisco Crosswork Situation Manager uses the following types of Link Access Modules (LAMs) to ingest data:

· Polling LAMs that periodically connect to a data source using an integration API to collect event data.

· Receiving LAMs that provide an endpoint for data sources to post event data.

For HA deployments:

1. Install two instances of each LAM. When both instances are in the same group, they run in active/passive mode.

2. For LAMs deployed over an unreliable link such as a WAN, or across data centers, you should deploy a caching LAM strategy that includes a database and message queue on the LAM Servers.

3. You can load balance receiving LAMs and configure them as active/active to increase capacity.

HA Architecture for LAMs

The Cisco Crosswork Situation Manager HA architecture provides increased resilience against LAM and server restarts by caching ingested data to the disk. It requires installing a local RabbitMQ cluster which is used by LAMs for publishing.

A remote caching LAM, located next to the Core role, connects to the local RabbitMQ cluster, picks the events from the queue and publishes them to the central RabbitMQ cluster for Moogfarmd to process.

If no caching LAM is available to consume the events from the local RabbitMQ cluster, the data is cached to disk until the server runs out of memory.

HA architecture

This architecture is recommended for hybrid installations, where the core processing is located in the cloud and LAMs are on-premise, or for a full on-premise configuration where LAMs are housed remotely to the core components.

Related image, diagram or screenshot

Polling LAMs run in an active / passive mode and must connect to a local database in order to negotiate their state. This requires a local MySQL instance that runs with master / master replication.

Related image, diagram or screenshot

Installation steps

If you are installing the LAMs in a non-SaaS version of Cisco Crosswork Situation Manager, see Install LAMs (non-SaaS).

If you are installing the LAMs in a SaaS version of Cisco Crosswork Situation Manager, see Install LAMs (SaaS).

High Availability for Third Party Component Dependencies

You can configure Cisco Crosswork Situation Manager dependencies such as Percona XTraDB Cluster, Elasticsearch, RabbitMQ, and Grafana to work effectively in highly available deployments.

See High Availability for details on high availability deployments of Cisco Crosswork Situation Manager and deployment scenarios.

Configure Percona XtraDB Cluster for HA

For an example Percona XtraDB Cluster configuration, see Set Up the Database for HA. For further information, refer to the documentation about Percona XtraDB Cluster.

Configure RabbitMQ for HA

You can improve the performance and reliability of your Cisco Crosswork Situation Manager deployment by:

1. Distributing your RabbitMQ brokers on different hosts.

2. Clustering your multiple RabbitMQ brokers.

3. Mirroring your message queues across multiple nodes.

See Set Up the Core Role for HA and Set Up the Redundancy Server Role for an example configuration. For more information see See Message System Deployment. Refer to the RabbitMQ documentation on Clustering and Mirrored Queues for more information.Message System Deployment

Configure Elasticsearch for HA

There are different ways to configure Elasticsearch for distributed installations. See Set Up the Core Role for HA and Set Up the Redundancy Server Role for an example configuration.

Refer to the Elasticsearch documentation on Clustering for more details.

Configure Grafana for HA

To set up Grafana for distributed installations, you should configure each Grafana instance to connect to a Cisco Crosswork Situation Manager UI load balancer such as HA Proxy rather than the Cisco Crosswork Situation Manager UI stack.

Alternatively you can point it at the Apache Tomcat server or Nginx server. Refer to the Grafana documention on Setting Up Grafana for High Availability.

HA Control Utility Command Reference

The Cisco Crosswork Situation Manager HA Control Utility ha_cntl is a command line utility to:

1. Control instance, process group, or cluster failover. For example, to switch from passive to active mode.

2. View the current status of all clusters, process groups, and instances. See High Availability Configuration Hierarchy for more information.

Normally you should configure groups in HA to use automatic failover in production. Use the HA Control utility to check the status of the HA system or to initiate failover in non-production scenarios.

Usage

Argument	Input	Description
-a, --activate	String <cluster[.group[.instance_name]]>	Activate all groups within a cluster, a specific group within a cluster, or a single instance.
-d, --deactivate	String <cluster[.group[.instance_name]]>	Deactivate all groups within a cluster, a specific group within a cluster or a single instance.
-i, --diagnostics	String <arg>	Print additional diagnostics where available to process log file.
-l,--loglevel	String, one of INFO \| WARN \| ALL	Log level controlling the amount of information logged by the utility.
-t,--time_out	String <number of seconds>	Amount of time in seconds to wait for the last answer. Defaults to 2.
-v,--view	-	View the current status of all instances, process groups, and clusters.
-y,--assumeyes	-	Answer "yes" for all prompts.

Example

      $MOOGSOFT_HOME/bin/ha_cntl -v

Getting system status
Cluster: [SECONDARY] passive
   Process Group: [UI] Passive (no leader - all can be active)
      Instance: [servlets] Passive
         Component: moogpoller - not running
         Component: moogsvr - not running
         Component: toolrunner - not running
   Process Group: [moog_farmd] Passive (only leader should be active)
      Instance: FARM Passive Leader
          Moolet: AlertBuilder - not running (will run on activation)
          Moolet: AlertRulesEngine - not running (will run on activation)
          Moolet: Cookbook - not running (will run on activation)
          Moolet: Speedbird - not running (will run on activation)
          Moolet: TemplateMatcher - not running
   Process Group: [rest_lam] Passive (no leader - all can be active)
      Instance: REST2 Passive
   Process Group: [socket_lam] Passive (only leader should be active)
      Instance: SOCK2 Passive Leader
Cluster: [PRIMARY] active
   Process Group: [UI] Active (no leader - all can be active)
      Instance: [servlets] Active
         Component: moogpoller - running
         Component: moogsvr - running
         Component: toolrunner - running
   Process Group: [moog_farmd] Active (only leader should be active)
      Instance: FARM Active Leader
         Moolet: AlertBuilder - running
         Moolet: AlertRulesEngine - running
         Moolet: Cookbook - running
         Moolet: Default Cookbook - running
         Moolet: Speedbird - running
         Moolet: TemplateMatcher - not running
   Process Group: [rest_lam] Active (no leader - all can be active)
      Instance: REST1 Active
   Process Group: [socket_lam] Active (only leader should be active)
      Instance: SOCK1 Active Leader

HA Installation

This topic summarises the different types of high availability (HA) installation available. There are three type of HA installation:

1. Basic.

2. Minimally distributed.

3. Fully distributed.

Before you begin

Before you start an HA installation:

1. Read the High Availability Overview section to familiarise yourself with HA concepts.

2. Complete the appropriate Prepare to Install Cisco Crosswork Situation Manager.

Basic HA installation

This installation configuration has three servers on a single machine; two for the primary and secondary clusters, and a redundancy server.

You can do a basic HA installation using either RPM or tarball.

See Basic HA (RPM) Install and Basic HA Installation - Tarball for more information.

Minimally distributed HA installation

For a minimally distributed HA installation, follow the fully distributed installation steps.

The instructions list the steps for a specific role installation. If you need to collocate multiple roles on the same server according to a minimally distributed installation of your choice, you may need to run multiple sets of instructions on the same server for the corresponding collocated roles. There might be an overlap in terms of steps and if this is the case you only need to perform those steps once. For instance, if you collocate Core 1 and UI 1 roles, you only need to configure HA Proxy once.

Fully distributed HA installation

This installation splits the different roles across different servers or virtual machines. To perform a fully distributed HA install:

· Set up Percona XtraDB Cluster. See Set Up the Database for HA for more information.

· Set up Core 1 and 2 roles. See Set Up the Core Role for HA for more information.

· Set up HAProxy on the Core, UI and LAM nodes. See Set Up HA Proxy for the Database Role for more information.

· Set up UI 1 and 2 roles. See Set Up the User Interface Role for HA for more information.

· Set up the Redundancy server role. See Set Up the Redundancy Server Role for more information.

· Set up the LAM 1 and 2 roles for an on-premise version of Cisco Crosswork Situation Manager. See Install LAMs (on-premise) for more information.

Set up the LAM 1 and 2 roles for a SaaS version of Cisco Crosswork Situation Manager. See Install LAMs (SaaS) for more information.

Install Cisco Add-Ons

Once you have finished upgrading or installing Cisco Crosswork Situation Manager, you should install the Cisco Crosswork Situation Manager add-ons to ensure you have the latest version.

See Install Cisco Add-ons for more information on how to install the Cisco Crosswork Situation Manager add-ons.

Basic HA Installation

The basic HA installation configuration has three servers on a single machine; two for the primary and secondary clusters, and a redundancy server.

You can do a basic HA install using either RPM or tarball.

Before you start, you must complete the appropriate Prepare to Install Cisco Crosswork Situation Manager process.

See Basic HA Installation - Tarball and Basic HA (RPM) Install for instructions on how to complete the basic HA install.

Basic HA Installation - RPM

This topic describes the basic High Availability (HA) installation for Cisco Crosswork Situation Manager using RPM. This installation configuration has three servers; two for the primary and secondary clusters, and a redundancy server. A three-server installation is good for user acceptance testing (UAT) or pre-production. For production installations Ciscorecommends five, seven, or nine servers.

Related image, diagram or screenshot

This topic describes how to perform the following tasks for the core Cisco Crosswork Situation Manager components:

1. Install the Cisco Crosswork Situation Manager packages and set the environment variables.

2. Set up the Percona XtraDB database and HA Proxy.

3. Configure the RabbitMQ message broker and Elasticsearch search service.

4. Configure high availability for the Cisco Crosswork Situation Manager core processing components.

5. Initialize the user interface (UI).

6. Configure high availability for data ingestion.

Before you begin

Before you start to configure your highly available deployment of Cisco Crosswork Situation Manager:

1. Familiarize yourself with the single-server deployment process: Install Cisco Crosswork Situation Manager and Upgrade Cisco Crosswork Situation Manager.

2. Read the High Availability Overview and review the HA Reference Architecture.

3. Verify that the hosts can access the required ports on the other hosts in the group. See HA Reference Architecture for more information.

4. Verify that you have root access to all three servers. You must perform this installation as the root user.

5. Complete either the Online RPM pre-installation or Cisco Crosswork Situation Manager - Offline RPM pre-installation instructions.

Prepare to install Cisco Crosswork Situation Manager

Before you install the Cisco Crosswork Situation Manager packages, perform the pre-installation tasks on all three servers.

Install Cisco Crosswork Situation Manager packages

Install the Cisco Crosswork Situation Manager packages on all three servers. Make sure you install the version you want by changing the VERSION number (8.0.0.1 in the following example):

Primary, Secondary and Redundancy servers:

VERSION=8.0.0.1; yum -y install moogsoft-server-${VERSION} \
moogsoft-db-${VERSION} \
moogsoft-ccsm-${VERSION} \
moogsoft-utils-${VERSION} \
moogsoft-search-${VERSION} \
moogsoft-ui-${VERSION} \
moogsoft-common-${VERSION} \
moogsoft-mooms-${VERSION} \
moogsoft-integrations-${VERSION} \
moogsoft-integrations-ui-${VERSION}

Edit the ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the .bashrc file:

source ~/.bashrc

Initialize the database

Install the Percona nodes and initialize the database on the primary server. Substitute the IP addresses of your servers and choose the password for the sstuser. Press <Enter> at the password prompt during initialization.

Primary server:

bash install_percona_nodes.sh -p -i <PRIMARY_IP>,<SECONDARY_IP>,<REDUNDANCY_IP> -u sstuser -w <SSTPASSWORD>

moog_init_db.sh -qIu root

Install the Percona nodes on the secondary and redundancy servers. Substitute the IP addresses of your servers and use the same sstuser password as the primary server. Do not initialize the database on these servers.

Secondary and Redundancy servers:

bash install_percona_nodes.sh -i <PRIMARY_IP>,<SECONDARY_IP>,<REDUNDANCY_IP> -u sstuser -w <SSTPASSWORD>

To verify that the Percona initialization was successful, run the following command on all three servers. Substitute the IP address of your primary server:

curl http://<PRIMARY_IP>:9198

If successful, you see the following message:

Percona XtraDB Cluster Node is synced

Set up HA Proxy

Install HA Proxy on the primary and secondary servers. Substitute the IP addresses of your servers.

Primary and Secondary servers:

$MOOGSOFT_HOME/bin/utils/haproxy_installer.sh -l 3309 -c -i <PRIMARY_IP>:3306,<SECONDARY_IP>:3306,<REDUNDANCY_IP>:3306

Run the following script to confirm successful installation:

$MOOGSOFT_HOME/bin/utils/check_haproxy_connections.sh

If successful, you see a script output similar to the following example:

      HAProxy Connection Counts
Frontend:
    0.0.0.0:3309 : 27
Backend:
    mysql_node_1 172.31.82.211:3306 : 27
    mysql_node_2 172.31.82.133:3306 : 0
    mysql_node_3 172.31.85.42:3306 : 0

Set up RabbitMQ

Initialize and configure RabbitMQ on all three servers.

Primary, Secondary and Redundancy servers:

Substitute a name for your zone.

moog_init_mooms.sh -pz <MY_ZONE>

The primary erlang cookie is located at /var/lib/rabbitmq/.erlang.cookie. The erlang cookie must be the same for all RabbitMQ nodes. Replace the erlang cookie on the secondary and redundancy servers with the erlang cookie from the primary server. Make the cookies on the secondary and redundancy servers read-only:

chmod 400 /var/lib/rabbitmq/.erlang.cookie

You may need to change the file permissions on the secondary and redundancy erlang cookies first to allow those files to be overwritten. For example:

chmod 406 /var/lib/rabbitmq/.erlang.cookie

Restart RabbitMQ on the secondary and redundancy servers and join the cluster. Substitute the short hostname of your primary server and the name of your zone.

The short hostname is the full hostname excluding the DNS domain name. For example, if the hostname is ip-172-31-82-78.ec2.internal, the short hostname is ip-172-31-82-78. To find out the short hostname, run rabbitmqctl cluster_status on the primary server.

Secondary and Redundancy servers:

systemctl restart rabbitmq-server
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@<PRIMARY_SHORT_HOSTNAME>
rabbitmqctl start_app
rabbitmqctl set_policy -p <MY_ZONE> ha-all ".+\.HA" '{"ha-mode":"all"}'

Run rabbitmqctl cluster_status to get the cluster status. Example output is as follows:

Cluster status of node rabbit@ip-172-31-93-201 ...
[{nodes,[{disc,['rabbit@ip-172-31-82-211','rabbit@ip-172-31-85-42','rabbit@ip-172-31-93-201']}]},
{running_nodes,['rabbit@ip-172-31-85-42','rabbit@ip-172-31-82-211','rabbit@ip-172-31-93-201']},
{cluster_name,<<"rabbit@ip-172-31-93-201.ec2.internal">>},
{partitions,[]},
{alarms,[{'rabbit@ip-172-31-85-42',[]},{'rabbit@ip-172-31-82-211',[]},{'rabbit@ip-172-31-93-201',[]}]}]

Set up Elasticsearch

Initialize and configure Elasticsearch on all three servers.

Primary, Secondary and Redundancy servers:

moog_init_search.sh

Uncomment and edit the properties in the Elasticsearch YAML file /etc/elasticsearch/elasticsearch.yml on all three servers as follows:

cluster.name: aiops
node.name: <SERVER_HOSTNAME>
...
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: ["<PRIMARY_HOSTNAME>","<SECONDARY_HOSTNAME>","<REDUNDANCY_HOSTNAME>"]
discovery.zen.minimum_master_nodes: 1
gateway.recover_after_nodes: 1
node.master: true

Restart Elasticsearch:

systemctl restart elasticsearch

Get the health status of the cluster.

Primary server:

curl -X GET "localhost:9200/_cat/health?v&pretty"

Example cluster health status:

epoch timestamp cluster status
node.total node.data shards pri relo init
unassign pending_tasks
max_task_wait_time
active_shards_percent
1580490422 17:07:02 aiops green 3 3 0 0 0 0 0 0 - 100.0%

The minimum and maximum JVM heap sizes must be large enough to ensure that Elasticsearch starts.

See Finalize and Validate the Install for more information.

Elasticsearch Encryption

You can enable password authentication on Elasticsearch by editing the $MOOGSOFT_HOME/config/system.conf configuration file. You can use either an unencrypted password or an encrypted password, but you cannot use both.

You should use an encrypted password in the configuration file if you do not want users with configuration access to be able to access integrated systems.

Enable password authentication

To enable unencrypted password authentication on Elasticsearch, set the following properties in the system.conf file:

      "search":
    {
    ...
    “username” : <username>,
    “password” : <password>,
    ...
    }

To enable encrypted password authentication on Elasticsearch, set the following properties in the system.conf file:

      "search":
    {
    ...
    “username” : <username>,
    “encrypted_password” : <encrypted password>
    ...
    }

Initialize Elasticsearch

To initialize Elasticsearch with password authentication, run:

moog_init_search.sh -a username:password

or:

moog_init_search.sh --auth username:password

If you run moog_init_search without the -a/--auth parameters, you will not enable password authentication in Elasticsearch.

See Moog Encryptor for more information on how to encrypt passwords stored in the system.conf file.

You can also manually add authentication to the Elasticsearch configuration. You should do this if you have your own local Elasticsearch installation. See the Elasticsearch documentation on configuring security for more information.

Configure Cisco Crosswork Situation Manager

Configure Cisco Crosswork Situation Manager by editing the Moogfarmd and system configuration files.

Primary and Secondary servers:

Edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names.

      "mooms" :
   {
...
"zone" : "<MY_ZONE>",

"brokers" : [
    {"host" : "<PRIMARY_HOSTNAME>", "port" : 5672},
    {"host" : "<SECONDARY_HOSTNAME>", "port" : 5672},
    {"host" : "<REDUNDANCY_HOSTNAME>", "port" : 5672}
],
...
"cache_on_failure" : true,
...
"search" :
    {
...
    "nodes" : [
        {"host" : "<PRIMARY_HOSTNAME>", "port" : 9200},
        {"host" : "<SECONDARY_HOSTNAME>", "port" : 9200},
        {"host" : "<REDUNDANCY_HOSTNAME>", "port" : 9200}
    ]
...
"failover" :
    {
    "persist_state" : true,
    "hazelcast" :
        {
        "hosts" : ["<PRIMARY_HOSTNAME>","<SECONDARY_HOSTNAME>"],
        "cluster_per_group" : true
        }
    "automatic_failover" : true,
    }
...
"ha":
    { "cluster": "<CLUSTER_NAME, PRIMARY or SECONDARY>" }

Uncomment and edit the following properties in $MOOGSOFT_HOME/config/moog_farmd.conf. Note the importance of the initial comma. Delete the cluster line in this section of the file.

Primary server

      ,
ha:
{
    group: "moog_farmd",
    instance: "moog_farmd",
    default_leader: true,
    start_as_passive: false
}

Secondary server

      ,
ha:
{
    group: "moog_farmd",
    instance: "moog_farmd",
    default_leader: false,
    start_as_passive: false
}

Start Moogfarmd on the primary and secondary servers:

systemctl start moogfarmd

After starting Moogfarmd on the primary and secondary servers, run the HA Control command line utility ha_cntl -v to check the status of Moogfarmd. Example output is as follows:

      Moogsoft AIOps Version 8.0.0.1
(C) Copyright 2012-2020 Moogsoft, Inc.
All rights reserved.
Executing: ha_cntl
Getting system status
Cluster: [PRIMARY] active
    Process Group: [moog_farmd] Active (only leader should be active)
        Instance: [primary] Active Leader
            Component: Alert Workflows - running
            Component: AlertBuilder - running
            Component: AlertMgr - not running
            Component: AlertRulesEngine - not running
            Component: Default Cookbook - running
            Component: Enricher - not running
            Component: Enrichment Workflows - running
            Component: Event Workflows - running
            Component: Feedback - not running
            Component: Housekeeper - running
            Component: Indexer - running
            Component: MaintenanceWindowManager - running
            Component: Notifier - not running
            Component: Scheduler - not running
            Component: Situation Workflows - running
            Component: SituationMgr - running
            Component: SituationRootCause - running
            Component: TeamsMgr - running
Cluster: [SECONDARY] partially active
    Process Group: [moog_farmd] Passive (only leader should be active)
        Instance: [secondary] Passive Leader
            Component: Alert Workflows - not running (will run on activation)
            Component: AlertBuilder - not running (will run on activation)
            Component: AlertMgr - not running
            Component: AlertRulesEngine - not running
            Component: Enricher - not running
            Component: Enrichment Workflows - not running (will run on activation)
            Component: Event Workflows - not running (will run on activation)
            Component: Feedback - not running
            Component: Housekeeper - not running (will run on activation)
            Component: Indexer - not running (will run on activation)
            Component: MaintenanceWindowManager - not running (will run on activation)
            Component: Notifier - not running
            Component: Scheduler - not running
            Component: Situation Workflows - not running (will run on activation)
            Component: SituationMgr - not running (will run on activation)
            Component: SituationRootCause - not running (will run on activation)
            Component: TeamsMgr - not running (will run on activation)

For more information, see the HA Control Utility Command Reference.

Initialize the User Interface

Run the initialization script moog_init_ui.sh on the primary server. Substitute the name of your RabbitMQ zone and primary hostname.

When asked if you want to change the configuration hostname, say yes and enter the public URL for the server.

Primary server:

moog_init_ui.sh -twfz <MY_ZONE> -c <PRIMARY_HOSTNAME>:15672 -m <PRIMARY_HOSTNAME>:5672 -s <PRIMARY_HOSTNAME>:9200 -d <PRIMARY_HOSTNAME>:3309 -n

Edit the servlets settings on the primary server in the file $MOOGSOFT_HOME/config/servlets.conf. Note the importance of the initial comma.

      ,ha :
{
    cluster: "primary",
    instance: "servlets",
    group: "servlets_primary",
    start_as_passive: false
}

Start Apache Tomcat on the primary server:

systemctl start apache-tomcat

Restart Moogfarmd:

systemctl restart moogfarmd

Run the initialization script moog_init_ui.sh on the secondary server. Substitute the name of your RabbitMQ zone.

When asked if you want to change the configuration hostname, say yes and enter the public URL for the server.

Secondary server:

moog_init_ui.sh -twfz MY_ZONE -c <SECONDARY_HOSTNAME>:15672 -m <SECONDARY_HOSTNAME>:5672 -s <SECONDARY_HOSTNAME>:9200 -d <SECONDARY_HOSTNAME>:3309 -n

Edit the servlets settings in the secondary server $MOOGSOFT_HOME/config/servlets.conf file. Note the importance of the initial comma.

      ,ha :
{
    cluster: "secondary",
    instance: "servlets",
    group: "servlets_secondary",
    start_as_passive: false
}

Start Apache Tomcat on the secondary server:

systemctl start apache-tomcat

Restart Moogfarmd:

systemctl restart moogfarmd

Run the HA Control command line utility ha_cntl -v to check the status of the UI:

      Moogsoft AIOps Version 8.0.0.1
(C) Copyright 2012-2020 Moogsoft, Inc.
All rights reserved.
Executing: ha_cntl
Getting system status
Cluster: [PRIMARY] active
        ...
    Process Group: [servlets_primary] Active (no leader - all can be active)
        Instance: [servlets] Active
            Component: moogpoller - running
            Component: moogsvr - running
            Component: situation_similarity - running
            Component: toolrunner - running
Cluster: [SECONDARY] partially active
        ...
    Process Group: [servlets_secondary] Active (no leader - all can be active)
        Instance: [servlets] Active
            Component: moogpoller - running
            Component: moogsvr - running
            Component: situation_similarity - running
            Component: toolrunner - running

For more information, see the HA Control Utility Command Reference.

Enable HA for LAMs

There are two types of HA configuration for LAMs; Active/Active and Active/Passive:

1. Receiving LAMs that listen for events are configured as Active/Active. For example, the REST LAM.

2. Polling LAMs are configured as Active/Passive. For example, the SolarWinds LAM.

Every LAM has its own configuration file under $MOOGSOFT_HOME/config/. This example references rest_lam.conf and solarwinds_lam.conf.

Primary and Secondary servers

Edit the HA properties in the primary and secondary servers' LAM configuration files. Cisco Crosswork Situation Manager automatically manages the active and passive role for the LAMs in a single process group:

      # Receiving LAM (Active / Active)
# Configuration on Primary
ha:
{
    group            : "rest_lam_primary",
instance         : "rest_lam",
duplicate_source : false
},
...
# Configuration on Secondary
ha:
{
    group            : "rest_lam_secondary",
instance         : "rest_lam",
duplicate_source : false
},

      # Polling LAM (Active / Passive)
# Configuration on Primary
ha:
{
    group                    : "solarwinds_lam",
instance                 : "solarwinds_lam",
only_leader_active       : true,
    default_leader           : true,
    accept_conn_when_passive : false,
    duplicate_source         : false
},
...
# Configuration on Secondary
ha:
{
    group                    : "solarwinds_lam",
instance                 : "solarwinds_lam",
only_leader_active       : true,
    default_leader           : false,
    accept_conn_when_passive : false,
    duplicate_source         : false
},

Start the LAMs:

systemctl start restlamd
systemctl start solarwindslamd

Run the HA Control command line utility ha_cntl -v to check the status of the LAMS:

      Moogsoft AIOps Version 8.0.0.1
(C) Copyright 2012-2020 Moogsoft, Inc.
All rights reserved.
Executing: ha_cntl
Getting system status
Cluster: [PRIMARY] active
        ...
    Process Group: [rest_lam_primary] Active (no leader - all can be active)
        Instance: [rest_lam] Active
        ...
        Process Group: [solarwinds_lam] Active (only leader should be active)
Instance: [solarwinds_lam] Active Leader

Cluster: [SECONDARY] partially active
        ...
    Process Group: [rest_lam_secondary] Passive (no leader - all can be active)
        Instance: [rest_lam] Active
        ...
        Process Group: [solarwinds_lam] Passive (only leader should be active)
Instance: [solarwinds_lam] Passive

For more information, see the HA Control Utility Command Reference.

Basic HA Installation - Tarball

This topic describes the basic High Availability (HA) installation for Cisco Crosswork Situation Manager using tarball. This installation configuration has three servers; two for the primary and secondary clusters, and a redundancy server.

Related image, diagram or screenshot

This topic describes how to perform the following tasks for the core Cisco Crosswork Situation Manager components:

· Stage the installation files on all servers.

· Install the Cisco Crosswork Situation Manager packages and set the environment variables.

· Set up the Percona XtraDB database and HA Proxy.

· Configure the RabbitMQ message broker and Elasticsearch search service.

· Configure high availability for the Cisco Crosswork Situation Manager core processing components.

· Initialize the user interface (UI).

· Configure high availability for data ingestion.

Before you begin

Before you start the tarball basic HA installation of Cisco Crosswork Situation Manager:

· Familiarize yourself with the single-server deployment process: Install Cisco Crosswork Situation Manager and Upgrade Cisco Crosswork Situation Manager.

· Read the High Availability Overview and the HA Reference Architecture.

· Verify that the hosts can access the required ports on the other hosts in the group. See HA Reference Architecture for more information.

· Complete either the # or # instructions.

Install files

· Log in as the Linux user for the primary, secondary and redundancy servers. If you have not yet done so, create a working directory in the overall installation directory, and move the pre-install files to this directory.

Refer to the # or # instructions for information on how big the working directory should be.

For the following example, /opt/moogsoft is the installation directory and /opt/moogsoft/v800wd is the working directory.

mkdir /opt/moogsoft/{VERSION} (recommend making the Linux user the owner of /opt/moogsoft)
cd /opt/moogsoft/{VERSION}
mv /tmp/preinstall_files{VERSION}.tgz /opt/moogsoft/{VERSION}
tar xzf preinstall_files{VERSION}.tgz

· Install Kernel Asynchronous I/O (AIO) Support for Linux. For example:

mkdir -p ~/install/libraries/
mv ./libaio-0.3.109-13.el7.x86_64.rpm ~/install/libraries
cd ~/install/libraries
rpm2cpio ./libaio-0.3.109-13.el7.x86_64.rpm | cpio -idmv && \
rm -f ./libaio-0.3.109-13.el7.x86_64.rpm && \
rm -f ~/install/libraries/lib64/libaio.so.1 && \
ln -s ~/install/libraries/lib64/libaio.so.1.0.1 ~/install/libraries/lib64/libaio.so.1 && \
echo "export LD_LIBRARY_PATH=`pwd`/lib64:\$LD_LIBRARY_PATH" >> ~/.bashrc && \
source ~/.bashrc
cd -

· Install libgfortran. For example:

      mv ./libquadmath-4.8.5-39.el7.x86_64.rpm ./libgfortran-4.8.5-39.el7.x86_64.rpm ~/install/libraries/
cd ~/install/libraries/
for PACKAGE in libquadmath-4.8.5-39.el7.x86_64.rpm libgfortran-4.8.5-39.el7.x86_64.rpm; do
    rpm2cpio $PACKAGE | cpio -idmv && \
    rm -f $PACKAGE
done
echo "export LD_LIBRARY_PATH=$(pwd)/usr/lib64:\$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc
cd –

· Install Percona dependencies on all servers that will house a database node (requires root permissions):

yum install *.rpm

Prepare to install

Before you install Cisco Crosswork Situation Manager:

· Log in as the Linux user on the primary, secondary and redundancy servers and go to the working directory on each server.

· Make sure that the directories where you will install Cisco Crosswork Situation Manager meet the size requirements stated in the # or # instructions. Some files such as the Percona database and Elasticsearch files will be installed into the Linux user's HOME/install directory.

· Remove existing environment variables such as $MOOGSOFT_HOME from previous installations.

Install database and Cisco Crosswork Situation Manager files on the primary server

· Run the following commands on the primary, secondary and redundancy servers that will house a database node:

cp percona-xtrabackup-2.4.14-Linux-x86_64.libgcrypt153.tar.gz ~/install
cp Percona-XtraDB-Cluster-5.7.26-rel29-31.37.1.Linux.x86_64.ssl102.tar.gz ~/install
cp socat-1.7.3.2-2.el7.x86_64.rpm ~/install

· Run the Percona install script on the primary database server. Substitute the IP addresses of your servers and choose the password for the sstuser.

bash install_percona_nodes_tarball.sh -p -i <primary_ip>,<secondary_ip>,<redundancy_ip> -u sstuser -w <sstpassword>

· To verify that the Percona install was successful, run the following command on the primary server. Substitute the IP address of your primary server:

curl http://<primary_ip>:9198

If successful, you see the following message:

Percona XtraDB Cluster Node is synced.

· Now that you've installed the Percona database, you can install Cisco Crosswork Situation Manager. Go to your working directory on the primary server, and extract the Cisco Crosswork Situation Manager distribution archive:

source ~/.bashrc
cd /opt/moogsoft/v{VERSION}
tar -xf moogsoft-aiops-{VERSION}.tgz

· Run the installation script in your primary server working directory to install Cisco Crosswork Situation Manager:

bash moogsoft-aiops-install-{VERSION}.sh

· When prompted, enter the working directory to install Cisco Crosswork Situation Manager on the primary server. For example, /opt/moogsoft/aiops. The script guides you through the installation process. You can modify the default installation directory displayed for your environment.

Set the $MOOGSOFT_HOME environment variable to point to your primary server installation directory. In this example, /opt/moogsoft/aiops is the $MOOGSOFT_HOME directory.

echo "export MOOGSOFT_HOME=/opt/moogsoft/aiops" >> ~/.bashrc

Add the preceding directories to the following code after PATH= in ~/.bashrc and source the file:

.:/opt/moogsoft/aiops/bin:/opt/moogsoft/aiops/bin/utils:/opt/moogsoft/aiops/cots/erlang/bin:/opt/moogsoft/aiops/cots/rabbitmq-server/sbin:
source ~/.bashrc

· Configure the Tool Runner to execute locally:

sed -i 's/# execute_locally: false,/,execute_locally: true/1' $MOOGSOFT_HOME/config/servlets.conf

Initialize Cisco Crosswork Situation Manager on the primary server

Initialize the database on the primary server:

moog_init_db.sh -qIu root

When prompted for a password, enter the password for the root database user instead of the Linux user. If you are installing Percona on this machine for the first time, leave the password blank and press Enter to continue. The script prompts you to accept the End User License Agreement (EULA) and guides you through the initialization process.

Install Cisco Crosswork Situation Manager files on the secondary and redundancy servers

· Run the Percona install script on the secondary and redundancy servers. Substitute the IP addresses of your servers and use the same password as for the primary server.

bash install_percona_nodes_tarball.sh -i <primary IP address>,<secondary IP address>,<redundancy IP address> -u sstuser -w <sstpassword>

· To verify that the Percona install was successful, run the following command on the secondary and redundancy servers. Substitute the IP address of your primary server:

curl http://<primary IP address>:9198

If successful, you see the following message:

Percona XtraDB Cluster Node is synced

It takes a moment for the secondary and redundancy servers to sync. If you do not get this message, wait for a few moments and then try again.

· Now that you've installed the Percona database, you can install Cisco Crosswork Situation Manager.Go to your working directory in the secondary and redundancy servers, and extract the Cisco Crosswork Situation Manager distribution archive:

   source ~/.bashrc
   cd /opt/moogsoft/v{VERSION}
   tar -xf moogsoft-aiops-{VERSION}.tgz

· Run the installation script in your working directory in both servers to install Moogsoft AIOps:

bash moogsoft-aiops-install-{VERSION}.sh

· When prompted, enter the directory to install AIOps. For example, /opt/moogsoft/aiops. The script guides you through the installation process. You can modify the default installation directory displayed for your environment.

· Set the $MOOGSOFT_HOME environment variable to point to your installation directory, and add $MOOGSOFT_HOME/bin/utils to the path. For example:

echo "export MOOGSOFT_HOME=/opt/moogsoft/aiops" >> ~/.bashrc

· Insert the following code after PATH= in ~/.bashrc:

.:/opt/moogsoft/aiops/bin:/opt/moogsoft/aiops/bin/utils:/opt/moogsoft/aiops/cots/erlang/bin:/opt/moogsoft/aiops/cots/rabbitmq-server/sbin:

· Source the ~/.bashrc file:

source ~/.bashrc

· Configure the Tool Runner to execute locally:

sed -i 's/# execute_locally: false,/,execute_locally: true/1' $MOOGSOFT_HOME/config/servlets

Install HA Proxy on primary and secondary servers

Install HA Proxy on the primary and secondary servers (root permission required).

1. Run the following command, using your chosen value for MOOGSOFT_HOME. Substitute the IP addresses of your servers.

export MOOGSOFT_HOME=/opt/moogsoft/aiops
$MOOGSOFT_HOME/bin/utils/haproxy_installer.sh -l 3309 -c -i <primary_ip>:3306,<secondary_ip>:3306,<redundancy_ip>:3306

2. Restart Apache Tomcat and Moogfarmd to start using HA Proxy:

process_cntl --process_name apache-tomcat restart

process_cntl --process_name moog_farmd restart

3. Run the following script to confirm successful installation:

$MOOGSOFT_HOME/bin/utils/check_haproxy_connections.sh

If successful, you see a script output similar to the following example:

   HAProxy Connection Counts
   Frontend:
    0.0.0.0:3309 : 27
   Backend:
    mysql_node_1 172.31.82.211:3306 : 27
    mysql_node_2 172.31.82.133:3306 : 0
    mysql_node_3 172.31.85.42:3306 : 0

Set up RabbitMQ on all servers

Initialize and configure RabbitMQ on all three servers.

Primary, Secondary and Redundancy servers:

1. Run the following command. Substitute a name for your zone. You must use the same zone name for all servers.

moog_init_mooms.sh -pz <my_zone>

2. Stop RabbitMQ on the secondary and redundancy servers:

process_cntl --process_name rabbitmq stop

3. The primary server erlang cookie is located in the Linux user's HOME directory: ~/.erlang.cookie. The erlang cookie must be the same for all RabbitMQ nodes.

Replace the erlang cookie on the secondary and redundancy servers with the erlang cookie from the primary server. You may need to change the file permissions on the secondary and redundancy erlang cookies first to allow those files to be overwritten.

After copying the cookie, as the Linux user, make the cookies on the secondary and redundancy servers read-only. For example:

cd ~
chmod 406 ~/.erlang.cookie
mv .erlang.cookie .erlang.cookie.orig
scp moogsoft@<PRIMARY_IP>:.erlang.cookie .
chmod 400 .erlang.cookie

Secondary and Redundancy servers:

· Restart RabbitMQ on the secondary and redundancy servers and join those servers to the cluster. Substitute the short hostname of your primary server and the name of your zone.

The restart command differs depending on whether you are a root or a non-root user:

1. Root: systemctl restart rabbitmq-server

2. Non-root: process_cntl --process_name rabbitmq restart

<restart command>
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@<primary_short_hostname>
rabbitmqctl start_app
rabbitmqctl set_policy -p <my_zone> ha-all ".+\.HA" '{"ha-mode":"all"}'

1. Run rabbitmqctl cluster_status to get the cluster status. Example output is as follows:

Cluster status of node rabbit@secondary ...
[{nodes,[{disc,[rabbit@primary,rabbit@secondary]}]},
{running_nodes,[rabbit@primary,rabbit@secondary]},
{cluster_name,<<"rabbit@secondary">>},
{partitions,[]},
{alarms,[{rabbit@primary,[]},{rabbit@secondary,[]}]}]

Set up Elasticsearch on all servers

You must start Elasticsearch with specific memory parameters to support HA. The process_cntl utility starts Elasticsearch.

· Run the following on the primary, secondary and redundancy servers to adjust the process_cntl utility:

cp –p $MOOGSOFT_HOME/bin/utils/process_cntl $MOOGSOFT_HOME/bin/utils/process_cntl.orig
sed -i 's/-Xms256m/-Xms2g/' $MOOGSOFT_HOME/bin/utils/process_cntl

· Initialize and configure Elasticsearch on all three servers:

moog_init_search.sh

· Uncomment and edit the properties in $MOOGSOFT_HOME/cots/elasticsearch/config/elasticsearch.ymlon all three servers as follows

cluster.name: aiops
node.name: <server_hostname>
...
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: ["<PRIMARY_HOSTNAME>","<SECONDARY_HOSTNAME>","<REDUNDANCY_HOSTNAME>"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 1
node.master: true

· Restart Elasticsearch on all three servers:

process_cntl --process_name elasticsearch restart

· Get the health status of the cluster by running the following on the primary server:

curl -X GET "localhost:9200/_cat/health?v&pretty"

Example cluster health status:

epoch timestamp cluster status
node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1580490422 17:07:02 aiops green 3 3 0 0 0 0 0 0 - 100.0%

If you've configured all three servers, the node.total will be 3.

Configure Cisco Crosswork Situation Manager

Configure Cisco Crosswork Situation Manager by editing the Moogfarmd and system configuration files on the primary and secondary servers.

Primary and Secondary servers:

1. Make a copy of the $MOOGSOFT_HOME/config/system.conf file:

cp -p $MOOGSOFT_HOME/config/system.conf $MOOGSOFT_HOME/config/system.conf.orig

2. Edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names:

      "mooms" :
   {
...
"zone" : "<my_zone>",

"brokers" : [
    {"host" : "<primary_hostname>", "port" : 5672},
    {"host" : "<secondary_hostname>", "port" : 5672},
    {"host" : "<redundancy_hostname>", "port" : 5672}
],
...
"cache_on_failure" : true,
...
"search" :
    {
...
"nodes" : [
      {"host" : "<primary_hostname>", "port" : 9200},
      {"host" : "<secondary_hostname>", "port" : 9200},
      {"host" : "<redundancy_hostname>", "port" : 9200}
]
...
"failover" :
    {
    "persist_state" : true,
    "hazelcast" :
        {
        "hosts" : ["<primary_hostname>","<secondary_hostname>"],
        "cluster_per_group" : true
        }
    "automatic_failover" : true,
    }
...
"ha":
    { "cluster": "<cluster_name, primary or secondary>" }

3. Make a copy of the $MOOGSOFT_HOME/config/moog_farmd.conf file:

cp -p $MOOGSOFT_HOME/config/moog_farmd.conf $MOOGSOFT_HOME/config/moog_farmd.conf.orig

4. Uncomment and edit the following properties in $MOOGSOFT_HOME/config/moog_farmd.conf. Note the importance of the initial comma. Delete the cluster line in this section of the file:

Primary server

      ,
ha:
{
    group: "moog_farmd",
    instance: "moog_farmd",
    default_leader: true,
    start_as_passive: false
}

Secondary server

      ,
ha:
{
    group: "moog_farmd",
    instance: "moog_farmd",
    default_leader: true,
    start_as_passive: false
}

5. Start Moogfarmd on the primary and secondary servers:

process_cntl --process_name moog_farmd start

Initialize the user interface

Run the initialization script moog_init_ui.sh on the primary server. Substitute the name of your RabbitMQ zone and primary hostname.

When asked if you want to change the configuration hostname, say yes and enter the public hostname/domain name used in the URL to access AIOps via browser. If the URL uses a different hostname/domain name (e.g., an alias) the system will reject your login.

Primary server:

moog_init_ui.sh -twfz <MY_ZONE> -c <primary_hostname>:15672 -m <primary_hostname>:5672 -s <primary_hostname>:9200 -d <primary_hostname>:3309 -n

Edit the servlets settings on the primary server in the file $MOOGSOFT_HOME/config/servlets.conf. Note the importance of the initial comma.

      ,ha :
{
    instance: "servlets",
    group: "servlets_primary",
    start_as_passive: false
}

Restart Apache Tomcat and Moogfarmd on the primary server:

process_cntl --process_name apache-tomcat restart
process_cntl --process_name moog_farmd restart

Run the initialization script moog_init_ui.sh on the secondary server. Substitute the name of your RabbitMQ zone.

When asked if you want to change the configuration hostname, say yes and enter the public hostname/domain name used in the URL to access AIOps via browser.

Secondary server:

moog_init_ui.sh -twfz MY_ZONE -c <SECONDARY_HOSTNAME>:15672 -m <SECONDARY_HOSTNAME>:5672 -s <SECONDARY_HOSTNAME>:9200 -d <SECONDARY_HOSTNAME>:3309 -n

Edit the servlets settings in the secondary server $MOOGSOFT_HOME/config/servlets.conf file. Note the importance of the initial comma.

      ,ha :
{
    instance: "servlets",
    group: "servlets_secondary",
    start_as_passive: false
}

Restart Apache Tomcat and Moogfarmd on the primary server secondary server:

process_cntl --process_name apache-tomcat restart
process_cntl --process_name moog_farmd restart

Enable HA for LAMs

There are two types of HA configuration for LAMs; Active/Active and Active/Passive:

· Receiving LAMs that listen for events are configured as Active/Active. For example, the REST LAM.

· Polling LAMs are configured as Active/Passive. For example, the SolarWinds LAM.

Every LAM has its own configuration file under $MOOGSOFT_HOME/config/. This example references rest_lam.conf and solarwinds_lam.conf.

Primary and Secondary servers

Start the LAMs:

process_cntl --process_name restlamd restart
process_cntl --process_name solarwindslamd restlamd restart

Run the HA Control command line utility ha_cntl -v to check the status of the LAMS:

      Moogsoft AIOps Version{VERSION}
(C) Copyright 2012-2020 Moogsoft, Inc.
All rights reserved.
Executing: ha_cntl
Getting system status
Cluster: [PRIMARY] active
        ...
    Process Group: [rest_lam_primary] Active (no leader - all can be active)
        Instance: [rest_lam] Active
        ...
        Process Group: [solarwinds_lam] Active (only leader should be active)
Instance: [solarwinds_lam] Active Leader

Cluster: [SECONDARY] partially active
        ...
    Process Group: [rest_lam_secondary] Passive (no leader - all can be active)
        Instance: [rest_lam] Active
        ...
        Process Group: [solarwinds_lam] Passive (only leader should be active)
Instance: [solarwinds_lam] Passive

For more information, see the HA Control Utility Command Reference.

Validate the installation

To verify that the installation has completed successfully, follow the steps outlined in Validate the Installation.

Fully Distributed HA Installation

This topic summarises the installation steps for a fully distributed system running with HA. This installation is shown in the following diagram.

Related image, diagram or screenshot

The installation assumes a HA configuration across 2 clusters called Cluster1 and Cluster2.

Note that both Core instances and polling LAMs are part of the same respective Cisco Crosswork Situation Manager process group since they run in an active / passive configuration with auto-failover enabled.

UI stacks, as well as receiving LAMs, should run as part of two distinct Cisco Crosswork Situation Manager process groups as both instances in the HA pair are active.

Percona XtraDB Cluster is the database product provided with Cisco Crosswork Situation Manager. HAProxy supports features such as query routing to available database targets and load balancing.

If you use the MySQL database, Cisco strongly recommends you migrate from MySQL to Percona XtraDB Cluster and HAProxy. See Post-upgrade steps for more information.

To perform a fully distributed HA installation:

· Set up Percona XtraDB Cluster. See Set Up the Database for HA for more information.

· Set up Core 1 and 2 roles. See Set Up the Core Role for HA for more information.

· Set up HAProxy on the Core, UI and LAM nodes. See Set Up HA Proxy for the Database Role for more information.

· Set up UI 1 and 2 roles. See Set Up the User Interface Role for HA for more information.

· Set up the Redundancy server role. See Set Up the Redundancy Server Role for more information.

· Set up the LAM 1 and 2 roles for an on-premise version of Cisco Crosswork Situation Manager. See Install LAMs (non-SaaS) for more information.

Set up LAM 1 and 2 roles for a SaaS version of Cisco Crosswork Situation Manager. See Install LAMs (SaaS) for more information.

To view a list of connectivity ports for a fully distributed HA architecture see Distributed HA system Firewall.

Distributed HA system Firewall

Connectivity within a fully distributed HA architecture:

Source	Destination	Ports	Bi-directional
UI 1, UI 2	Core 1, Core 2	3309, 5672, 9200	-
UI 1, UI 2	RedServ	5672, 9200	-
UI 1, UI 2	DB 1, DB 2, DB 3	3306, 3309, 9198	-
Core 1	Core 2	5701, 9300, 4369, 5672	Yes
Core 1, Core 2	RedServ	9300, 4369, 5672	Yes
Core 1	Core 2, RedServ	25672
Core 1	Core 1, RedServ	25672
RedServ	Core 1, Core 2	25672
Core 1, Core 2	DB 1, DB 2, DB 3	3306, 9198	-
LAM 1, LAM 2	Core 1, Core 2, RedServ	5672	-
LAM 1, LAM 2	DB 1, DB 2, DB 3	3306, 9198	-
DB 1	DB 2, DB 3	3306, 4567, 4444, 5468	Yes

If any of the default ports are changed then substitute it in the tables above. The ports are responsible for the following:

9200	Used for inbound Elastic Search REST API
9300	Used for Elastic nodes communication within a cluster
5672	Access to mooms bus (RabbitMQ)
15672	Access to mooms (RabbitMQ) console
4369	Required for mooms (RabbitMQ) cluster
5701	Required for Hazelcast cluster
8091	Access the Hazelcast cluster info via Hazelcast's
3309	Used for initializing UI servers
3306	Regular MySQL port
4567	For group communication in Percona XtraDB Cluster
4444	For State Snapshot Transfer in Percona XtraDB Cluster
4568	For Incremental State Transfer in Percona XtraDB Cluster
9198	Allows HAProxy to check the node's Percona XtraDB Cluster status via http
25672	Used for inter-node and CLI tools communication

See HA Installation for the full installation steps for a fully distributed system running with HA.

Set Up the Database for HA

The database layer Cisco Crosswork Situation Manager for HA uses the Percona XtraDB Cluster mechanism.

HA architecture

In our distributed HA installation, the database components are installed on servers DB 1, DB 2, and DB 3:

Related image, diagram or screenshot

Refer to the Distributed HA system Firewall for more information on connectivity within a fully distributed HA architecture.

Build a Percona cluster

The sections below detail how to build a Percona XtraDB cluster.

Install Cisco Crosswork Situation Manager components on DB 1, 2 and 3

On servers DB 1, 2 and 3, install the following Cisco Crosswork Situation Manager components:

VERSION=8.0.0; yum -y install moogsoft-utils-${VERSION} moogsoft-ccsm-${VERSION} moogsoft-db-${VERSION};

Edit the ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the .bashrc file:

source ~/.bashrc

Set up Percona on DB 1

On DB 1, run install_percona_nodes.sh to install, configure and start Percona Cluster node 1. Substitute the IP addresses of your servers and choose the password for the sstuser. Press <Enter> at the password prompt during initialization.

install_percona_nodes.sh -p -i <DB 1 server ip address>,<DB 2 server ip address>,<DB 3 server ip address> -u sstuser -w <sstpassword>

Cisco advises that you provide the IP addresses instead of hostnames for servers running the Percona Cluster in order to reduce network latency. The “sstuser“, in the command above, is the user that will be used by the Percona nodes to communicate with each other. The script performs the following tasks:

· Disables SELinux and sets the vm.swappiness property to 1.

· Installs the Percona Yum repository.

· Installs the Percona compatibility package.

· Installs Percona XtraDB cluster.

· Installs the Extended Internet Service Daemon (xinetd).

· Creates a my.cnf configuration file based on the server's hardware.

· Configures a mysqlchk service on port 9198 and restarts the xinetd service.

· Starts the first Percona node in bootstrap mode.

· Reconfigures my.cnf to ensure the node will restart in non-bootstrap mode.

Initialize the Cisco Crosswork Situation Manager database

On DB 1, run the following commands to create the Cisco Crosswork Situation Manager databases (moogdb, moog_reference, historic_moogdb, moog_intdb), and populate them with the required schema:

$MOOGSOFT_HOME/bin/utils/moog_init_db.sh -qIu root --accept-eula <<-EOF

EOF

Note

You do not need to run this command on any of the other nodes. The new schema is replicated automatically around the cluster.

Set up Percona on DB 2

On DB 2, run install_percona_nodes.sh. Substitute the IP addresses of your servers and use the same sstuser password as DB 1. The script will perform the same actions, only this time starting the second Percona node to join the first node as a cluster.

install_percona_nodes.sh -d -i <DB 1 server IP address>,<DB 2 server IP address>,<DB 3 server IP address> -u sstuser -w <sstpassword>

Set up Percona on DB 3

On DB 3, run install_percona_nodes.sh as you did for DB 1 and DB 2. Substitute the IP addresses of your servers and use the same sstuser password as DB 1. The script will perform the same actions, only this time starting the third Percona node to join the first and second nodes as a cluster.

install_percona_nodes.sh -d -i <DB 1 server IP address>,<DB 2 server IP address>,<DB 3 server IP address> -u sstuser -w passw0rd

Verify Percona cluster status

To verify the replication status of each node, run the following commands from a remote server:

curl http://<DB 1 server IP address/hostname>:9198
curl http://<DB 2 server IP address/hostname>:9198
curl http://<DB 3 server IP address/hostname>:9198

If successful, you see the following message:

Percona XtraDB Cluster Node is synced.

Set Up the Core Role for HA

In Cisco Crosswork Situation Manager HA architecture, Core 1 and Core 2 run in an active / passive HA pair.

HA architecture

In our distributed HA installation, the Core components are installed on Core 1, Core 2, and Redundancy servers:

Related image, diagram or screenshot

Core 1: Core Data Processing 1 (Moogfarmd), Elastic Node 1, RabbitMQ Node 1.

Core 2: Core Data Processing 2 (Moogfarmd), Elastic Node 2, RabbitMQ Node 2.

Redundancy: Elastic Node 3, RabbitMQ Node 3.

Refer to the Distributed HA system Firewall for more information on connectivity within a fully distributed HA architecture.

Install Core 1

1. Install the required Cisco Crosswork Situation Manager packages:

      VERSION=8.0.0.1; yum -y install moogsoft-server-${VERSION} \
    moogsoft-search-${VERSION} \
    moogsoft-common-${VERSION} \
    moogsoft-mooms-${VERSION} \
    moogsoft-integrations-${VERSION} \
    moogsoft-integrations-ui-${VERSION}

· Edit your ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

· Source the ~/.bashrc file:

source ~/.bashrc

· Initialize RabbitMQ Cluster Node 1 on the Core 1 server. Substitute a name for your zone.

moog_init_mooms.sh -pz <zone>

· Initialize, configure and start Elasticsearch Cluster Node 1 on the Core 1 server.

1. Initialize Elasticsearch on Core 1:

moog_init_search.sh

2. Uncomment and edit the properties in the /etc/elasticsearch/elasticsearch.yml file on Core 1 as follows:

cluster.name: aiops
node.name: <Core 1 server hostname>
...
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: [ "<Core 1 server hostname>","<Core 2 server hostname>","<Redundancy server hostname>"]
discovery.zen.minimum_master_nodes: 1
gateway.recover_after_nodes: 1
node.master: true

The minimum and maximum JVM heap sizes must be large enough to ensure that Elasticsearch starts.

See Finalize and Validate the Install for more information.

You can enable password authentication on Elasticsearch. See Elasticsearch Encryption for more information.

· On Core 1, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names.

      "mooms" :
   {
...
"zone" : "<zone>",

"brokers" : [
    {"host" : "<Core 1 server hostname>", "port" : 5672},
    {"host" : "<Core 2 server hostname>", "port" : 5672},
    {"host" : "<Redundancy server hostname>", "port" : 5672}
],
...
"cache_on_failure" : true,
...
"search" :
    {
...
"nodes" : [
    {"host" : "<Core 1 server hostname>", "port" : 9200},
    {"host" : "<Core 2 server hostname>", "port" : 9200},
    {"host" : "<Redundancy server hostname>", "port" : 9200}
]
...
"failover" :
    {
    "persist_state" : true,
    "hazelcast" :
        {
        "hosts" : ["<Core 1 server hostname>","<Core 2 server hostname>"],
        "cluster_per_group" : true
        }
    "automatic_failover" : true,
    }
...
"ha":
    { "cluster": "PRIMARY" }

Restart Elasticsearch:

systemctl restart elasticsearch

· Uncomment and edit the following properties in $MOOGSOFT_HOME/config/moog_farmd.conf. Note the importance of the initial comma. Delete the cluster line in this section of the file.

      ,
ha:
{
    group: "moog_farmd",
    instance: "moog_farmd",
    default_leader: true,
    start_as_passive: false
}

Start moog_farmd.conf:

systemctl start moogfarmd

· Install, configure and start HA Proxy on the Core 1 server to connect to Percona XtraDB Cluster.

Install Core 2

· Install Cisco Crosswork Situation Manager components on the Core 2 server.

On Core 2 install the following Cisco Crosswork Situation Manager components:

VERSION=8.0.0.1; yum -y install moogsoft-server-${VERSION} \
moogsoft-search-${VERSION} \
moogsoft-common-${VERSION} \
moogsoft-mooms-${VERSION} \
moogsoft-integrations-${VERSION}

· Edit your ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

· Source the ~/.bashrc file:

source ~/.bashrc

· On Core 2 initialize RabbitMQ. Use the same zone name as Core 1:

moog_init_mooms.sh -pz <zone>

· Initialize, configure and start Elasticsearch Cluster Node 2 on the Core 2 server.

— Initialize Elasticsearch on Core 2:

moog_init_search.sh

— Uncomment and edit the properties of the /etc/elasticsearch/elasticsearch.yml file on Core 2 as follows:

cluster.name: aiops
node.name: <Core 2 server hostname>
...
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: [ "<Core 1 server hostname>","<Core 2 server hostname>","<Redundancy server hostname>"]
discovery.zen.minimum_master_nodes: 1
gateway.recover_after_nodes: 1
node.master: true

The minimum and maximum JVM heap sizes must be large enough to ensure that Elasticsearch starts.

See Finalize and Validate the Install for more information.

You can enable password authentication on Elasticsearch. See Elasticsearch Encryption for more information.

1. On Core 2, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names.

Restart Elasticsearch:

systemctl restart elasticsearch

2. Uncomment and edit the following properties in $MOOGSOFT_HOME/config/moog_farmd.conf. Note the importance of the initial comma. Delete the cluster line in this section of the file.

      ,
ha:
{
    group: "moog_farmd",
    instance: "moog_farmd",
    default_leader: false,
    start_as_passive: false
}

Start moog_farmd.conf:

systemctl start moogfarmd

3. The erlang cookies must be the same for all RabbitMQ nodes. Replace the erlang cookie on Core 2 with the Core 1 erlang cookie located at /var/lib/rabbitmq/.erlang.cookie. Make the Core 2 cookie read-only:

chmod 400 /var/lib/rabbitmq/.erlang.cookie

You may need to change the file permissions on the Core 2 erlang cookie first to allow this file to be overwritten. For example:

chmod 406 /var/lib/rabbitmq/.erlang.cookie

4. Restart the rabbitmq-server service and join the cluster. Substitute the Core 1 short hostname and zone:

systemctl restart rabbitmq-server
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@<Core 1 server short hostname>
rabbitmqctl start_app

5. Apply HA mirrored queues policy. Use the same zone name as Core 1.

rabbitmqctl set_policy -p <zone> ha-all ".+\.HA" '{"ha-mode":"all"}'

6. Run rabbitmqctl cluster_status to verify the cluster status and queue policy. Example output is as follows:

7. Install, configure and start HA Proxy on the Core 2 server to connect to Percona XtraDB Cluster

Elasticsearch Encryption

You should use an encrypted password in the configuration file if you do not want users with configuration access to be able to access integrated systems.

Enable password authentication

To enable unencrypted password authentication on Elasticsearch, set the following properties in the system.conf file:

      "search":
    {
    ...
    “username” : <username>,
    “password” : <password>,
    ...
    }

To enable encrypted password authentication on Elasticsearch, set the following properties in the system.conf file:

      "search":
    {
    ...
    “username” : <username>,
    “encrypted_password” : <encrypted password>
    ...
    }

Initialize Elasticsearch

To initialize Elasticsearch with password authentication, run:

moog_init_search.sh -a username:password

or:

moog_init_search.sh --auth username:password

If you run moog_init_search without the -a/--auth parameters, you will not enable password authentication in Elasticsearch.

See Moog Encryptor for more information on how to encrypt passwords stored in the system.conf file.

Validate that failover is working

On Core 1, confirm the moog_farmd process is active on Core 1:

ha_cntl -v

You should see output indicating that the moog_farmd process is active on Core 1. If moog_farmd process is active on Core 2, stop moog_farmd on the Core 2 and restart it on Core 1.

On Core 1, deactivate moog_farmd on the primary cluster:

ha_cntl --deactivate primary.moog_farmd

Enter "y" when prompted.

Run ha_cntl -v to monitor the moog_farmd process. You will see this process stop on Core 1 and start on Core 2.

To fail the cluster back to its default state, run the following command on Core 2 to deactivate moog_farmd on the secondary cluster:

ha_cntl --deactivate secondary.moog_farmd

On Core 1, activate moog_farmd on the primary cluster:

ha_cntl --activate primary.moog_farmd

Run ha_cntl -v to confirm that moog_farmd is active on the primary cluster.

Set Up the User Interface Role for HA

The UI role includes the Nginx and Apache Tomcat components. There are also a number of Cisco Crosswork Situation Manager webapps (servlets) installed and running within Tomcat, responsible for the following processes:

1. graze: Graze API

2. moogpoller: Dynamic updates to UI

3. moogsvr: Services HTTP requests

4. situation_similarity: Calculates the situation similarity and pushes to UI

5. toolrunner: Services Server Tools

HA architecture

In our distributed HA installation, the UI components are installed on the UI 1 and UI 2 servers.

Related image, diagram or screenshot

Refer to the Distributed HA system Firewall for more information on connectivity within a fully distributed HA architecture.

Install UI primary

· Install Cisco Crosswork Situation Manager components on the UI primary server.

On UI 1 install the following Cisco Crosswork Situation Manager components:

      VERSION=8.0.0.1; yum -y install moogsoft-common-${VERSION} \
    moogsoft-integrations-ui-${VERSION} \
    moogsoft-ui-${VERSION} \
    moogsoft-utils-${VERSION}

Edit the ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the .bashrc file

source ~/.bashrc

· Run the initialization script moog_init_ui.sh on UI 1 to initialize the UI stack. Substitute the name of your RabbitMQ zone and the Core 1 server hostname:

moog_init_ui.sh -twfz <zone> -c <Core 1 server hostname>:15672 -m <Core 1 server hostname>:5672 -s <Core 1 server hostname>:9200 -d <Core 1 server hostname>:3309 -n

· Uncomment and edit the servlets settings on UI 1 in the file $MOOGSOFT_HOME/config/servlets.conf. Note the importance of the initial comma.

      ,ha :
{
    cluster: "primary",
    instance: "servlets",
    group: "servlets_primary",
    start_as_passive: false
}

· On UI 1, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names.

      "mooms" :
   {
...
"zone" : "<zone>",

"brokers" : [
    {"host" : "<UI 1 server hostname>", "port" : 5672},
    {"host" : "<UI 2 server hostname>", "port" : 5672},
    {"host" : "<Redundancy server hostname>", "port" : 5672}
],
...
"cache_on_failure" : true,
...
"search" :
    {
...
"nodes" : [
    {"host" : "<UI 1 server hostname>", "port" : 9200},
    {"host" : "<UI 2 server hostname>", "port" : 9200},
    {"host" : "<Redundancy server hostname>", "port" : 9200}
]
...
"failover" :
    {
    "persist_state" : true,
    "hazelcast" :
        {
        "hosts" : ["<UI 1 server hostname>","<UI 2 server hostname>"],
        "cluster_per_group" : true
        }
    "automatic_failover" : true,
    }
...
"ha":
    { "cluster": "PRIMARY" }

· On UI 1, restart the Apache Tomcat service:

systemctl restart apache-tomcat

· Install, configure and start HA Proxy on UI 1 to connect to the Percona XtraDB Cluster.

Install UI secondary

1. Install Cisco Crosswork Situation Manager components on the UI secondary server.

On UI 2 install the following Cisco Crosswork Situation Manager components:

      VERSION=8.0.0.1; yum -y install moogsoft-common-${VERSION} \
    moogsoft-integrations-ui-${VERSION} \
    moogsoft-ui-${VERSION} \
    moogsoft-utils-${VERSION}

Edit the ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the .bashrc file

source ~/.bashrc

1. Initialize the UI stack. Run the initialization script moog_init_ui.sh on UI 2 to initialize the UI stack. Substitute the name of your RabbitMQ zone and the Core 2 server hostname:

moog_init_ui.sh -twfz <zone> -c <Core 2 server hostname>:15672 -m <Core 2 server hostname>:5672 -s <Core 2 server hostname>:9200 -d <Core 2 server hostname>:3309 -n

2. Uncomment and edit the servlets settings on UI 2 in the file $MOOGSOFT_HOME/config/servlets.conf. Note the importance of the initial comma.

Caution

The secondary server group must be different to the primary server group.

      ,ha :
{
    cluster: "secondary",
    instance: "servlets",
    group: "servlets_secondary",
    start_as_passive: false
}

3. On UI 2, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names.

4. On UI 2, restart the Apache Tomcat service:

systemctl restart apache-tomcat

5. Install, configure and start HA Proxy on UI 2 to connect to the Percona XtraDB Cluster .

Configure the UI load balancer

A user session needs to be served from the same UI stack, ie. they need to stay connected to the same UI server for the duration of their session, or until that UI server becomes unavailable (in which case the load balancer will redirect the user to the secondary). This is because requests are routed via moogsvr and data is received from moogpoller (web sockets).

Configure the UI load balancer with the following attributes:

1. Since both UI stacks are active you can choose to implement the round robin or least connection balancing method.

2. Route web traffic only to the Nginx behind which there is an active UI. The decision for this is based on a moogsvr servlet check via the ‘hastatus’ Tomcat endpoint. It will return a 204 if the UI stack is UP. It does not however report on the health of other roles, ie. Core (Moogfarmd, RabbitMQ and Elasticsearch clusters), Database (Percona Cluster), LAMs.

3. Sticky sessions are preferred. Traffic needs to be routed to the same backend server based on the same MOOGSESS cookie.

You can send the following example cURL command from the command line to check moogsvr servlet status:

curl -k https://server1/moogsvr/hastatus -v

Set Up HA Proxy for the Database Role

This topic tells you how to install and configure HA Proxy on a Cisco Crosswork Situation Manager server to connect to a remote Percona XtraDB cluster.

Percona XtraDB Cluster must be run as a 3-node (minimum) cluster distributed across the database roles.

Before you begin

Before you install and configure HA Proxy, you must configure Percona XtraDB as described in Set Up the Database for HA.

You must also set up the Core and UI nodes before installing and configuring HA Proxy on these nodes. See the following documentation for more information:

· Set Up the Core Role for HA.

· Set Up the User Interface Role for HA.

Once you have set up the Core and UI nodes, you install and configure HA Proxy on these nodes:

Related image, diagram or screenshot

Refer to the Distributed HA system Firewall for more information on connectivity within a fully distributed HA architecture.

Install and configure HA Proxy

You install and configure HA Proxy on the following Core and UI nodes:

1. Core 1 and Core 2: Core primary and secondary.

2. UI 1 and UI 2: UI primary and secondary.

Complete the following steps for the Core and UI primary and secondary servers:

1. Install and configure HA Proxy to listen on 0.0.0.0:3309 and route connections to one of three Percona XtraDB nodes. Use the Percona XtraDB server IP addresses instead of hostnames to reduce network latency.

$MOOGSOFT_HOME/bin/utils/haproxy_installer.sh -l 3309 -c -i <DB 1 server IP address>:3306,<DB 2 server IP address >:3306,<DB 3 server IP address>:3306

See Set up the database for HA for more information.

2. Restart the running services that use Percona XtraDB on the primary and secondary servers to allow those services to connect on 3309:

a. Moogfarmd on the Core servers.

b. Apache Tomcat on the UI servers.

c. Run the following script to confirm successful installation:

$MOOGSOFT_HOME/bin/utils/check_haproxy_connections.sh

If successful, you see a script output similar to the following example:

      HAProxy Connection Counts
Frontend:
    0.0.0.0:3309 : 27
Backend:
    mysql_node_1 172.31.82.211:3306 : 27
    mysql_node_2 172.31.82.133:3306 : 0
    mysql_node_3 172.31.85.42:3306 : 0

Set Up the Redundancy Server Role

In Cisco Crosswork Situation Manager HA architecture, both RabbitMQ and ElasticSearch run as three-node clusters. The three-node clusters prevent issues with ambiguous data state, such as a "split-brain".

RabbitMQ is the Message Bus used by Cisco Crosswork Situation Manager. Elasticsearch delivers the search functionality.

The three nodes are distributed across the two Core roles and the redundancy server.

HA architecture

In our distributed HA installation, the RabbitMQ and Elasticsearch components are installed on the Core 1, Core 2 and Redundancy servers.

Related image, diagram or screenshot

1. Core 1: RabbitMQ Node 1, Elasticsearch Node 1

2. Core 2: RabbitMQ Node 2, Elasticsearch Node 2

3. Redundancy server: RabbitMQ Node 3, Elasticsearch Node 3

Refer to the Distributed HA system Firewall for more information on connectivity within a fully distributed HA architecture.

Install Redundancy server

1. Install the Cisco Crosswork Situation Manager components on the Redundancy server.

On the Redundancy server install the following Cisco Crosswork Situation Manager components:

      VERSION=8.0.0.1; yum -y install moogsoft-common-${VERSION} \
    moogsoft-mooms-${VERSION} \
    moogsoft-search-${VERSION} \
    moogsoft-utils-${VERSION}

Edit the ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the .bashrc file:

source ~/.bashrc

1. Initialize RabbitMQ cluster node 3 on the Redundancy server and join the cluster.

— On the Redundancy server initialise RabbitMQ. Use the same zone name as Core 1 and Core 2:

moog_init_mooms.sh -pz <zone>

— The erlang cookies must be the same for all RabbitMQ nodes. Replace the erlang cookie on the Redundancy server with the Core 1 erlang cookie located at /var/lib/rabbitmq/.erlang.cookie. Make the Redundancy server cookie read-only:

chmod 400 /var/lib/rabbitmq/.erlang.cookie

You may need to change the file permissions on the Redundancy server erlang cookie first to allow this file to be overwritten. For example:

chmod 406 /var/lib/rabbitmq/.erlang.cookie

— Restart the rabbitmq-server service and join the cluster. Substitute the Core 1 server short hostname:

systemctl restart rabbitmq-server
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@<Core 1 server short hostname>
rabbitmqctl start_app

— Apply the HA mirrored queues policy. Use the same zone name as Core 1:

rabbitmqctl set_policy -p <zone> ha-all ".+\.HA" '{"ha-mode":"all"}'

— Run rabbitmqctl cluster_status to verify the cluster status and queue policy. Example output is as follows

Cluster status of node rabbit@ldev02 ...
[{nodes,[{disc,[rabbit@ldev01,rabbit@ldev02]}]},
{running_nodes,[rabbit@ldev01,rabbit@ldev02]},
{cluster_name,<<"rabbit@ldev02">>},
{partitions,[]},
{alarms,[{rabbit@ldev01,[]},{rabbit@ldev02,[]}]}]
[root@ldev02 rabbitmq]# rabbitmqctl -p MOOG list_policies
Listing policies for vhost "MOOG" ...
MOOG ha-all .+\.HA all {"ha-mode":"all"} 0

2. Initialise, configure and start Elasticsearch cluster node 3 on the Redundancy server.

a. Initialize Elasticsearch on the Redundancy server:

moog_init_search.sh

b. Uncomment and edit the properties of the /etc/elasticsearch/elasticsearch.yml file on the Redundancy server as follows:

cluster.name: aiops
node.name: <Redundancy server hostname>
...
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: [ "<Core 1 hostname>","<Core 2 hostname>","<Redundancy server hostname>"]
discovery.zen.minimum_master_nodes: 1
gateway.recover_after_nodes: 1
node.master: true

c. Restart Elasticsearch on the Core 1, Core 2 and Redundancy servers:

systemctl restart elasticsearch

3. Verify that the Elasticsearch nodes are working correctly:

curl -X GET "localhost:9200/_cat/health?v&pretty"

Example cluster health status:

epoch timestamp cluster status
node.total node.data shards pri relo init
unassign pending_tasks
max_task_wait_time
active_shards_percent
1580490422 17:07:02 aiops green 3 3 0 0 0 0 0 0 - 100.0%

Elasticsearch Encryption

You should use an encrypted password in the configuration file if you do not want users with configuration access to be able to access integrated systems.

Enable password authentication

To enable unencrypted password authentication on Elasticsearch, set the following properties in the system.conf file:

      "search":
    {
    ...
    “username” : <username>,
    “password” : <password>,
    ...
    }

To enable encrypted password authentication on Elasticsearch, set the following properties in the system.conf file:

      "search":
    {
    ...
    “username” : <username>,
    “encrypted_password” : <encrypted password>
    ...
    }

Initialize Elasticsearch

To initialize Elasticsearch with password authentication, run:

moog_init_search.sh -a username:password

or:

moog_init_search.sh --auth username:password

If you run moog_init_search without the -a/--auth parameters, you will not enable password authentication in Elasticsearch.

See Moog Encryptor for more information on how to encrypt passwords stored in the system.conf file.

Install LAMS (On-premise)

This topic describes how to install LAMs in a distributed HA system where all components in the HA architecture are on-premise.

In HA architecture, LAM 1 and LAM 2 run in an active / passive mode for a HA polling pair, and in active / active mode for a HA receiving pair.

HA architecture

In our distributed HA installation, the LAM components are installed on the LAM 1 and LAM 2 servers:

Related image, diagram or screenshot

Refer to the Distributed HA system Firewall for more information on connectivity within a fully distributed HA architecture.

Install LAM 1

Follow these instructions to install Cisco Crosswork Situation Manager on the LAM 1 server.

1. On LAM 1 install the following Cisco Crosswork Situation Manager components:

      yum -y install moogsoft-common-${VERSION}* \
    moogsoft-integrations-${VERSION}* \
    moogsoft-utils-${VERSION}*

Edit your ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the ~/.bashrc file:

source ~/.bashrc

2. On LAM 1, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names:

Install LAM 2

Follow these instructions to install Cisco Crosswork Situation Manager on the LAM 2 server.

· On LAM 2 install the following Cisco Crosswork Situation Manager components:

      yum -y install moogsoft-common-7.3* \
    moogsoft-integrations-7.3* \
    moogsoft-utils-7.3*

Add the following code to the ~/.bashrc file:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the ~/.bashrc file:

source ~/.bashrc

· On LAM 2, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone, the server hostnames, and the cluster names:

      "mooms" :
   {
...
"zone" : "<zone>",

"brokers" : [
    {"host" : "<Core 2 server hostname>", "port" : 5672},
    {"host" : "<Core 1 server hostname>", "port" : 5672},
    {"host" : "<Redundancy server hostname>", "port" : 5672}
],
...
"cache_on_failure" : true,
...
"search" :
    {
...
"nodes" : [
    {"host" : "<Core 2 server hostname>", "port" : 9200},
    {"host" : "<Core 1 server hostname>", "port" : 9200},
    {"host" : "<Redundancy server hostname>", "port" : 9200}
]
...
"failover" :
    {
    "persist_state" : true,
    "hazelcast" :
        {
        "hosts" : ["<Core 2 server hostname>","<Core 1 server hostname>"],
        "cluster_per_group" : true
        }
    "automatic_failover" : true,
    }
...
"ha":
    { "cluster": "SECONDARY" }

Configure a new backend LAM integration as HA on LAM 1 and LAM 2

Follow the instructions in Set Up LAMs for HA.

Install LAMs (SaaS)

This topic describes how to install LAMs in a distributed HA system in a SaaS version of Cisco Crosswork Situation Manager. This is a hybrid installation. The core processing is located in the cloud, and the LAMs and event sources are on-premise.

You must enable either Cisco Bridge or Enable WebSockets LAMs before you can use the LAMs to enable event delivery:

· Cisco Bridge uses a store and forward architecture to push events and other messages from a local RabbitMQ cluster to the Message Bus. Cisco recommends that you use Cisco Bridge in production environments.

· WebSockets LAMs enables LAMs to communicate with Cisco Crosswork Situation Manager using WebSockets instead of RabbitMQ or Cisco Crosswork Situation Manager Bridge. Cisco recommends that you use WebSockets LAMs in non-production environments only.

In HA architecture, LAM 1 and LAM 2 run in an active / passive mode for a HA polling pair, and in active / active mode for a HA receiving pair.

HA architecture

In our distributed HA installation, the LAM components are installed on the LAM 1 and LAM 2 servers:

Related image, diagram or screenshot

Refer to the Distributed HA system Firewall for more information on connectivity within a fully distributed HA architecture.

Install LAM 1

Follow these instructions to install Cisco Crosswork Situation Manager on the LAM 1 server.

· Install Cisco Crosswork Situation Manager components on the LAM 1 server.

On LAM 1 install the following Cisco Crosswork Situation Manager components:

      yum -y install moogsoft-common-${VERSION}* \
    moogsoft-db-${VERSION}* \
    moogsoft-mooms-${VERSION}*
    moogsoft-integrations-${VERSION}* \
    moogsoft-utils-${VERSION}*

Edit your ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the ~/.bashrc file:

source ~/.bashrc

· Initialize the local Cisco Crosswork Situation Manager RabbitMQ cluster node on the LAM 1 server. Substitute a value into <zone> that is different from the value chosen for the main RabbitMQ cluster.

moog_init_mooms.sh -pz <zone>

· On LAM 1, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone and the LAM 1 server hostname. Set automatic failover to true and the cluster name to PRIMARY:

      "mooms" :
   {
...
"zone" : "<zone>",

"brokers" : [
    {"host" : "<LAM 1 server hostname>", "port" : 5672},
],
...
"failover" :
    ...
    "automatic_failover" : true,
    }
...
"ha":
    { "cluster": "PRIMARY" }

Install LAM 2

Follow these instructions to install Cisco Crosswork Situation Manager on the LAM 1 server.

· On LAM 2 install the following Cisco Crosswork Situation Manager components:

      yum -y install moogsoft-common-${VERSION}* \
     moogsoft-mooms-${VERSION}*
     moogsoft-integrations-${VERSION}* \
     moogsoft-utils-${VERSION}*

Edit your ~/.bashrc file to contain the following lines:

export MOOGSOFT_HOME=/usr/share/moogsoft
export APPSERVER_HOME=/usr/share/apache-tomcat
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$MOOGSOFT_HOME/bin:$MOOGSOFT_HOME/bin/utils

Source the ~/.bashrc file:

source ~/.bashrc

· Initialize the local Cisco Crosswork Situation Manager RabbitMQ cluster node on the LAM 2 server. Use the same zone that you specified for LAM 1.

moog_init_mooms.sh -pz <zone>

· On LAM 2, edit $MOOGSOFT_HOME/config/system.conf and set the following properties. Substitute the name of your RabbitMQ zone and the LAM 2 server hostname. Set automatic failover to true and the cluster name to SECONDARY:

      "mooms" :
   {
...
"zone" : "<zone>",

"brokers" : [
    {"host" : "<LAM 2 server hostname>", "port" : 5672},
],
...
"failover" :
    ...
   "automatic_failover" : true,
    }
...
"ha":
    { "cluster": "SECONDARY" }

Enable event delivery

You must set up either Cisco Bridge or WebSockets LAMs to enable event delivery:

· See Cisco Bridge for information on how to enable Cisco Bridge.

· See Enable WebSockets LAMs for information on how to enable WebSockets LAM.

Set up a backend LAM HA configuration on LAM 1 and LAM 2

See Set Up LAMs for HA for instructions.

Cisco Bridge

Cisco Bridge uses a store and forward architecture to push events and other messages from a local RabbitMQ cluster to the Message Bus.

The connection to the Message Bus is through a WebSocket connection to the Integrations Controller. The Integrations Controller publishes the events directly to the Message Bus. The following diagram shows this process.

Related image, diagram or screenshot

Cisco Bridge offers the following advantages:

· You do not have to open a port for local RabbitMQ clusters or the Message Bus.

· You do not need a database.

· You will have an outbound HTTP connection with optional proxy support.

· Events will persist in local RabbitMQ clusters even if the connection to the Message Bus is disrupted.

· If you enable Cisco Bridge wherever a server has access to a local RabbitMQ cluster, this increases redundancy if a bridge is killed or loses connection to the local RabbitMQ cluster.

· Connection between RabbitMQ and the Message Bus is bi-directional. This means that in a HA installation:

— You can send ha_cntl messages to your LAMs on a remote LAM server. This means that using the UI stack, you can query your remote LAMs’ status, and activate or deactivate your remote LAMs.

— Your remote LAMs follow automatic failover across different LAM servers within the same group. You must enable automatic failover in each LAM server's system.conf for this to take effect.

Before you begin

Before you enable Cisco Bridge, verify the following:

· You are a SaaS user that needs on-premise data ingestion using LAMs instead of a UI integration.

· You have a SaaS production environment that requires event delivery guarantee.

Create a WebSockets authentication toke

You create a WebSockets authentication token to use when enabling either WebSockets LAM or Moogsoft Bridge.

You must have the grazer_login and manage_integrations permissions to create a WebSocket authentication token. The Grazer role has these permissions with a new install of Cisco Crosswork Situation Manager v8.x.

To create a WebSockets authentication token, run the following:

curl -u <username:password> -X POST 'https://<instance>/integrations/api/v1/auth/integrations'

Substitute the username and password of the user with the Grazer role and manage_integrations permission.

See Role Permissions for more information.

Install the required files

Install the Cisco Crosswork Situation Manager RPM or TGZ files on the on-premise LAM server that has access to the local RabbitMQ cluster:

VERSION=8.0.0; yum -y install moogsoft-integrations-${VERSION}* \
moogsoft-common-${VERSION}* \
moogsoft-utils-${VERSION}* \
moogsoft-mooms-${VERSION}*

Edit the Cisco Bridge configuration file

Edit the Cisco Bridge configuration file located at $MOOGSOFT_HOME/config/moogsoft_bridge.conf:

      {
  "group": <group>,
  "webhost": <base URL>,
  "websocket_token": <WebSockets token>,
  "proxy": {
    "host" : <Proxy host>,
    "port" : <Proxy port>,
    "username": <Username of proxy for basic authentication>,
    "password": <Password for the proxy for basic authentication>
  }
}

Substitute in the group, webhost, websocket_token and proxy values:

· group: A unique identifier for a local RabbitMQ cluster.

You should set up multiple bridges for each local RabbitMQ cluster. This ensures that the cronjob will be able to automatically replace disconnected bridges.

If Cisco Crosswork Situation Manager detects a missing bridge, then Cisco Crosswork Situation Manager will raise a critical event. If the bridge reappears, Cisco Crosswork Situation Manager clears the critical event.

· webhost: Your base URL.

· websocket_token: Your WebSocket token.

· proxy: An optional object for HTTP communication. The proxy object contains the following parameters:

— host: Host of the proxy (mandatory).

— port: Port of the proxy (mandatory).

— username: Username of proxy for basic authentication (optional).

— password: Password for the proxy (mandatory if username is used).

Start the Cisco Bridge process

Run the LAMs initialization utility moog_init_lams.sh --bridge to start the Cisco Bridge as a background process.

Running this command also creates a crontab to restart the bridge if it stops.

If all bridges in a group stop, Cisco Crosswork Situation Manager creates an event.

Cisco Bridge outputs logs to:

1. /var/log/moogsoft/moogsoft_bridge.log for root users.

2. $MOOGSOFT_HOME/log/moogsoft_bridge.log for non-root users.

See Configure Logging for more information on logging.

Enable WebSockets LAMs

You can configure LAMs to communicate with Cisco Crosswork Situation Manager using WebSockets instead of RabbitMQ or Cisco Crosswork Situation Manager Bridge.

If you are a SaaS customer and cannot use the UI for integrations with an on-premise data source, you should use WebSockets LAMs.

Cisco recommends that you use WebSockets LAMs in non-production environments only, and use Cisco Bridge in production environments.

If you use WebSockets LAMs:

· You do not have to open a port for local RabbitMQ clusters.

· You have an outbound HTTP connection with optional proxy support.

You can enable WebSockets LAMs in any of your environments where you have installed the Cisco Crosswork Situation Manager RPM or tarball files.

If you enable WebSockets LAMs in an environment, all LAMs in that environment use WebSockets. You cannot enable multiple communication methods in a single environment.

Before you begin

Before you enable WebSockets LAMs, you must create a WebSockets authentication token.

Create a WebSockets authentication token

You create a WebSockets authentication token to use when enabling either WebSockets LAM or Moogsoft Bridge.

To create a WebSockets authentication token, run the following:

curl -u <username:password> -X POST 'https://<instance>/integrations/api/v1/auth/integrations'

Substitute the username and password of the user with the Grazer role and manage_integrations permission.

See Role Permissions for more information.

Enable WebSockets LAMs

To enable WebSockets LAMs, edit the integrations configuration file located at $MOOGSOFT_HOME/config/integrations.conf.

Substitute in your base URL and WebSockets authentication token.

      {
  "controller_url":<Base URL>,
  "Websocket_token":<WebSockets authentication token>
}

You can optionally configure a proxy object for LAM HTTP communication. The proxy object contains the following parameters:

· host: Host of the proxy (mandatory).

· port: Port of the proxy (mandatory).

· username: Username of proxy for basic authentication (optional).

· password: Password for the proxy (mandatory if username is used).

The following example $MOOGSOFT_HOME/config/integrations.conf file contains a proxy object:

      {
  "controller_url":"https://example.moogsoft.com"
  "Websocket_token":"awkdnawdnawidawk123",

  "proxy": {
    "host" :"host",
    "port" :8080,
    "username":"user",
    "password":"password"
  }
}

Restart or start the LAMs to finish enabling the WebSockets LAMs.

Set Up LAMs for HA

To configure a new backend LAM integration for HA on LAM 1 and LAM 2:

· Make a copy of the corresponding LAM configuration file and rename it accordingly.

· Make a copy of the corresponding LAMbot LAM file and rename it accordingly.

· If applicable, amend the LAM configuration file to point to the LAMBot file (under the Presend section).

· Create the service script pointing to the configuration file.

· Configure the ha section of the configuration file according to the type of LAM and its corresponding HA setup.

Configure a Polling LAM for HA

To configure a polling LAM for HA, you must set the LAMs as active / passive, and therefore in the same Cisco Crosswork Situation Manager process group. If the system detects an issue with the active LAM, the passive instance will automatically take over.

Related image, diagram or screenshot

To enable automatic failover:

· On LAM 1 and LAM 2, edit the $MOOGSOFT_HOME/config/system.conf file and set the automatic_failover property to true:

# Allow a passive process to automatically become active if
# no other active processes are detected in the same process group
"automatic_failover" : true,

· Restart the polling LAMs to finish enabling automatic failover.

Configure a Receiving LAM for HA

For a HA configuration, the receiving LAMs must always run as active / active, meaning a load balancer (of your choice) places them in different Cisco Crosswork Situation Manager process groups.

Related image, diagram or screenshot

There are two methods you can use to implement your load balancer: chained failover or multiplexing (which sends to both active receiving LAMs.

If you choose to implement using multiplexing, ensure the following:

· The duplicate_event_source parameter in the LAM config is set to true. The parameter lets Moogfarmd know to silently drop any event duplicates arriving within a configurable period.

· The configuration files for both active Receiving LAMs, running as an HA pair, are identical, apart from their ha sections. This ensures that Moogfarmd is able to detect the event duplicates correctly.

The following example cURL command is a call from the command line to check on the status of the LAM instance:

[root@server1 moogsoft]# curl -X GET "http://server9:8888"
{"success":true,"message":"Instance is active","statusCode":0}

Validate the Installation

Follow the steps below to validate that the installation was successful.

Elasticsearch requires Java11. Java 11 is included with the installation of the RPM packages as a dependency.

If Elasticsearch fails to start due to an incorrect Java/JDK version, follow these steps.

· Run the following command to configure the system to use the new Java version:

alternatives --config java

This command prompts you to select which 'java' should be in the system PATH. At the prompt, type the number which corresponds with the Java 11 installation. For example, if the prompt includes:

Selection Command
-----------------------------------------------
*+ 1 java-8-openjdk.x86_64 (/usr/lib/jvm/java-8-openjdk-8.1.0.7-0.el7_6.x86_64/bin/java)
+ 2 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.5.10-0.el7_7.x86_64/bin/java

Press 2 and hit Enter. To confirm the change has taken effect, run the following command:

java -version

The output should show the latest version of openjdk. This version will be at least 11.0.5.

· Restart Elasticsearch:

service elasticsearch restart

Perform the following steps to ensure that Cisco Crosswork Situation Manager v7.3 has been successfully installed or upgraded:

· Check that the UI login page displays "Version 8.x" at the bottom.

· Log into the UI with username "admin" and password "admin". You should change the default username and password when you have logged in for the first time.

· Select the Help icon (question mark) > Support Information. Check that the System Information shows "Version 8.x" and the correct schema upgrade history if you have performed an upgrade.

Note

If you have already completed this step previously (as part of this upgrade process) on the current host, you can skip this step.

Run the Install Validator utility to ensure that all Cisco Crosswork Situation Manager files were deployed correctly in $MOOGSOFT_HOME:

$MOOGSOFT_HOME/bin/utils/moog_install_validator.sh

Run this utility to confirm that all Apache Tomcat files were deployed correctly in $MOOGSOFT_HOME:

$MOOGSOFT_HOME/bin/utils/tomcat_install_validator.sh

If there are webapp differences, run the following command to extract the webapps with the correct files:

$MOOGSOFT_HOME/bin/utils/moog_init_ui.sh -w

Note

If you have already completed this step previously (as part of this upgrade process) on the current host, you can skip this step.

Run the Database Validator utility to validate the database schema:

$MOOGSOFT_HOME/bin/utils/moog_db_validator.sh

Note

Some schema differences are valid, for example those related to custom_info (new columns added etc).

An additional schema upgrade step is required if you are upgrading from v7.0.x, v7.1.x, or v7.2.x.

This additional step is documented on the Post-upgrade steps page.

Until you have complete this step, you should expect to see the following differences in the output of the Database Validator utility:

      Differences found in 'historic_moogdb' tables:
41,49c41,43
<   primary key (`alert_id`),
<   unique key `idx_signature` (`signature`),
<   key `idx_first_event_time` (`first_event_time`),
<   key `idx_state_last` (`state`,`last_state_change`),
<   key `idx_severity` (`severity`,`state`),
<   key `idx_agent` (`agent`(12)),
<   key `idx_source` (`source`(12)),
<   key `idx_type` (`type`(12)),
<   key `idx_manager` (`manager`(12))
---
>   primary key (`signature`),
>   key `alert_id` (`alert_id`),
>   key `first_event_time` (`first_event_time`,`alert_id`)
93,94c87
<   key `timestamp` (`timestamp`,`type`),
<   key `idx_type_time` (`type`,`timestamp`)
---
>   key `timestamp` (`timestamp`,`type`)
241,242c234
<   key `sig_id` (`sig_id`,`action_code`,`timestamp`),
<   key `idx_action_sig` (`action_code`,`sig_id`)
---
>   key `sig_id` (`sig_id`,`action_code`,`timestamp`)

The differences above will not have any functional impact, but you must complete the rest of the upgrade to ensure the system is performant and the schema is ready for future upgrades.

If you have performed an upgrade and you see errors similar to the following:

Differences found in 'moogdb' tables:
57a58
> key 'filter_id' ('filter_id'),
194a196
> key 'enrichment_static_mappings_ibfk_1' ('eid'),
1196a1199
> key 'sig_id' ('sig_id'),
1325a1329
> key 'filter_id' ('filter_id'),

Run the following commands to resolve these index-related problems:

mysql moogdb -u root -e "alter table alert_filters_access drop key filter_id"
mysql moogdb -u root -e "alter table situation_filters_access drop key filter_id"
mysql moogdb -u root -e "alter table enrichment_static_mappings drop key enrichment_static_mappings_ibfk_1"
mysql moogdb -u root -e "alter table sig_stats_cache drop key sig_id"

Validate a high availability installation

Follow these instructions to validate a high availability (HA) installation:

· Run the Apache Tomcat install validator utility on any UI server where apache-tomcat is installed and running. In a standard HA installation, apache-tomcat is running on the UI 1 and UI 2 servers.

See Set Up the User Interface Role for HA for more information.

· For a non-root deployment of Cisco Crosswork Situation Manager, run the database validator utility once on any server.

For an RPM deployment, run the utility on the server with the moogsoft-db package installed.

See Set Up the Database for HA for more information.

· Run the install validator utility on any server that has a running Cisco Crosswork Situation Manager component.

· Run the ha_cntl utility on all HA servers before data is ingested into the system to make sure all expected HA components are running successfully.

Validate the database settings

When you have finished installing Cisco Crosswork Situation Manager, you should tune and validate your database settings. The appropriate settings for your database will vary depending on your system, deployment, database type and other factors. This topic provides general guidance on the following scenarios:

· Scenario 1: You are setting up a production environment with a Percona XtraDB Cluster and have used the install_percona_nodes script to create the cluster.

· Scenario 2: You are setting up a production environment with a Percona XtraDB Cluster, but have not used the install_percona_nodes script to create the cluster.

· Scenario 3: You are setting up a non-production or QA environment with MySQL Community.

Scenario 1

In the install_percona_nodes script, the Percona nodes are deployed with the following my.cnf file:

      [mysqld]
# GENERAL #
basedir                        = /usr/
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/run/mysqld/mysqld.sock
pid-file                       = /var/run/mysqld/mysqld.pid
port                           = 3306

# MyISAM #
key-buffer-size                = 32M
myisam-recover-options         = FORCE,BACKUP

# SAFETY #
max-allowed-packet             = 128M
max-connect-errors             = 1000000

# DATA STORAGE #
datadir                        = /var/lib/mysql/

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 500
thread-cache-size              = 128
open-files-limit               = 65535
table-definition-cache         = 1024
table-open-cache               = 8000
max_prepared_stmt_count        = 1048576
group_concat_max_len           = 1048576

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = <VARIABLE>
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table          = 1
innodb-buffer-pool-size        = <VARIABLE>
innodb_buffer_pool_instances   = <VARIABLE>
innodb_autoinc_lock_mode       = 2

# LOGGING #
log-timestamps                 = SYSTEM
log-error                      = /var/log/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log-file            = /var/log/mysql-slow.log
slow-query-log                 = 0

# REPLICATION #
binlog_format                  = ROW
sync_binlog                    = 0

# PERF #
performance_schema             = ON

# Moogsoft MySQL 5.6 optimizations
default_tmp_storage_engine     = MyISAM

# wsrep settings
wsrep_provider                 = /usr/lib64/galera3/libgalera_smm.so
wsrep_cluster_address          = gcomm://
wsrep_log_conflicts
wsrep_cluster_name             = pxc-cluster
wsrep_node_name                = pxc-node-$(hostname)
pxc_strict_mode                = ENFORCING
wsrep_sst_method               = xtrabackup-v2
wsrep_sst_auth                 = "<VARIABLE>:<VARIABLE>"
wsrep_slave_threads            = <VARIABLE>
wsrep_sync_wait                = 0
wsrep_provider_options         = "gcache.size=4G"
wsrep_retry_autocommit         = 10

[mysqldump]
quick
quote-names
max_allowed_packet             = 16M

[client]
default-character-set=utf8
port                           = 3306
socket                         = /var/run/mysqld/mysqld.sock

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
socket                         = /var/run/mysqld/mysqld.sock
nice                           = 0

[mysql]

[isamchk]

If you have used the install_percona_nodes script to create the cluster, you should not need to tune your database settings.

If you do want to tune your settings, Cisco recommends that you set the following variables based on system resources:

1. innodb-buffer-pool-size: Set this to 80% of system RAM if you set the -d (dedicated) flag. Otherwise, set this to 50% of system RAM.

2. innodb_buffer_pool_instances:

a. If the buffer pool size is greater than 16 GB, set this to a number between "2" and "64" to divide the buffer pool size into chunks of approximately 8 GB in size.

b. If the buffer pool size is less than 16 GB, set this to "8".

3. innodb-log-file-size: Set this to the larger of the following two values:

a. Approximately 2% of the buffer pool size.

b. 1 GB.

4. wsrep_slave_threads = If the number of processor cores is greater than or equal to 32, set this to the number of processor cores divided by four. Otherwise, set this to "8".

Scenario 2

If you have set up a production environment with a Percona XtraDB Cluster, but have not used the install_percona_nodes script to create the cluster, you should tune your database settings in line with Scenario 1.

Scenario 3

If you have set up a non-production or QA environment with MySQL Community, you should tune your database settings by following the guidance in the default my.cnf.

The default moog_init_db script does not attempt to automatically set any of these variables. You must manually set these variables post-install.

      # INNODB #
# Important: innodb-buffer-pool performance tuning
# On servers with >= 16GB RAM that run MySQL and Moog applications, consider setting
# innodb-buffer-pool-size to 50% of system RAM.
# On servers where only MySQL is running, consider setting innodb-buffer-pool-size
# to 80% of system RAM.
innodb-buffer-pool-size        = 2G

# If innodb-buffer-pool-size >= 16G, then uncomment and set innodb_buffer_pool_instances
# to divide buffer pool into ~8G chunks. Otherwise leave commented to use default of 8
# e.g: innodb-buffer-pool-size=32G innodb_buffer_pool_instances=4
# e.g: innodb-buffer-pool-size=80G innodb_buffer_pool_instances=10
#innodb_buffer_pool_instances = 8

# Set innodb-log-file-size to 1G or ~2% of buffer pool size, whichever is the larger
# e.g: innodb-buffer-pool-size=64G innodb_log_file_size=1280M
innodb-log-file-size           = 1G

# Set innodb-flush-log-at-trx-commit to 0 on systems where throughput is valued
# over risk of data loss in the event of a mysql crash
innodb-flush-log-at-trx-commit = 1

Install Cisco Add-ons

Cisco periodically provides updates to Cisco Crosswork Situation Manager as add-ons. Add-ons may comprise updates to the Workflow Engine, new Workflow Engine functions, integrations tiles, and other features.

This topic tells you how to install the latest version of the Cisco Add-ons. For information on the latest add-ons, see Add-ons.

If you haven't already, you need to create the Enrichment API data store to use the Enrichment API Integration. See Create the Enrichment API Data Store for instructions. If you upgraded from Cisco Crosswork Situation Manager v.7.3.0 and were using the Integrations API with Cisco Add-ons v.1.4.0 or later, you do not need to create the Enrichment API data store.

Before you begin

Before you install Cisco Add-ons:

· Verify you have SSH access to to your Cisco Crosswork Situation Manager machines. For distributed or Highly Available installations, update the add-ons on all core role machines where you run Moogfarmd and on the machines where you run the UI. See Server Roles.

· Download the latest add-ons bundle and transfer it to the machines where you are performing the update.

· Verify the credentials for the operating system user that runs Moogfarmd or the UI and perform all steps as that same user.

Download Cisco Add-ons

You can download the current add-ons (Crosswork-Situation-Manager-Addons-2.1.0.tar.gz) from Cisco eDelivery.

Install add-ons

To install add-ons, you add them to machines running Moogfarmd and the UI components. This procedure replaces existing components.

1. Create a backup of the moobots, integrations, and other add-on files on the instance. For example:

tar -czf $MOOGSOFT_HOME/addons_backup.tar.gz -C $MOOGSOFT_HOME \
bots/moobots \
config \
contrib \
etc/integrations

2. Make a copy of SimilarSigConfig.conf:

cp $MOOGSOFT_HOME/config/SimilarSigConfig.conf $MOOGSOFT_HOME

3. Extract the add-ons to $MOOGSOFT_HOME:

tar -xzhf Crosswork-Situation-Manager-Addons-2.1.0.tar.gz -C $MOOGSOFT_HOME

4. Move SimilarSigConfig.conf back into place:

mv $MOOGSOFT_HOME/SimilarSigConfig.conf $MOOGSOFT_HOME/config

5. On core role machines, restart Moogfarmd:

service moogfarmd restart

You do not need to restart Nginx or Apache Tomcat on UI machines.

Create the Enrichment API Data Store

Before you install the Enrichment API Integration for Cisco Crosswork Situation Manager, set up the data store for the API.

The add-ons includes the Enrichment API Integration that lets you store enrichment data in a database for use with the Enrichment Workflow Engine. Before you can use the integration, create the data store from one machine with access to the database as follows:

1. Extract the utilities. For example:

unzip -x $MOOGSOFT_HOME/contrib/moog_enrichment_utils.zip -d $MOOGSOFT_HOME/contrib/

2. Run the script to create the database and the enrichment user:

$MOOGSOFT_HOME/contrib/moog_enrichment_utils/bin/moog_init_enrichmentdb.sh \
-u <database root user> \
-p <optional database root user password> \
-d <database host>:<database port> \
-e <enrichment user password>

For example to initialize the moog_enrichment database on a database host named my_database_host with an enrichment user password of 'password123':

$MOOGSOFT_HOME/contrib/moog_enrichment_utils/bin/moog_init_enrichmentdb.sh \
-u root \
-d my_database_host:3306 \
-e password123

In this case, when prompted for the password, enter the root database user password.

In addition to the database script, the Add-ons include the following utilities to help you load and test enrichment data:

· moog_send_event is a node.js script that sends test events to the Enrichment API after you set up the integration. For usage information run: ./moog_send_event -h .

· moog_enrichment_util is a node.js script that loads data into the Enrichment API after you set up the integration. You can supply enrchment data in JSON, CSV, or TSV format. For usage information, run ./moog_enrichment_util -h.

· moog_enrichment_util.sh is a Bash version of the data loader. It is slower than the node.js scripts, but works on systems where you can't install node.js. For usage information, run ./moog_enrichment_util.sh -h.

After you have installed the latest add-ons and created the data store for the Enrichment API, you can install the Enrichment API Integration.

Apply Valid SSL Certificates

Cisco Crosswork Situation Manager includes a self-signed certificate by default. If you want to add your own certificates to Nginx, follow the instructions below.

A valid SSL certificate is required if you want to use Cisco Crosswork Situation Manager for Mobile on an iPhone. This is because WebSockets do not work on iOS with self-signed certificates. If a valid root CA certificate is not added, a 'Connection Error' appears at login and Cisco Crosswork Situation Manager for Mobile does not work.

For more information, see the Nginx documentation.

Add a Valid Certificate

To apply a valid certificate to Nginx, edit $MOOGSOFT_HOME/etc/cots/nginx/conf.d/moog-ssl.conf :

vi $MOOGSOFT_HOME/etc/cots/nginx/conf.d/moog-ssl.conf

Change the default self-signed certificate and key locations to point to the valid root certificate and key:

#ssl_certificate /etc/nginx/ssl/certificate.pem;
#ssl_certificate_key /etc/nginx/ssl/certificate.key;

ssl_certificate /etc/certificates/your_company_certificate.crt;
ssl_certificate_key /etc/certificates/your_company_certificate.key;

Reload Nginx with the command:

systemctl reload nginx

Configure Services to Restart

You can use Service Manager to control process startup for Cisco Crosswork Situation Manager.

The utility can keep processes associated with Cisco Crosswork Situation Manager services alive if your system fails or restarts. For every new install of Cisco Crosswork Situation Manager, a cronjob entry deployed by the Moogsoft initialization script (moog_init.sh) runs the process manager script (process_keepalive.sh) every minute. This script attempts to restart the processes of any services listed in the process manager configuration file at $MOOGSOFT_HOME/config/keep_procs_alive.conf. By default, the following services are set to restart:

1. RabbitMQ

2. MySQL

3. Nginx

4. Elasticsearch

5. Apache-Tomcat

6. Moogfarmd.

You can either use the new service_manager utility to edit the configuration file, or edit the file manually (and enter a '1' for all services you want to restart and enter a '0' for those that you want left alone). If you have a custom LAM, you can add this to the configuration file with the flag to determine the restart behavior. For example:

Customlamd=1

If your system fails or restarts, the Service Manager utility automatically starts after one minute and attempts to restart the configured processes/services up to three times. If the service does not start after three attempts, the utility disables the service restart attempts in future.

Service Manager is useful for ensuring non-core processes start and run if Cisco Crosswork Situation Manager fails or restarts. For example, you might want to ensure specific LAMs remain alive so no events are missed if your system reboots.

Configure Service Manager

You can configure Service Manager to control which services and their associated processes to start up when Cisco Crosswork Situation Manager reboots:

· Run the Service Manager:

$MOOGSOFT_HOME/bin/utils/service_manager

The default utility settings are shown as follows:

      ------------------------------------------------ Page [1/3] ---
#                      Service Manager                        #
---------------------------------------------------------------
Check the services that will be started and kept alive...

[x] rabbitmq
[x] mysql
[x] nginx
[x] elasticsearch
[x] apache-tomcat
[x] moog_farmd
[ ] ansibletower_lam
[ ] appdynamics_lam
[ ] aws_lam
[ ] azure_classic_lam
[ ] azure_lam
[ ] ca_spectrum_lam
[ ] datadog_client_lam
[ ] datadog_lam
[ ] dynatrace_apm_lam
[ ] dynatrace_apm_plugin_lam
[ ] dynatrace_notification_lam
[ ] dynatrace_synthetic_lam
[ ] email_lam
[ ] emc_smarts_lam
[ ] extrahop_lam
[ ] fluentd_lam
[ ] hp_nnmi_lam
[ ] hp_omi_lam

Use arrows, page up/down and ENTER to navigate

· Navigate through the list of available services using the directional arrows. There are multiple pages to scroll through.

· Press Space or Enter to add or remove services you want to restart or do not want to restart. An [ x ] appears next to any services you select.

· You can enable or disable Service Manager from restarting all services on the last page.

· Select 'Apply Changes' and press Enter to make the changes.

After you exit, the process_keepalive.sh script keeps selected services alive if Cisco Crosswork Situation Manager fails or restarts.

Service Manager Command Line Reference

You can also configure the Service Manager using command line arguments. The utility uses the following syntax:

service_manager --service=<service_name> --command=<action>

The Service Manager resides at $MOOGSOFT_HOME/bin/utils/service_manager. You can configure the utility using the following arguments:

Argument	Input	Description
-h,--help	-	Displays the syntax and arguments available in Service Manager.
-s,--service=	service name	Name of the service you want to execute the Service Manager command on. Apart from the core services, these include all of the LAMs. For example: email_lam, rest_lam, trapd_lam etc.
-c, --command=	enable \| disable \| enable_start \| disable_stop	Commands the Service Manager can execute against the specified service: enable - Enables the service auto start option. disable - Disables the service auto start option. enable_start - Starts and enables the service. disable_stop - Stops and disables the service.
-a, --autostart-all=	enable \| disable	Disables or enables the auto start option for all services.

You can only run a command against a single service at a time using the command line arguments. For example, if you wanted to enable the Service Manager to restart the Email LAM in the event of a failover:

service_manager --service=email_lam --command=enable

Disable Service Restart

There are two ways to disable the default service restart functionality.

Disable theprocess_keepalive.shcronjob by removing it from the cron table, the commands scheduled to run on your system. To edit the cron table:

crontab -e

Delete or comment out/usr/share/moogsoft/bin/utils/process_keepalive.sh 2>&1 from the file.

Alternatively, disable the Service Manager utility:

$MOOGSOFT_HOME/bin/utils/service_manager -a disable

Service Manager Logging

To check the Service Manager logs you can view:

/var/log/moogsoft/process/keepalive.log

This contains all logs relating to the Service Manager utility (service_manager) and to the process that attempts to keep the services alive (process_keepalive).

Encrypt Database Communications

You can enable SSL to encrypt communications between all Cisco Crosswork Situation Manager components and the MySQL database.

For information on creating SSL keys and certificates for MySQL, see Creating SSL and RSA Certificates and Keys using MySQL.

Establish Trust for the MySQL Certificate

To establish trust for the MySQL database certificate, create a truststore to house the root certificate for the Certificate Authority that signed the MySQL Server certificate.

1. If you upgraded from a previous version of Cisco Crosswork Situation Manager, run the following command to extract the certificate for the root CA for MySQL:

mysql_ssl_rsa_setup

The command generates new keys and writes them to the /var/lib/mysql directory.

2. Run the java keytool command to create a trust store containing the certificate for the root CA for MySQL.

keytool -import -alias mysqlServerCACert -file /var/lib/mysql/ca.pem -keystore $MOOGSOFT_HOME/etc/truststore

a. When keytool prompts you, enter a password for the keystore. You will need this password to configure Cisco Crosswork Situation Manager.

b. Answer 'yes' to "Trust this certificate."

Keytool creates a truststore at the path $MOOGSOFT_HOME/etc/truststore.

Configure Cisco Crosswork Situation Manager to use SSL for Database Communications

After you have created the truststore, edit the Cisco Crosswork Situation Manager configuration to enable SSL.

· Edit $MOOGSOFT_HOME/config/system.conf.

· Inside the MySQL property, uncomment the SSL property and the properties that comprise it. Make sure to uncomment the opening "{" and closing braces "}". For example:

,“ssl” :
{
# # The location of the SSL truststore.
# #
# # Relative pathing can be used, i.e. ‘.’ to mean current directory,
# # ‘../truststore’ or ‘../../truststore’ etc. If neither relative
# # nor absolute (using ‘/’) path is used then $MOOGSOFT_HOME is
# # prepended to it.
# # i.e. “config/truststore” becomes “$MOOGSOFT_HOME/config/truststore”
# #
# #
# # Specify the server certificate.
# #
“trustStorePath” : “etc/truststore”,

# “trustStoreEncryptedPassword” : “vQj7/yom7e5ensSEb10v2Rb/pgkaPK/4OcUlEjYNtQU=“,

“trustStorePassword” : “moogsoft”
}

· Provide the path to the truststore you created. For example:

"trustStorePath" : "etc/truststore",

· Edit the password for the truststore. For example:

"trustStorePassword" : "moogsoft"

See Moog Encryptor if you want to use an encrypted password. Uncomment trustStoreEncryptedPassword and provide the encrypted password for the value. For example:

“trustStoreEncryptedPassword” : “vQj7/yom7e5ensSEb10v2Rb/pgkaPK/4OcUlEjYNtQU=“

· Save your changes and restart the following components:

— Moogfarmd

— Apache Tomcat

— All LAMs

After you restart, all Cisco Crosswork Situation Manager components encrypt communications with the MySQL database.

Control Cisco Crosswork Situation Manager Processes

This topic describes the commands for starting, stopping or restarting individual Cisco Crosswork Situation Manager processes.

To configure process startup when Cisco Crosswork Situation Manager fails or restarts see Configure Services to Restart.

Note

When restarting all nodes in a distributed system, the node to start first is the last one that you stopped.

Dependencies

LAMs, Moogfarmd, and Apache Tomcat depend on the following system processes: MySQL RabbitMQ, Nginx, and Elasticsearch. When starting Cisco Crosswork Situation Manager processes, follow these steps:

1. Start or verify the following are started:

— MySQL

— RabbitMQ

— Nginx

— Elasticsearch

2. Start or restart LAMs, Moogfarmd, and/or Apache Tomcat.

Similarly, if you plan to stop any one of MySQL, RabbitMQ, Nginx, or Elasticsearch, stop LAMs, Moogfarmd, and Apache Tomcat first.

Init scripts for RPM installations

If you performed an RPM installation as root, use the service init script to start and stop Cisco Crosswork Situation Manager processes:

service <service-name> start|stop|restart

The service names are as follows:

· MySQL: mysql

· RabbitMQ: rabbitmq

· Nginx: nginx

· Elasticsearch: elasticsearch

· Tomcat: apache-tomcat

· Moogfarmd: moog_farmd

· For LAMs , refer to the individual LAM references for the service names.

For more information, see the documentation on managing system services for your operating system.

System Configuration

You can configure the various components of Cisco Crosswork Situation Manager using the system configuration file.

Configure your system

Edit the configuration file to control the behavior of the different components in your Cisco Crosswork Situation Manager system. You can find the file at $MOOGSOFT_HOME/config/system.conf.

See the System Configuration Reference for a full description of all properties. Some properties in the file are commented out by default. Uncomment properties to configure and enable them.

Message Bus

You can edit your Message Bus and RabbitMQ configuration in the mooms section of the file. It allows you to:

· Configure your Message Bus zones and brokers.

· Control and minimize message loss during a failure.

· Control how senders handle Message Bus failures.

· Control what happens during periods of extended Message Bus unavailability.

· Configure the SSL protocol you want to use.

· Specify the number of connections to use for each message sender pool.

For more information see the Message Bus documentation.Configure the Message Bus

Database

You can edit your database configuration in the mysql section of the file:

1. Configure your host name, database names and database credentials:

a. host: Name of your host.

b. moogdb_database_name: Name of the Moogdb database.

c. referencedb_database_name: Name of the Cisco Crosswork Situation Manager reference database.

d. intdb_database_name: Name of the Cisco Crosswork Situation Manager integrations database.

e. username:Username for the MySQL user that accesses the database.

f. encrypted_password: Encrypted password for the MySQL user.

g. password: Password for the MySQL user.

h. port: Default port that Cisco Crosswork Situation Manager uses to connect to MySQL.

2. Configure the port, deadlock retry attempts and multi-host connections:

a. maxRetries: Maximum number of retries in the event of a MySQL deadlock.

b. retryWait: Number of milliseconds to wait between each retry attempt.

c. failover_connections: Hosts and ports for the different servers that are connected to the main host.

3. Configure the SSL connections to the MySQL database:

a. trustStorePath: Path to location that stores the server certificate.

b. trustStoreEncryptedPassword: Path to location that stores your encrypted trustStore password.

c. trustStorePassword: Path to location that stores your trustStore password.

Elasticsearch

You can edit your search configuration in the search section of the file:

· Configure the Elasticsearch connection timeouts:

— connection_timeout: Length of time in milliseconds before the connection times out.

— request_timeout: Length of time in milliseconds before the request times out.

· Configure the Elasticsearch limit and nodes:

— refresh_interval: Defines how often an Elasticsearch index refreshes. A newly indexed document is not visible in search results until the next time the index refreshes. Default is 30 seconds.

— limit: Maximum number of search results that Elasticsearch returns from a search query.

— nodes: Hosts and ports for the Elasticsearch servers connected in a cluster.

Failover

You can edit failover configuration in the failover section of the file:

1. Configure persistence in the event of a failover:

— persist_state: Enable or disable the persistence of the state of all Moolets in the event of a failover.

2. Configure the Hazelcast cluster, this is Cisco Crosswork Situation Manager implementation of persistence:

— network_port: Port to connect to on each specified host.

— auto_increment: Enable for Hazelcast to attempt to the next incremental available port number if the configured port is unavailable.

— hosts: List of hosts that can participate in the cluster.

— man_center: Configures the cluster information that you can view in the Hazelcast Management Center UI.

— cluster_per_group: Enable the stateful information from each process group to persist in a dedicated Hazelcast cluster.

3. Configure failover options that apply to Moogfarmd and the LAMs:

— keepalive_interval: Time interval in seconds at which processes report their active/passive status and check statuses of other processes.

— margin: Amount of time in seconds after keepalive_intervalbefore Cisco Crosswork Situation Manager considers processes that do not report their status to be dead.

— failover_timeout: Number of seconds to wait for previously active process to become passive during a manual failover.

— automatic_failover: Allow a passive process to automatically become active if no other active processes are detected in the same process group.

— heartbeat_failover_after: Number of consecutive heartbeats that a process fails to send before Moogfarmd considers it inactive.

Process Monitor

You can edit the process monitor configuration in the process_monitor section of the file:

· Configure the heartbeat interval and delay:

— heartbeat: Interval in milliseconds between heartbeats sent by processes.

— max_heartbeat_delay: Number of milliseconds to wait before declaring heartbeat as missing.

· Configure the Moogfarmd and which processes you can control from the UI:

— group: Name of the group of processes and subcomponent processes that you want to control from the UI.

— instance: Name of the instance of Cisco Crosswork Situation Manager you want to configure.

— service_name: Name of the service you want to control.

— process_type: Type of process you want to control.

— reserved: Determines if Cisco Crosswork Situation Manager considers the process as critical in process monitoring.

Encryption

You can edit the encryption configuration in the encryption section of the file:

· encryption_key_file: Default location of the encryption key file.

High Availability

You can edit the high availability configuration in the ha section of the file.

· cluster: Default HA cluster name

Port ranges

You can edit the port range that Cisco Crosswork Situation Manager services use when they look for open ports.

port_range_min: Minimum port number in the range.

port_range_max: Maximum port number in the range.

Example

The following example shows system.conf with the default configuration and all available properties enabled:

      {
   "mooms": {
      "zone": "",
      "brokers": [{
         "host": "localhost",
          "port": 5672
      }],
      "username": "moogsoft",
      "password": "m00gs0ft",
      "encrypted_password": "e5uO0LY3HQJZCltG/caUnVbxVN4hImm4gIOpb4rwpF4=",
      "threads": 10,
      "message_persistence": false,
      "message_prefetch": 100,
      "max_retries": 100,
      "retry_interval": 200,
      "cache_on_failure": false,
      "cache_ttl": 900,
      "connections_per_producer_pool": 2,
      "confirmation_timeout": 2000,
      "ssl": {
         "ssl_protocol": "TLSv1.2",
         "server_cert_file": "server.pem",
         "client_cert_file": "client.pem",
         "client_key_file": "client.key"
      }
   },
   "mysql": {
      "host": "localhost",
      "moogdb_database_name": "moogdb",
      "referencedb_database_name": "moog_reference",
      "intdb_database_name": "moog_intdb",
      "username": "ermintrude",
      "encrypted_password": "vQj7/yom7e5ensSEb10v2Rb/pgkaPK/4OcUlEjYNtQU=",
      "password": "m00",
      "port": 3306,
      "maxRetries": 10,
      "retryWait": 50,
      "failover_connections": [
         {
            "host": "193.221.20.24",
            "port": 3306
         },
         {
             "host": "143.47.254.88",
             "port": 3306
         },
         {
             "host": "234.118.117.132",
             "port": 3306
         }
      ],
      "ssl": {
         "trustStorePath": "etc/truststore",
         "trustStoreEncryptedPassword": "vQj7/yom7e5ensSEb10v2Rb/pgkaPK/4OcUlEjYNtQU=",
         "trustStorePassword": "moogsoft"
      }
   },
   "search": {
      "connection_timeout": 1000,
      "request_timeout": 10000,
      "refresh_interval": 30,
      "limit": 1000,
      "nodes": [{
         "host": "localhost",
         "port": 9200
      }]
},
"failover": {
      "persist_state": false,
      "hazelcast": {
          "network_port": 5701,
          "auto_increment": true,
          "hosts": ["localhost"],
          "man_center":
          {
              "enabled": false,
              "host": "localhost",
              "port": 8091
          },
          "cluster_per_group": false
      },
      "keepalive_interval": 5,
      "margin": 10,
      "failover_timeout": 10,
      "automatic_failover": false,
      "heartbeat_failover_after": 2
},
"process_monitor": {
      "heartbeat": 10000,
      "max_heartbeat_delay": 1000,
      "processes": [{
          "group": "moog_farmd",
          "instance": "",
          "service_name": "moogfarmd",
          "process_type": "moog_farmd",
          "reserved": true,
          "subcomponents": [
              "AlertBuilder",
              "Default Cookbook",
              "TeamsMgr",
              "Housekeeper",
              "AlertRulesEngine",
              "SituationMgr",
              "Notifier"
              ]},
              {
                  "group": "servlets",
                  "instance": "",
                  "service_name": "apache-tomcat",
                  "process_type": "servlets",
                  "reserved": true,
                  "subcomponents": [
                      "moogsvr",
                      "moogpoller",
                      "toolrunner",
                      "situation_similarity"
                  ]},
              {
                  "group": "logfile_lam",
                  "instance": "",
                  "service_name": "logfilelamd",
                  "process_type": "LAM",
                  "reserved": false
              },
              {
                  "group": "rest_lam",
                  "instance": "",
                  "service_name": "restlamd",
                  "process_type": "LAM",
                  "reserved": false
              },
              {
                  "group": "socket_lam",
                  "instance": "",
                  "service_name": "socketlamd",
                  "process_type": "LAM",
                  "reserved": false
              },
              {
                  "group": "trapd_lam",
                  "instance": "",
                  "service_name": "trapdlamd",
                  "process_type": "LAM",
                  "reserved": false
              },
              {
                  "group": "rest_client_lam",
                  "instance": "",
                  "service_name": "restclientlamd",
                  "process_type": "LAM",
                  "reserved": false
              }
         ]
    },
    "encryption": {
       "encryption_key_file": "/location/of/.key"
    },
    "ha": {
       "cluster": "MOO"
    },
    "port_range_min": 50000,
    "port_range_max": 51000
}

Start and stop Moogfarmd

Restart the Moogfarmd service to activate any changes you make to the system configuration file.

The service name is moogfarmd.

See Control Moogsoft AIOps Processes for further details.

System Configuration Reference

This is a reference for the system configuration file located at $MOOGSOFT_HOME/config/system.conf. It contains the following sections and properties:

Message Bus (MooMs)

connections_per_producer_pool

The number of connections to use for each message sender pool. For example, if a message sender pool has 20 channels and this property is set to 2, the channels are split across both connections so that each has 10 channels. To configure this property, you must manually add it to the mooms section.

Type	Integer
Required	No
Default	2

zone

Name of the zone.

Type	String
Required	No
Default	N/A

brokers

Hostname and port number of the RabbitMQ broker.

Type	Array
Required	No
Default	"host" : "localhost", "port" : 5672

username

Username of the RabbitMQ user. This needs to match the RabbitMQ broker configuration. If commented out, it uses the default "guest" user.

Type	String
Required	No
Default	guest

password

Password for the RabbitMQ user. You can choose to either have a password or an encrypted password, you cannot use both.

Type	String
Required	Yes. If you are not using encrypted password.
Default	guest

encrypted_password

Encrypted password for the RabbitMQ user. You can choose to either have a password or an encrypted password, you cannot use both. See Moog Encryptor if you want to encrypt your password.

Type	String
Required	Yes. If you are not using password.
Default	N/A

threads

Number of threads a process can create in order to consume the messages from the Message Bus. If not specified, the thread limit = (Number of processors x 2) + 1. Altering this limit affects the performance of Cisco Crosswork Situation Manager processes such as Moogfarmd and Moogpoller.

If your logs indicate an issue in creating threads, Cisco advises that you increase the ulimit, the maximum number of file descriptors each process can use, for the Cisco Crosswork Situation Manager user. You can set this limit in /etc/security/limits.conf.

Type	Integer
Required	No
Default	10

message_persistance

Controls whether RabbitMQ persists important messages. Message queues are durable by default and data is replicated between nodes in High Availability mode. Setting this value to false means that replicated data is not stored to disk.

Type	Boolean
Required	No
Default	true

message_prefetch

Controls how many messages a process can take from the Message Bus and store in memory as a buffer for processing. This configuration allows processes to regulate message consumption which can ease backlog and memory consumption issues. The higher the number, the more messages held in the process's memory. Set to 0 for unlimited processing. To achieve high availability of messages and ensure messages are processed, the value of this should be higher than 0.

Type	Integer
Required	No
Default	0

max_retries

Maximum number of attempts to resend a message that failed to send. Cisco Crosswork Situation Manager only attempts a retry when there is a network outage or if cache_on_failure is enabled.

You can use this in conjunction with the retry_interval property. For example, a combination of 100 maximum retries and 200 milliseconds for retry interval leads to a total of 20 seconds. The combined default value for these properties was chosen to handle the typical time for a broker failover in a clustered environment.

Type	Integer
Required	No
Default	100

retry_interval

Maximum length of time to wait in milliseconds between each attempt to retry and send a message that failed to send.

You can use this in conjunction with the max_retries property. The combined value for these properties was chosen to handle the typical time for broker failover in a clustered environment.

Type	Integer
Required	No
Default	200

cache_on_failure

Controls whether Cisco Crosswork Situation Manager caches the message internally and resends it if there is an initial retry failure. The system attempts to resend any cached messages in the order they were cached until the time-to-live value, defined by the cache_ttl property, is reached.

Type	Boolean
Required	No
Default	false

cache_ttl

Length of time in seconds that Cisco Crosswork Situation Manager keeps cached messages in the cache list before discarding them. If a message is not successfully resent within this timeframe it is still discarded.

This defaults to 900 seconds (15 minutes). Increasing this value has a direct impact on sender process memory.

Type	Integer
Required	No
Default	900

confirmation_timeout

Length of time in milliseconds to wait for the Message Bus to confirm that a broker has received a message. Cisco does not advise changing this value.

Type	Integer
Required	No
Default	2000

Message Bus SSL

ssl_protocol

SSL protocol you want to use. JRE 8 supports "TLSv1.2", "TLSv1.1", "TLSv1" or "SSLv3".

Type	String
Required	No
Default	TLSv1.2

server_cert_file

Path to the directory that contains the SSL certificates. You can use a relative path based upon the $MOOGSOFT_HOME directory. For example, config indicates $MOOGSOFT_HOME/config.

Type	String
Required	No
Default	server.pem

client_cert_file

Enables client authentication if you provide a client certificate and key file.

Type	String
Required	No
Default	client.pem

client_key_file

Enables client authentication if you provide a client key file. The file must be in PKCS#8 format.

Type	String
Required	No
Default	client.key

MySQL

host

Host name or server name of the server that is running MySQL.

Type	String
Required	No
Default	localhost

moogdb_database_name

Name of the primary Cisco Crosswork Situation Manager database.

Type	String
Required	No
Default	moogdb

referencedb_database_name

Name of the Cisco Crosswork Situation Manager reference database.

Type	String
Required	No
Default	moog_reference

intdb_database_name

Name of the integrations database.

Type	String
Required	No
Default	moog_intdb

username

Username of the MySQL user.

Type	String
Required	No
Default	ermintrude

password

Password for the MySQL user. You can choose to either have a password or an encrypted password, you cannot use both.

Type	String
Required	Yes, if you are not using encrypted password.
Default	m00

encrypted_password

Encrypted password for the MySQL user. You can choose to either have a password or an encrypted password, you cannot use both. See Moog Encryptor if you want to encrypt your password.

Type	String
Required	Yes, if you are not using password.
Default	N/A

port

Port that MySQL uses.

Type	Integer
Required	No
Default	3306

maxRetries

Maximum number of MySQL query retries to attempt in the event of a deadlock.

Type	Integer
Required	No
Default	10

retryWait

Length of time in milliseconds to wait between retry attempts.

Type	Integer
Required	No
Default	50

failover_connections

Hosts and ports for the different servers that are connected to the main host. For example, master-master, master-slave. In the event of connection failover, the connection cannot be read-only (slave).

Type	List
Required	No
Default	N/A

MySQL SSL

trustStorePath

Path to tNohe directory that contains the trustStore you want to use for SSL connections to your MySQL database. You can use a relative path based upon the $MOOGSOFT_HOME directory. For example, config indicates $MOOGSOFT_HOME/config/truststore.

Type	String
Required	No
Default	etc/truststore

trustStoreEncryptedPassword

Your encrypted trustStore password. You can choose to either have a password or an encrypted password, you cannot use both. See Moog Encryptor if you want to encrypt your password.

Type	String
Required	Yes, if you are not using trustStorePassword.
Default	N/A

trustStorePassword

Your trustStore password. You can choose to either have a password or an encrypted password, you cannot use both.

Type	String
Required	No, if you are not using trustStoreEncryptedPassword.
Default	moogsoft

connection_timeout

Length of time in milliseconds before the connection to the Elasticsearch server times out.

Type	Integer
Required	No
Default	1000

nodes

Hosts and ports for the different Elasticsearch servers connected in a cluster.

Type

Array

Required

Default

"host" : "localhost",

"port" : 9200

Failover

persist_state

Enable or disable the persistence of the state of all Moolets in the event of a failover.

Type	Boolean
Required	No
Default	false

network_port

Port to connect to on each specified host in your Hazelcast cluster.

Type	Integer
Required	No
Default	5701

auto_increment

Enable for Hazelcast to attempt to connect to the next incremental available port number if the configured port is unavailable.

Type	Boolean
Required	No
Default	true

hosts

List of hosts that can participate in the cluster.

Type	Array
Required	No
Default	localhost

man_center

Specifies the cluster information that you can view in the Hazelcast Management Center UI.

Type

List

Required

Default

"enabled" : false,

"host" : "localhost",

"port" : 8091

cluster_per_group

Enable the stateful information from each process group to persist in a dedicated Hazelcast cluster.

Type	Boolean
Required	No
Default	false

Moogfarmd Failover

keepalive_internal

Time interval in seconds at which processes report their active or passive status and check statuses of other processes.

Type	Integer
Required	No
Default	5

margin

Amount of time in seconds after keepalive_interval before Cisco Crosswork Situation Manager considers processes that do not re_port their status to be dead.

Type	Integer
Required	No
Default	10

failover_timeout

Amount of time in seconds to wait for previously active process to become passive during manual failover.

Type	Integer
Required	No
Default	10

automatic_failover

Allow a passive process to automatically become active if no other active processes are detected in the same process group.

Type	Boolean
Required	No
Default	false

Process Monitor

heartbeat

Interval in milliseconds between heartbeats sent by processes.

Type	Integer
Required	Yes
Default	10000

max_heartbeat_delay

Number of milliseconds to wait before declaring heartbeat as missing. Defaults to 10% of the heartbeat.

Type	Integer
Required	No
Default	1000

Processes

Groups of processes that you want to be able to stop, start and restart from Self Monitoring in the Cisco Crosswork Situation Manager UI. For each group you can configure the following options:

group

Name of the process group that Cisco Crosswork Situation Manager uses when it starts and stops the service.

Type	String
Required	Yes
Default	N/A

instance

Name of the instance for the process.

Type	String
Required	Yes
Default	N/A

display_name

Additional identification label that appears in the UI.

Type	String
Required	No
Default	N/A

cluster

Name of the process's cluster. This overrides the default cluster for a process. If left empty, the Cisco Crosswork Situation Manager uses the process's default cluster.

Type	String
Required	No
Default	N/A

service_name

Name of the service script that Cisco Crosswork Situation Manager uses to control the process. If you do not configure a service name, Cisco Crosswork Situation Manager uses the group name, removing underscores and appending a 'd'. For example, "traplam" becomes "traplamd".

Type	String
Required	No
Default	N/A

process_type

Type of process. If left empty, Cisco Crosswork Situation Manager calculates the type based on the group name.

Type	String
Required	No
Default	N/A
Valid Values	moog_farmd, servlet, LAM

reserved

Determines if the process produces a warning in the UI when it is running. Processes that are unreserved do not produce a warning.

Type	Boolean
Required	No
Default	true

subcomponents

Specifies which Moolets are reserved for the Moogfarmd process. If left empty, no Moolets are reserved for the Moogfarmd process.

Type	Array
Required	No
Default	N/A

Encryption

encryption_key_file

Default location of the encryption key file.

Type	String
Required	No
Default	/location/of/.key

High Availability (HA)

cluster

Default HA cluster name.

Type	String
Required	No
Default	MOO

Port Range

port_range_min

Minimum port number in the range that the Cisco Crosswork Situation Manager services use when they look for open ports.

Type	String
Required	No
Default	50000

port_range_max

Maximum port number in the range that the Cisco Crosswork Situation Manager services use when they look for open ports.

Type	String
Required	No
Default	51000

Secure Your Installation

Apply Valid SSL Certificates

Cisco Crosswork Situation Manager includes a self-signed certificate by default. If you want to add your own certificates to Nginx, follow the instructions below.

For more information, see the Nginx documentation.

Add a Valid Certificate

To apply a valid certificate to Nginx, edit $MOOGSOFT_HOME/etc/cots/nginx/conf.d/moog-ssl.conf :

vi $MOOGSOFT_HOME/etc/cots/nginx/conf.d/moog-ssl.conf

Change the default self-signed certificate and key locations to point to the valid root certificate and key:

Reload Nginx with the command:

systemctl reload nginx

Encrypt Database Communications

You can enable SSL to encrypt communications between all Cisco Crosswork Situation Manager components and the MySQL database.

For information on creating SSL keys and certificates for MySQL, see Creating SSL and RSA Certificates and Keys using MySQL.

Establish Trust for the MySQL Certificate

To establish trust for the MySQL database certificate, create a truststore to house the root certificate for the Certificate Authority that signed the MySQL Server certificate.

1. If you upgraded from a previous version of Cisco Crosswork Situation Manager, run the following command to extract the certificate for the root CA for MySQL:

mysql_ssl_rsa_setup

The command generates new keys and writes them to the /var/lib/mysql directory.

2. Run the java keytool command to create a trust store containing the certificate for the root CA for MySQL.

keytool -import -alias mysqlServerCACert -file /var/lib/mysql/ca.pem -keystore $MOOGSOFT_HOME/etc/truststore

— When keytool prompts you, enter a password for the keystore. You will need this password to configure Cisco Crosswork Situation Manager.

— Answer 'yes' to "Trust this certificate."

Keytool creates a truststore at the path $MOOGSOFT_HOME/etc/truststore.

Configure Cisco Crosswork Situation Manager to use SSL for Database Communications

After you have created the truststore, edit the Cisco Crosswork Situation Manager configuration to enable SSL.

· Edit $MOOGSOFT_HOME/config/system.conf.

· Inside the MySQL property, uncomment the SSL property and the properties that comprise it. Make sure to uncomment the opening "{" and closing braces "}". For example:

· Provide the path to the truststore you created. For example:

"trustStorePath" : "etc/truststore",

· Edit the password for the truststore. For example:

"trustStorePassword" : "moogsoft"

See Moog Encryptor if you want to use an encrypted password. Uncomment trustStoreEncryptedPassword and provide the encrypted password for the value. For example:

“trustStoreEncryptedPassword” : “vQj7/yom7e5ensSEb10v2Rb/pgkaPK/4OcUlEjYNtQU=“

· Save your changes and restart the following components:

— Moogfarmd

— Apache Tomcat

— All LAMs

After you restart, all Cisco Crosswork Situation Manager components encrypt communications with the MySQL database.

Moog Encryptor

Cisco Crosswork Situation Manager includes an encryptor utility so you can encrypt passwords stored in the system.conf configuration file. Encrypted passwords in configuration files are more secure because someone with access to the configuration cannot necessarily gain access to integrated systems.

If you run in a distributed environment, run the encryptor utility on one host to create an encryption key (.key). Then copy the key to the $MOOGSOFT_HOME/etc/ directory on the remaining hosts.

Encrypt a password

To encrypt a password, execute the moog_encryptor command as follows:

$MOOGSOFT_HOME/bin/moog_encryptor -p <password>

For example, to encrypt the password "Abacus":

/usr/share/moogsoft/bin/moog_encryptor -p Abacus

The moog_encryptor displays the encrypted password:

      The encrypted password is:

        KfFJGilmGGJP/qTrJV6SBs0HTTy3NpCqvGaYKviDbLQ=

When using within Javascript code or JSON file, use:

         {"encrypted_password":"KfFJGilmGGJP/qTrJV6SBs0HTTy3NpCqvGaYKviDbLQ="}

Note

Each time you run moog_encryptor, it generates a different encrypted password.

Configure Cisco Crosswork Situation Manager to use encrypted passwords

You can use passwords encrypted with moog_encryptor in the system.conf file as follows:

1. Edit $MOOGSOFT_HOME/config/system.conf.

2. Identify the password you want to replace and uncomment the encrypted_password property. Comment out the password property. For example:

"username" : "moogsoft",
#"password" : "Abacus",
"encrypted_password" : "e5uO0LY3HQJZCltG/caUnVbxVN4hImm4gIOpb4rwpF4=",

3. Set the value of the encrypted_password property to the value returned from the moog_encryptor. For example:

"encrypted_password":"KfFJGilmGGJP/qTrJV6SBs0HTTy3NpCqvGaYKviDbLQ=",

4. Change the value of the password property so that it does not match the unencrypted value of the password.

Change the location of the encryption key

By default, the encryptor utility uses a key at the following location:

$MOOGSOFT_HOME/etc/.key

The encryptor utility creates a new key if one does not already exist.

If you want to use a different location for the key, uncomment the encryption section in system.conf. Set the value of the encryption_key_file property to a new path for the key. For example:

      # Uncomment the encryption section if you want to specify the location
# for the encryption key file.
,
"encryption" :
{
    # Use this to change the default location of the encryption key file
    "encryption_key_file" : "/usr/share/example/.key"
}

Note

You must configure Cisco Crosswork Situation Manager to use the same .key file you used to encrypt passwords. If you encrypt a password using one key and then change the configuration to use another key, decryption fails.

Configure External Authentication

You can configure different login authentication and security methods with Cisco Crosswork Situation Manager. For more information see:

1. Configure Single Sign-On with LDAP

2. Configure Single Sign-On with SAML

3. Security Configuration ReferenceSecurity Configuration Reference

Configure Single Sign-On with SAML

You can configure Cisco Crosswork Situation Manager so users from an external directory can log in by Single Sign-On (SSO) using Security Assertion Markup Language (SAML).

When you enable the SAML integration, your SAML identity provider (IdP) can exchange authorization and authentication data securely with your service provider (SP), Cisco Crosswork Situation Manager. The integration redirects you from the standard Cisco Crosswork Situation Manager login page to the IdP's login page. You can log in to Cisco Crosswork Situation Manager if you provide the IdP with valid authentication details.

See SAML Strategies and Tips for strategies to help you decide how to configure the SAML integration.

Cisco Crosswork Situation Manager implements SAML 2.0 using the SAML v3 Open Library. See SAML 2.0 for information on supported bindings and Open SAML v3 for information on the library.

Before you begin

Before you start to set up SAML, ensure you have met the following requirements:

· You have an active SAML IdP account with administrator privileges.

· The webhost URL is the same as your Cisco Crosswork Situation Manager instance URL. For example:

webhost: "https://example.com"

Configure your SAML IdP

Configure your IdP to integrate with Cisco Crosswork Situation Manager and enable SSO. Refer to your IdP's documentation for instructions.

Configuration differs for each IdP but common settings include:

· SSO URL: The URL that sends a SAML login request to the IdP. For example:

https://example.com/moogsvr/mooms?request=samlRequest

· Assertion Consumer Service URL: The Cisco Crosswork Situation Manager URL that receives the IdP response to each SAML assertion:

https://example.com/moogsvr/mooms?request=samlResponse

· Entity ID: A unique identifier for the SP SAML entity. For example:

https://example.com/moogsvr/mooms

Generate the IdP metadata

After you configure your SAML IdP configuration, it generates an IdP metadata file in .xml format. Some IdPs also allow you to generate an X509 self-signed certificate.

Save the certificate and add it to your SP metadata file if you want your IdP to encrypt SAML assertions.

Copy the IdP metadata file

The .xml metadata file generated by the IdP provides Cisco Crosswork Situation Manager with a security certificate, endpoints and other processing requirements.

To add this file to your SAML configuration:

· Save the IdP metadata file to your local machine.

· Copy the file to the location $MOOGSOFT_HOME/etc/saml.

· Grant the Apache Tomcat user read permissions to the file. For example:

chmod 644 my_idp_metadata.xml

Configure the SAML realm

You enable SAML authentication in Cisco Crosswork Situation Manager by creating and configuring a SAML realm. You can only configure and use one SAML realm at a time. See Security Configuration Reference for a full description of the available properties.Security Configuration Reference

To configure your SAML realm:

· Edit the file $MOOGSOFT_HOME/config/security.conf and uncomment the "my_saml_realm" section. Rename the realm to meet your requirements.

· Configure the locations of your metadata files:

— idpMetadataFile: Location of the IdP's metadata file.

— spMetadataFile: Location of the service provider's metadata file. When the metadata file is generated in step 10, it is saved in this location.

· Configure the roles, teams and primary group mappings for new users that log in to Cisco Crosswork Situation Manager using SAML. These are all required:

— defaultRoles: Default roles that Cisco Crosswork Situation Manager assigns to new users at first login.

— defaultTeams: Default teams that Cisco Crosswork Situation Manager assigns to new users at first login.

— defaultGroup: Default primary group that Cisco Crosswork Situation Manager assigns to new users at first login.

· Configure the mappings for existing users that log in to Cisco Crosswork Situation Manager using SAML. You can choose either username or email:

— existingUserMappingField: Defines the field that Cisco Crosswork Situation Manager uses to map existing users to your IdP users.

· Configure the mapping of the IdP's provided attributes. These are all required:

— username: Defines the IdP user attribute that maps to username in Cisco Crosswork Situation Manager.

— email: Defines the IdP user attribute that maps to email in Cisco Crosswork Situation Manager.

— fullname: Defines the IdP user attribute that maps to full name in Cisco Crosswork Situation Manager.

· Optionally configure additional IdP attribute mappings:

— contactNumber: Defines the IdP attribute that maps to contact number in Cisco Crosswork Situation Manager.

— department: Defines the IdP attribute that maps to department in Cisco Crosswork Situation Manager.

— primaryGroup: Defines the IdP attribute that maps to primary group in Cisco Crosswork Situation Manager.

— timezone: Defines the IdP attribute that maps to timezone in Cisco Crosswork Situation Manager.

— teamAttribute: Defines the IdP attribute that maps to teams in Cisco Crosswork Situation Manager.

— teamMap: Defines the IdP attribute or custom attribute that maps to team names in Cisco Crosswork Situation Manager.

— createNewTeams: Creates a team or teams if they did not exist in Cisco Crosswork Situation Manager.

— roleAttribute: Defines the IdP attribute containing role information.

— roleMap: Defines the IdP attribute that maps to Cisco Crosswork Situation Manager roles.

· Optionally configure your keystore and private key passwords if you want to use encryption with SAML. You can have either an unencrypted keystore password or an encrypted keystore password, but you cannot use both.

a. keystorePassword: Your unencrypted keystore password.

b. encryptedKeystorePassword: Your encrypted keystore password.

c. privateKeyPassword: Your private key password.

See Moog Encryptor for more information on encrypting passwords.

· Optionally configure the lifetime of each SAML assertion:

— maximumAuthenticationLifeTime: Maximum time in seconds for Cisco Crosswork Situation Manager to receive an IdP's SAML assertion before it becomes invalid.

· Optionally configure the Service Provider Entity ID:

— serviceProviderEntityId: Service Provider Entity ID assertion number.

· Restart the Apache Tomcat service:

service apache-tomcat restart

When Apache Tomcat restarts it generates the Service Provider metadata file. The file is saved to the location specified in the spMetadataFile property.

Additional SAML configuration

You can configure the following additional properties when setting up SAML for Cisco Crosswork Situation Manager. Restart Apache Tomcat after you make any of these changes.

Enable encrypted assertion

To enable encrypted assertion for SAML with Cisco Crosswork Situation Manager, log in to your SAML IdP and enable encrypted assertions. Refer to your IdP's documentation for information.

Once enabled, the IdP encrypts all SAML assertions made with Cisco Crosswork Situation Manager.

Set an assertion time limit

The assertion time limit is the period of time between the IdP providing the SAML assertion and Cisco Crosswork Situation Manager accepting it.

Cisco Crosswork Situation Manager accepts a delay of up to one hour by default. You can specify a different period of time in minutes using the maximumAuthenticationLifetime property in the security configuration file for your SAML realm. For example:

"maximumAuthenticationLifetime": 3600

Enable entity ID assertion

You can enable entity ID assertion, also known as audience restriction, to restrict SAML assertions to Cisco Crosswork Situation Manager.

To do this, specify the serviceProviderEntityID property in $MOOGSOFT_HOME/config/security.conf. You must also configure this in your IdP. The values must match for successful SAML authorization. For example:

"serviceProviderEntityId": "MySystemName"

Map user attributes

When you create your SAML realm, you can configure the attributes your IdP passes to Cisco Crosswork Situation Manager at SAML authentication.

By default, the IdP email attribute maps to both the Cisco Crosswork Situation Manager username and email. The Cisco Crosswork Situation Manager full name maps to First Name and Last Name from the IdP. For example:

"username": "$Email",
"email": "$Email",
"fullname": "$FirstName.$LastName",

You may see errors indicating failure to configure an attribute mapping or the IdP's failure to provide a configured attribute if something goes wrong at login.

You can map other IdP user attributes such contact number, department, primary group and time zone. For example:

"contactNumber": "phone",
"department": "department",
"primaryGroup": "primaryGroup",
"timezone": "timezone",

If you already have users in Cisco Crosswork Situation Manager, you can map the user attributes to the IdP using the existingUserMappingField property. For example:

"existingUserMappingField": "username",

When a user logs in via the IdP for the first time but does not map to an existing user entry, Cisco Crosswork Situation Manager creates a new user.

You can define which primary group, roles and teams to assign to users using the following properties in the SAML realm configuration:

· defaultRoles: Default roles to assign to users.

· defaultTeams: Default teams to assign to users.

· defaultGroup: Default group to assign to users.

· teamAttribute: The IdP's attribute for team names.

· teamMap: Map IdP team names to Cisco Crosswork Situation Manager teams.

· roleAttribute: The IdP's attribute for roles.

· roleMap: Map IdP role names to Cisco Crosswork Situation Manager roles.

For example:

      "assignTeams":
{
    "teamAttribute": "groups",
    "teamMap":
    {
        "IdP Team": "Networks",
        "Another IdP Team": "Application Support"
    }
}
"assignRoles":
{
    "roleAttribute": "groups",
    "roleMap":
    {
        "IdP Standard User": "Operator",
        "IdP Manager User": "Manager"
    }
}

Note

You must map both roles and teams through IdP to prevent users being assigned to the default role and team.

Create new teams

Enable the createNewTeams property to create new teams and assign newly created users to these teams as part of the SAML login process, instead of assigning new users to the default teams.

"createNewTeams": true

Note

Enable this property with caution. If a user logs in to Cisco Crosswork Situation Manager and createNewTeams is set to true, a new team is defined in Cisco Crosswork Situation Manager for every value found in the teamAttribute property in the user's profile. If you are using the "groups" attribute to determine team membership, this could result in the creation of hundreds of teams that are not referenced by Cisco Crosswork Situation Manager.

Cisco recommends that you enable createNewTeams with a custom profile attribute that you specifically use to determine Cisco Crosswork Situation Manager team membership and contains a very limited set of values.

Configure the SAML logout URL

After you enable SAML, you can configure a different logout page to display when a Cisco Crosswork Situation Manager user ends their session.

To configure the logout URL:

· Edit the configuration file: $MOOGSOFT_HOME/ui/web.conf.

· Configure the logout property to meet your requirements and save the changes.

An example web configuration file is as follows:

      "authentication":
{
    "pages":
    {
        "login": "/login/",
        "logout": "/logout/",
        "failedLogin": "/login/?error=true",
        "sessionTimeout": "/logout/?error=session",
        "dbFailure": "/login/?error=dbfailure"
    },
    "paramNames":
    {
        "userId": "userid",
        "password": "password"
    }
}

Example SAML realm

An example SAML realm in $MOOGSOFT_HOME/config/security.conf is as follows:

      "my_saml_realm":
{
    "realmType": "SAML2",
    "idpMetadataFile": "/usr/share/moogsoft/etc/saml/my_idp_metadata.xml",
    "spMetadataFile": "/usr/share/moogsoft/etc/saml/my_sp_metadata.xml",
   "defaultRoles": [ "Operator" ],
    "defaultTeams": [ "Cloud DevOps" ],
    "defaultGroup": "End-User",
    "existingUserMappingField": "username",
    "username": "$Email",
    "email": "$Email",
    "fullname": "$FirstName $LastName",
    "contactNumber": "phoneNumber",
    "department": "dept",
    "primaryGroup": "group",
    "timezone": "timezone",
    "assignTeams":
    {
        "teamAttribute": "groups",
        "createNewTeams": true,
        "teamMap":
        {
            "Cloud Team": "Cloud DevOps",
            "Database Team": "Database DevOps"
        }
    },
    "assignRoles" :
    {
        "roleAttribute": "groups",
        "roleMap":
        {
            "Standard User": "Operator",
            "Manager User": "Manager"
        }
    },
    "keystorePassword": "my_realm_secret",
    "privateKeyPassword": "my_realm_secret",
    "maximumAuthenticationLifetime": 60,
    "serviceProviderEntityId": "MySystemName"
}

Send the SP metadata file

When you have configured the SAML realm, copy your SP metadata file and send it to the administrator of your IdP. For example:

$MOOGSOFT_HOME/etc/saml/my_idp_metadata.xml

Your IdP must import the metadata file. Note that all certificates are self-signed.

See Troubleshoot SAML for ideas to help you debug SAML connection and configuration problems.

Build a Service Provider Metadata File

You must build a Service Provider (SP) metadata file in order to configure SAML-based Single Sign-On with Cisco Crosswork Situation Manager.

The SP metadata .xml file contains all of the keys, services and URLs defining the SAML endpoints. You can use your IdP's SP metadata file generator if it has one. If not you can create the file manually.

To manually create your SP metadata file:

1. Copy the .xml template from the code block:

      <md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata"
entityID="https://localhost/moogsvr/mooms">
        <md:SPSSODescriptor AuthnRequestsSigned="true" WantAssertionsSigned="true"
protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
                <md:KeyDescriptor>
                        <ds:KeyInfo xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
                                <ds:X509Data>
                                        <ds:X509Certificate> MIIC/jCCAeagAwIBAgIQCGehfcnv6r5My/fnrbfDejANBgkqhkiG9w0BAQsFADAVMRMwEQYDVQQDEwp3d3cuc3AuY29tMB4XDTEzMTEyMjA4MjMyMVoXDTQ5MTIzMTE0MDAwMFowFTETMBEGA1UEAxMKd3d3LnNwLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAMPm/ew9jaGWpQS1C7KtpvgzV4nSOIFPgRt/nlRYR+pUWdDEfSKmyjK28nkQ1KKujRJTnvnmZydmUrmEFpVv+giBiUkvCJY3PxZ/EDSsF3R/OzWhkUv5nfAXPnqkX9x22b6+vUof6WiLGyAW6lOYMCVADjTSl9pSaUtIaANdx9maERcT9eQbGSnjim0WurFRYs9ZE8ttErrMH9+Su4246YDqOPAkz6La4cHHMPQdcFQT5p/cuXBfU1vl1tWdBEgAY3xHYZE8u5TTJ/vp9UxyU1MwfeO2g9VDRcokLQHrj6wFxtvufA+WtUKYJGUu2p/qSuaw7eS6UFjUn49aVqg9OacCAwEAAaNKMEgwRgYDVR0BBD8wPYAQ1/S0ibdvfdFkJ9T9oIPluKEXMBUxEzARBgNVBAMTCnd3dy5zcC5jb22CEAhnoX3J7+q+TMv35623w3owDQYJKoZIhvcNAQELBQADggEBAAHlmVoAZUt6paeFvtQbc/iaJe/Fhd+JG1U0jyjlFDcCn8erLihEbhb3mFBBMF25oO67gfA1JJXZrmHry3NlOZuovqRqm8v7wg8n0nQa1HUWkUC2TBgfg1HE8/2rmSF2PngiEi18VOxRDxx0WXMNZX6JebJ1kCOCpT/x7aupS7T1GrIPmDLxjnC9Bet7pRynfomjP/6iU21/xOIF6xB9Yf1a/kQbYdAVt2haYKIfvaF3xsq1X5tCXc9ijhBMgyaoqA+bQJD/l3S8+yCmMxEYZjAVLEkyGlU4Uwo01cKEYbXIG/YVq+4CaIRxIfMvV+j8gzTLHTXI+pHEMfMhyYa0pzM=
                                        </ds:X509Certificate>
                                </ds:X509Data>
                        </ds:KeyInfo>
                </md:KeyDescriptor>
        <md:SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect"
Location="https://localhost:44360/SAML/SingleLogoutService"/>
        <md:AssertionConsumerService index="0" isDefault="true"
Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST"
Location="https://localhost/moogsvr/mooms?request=samlResponse"/>
        </md:SPSSODescriptor>
</md:EntityDescriptor>

2. Configure the mandatory elements in the metadata file:

a. entityID: Unique identifier or name for the SP. This should be a URL or a URN.

b. AssertionConsumerService: URL or endpoint that receives SAML responses from the IdP.

3. Add the X509 self-signed certificate you create when you configure your IdP.

4. Configure the other elements to meet your requirements. See Service Provider Metadata Reference for full descriptions of the available elements.Service Provider Metadata Reference

5. Save the SP metadata file to a path on your local machine.

After you have created the metadata file, you must copy it to your Cisco Crosswork Situation Manager machine to continue with the SAML configuration. See Configure Single Sign-On with SAML.

SAML Strategies and Tips

You can configure Cisco Crosswork Situation Manager so that users from an external directory can log in by Single Sign-On (SSO) using Security Assertion Markup Language (SAML). This topic covers some strategies to help you decide how to configure the SAML integration.

See Configure Single Sign-On with SAML for instructions on configuring the SAML integration and Troubleshoot SAML for information on how to address configuration and connection problems.

Map user profile attributes to Cisco Crosswork Situation Manager

You can employ a number of strategies to map your SAML identity provider (IdP) attribute values to Cisco Crosswork Situation Manager teams and roles. The strategy you choose depends on a number of factors, including:

· Whether your IdP contains identifiers that can equate to roles and teams in Cisco Crosswork Situation Manager.

For example, you have an IdP group attribute value that identifies the "Automation" team for which there is a corresponding team in Cisco Crosswork Situation Manager. This team has administrative authority over Cisco Crosswork Situation Manager. In this situation you could use the "groups" attribute, map the Automation team to a Cisco Crosswork Situation Manager team and assign the Administrator role to members of that team in the roles mapping.

· Whether you would prefer to create new identifiers for the Cisco Crosswork Situation Manager teams and roles.

· Whether you want to use an existing attribute name, for example "groups", in your user profiles, or you would prefer to create new attributes for Cisco Crosswork Situation Manager.

The following use cases show examples of these scenarios.

Use existing "groups" attribute values

You already have a granular set of IdP "groups" that you use to assign permissions to your users. The values in each user's "groups" attribute identify the teams the user is associated with, and the role they play in each team.

One group, "Monitoring Tools", has complete administrative authority over the Cisco Crosswork Situation Manager platform.

In this case, you could use the pre-existing "groups" attribute as the source for both the teams mapping and the roles mapping within Cisco Crosswork Situation Manager.

An example configuration file is as follows:

      "assignTeams":
{
    "teamAttribute": "groups",
    "teamMap":
    {
        "Monitoring_Tools": "Monitoring Tools",
    "Application_A_Support": "Application A",
    "Application_B_Support": "Application B",
    "Network_Support": "Network"
    },
    "createNewTeams": false
},
"assignRoles":
{
    "roleAttribute": "groups",
    "roleMap":
    {
        "Monitoring": "Super User"
    }
}

Create new "groups" attribute values

You have reviewed the "groups" assigned to your IdP user profiles, and are unable to identify values that you could use to assign team and role membership to users in Cisco Crosswork Situation Manager.

You want to continue to use the "groups" attribute as a single location to hold permissions information for your users, and therefore you do not want to create another attribute within your user profiles.

In this case, you could add values to the "groups" attribute to identify the team and role to assign to the user in Cisco Crosswork Situation Manager.

In the configuration file for this example shown below, the "EnterpriseSuperUser", "EnterpriseTestUser", and "EnterpriseAdmin" IdP roles in the "groups" attribute map to the "Super User", "Test" and "Administrator" roles in Cisco Crosswork Situation Manager.

      "assignTeams":
{
    "teamAttribute": "groups",
    "teamMap":
    {
        "Monitoring_Tools": "Monitoring Tools",
        "Application_A_Support": "Application A",
        "Application_B_Support": "Application B",
        "Network_Support": "Network"
    },
    "createNewTeams": false
},
"assignRoles":
{
    "roleAttribute": "groups",
    "roleMap":
    {
        "EnterpriseSuperUser": "Super User",
        "EnterpriseTestUser": "Test",
        "EnterpriseAdmin": "Administrator"
    }
}

Create new attributes

You do not have appropriate teams and roles defined within your IdP user profiles, and would like to hold this information for Cisco Crosswork Situation Manager in a unique user profile attribute.

In this case, you could define the attributes in the user profile structure and use the values from these attributes as the source for team and role mappings.

In the configuration file for this example shown below, the new attribute "EnterpriseTeam" contains the IdP teams to map to Cisco Crosswork Situation Manager teams. The new attribute "EnterpriseRole" contains the IdP roles to map to Cisco Crosswork Situation Manager roles.

      "assignTeams":
{
    "teamAttribute": "EnterpriseTeam",
    "teamMap":
    {
        "Monitoring_Tools": "Monitoring Tools",
        "Application_A_Support": "Application A",
        "Application_B_Support": "Application B",
        "Network_Support": "Network"
    },
    "createNewTeams": false
},
"assignRoles":
{
    "roleAttribute": "EnterpriseRole",
    "roleMap":
    {
        "EnterpriseSuperUser": "Super User",
        "EnterpriseTestUser": "Test",
        "EnterpriseAdmin": "Administrator"
    }
}

Map a single value to many teams or roles

You would like to use a single value in the "groups" attribute of your IdP user profiles to add the user to multiple Cisco Crosswork Situation Manager teams or roles.

All mappings are one to one, so to achieve this you must re-map the value from the user profile's "groups" membership multiple times. Each instance maps to an individual Cisco Crosswork Situation Manager team or role.

Troubleshoot SAML

You can configure Cisco Crosswork Situation Manager so that so users from an external directory can log in by Single Sign-On (SSO) using Security Assertion Markup Language (SAML). This topic contains ideas to help you debug SAML connection and configuration problems. See SAML Strategies and Tips for strategies to help you decide how to configure the SAML integration.

Most SAML integration issues occur as a result of misconfiguration. If checking your configuration using the instructions in Configure Single Sign-On with SAML does not solve the problem, there are two methods you can use to obtain the diagnostic data you require to debug SAML issues.

SAML debugging tool

View the available add-ons for your browser to choose and install a SAML debugging tool. These tools typically show the outgoing request and the response received by Cisco Crosswork Situation Manager. If the payloads are not encrypted, you will be able to see the claims returned in the response from the SAML identity provider (IdP).

Trace logging

Enable "trace" logging for the moogsvr UI component. Once enabled, the $APPSERVER_HOME/logs/catalina.out log file shows the returned claim data as it is processed. Your system administrator can use this data to validate the claim data being returned by the IdP and ensure it is mapped correctly in $MOOGSOFT_HOME/config/security.conf.

See Configure Logging for information on log levels.

Configure Single Sign-On with LDAP

You can configure Cisco Crosswork Situation Manager so users from an external directory can log in by Single Sign-On (SSO) using Lightweight Directory Access Protocol (LDAP).

See LDAP version 3 for more information.

Before you begin

Before you start to set up LDAP, ensure you have met the following requirements:

· You have the URL for your LDAP server.

· If you want to use a "lookup" DN (Distinguished Name) resolution method, you have the credentials for the LDAP user who has rights to look up other users and determine their roles.

· If you want to use SSL encryption, you have a valid SSL certificate.

Configure LDAP for Cisco Crosswork Situation Manager

Edit the configuration file to configure and enable LDAP for Cisco Crosswork Situation Manager. You can find the file at $MOOGSOFT_HOME/config/security.conf.

See the Security Configuration Reference for a full description of all properties. Some properties in the file are commented out by default. Uncomment properties to enable them.Security Configuration Reference

· Configure the properties for the LDAP connection:

— url: URL of your LDAP server. This is required.

— connectionTimeout: Connection timeout in milliseconds.

— readTimeout: Read timeout in milliseconds.

— predefinedUser: Determines if user must exist in the local database or not.

· Configure the user resolution and attribute search section:

— resolutionType: Type of DN resolution method. Valid options are "direct" and "lookup".

— attributeSearchFilter: Defines an optional attribute filter to retrieve all user attributes.

— attributeMap: Defines an attribute map between the LDAP user attributes and the user attributes in the Cisco Crosswork Situation Manager database.

· Configure the LDAP group search section:

— systemUser: Username of the system user to bind and search for user group information.

— systemPassword: Password of the system user to bind and search for user group information.

— groupBaseDn: Defines a group base DN to search for LDAP groups.

— memberAttribute: Attribute used look for group members. Defaults to "member".

— groupNameAttribute: Attribute used to look for group name.

— roleMap: Defines the role mappings between the user directory and Cisco Crosswork Situation Manager.

— assignTeams: Sychronizes team assignment between the user directory and the teams in Cisco Crosswork Situation Manager.

· Optionally configure SSL if you want to enable TLS authentication:

— ssl_protocol: Defines the SSL protocol you want to use. Defaults to TLSv1.2.

— server_cert_file: SSL server certificate.

— client_cert_file: Client certificate file.

— client_key _file: Client key file.

· Restart Apache Tomcat to activate the changes:

service apache-tomcat restart

See Control Moogsoft AIOps Processes for further details.

Example

An example LDAP configuration that uses direct DN resolution and SSL without client authentication:

      "example_ldap":
{
    "realmType": "LDAP",
    "url": "ldap://mysaml:389",
    "userDnResolution":
    {
    "resolutionType": "direct",
    "direct":{
        "usernameAttribute": "uid",
        "userDnPostfix": "ou=People,dc=moogsoft,dc=com"
    }
    },
    "attributeMap":{
        "fullname": "cn",
        "email": "mail"
    },
    "groupBaseDn": "ou=Group,dc=moogsoft,dc=com",
    "memberAttribute": "member",
    "groupNameAttribute": "cn",
    "roleMap":{
        "role-admin": "Super User",
        "OperatorRole": "Operator"
    },
    assignTeams:{
        teamMap:{
            CloudDevOps: "Cloud DevOps team",
            DBDevOps: "Database DevOps team"
        },
        useGroupName: true,
        createNewTeams: true
    },
    "ssl":
    {
        "server_cert_file": "/usr/share/moogsoft/config/example.crt"
    }
}

Manage User Access

As an Administrator, you control user access to functions in the Cisco Crosswork Situation Manager UI. This ensures that authorized users can perform required functions and prevents unauthorised users from accessing the system and sensitive operations within it.

Functions you can restrict include the ability to assign alerts, assign Situation owners, mark alerts with Probable Root Cause (PRC) feedback, and access Integrations.

Follow these steps to manage user access:

· Create roles for specific job functions.

· Assign the permissions to perform operations to roles.

· Enable user authentication via SAML or LDAP, or manually create users.

· Specify a role for authenticating users, or manually assign roles to users.

· Set up teams (optional).

Manage Roles

Roles group the permissions that users need to perform a set of tasks within Cisco Crosswork Situation Manager. You can create roles for specific job functions and assign them the permissions to perform certain operations. For example, you could create a role named Situation Manager and assign it the permissions to perform certain operations, such as managing alerts and Situations.

You must assign at least one role to every user. You can do this manually when you create the user or you can map roles to users if you use SAML.

Cisco Crosswork Situation Manager contains a set of predefined roles with a predefined set of permissions. See Role Permissions for details.

Create, edit and delete roles

To view a list of roles, navigate to Settings > Roles. You can use the search box on the left to filter the list.

Select a role to view and edit its configuration. You can edit the following:

· Selected permissions: See Role Permissions for a description of each permission.

· Session timeout: You can use the system timeout of 60 minutes or define a custom timeout period.

· Landing page: You can choose to inherit the landing page from the system configuration or select an alternative page.

You can also perform the following operations:

· Click + to create a new role.

· Click a role name and then - to delete a role.

You cannot delete a role that is assigned to users. Remove the role from all users first.

Role Permissions

Cisco Crosswork Situation Manager contains the following predefined roles: Super User, Administrator, Manager, Operator, Customer, and Grazer. The REST LAM Sender role is designed for use with REST LAM integrations. The role restricts UI functionality, so you should not assign it to UI users. The default permissions selected for these roles are as follows:

Permission	Description	Super User	Administrator	Manager	Operator	Customer	Grazer
add_media	Attach files to the collaborate tab in Situations and Team Room. Upload a photo to a user avatar.	✔︎	✔︎	✔︎	✔︎	✔︎	✔︎
alert_assign	Assign alerts.	✔︎	✔︎	✔︎	✔︎		✔︎
alert_close	Close alerts.	✔︎	✔︎	✔︎	✔︎		✔︎
alert_modify	Manage alerts including changes to significance and severity.	✔︎	✔︎	✔︎	✔︎		✔︎
all_data	View all data. Users without this permission can only view Situation and alert data related to their teams.	✔︎	✔︎	✔︎	✔︎	✔︎	✔︎
collab_read	View content, such as files and comments, added by other users, on the Collaborate tab within Team and Situation Rooms.	✔︎	✔︎	✔︎	✔︎	✔︎	✔︎
collab_write	Add content, such as files and comments, on the Collaborate tab within Team and Situation Rooms.	✔︎	✔︎	✔︎	✔︎		✔︎
collect_insights	Collect statistics for users with this role for Insights dashboards.				✔︎
filters	Create and edit the user's own personal Situation and alert filters.	✔︎	✔︎	✔︎	✔︎	✔︎	✔︎
graze_login	Log into the Graze API.						✔︎
manage_integrations	Access the Integrations tab.	✔︎
manage_maint	Manage maintenance windows.	✔︎	✔︎	✔︎	✔︎		✔︎
mobile	Access the Cisco Crosswork Situation Manager application on a mobile device.
moderator_assign	Assign a Situation owner.	✔︎	✔︎		✔︎		✔︎
moolet_informs	Send a Moolet Inform message using Situation and alert tools or the API.	✔︎	✔︎	✔︎	✔︎		✔︎
prc_feedback	Mark alerts with Probable Root Cause feedback.	✔︎	✔︎	✔︎	✔︎		✔︎
share_filters_public	Share filters publicly with all Cisco Crosswork Situation Manager users.	✔︎	✔︎
share_filters_teams	Share filters with teams.	✔︎	✔︎
sig_close	Close Situations.	✔︎	✔︎	✔︎	✔︎		✔︎
sig_create	Create Situations manually and from alerts.	✔︎	✔︎	✔︎	✔︎		✔︎
sig_modify	Manage Situations including changes to descriptions, queues and alerts.	✔︎	✔︎	✔︎	✔︎		✔︎
sig_resolve	Resolve a Situation.	✔︎	✔︎	✔︎	✔︎		✔︎
sig_visualize	Access information on what created a Situation.	✔︎	✔︎				✔︎
super_privileges	Super user privileges. Enables access to all system settings and the ability to manage dashboards, alert and Situation filters, templates, users, and roles.	✔︎					✔︎
thread_create	Deprecated permission	✔︎	✔︎	✔︎	✔︎	✔︎	✔︎
view_summary	View the Summary screen. If you remove this permission, the user can no longer access the Summary screen that appears on the system's default landing page. If you remove this permission and the user's landing page is the Summary screen, Cisco Crosswork Situation Manager redirects the user to the Open Situations screen instead.	✔︎	✔︎	✔︎	✔︎	✔︎	✔︎

Manage Users

As with most software systems, you use user credentials to provide secure access to Cisco Crosswork Situation Manager for your personnel. You can use the System Settings UI to manage the various attributes that define users and the actions they are allowed to perform inside Cisco Crosswork Situation Manager.

As an alternative to managing users in the UI, you can configure the system to allow Single Sign-On (SSO) via the Security Assertion Markup Language (SAML) protocol. If you have a large number of users, enabling SSO saves you from setting them up individually. It also improves security by requiring users to remember a single complex password instead of multiple credentials for multiple systems.

Within the SAML configuration you can specify a role, primary group and team to assign to users when they authenticate for the first time. See Configure Single Sign-On with SAML for more information. You can also authenticate users with Lightweight Directory Access Protocol (LDAP). See Configure Single Sign-On with LDAP for more information.

Manually Create and Edit Users

To view a list of users, navigate to Settings > Users. Use the search box on the left to filter the list. You can click the person icon to toggle the display of inactive users. Click + to create a new user or select a user to view and edit their attributes.

You cannot delete users. The system retains a history of all user activity, including collaboration posts and ownership of alerts and Situations. You can set obsolete users to inactive in the Personal tab. Cisco Crosswork Situation Manager includes the following predefined users:

· Administrator: Super user role. For information on creating and editing roles, see Manage Roles.

· Graze API user. Grazer role. This user is intended for system integration purposes, it is not a UI user.

· System Owner: Super user role.

· Moog: An anonymous system account used for unassigned alerts and Situations.

We recommend that you change the default password for each predefined user. Once you have set up your own users or enabled user authentication you may wish to deactivate the predefined users.

Edit User Details

Navigate to the Personal tab to view or edit the following user details:

· Username with 32 characters maximum. Mandatory.

· Full name.

· Password. If you do not enter a password the user is created as an LDAP user.

· Primary group. Mandatory.

· Department.

· Time zone if different to the system time zone.

· Active status. Inactive users cannot log into the UI. New users are active by default.

· Timeout. You can use the role timeout of 60 minutes or define a custom timeout period between 60 and 720 minutes (12 hours).

You can view or edit the user's email address or telephone number on the Contact tab.

Manage a User's Roles

Roles group the permissions users need to perform a set of tasks within Cisco Crosswork Situation Manager. See Manage Roles for information on creating and editing roles. You must assign at least one role to each user.

Assign a User to Teams

You can optionally group users into teams, to ensure that users working together view the Situations that are relevant to them. You can configure Cisco Crosswork Situation Manager to assign Situations to a particular team if they impact selected services or meet other criteria. See Manage Teams for further information.

Use Graze and MoogDb v2 APIs to manage users

See Graze API Endpoint Reference for the Graze API endpoints you can use to create and update users, and to return a list of all users in Cisco Crosswork Situation Manager.

See MoogDb V2 Method Reference for the MoogDb v2 methods you can use to manage users.

Manage Teams

You can use the optional Teams feature in Cisco Crosswork Situation Manager to allow users working together to view the Situations that are relevant to them. You can configure the system to automatically create teams based on certain Situation data, or you can manually create teams.

Create, Edit and Delete Teams

To view a list of teams, navigate to Settings > Teams. Use the search box on the left to filter the list. You can click the people icon to toggle the display of inactive teams.

· Click + to create a new team.

· Click a team name and then the copy icon to duplicate a team.

To create a team, you must assign the team a name with a maximum of 64 characters. You can optionally provide a description of the team.

By default, teams are active. Deselect the Active checkbox to deactivate a team. Inactive teams don't appear in team rooms and cannot have Situations assigned to them.

Navigate to the Users tab to select the users that belong to this team. You can define an alternative landing page for the team members on the Settings tab.

To delete a team, select it and click -. When you delete a team Cisco Crosswork Situation Manager removes it from any alerts, Situations, services and notifications that it was assigned to or associated with. The system retains a record of all team activity. When you view a Situation assigned to a deleted team, the team name is shown with an "inactive" label.

If a deleted team is recreated by SAML/LDAP, it is assigned a status of inactive.

Create Automatic Teams

You can configure Cisco Crosswork Situation Manager to create teams based on current Situation data.

Note

Creating automatic teams sets all existing teams to inactive.

You can create teams based on any of the following Situation fields:

· Description

· Services Impacted

· Process Impacted

· Queue

After Cisco Crosswork Situation Manager creates a team, you can view and edit the team settings and membership.

Configure Team Permissions

You can filter data visible to team members according to services, Situations or alerts:

· Create a Service Filter in General to select services for which the team views affected Situations. Adding more than 200 services may affect system performance and stability.

· Create a Situation Filter in General to select the Situations that you can assign to members of the team. If you only want team members to be able to see these Situations, ensure they do not have the 'all_data' permission. See Role Permissions for details.

· Create an Alert Filter in Settings to select the alerts outside the team's Situation Filter that you want the team members to see.

Permissions are additive. Therefore, if you set several filters, Situations and alerts that meet any one of them are included. For example, you could set up a team's permissions so that the team views critical Situations affecting a database service, a messaging service, and a web service.

The all_data permission in a user role overrides the filters in team settings. Users with the 'all_data' permission can view all Situations and alerts.

If you want users to see only the alerts and Situations that are assigned to their team, configure roles and users as follows:

1. Create a role without the all_data permission.

2. Assign the role to the users.

3. Add the users to the team.

See Manage Roles and Manage Users for more information.

Use Graze and MoogDb v2 APIs to manage teams

See Graze API Endpoint Reference for the Graze API endpoints you can use to create and update users, to return a list of all users in Cisco Crosswork Situation Manager, and other team-related requests.

See MoogDb V2 Method Reference for the MoogDb v2 methods you can use to manage teams.

Data Ingestion and Event Processing

Related image, diagram or screenshot

Before Ingesting Data

Configure Logging

Cisco Crosswork Situation Manager components generate log files to report their activity. As a Cisco Crosswork Situation Manager administrator, you can refer to the logs to audit system usage or diagnose issues. In certain cases you may want to change logging levels based upon your specific environment or needs. See the Log Levels Reference for details.Log Levels Reference

Cisco Crosswork Situation Manageruses Apache Log4j for logging. See the Log4j configuration documentation for more information.

Configure your log files

You can edit the log configuration files at $MOOGSOFT_HOME/config/logging/

There is a configuration file for every component or servlet in Cisco Crosswork Situation Manager. These files can be found in $MOOGSOFT_HOME/config/logging/servlets/ and follow the naming convention <servlet_name>.log.json. These configuration files control the logs for the following:

1. events.log.json: Logs for the proxy LAM.

2. graze.log.json: Graze request logs.

3. moogpoller.log.json: Moogpoller logs.

4. moogsvr.log.json: Logs relating to SAML/LDAP authentication and internal API calls.

5. situation_similarity.log.json: Situation Similarity servlet logs.

6. toolrunner.log.json: Toolrunner servlet logs.

The other default configuration files include:

1. moog_farmd.log.json: Configures logs for Moogfarmd process.

2. moogsoft.log.json: Configures logs for all of the utilities.

3. integrations.log.json: Configures logs for LAMs and integrations.

You can change log levels and make other configuration changes to components while they are running. Cisco Crosswork Situation Manager reads any changes and applies them every two seconds.

You can configure these files to meet your requirements. Refer to the Log4j documentation to see the available properties or see Log Configuration File Examples.

Log files by component

The following reference provides information about the log files for the various Cisco Crosswork Situation Manager components.

Apache Tomcat

Log location: /usr/share/apache-tomcat/logs

· RPM: /usr/share/apache-tomcat/logs/catalina.out

· Tarball: $MOOGSOFT_HOME/cots/apache-tomcat/logs/catalina.out

Primary log file: catalina.out

To change the logging level for the Cisco Crosswork Situation Manager servlets which run in Tomcat, edit the relevant files in $MOOGSOFT_HOME/config/logging/servlets.

      vi $MOOGSOFT_HOME/config/logging/servlets/moogsvr.log.json

...
    "loggers": {
        "Logger": {
            "name": "com.moogsoft",
            "additivity": false,
            "AppenderRef": [{
                "ref": "STDOUT"
            }],
            "level": "info"
        }
    }
...

Nginx

Log location: /var/log/nginx

Primary log file: error.log

To change the logging level for Nginx:

1. Edit /etc/nginx/nginx.conf.

2. Set the LogLevel property. For example to enable debug logging:

LogLevel debug

3. Restart Ngnix.

Moogfarmd

By default Moogfarmd and Ticketing integrations write logs into a log file stored in /var/log/moogsoft if you have write permissions for this directory. Otherwise, the logs are written to $MOOGSOFT_HOME/log. By default the log file takes the name of the HA address of the process. For example, MOO.moog_farmd.farmd_instance1.log.

MOO is the default HA cluster name in $MOOGSOFT_HOME/config/system.conf. If you change it the Moogfarmd log file path changes accordingly.

Restart Moogfarmd after making any of the following configuration changes.

To use a custom log configuration file for Moogfarmd:

1. Make a copy of the default Moogfarmd log configuration file and rename it, for example:

cd $MOOGSOFT_HOME/config/logging
cp moog_farmd.log.json mymoog_farmd.log.json

2. Edit the new file according to your Moogfarmd logging requirements.

3. Edit the configuration_file property in the log_config section of moog_farmd.conf to point to the new file. For example:

log_config:
{
configuration_file: "mymoogfarmd.log.json"
}

To change the logging level for Moogfarmd, edit the file $MOOGSOFT_HOME/config/logging/moog_farmd.log.json. For example:

      "configuration":
{
    "ThresholdFilter":
    {
        "level": "trace"
    },
}

You can also modify the log level using moog_farmd --loglevel. There are two options:

· Use farmd_cntl --loglevel <level>

Runtime change, restarting farmd resets log level to match logger config

· Adjust logger configuration

Adjusting moogfarmd log level (moog_farmd.log.json):

      vi $MOOGSOFT_HOME/config/logging/moog_farmd.log.json

...
            "Logger": [
                {
                    "name": "com.moogsoft",
                    "additivity": false,
                    "AppenderRef": [{
                       "ref":"${sys:MoogsoftLogAppender}"
                    }],
                    "level": "info"
}
...

See Moogfarmd Reference for more information.

To save Moogfarmd logs to a different location and/or filename, edit the Moogfarmd log configuration file located at $MOOGSOFT_HOME/config/logging/moog_farmd.log.json. For example:

      "RollingFile":
{
    "name"     : "FILE",
    "fileName" : "/var/log/moogsoft/Moogfarmd_test.log"
}

LAMs and integrations

LAMs and monitoring integrations log their processing and data ingestion to two types of log files, process and capture. Ticketing integrations do not have dedicated log files, and instead log their processing and data to var/log/moogsoft/MOO.moog_farmd.log. For more information, refer to the preceding section on Moogfarmd.

Process logs

LAMs and integrations record their activities as they ingest raw data. By default these process logs are written to a log file stored in /var/log/moogsoft if the user running the LAM has write permissions for this directory. Otherwise, the logs are written to $MOOGSOFT_HOME/log. By default the log file takes the name of the LAM or integration. For example, MOO.solarwinds_lam.log.

The configuration of LAM process logs is specified in a file located at $MOOGSOFT_HOME/config/logging/integrations.log.json.

To specify the log configuration for a particular LAM:

· Make a copy of the default LAM log configuration file and rename it with the name of the LAM, for example:

cd $MOOGSOFT_HOME/config/logging
cp integrations.log.json solarwinds_lam.log.json

· Edit the file according to your LAM logging requirements.

· Edit the configuration_file property in the log_config section of the LAM configuration file to point to the new file. For example:

log_config:
{
configuration_file: "$MOOGSOFT_HOME/config/logging/solarwinds_lam.log.json"
}

If a polling integration or LAM fails to connect to the target system using the connection details in the UI or configuration file, Cisco Crosswork Situation Manager creates an alert with critical severity and writes the details to the process log. The following example shows a log file entry for a failed Zabbix Polling integration with an invalid URL:

WARN : [target1][20190117 13:03:33.942 +0000] [CZabbixPollingTask.java:129] +|40001: An error response received
from Zabbix REST server: [Invalid URL provided [http://zabbixserver1/zabbix/api_jsonrpc.php] for User Login request]|+

The following error code raises a Cisco Crosswork Situation Manager alert. The alert details are listed below:

External ID	Type	Class	Severity	Example Alert Description
40001	Internal Integrations Error	Failed Connection Attempt	Critical	Failed Connection Attempt for target [target1] and destination [http://zabbixserver1/zabbix/api_jsonrpc.php]. This is attempt [1] out of [infinite].
40002	Internal Integrations Error	Failed Connection Error	Critical	Failed Connection Error [rabbitmq-host.com: nodename nor servname provided, or not known]. This is attempt [2] out of [infinite].

If the integration or LAM polls successfully on the next attempt, the alert is cleared. If the integration or LAM is restarted to resolve the connection issue the alert is not cleared and must be handled manually.

Capture logs

In addition to process logs, all LAMs except the Logfile LAM allow you to capture the raw data they receive. This feature is disabled by default. To enable it, edit the LAM's configuration file and uncomment the capture_log property in the agent section. The default path to the capture log files is $MOOGSOFT_HOME/log/data-capture/<lam_name>.log.

An example agent section in a LAM configuration file is as follows:

      agent:
{
    name        : "SolarWinds",
    capture_log : "$MOOGSOFT_HOME/log/data-capture/solarwinds_lam.log"
}

MySQL

Log location: /var/log/mysqld.log

MySQL logging defaults to the highest level. To remove warnings from the MySQL log:

· Edit /etc/my.cnf .

· Add the following line:

log_warnings = 0

· Restart the MySQL service.

RabbitMQ

Log location: /var/log/rabbitmq

Refer to the RabbitMQ documentation for information on how to configure RabbitMQ.

Elasticsearch

Log location: /var/log/elasticsearch/elasticsearch.log.

Refer to the Elasticsearch documentation for information on how to configure Elasticsearch.

Hazelcast and Kryo

Cisco Crosswork Situation Manager uses two libraries for persistence: Hazelcast and Kryo. You can configure the logging for these components in the file $MOOGSOFT_HOME/config/logging/moog_farmd.log.json.

The logging level is set to WARN by default. Logs are written to the process log file.

Topologies

When you create, update and delete topologies and their nodes and links, Apache Tomcat logs the details in its primary log file catalina.out. An example log entry at INFO level is as follows:

INFO : [Topologies Reporter Thread][20200225 16:48:25.105 +0000] [CReporterThread.java:142] +|Topologies server handled [200] topologies requests in the last [60] seconds.|+

Example errors at WARN level:

“Unable to replace topology as topology to be replaced is not valid: [physical]" (WARN)
"Unable to replace topology [physical] as replacing topology is not valid: [network] (WARN)
"Unable to replace topology [%s] - topology [%s] has not been updated" (WARN)
"Failed to get all nodes for topology [physical] as it does not exist" (WARN)

Vertex Entropy

The Graph Analyser process runs automatically as part of the Housekeeper Moolet, to calculate Vertex Entropy for your topological nodes. The Graph Analyser process logs details of its processing to the Moogfarmd log file. Example log entries:

Starts processing for a topology named "physical":

Starting graph analysis of topology [physical]
...
Setting topology state from [OUTDATED] -> [PROCESSING] for [physical]

Processes nodes in the "physical" topology:

Performing analysis on 15023 nodes in topology [physical]

Example errors:

"Skipping topology [physical] as it has no nodes." (INFO)
"Failed to update vertex entropy values for topology [physical]" (WARN)
"Topology with name [physical] does not exist" (WARN)
"Topology [physical] state is [OUTDATED], skipping pending re-analysis" (INFO)

Completes processing:

Completed graph analysis of topology [physical], time elapsed 10s

Log rotation

Moogfarmd, LAMs and integrations use a Java-based logging utility that automatically runs at startup to prevent log files from becoming unmanageably large. The utility also prevents the loss of log data when you restart Cisco Crosswork Situation Manager.

The logging utility rotates the logs when the file size reaches 500MB by default. It rotates up to 40 files by default. This is controlled in by two properties under RollingFile and Policies in $MOOGSOFT_HOME/config/logging/<component_log_file_name>.log.json.

size

The size limit of the log file in megabytes that triggers a log rotation.

Type: Integer

Default: 500M

max

The maximum number of files that Cisco Crosswork Situation Manager can rotate.

Type: Integer

Default: 40

The default logger configuration appears in $MOOGSOFT_HOME/config/logging/<component_log_file_name>.log.json as follows:

      "Policies":
{
    "SizeBasedTriggeringPolicy":
    {
        "size": "500M"
    }
},
"DefaultRolloverStrategy":
{
    "max": "40"
}

Cisco Bridge

Cisco Bridge uses a store and forward architecture to push events and other messages from a local RabbitMQ cluster to the Message Bus.

Cisco Bridge outputs logs to:

/var/log/moogsoft/moogsoft_bridge.log for root users

$MOOGSOFT_HOME/log/moogsoft_bridge.log for non-root users

See Cisco Bridge for more information.

Log Configuration File Examples

You can customize each configuration log file to control the behaviour of the logging for the different components in Cisco Crosswork Situation Manager.

See Configure Logging for more information on logging.

Default Configuration Files

The default log configuration file for servlets and utilities is as follows:

The default log configuration file for Moogfarmd is:

      {
    "configuration": {
        "packages": "com.moogsoft",
        "monitorInterval": 2,
        "ThresholdFilter": {
            "level": "trace"
        },
        "appenders": {
            "Console": {
                "name": "STDOUT",
                "PatternLayout": {
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%file:%line] +|%message|+%n"
                }
            },
            "RollingFile": {
                "name": "FILE",
                "fileName": "${sys:MoogsoftLogFilename}",
                "filePattern": "${sys:MoogsoftLogFilename}-%d{MM-dd-yy}-%i.gz",
                "PatternLayout": {
                    "header": "${sys:MoogsoftLogHeader}",
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%file:%line] +|%message|+%n"
                },
                "Policies": {
                    "SizeBasedTriggeringPolicy": {
                        "size": "500M"
                    }
                },
                "DefaultRolloverStrategy": {
                    "max": "40"
                }
            }
        },
        "filters" : {
            "MarkerFilter" : {
                "marker": "MOOG_ESSENTIAL_INFO",
                "onMatch": "ACCEPT",
                "onMismatch": "NEUTRAL"
            }
        },
        "loggers": {
            "root": {
                "additivity": false,
                "AppenderRef": [{
                    "ref": "${sys:MoogsoftLogAppender}"
                }],
                "level": "warn"
            },
            "Logger": [
                {
                    "name": "com.moogsoft",
                    "additivity": false,
                    "AppenderRef": [{
                        "ref": "${sys:MoogsoftLogAppender}"
                    }],
                    "level": "warn"
                },
                {
                    "name": "com.moogsoft.persistence.serialization.CMiniLogToSlf4jLogger",
                    "additivity": false,
                    "AppenderRef": [{
                        "ref": "${sys:MoogsoftLogAppender}"
                    }],
                    "level": "error"
                }
            ]
        }
    }

Asynchronous Appender

You can configure a log file to use an asynchronous appender. This allows you to log event asynchronously. See AsyncAppender for details.

      {
    "configuration": {
        "packages": "com.moogsoft",
        "monitorInterval": 2,
        "ThresholdFilter": {
            "level": "info"
        },
        "appenders": {
            "Console": {
                "name": "STDOUT",
                "PatternLayout": {
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%file:%line] +|%message|+%n"
                }
            },
            "RollingFile": {
                "name": "FILE",
                "fileName": "${sys:MoogsoftLogFilename}",
                "filePattern": "${sys:MoogsoftLogFilename}-%d{MM-dd-yy}-%i.gz",
                "PatternLayout": {
                    "header": "${sys:MoogsoftLogHeader}",
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%c]: %message%n"
                },
                "Policies": {
                    "SizeBasedTriggeringPolicy": {
                        "size": "500M"
                    }
                },
                "DefaultRolloverStrategy": {
                    "max": "40"
                }
            },
            "Async" : {
                "name": "Async",
                "AppenderRef": {"ref": "FILE"}
            }
        },
        "loggers": {
            "Logger": {
                "name": "com.moogsoft",
                "additivity": false,
                "AppenderRef": [{
                    "ref": "Async"
                }],
                "level": "info"
            }
        }
    }
}

Post Logs to Elasticsearch

You can configure logs to be posted to Elasticsearch using the "Http" section and the 'url' property direct them to an Elasticsearch server:

      {
    "configuration": {
        "packages": "com.moogsoft",
        "monitorInterval": 2,
        "ThresholdFilter": {
            "level": "info"
        },
        "appenders": {
            "Console": {
                "name": "STDOUT",
                "PatternLayout": {
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%file:%line] +|%message|+%n"
                }
            },
            "RollingFile": {
                "name": "FILE",
                "fileName": "${sys:MoogsoftLogFilename}",
                "filePattern": "${sys:MoogsoftLogFilename}-%d{MM-dd-yy}-%i.gz",
                "PatternLayout": {
                    "header": "${sys:MoogsoftLogHeader}",
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%file:%line] +|%message|+%n"
                },
                "Policies": {
                    "SizeBasedTriggeringPolicy": {
                        "size": "500M"
                    }
                },
                "DefaultRolloverStrategy": {
                    "max": "40"
                }
            },
            "Http": {
                "name": "Elastic",
                "url": "http://localhost:9200/logs/farmdlogs/",
                "JsonLayout": {
                    "compact": true,
                    "eventEol": true,
                    "locationInfo": true,
                    "properties": true
                }
            }
        },
        "loggers": {
            "Logger": {
                "name": "com.moogsoft",
                "additivity": false,
                "AppenderRef": [{
                    "ref": "${sys:MoogsoftLogAppender}"
                }, {
                    "ref": "Elastic"
                }],
                "level": "info"
            }
        }
    }
}

Save Logs to the Database

You can configure your logs to be saved to the database with a configuration similar to the following:

      /*
To create the table for the logs to use:
CREATE TABLE IF NOT EXISTS logs(time TIMESTAMP, message TEXT, level TEXT, logger TEXT, exception TEXT);
*/

{
    "configuration": {
        "packages": "com.moogsoft",
        "monitorInterval": 2,
        "ThresholdFilter": {
            "level": "info"
        },
        "appenders": {
            "Console": {
                "name": "STDOUT",
                "PatternLayout": {
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%file:%line] +|%message|+%n"
                }
            },
            "RollingFile": {
                "name": "FILE",
                "fileName": "${sys:MoogsoftLogFilename}",
                "filePattern": "${sys:MoogsoftLogFilename}-%d{MM-dd-yy}-%i.gz",
                "PatternLayout": {
                    "header": "${sys:MoogsoftLogHeader}",
                    "pattern": "%-5level: [%thread][%date{yyyMMdd HH:mm:ss.SSS Z}] [%file:%line] +|%message|+%n"
                },
                "Policies": {
                    "SizeBasedTriggeringPolicy": {
                        "size": "500M"
                    }
                },
                "DefaultRolloverStrategy": {
                    "max": "40"
                }
            },
            "JDBC" : {
                "name": "DB",
                "tableName": "logs",
               "DriverManager": {
                "connectionString": "jdbc:mysql://localhost:3306/moogdb??useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8&connectTimeout=5000&rewriteBatchedStatements=true&cacheCallableStmts=true&cachePrepStmts=true&callableStatementCacheSize=1000&prepStmtCacheSize=1000&prepStmtCacheSqlLimit=100000&useCursorFetch=true&useSSL=false",
                "userName": "ermintrude",
                "password": "m00"
                },
                "Column": [{
                    "name": "time",
                    "isEventTimestamp": true
                }, {
                    "name": "message",
                    "pattern": "%message"
                }, {
                    "name": "level",
                    "pattern": "%level"
                }, {
                    "name": "logger",
                    "pattern": "%logger"
                }, {
                    "name": "exception",
                    "pattern": "%ex{full}"
                }
                    ]
          }
        },
        "loggers": {
            "Logger": {
                "name": "com.moogsoft",
                "additivity": false,
                "AppenderRef": [{
                    "ref": "${sys:MoogsoftLogAppender}"
                }, {"ref": "DB"}],
                "level": "info"
            }
        }
    }
}

Change passwords for default users

Cisco Crosswork Situation Manager creates users for Linux, RabbitMQ, the Cisco Crosswork Situation Manager UI and Graze API during the installation process. As a security measure, you must change the default passwords for these users. After you change the passwords you may need to update the Cisco Crosswork Situation Manager configuration to use the new passwords.

If you run in a distributed environment, you can set unique passwords for all components on each host.

Cisco recommends you encrypt passwords for use in Cisco Crosswork Situation Manager configuration files. See Moog Encryptor for more information. In distributed or high availability environments, encrypt passwords on each machine.

Linux users

The Cisco Crosswork Situation Manager installation package creates the following Linux users with login privileges:

· moogsoft

· moogadmin

· moogtoolrunner

Execute the passwd command to change the password of these Linux users. For example, to change the password for the moogtoolrunner user:

passwd moogtoolrunner

The Cisco Crosswork Situation Manager installation package creates the following Linux users without login privileges:

· Elasticsearch

· Nginx

Update Cisco Crosswork Situation Manager Configuration

After you change the password for moogtoolrunner, update its password in $MOOGSOFT_HOME/config/servlets.conf. You can use either the toolrunnerpassword or encrypted_toolrunnerpassword property. For example:

#toolrunnerpassword: "MyNewPassword",
encrypted_toolrunnerpassword: "rmW2daCwMyI8JGZygfEJj0MZdbIkUqX3tT/OIVfMGyI=",

Restart Apache Tomcat to apply the configuration change:

service apache-tomcat restart

Note

You do not need to update the Cisco Crosswork Situation Manager configuration after you change the password for other Linux users with login privileges.

RabbitMQ user

The Cisco Crosswork Situation Manager installation process creates a RabbitMQ user called moogsoft. Execute the rabbitmqctl change_password command to change the moogsoft user password. For example:

rabbitmqctl change_password moogsoft <new-password>

Update Cisco Crosswork Situation Manager configuration

After you change the moogsoft user password, update the password in $MOOGSOFT_HOME/config/system.conf. You can use either the password or encrypted_password property. For example:

"username" : "moogsoft",
#"password" : "MyNewPassword",
"encrypted_password" : "e5uO0LY3HQJZCltG/caUnVbxVN4hImm4gIOpb4rwpF4=",

If you are running in a distributed environment, update the password configuration on every host.

Graze API and UI users

The installation process creates the following default users for the UI:

1. admin

2. graze

3. super

You can also use the graze user to log into the Graze API.

To change the default passwords for these users, log into the UI and go to Settings > Users.

Configuration Steps Overview

Architecting a solution in Cisco Crosswork Situation Manager is in a way a "backward" design process. You need to ingest data to start the assessment and building strategies, but the resulting Situation design defines the further requirements for data ingestion.

Throughout the deployment process, you will be coming back to the ingestion step multiple times. When you determine your clustering strategy and design Situations you'd identify additional need to extract certain information from the source event payload.

For example, you may need to bundle alerts by the floor where the server is located, and that information is available as part of the server name. In that case, you'll be coming back to the ingestion step, and perform data parsing for the field.

Remember your solution design process is a cyclical, iterative one.

Design Your Situation

Situation design is the practice of identifying the alert clustering strategy to achieve the business objectives for using Cisco Crosswork Situation Manager.

Situation design is not just about identifying how you want to cluster your alerts. It dictates data ingestion, alert creation, and processing requirements.

Also, it is not a linear process. Rather, you will repeat the iterative process to incrementally improve configuration until you achieve your goal.

Perform Business Analysis and Information Inventory

The business analysis stage breaks down into 4 steps.

Related image, diagram or screenshot

Analyze Data

The data analysis phase takes 4 steps.

Related image, diagram or screenshot

Ingest Event Data from Monitoring Tools

Integrations enable you to connect applications and other tools to Cisco Crosswork Situation Manager.

You can integrate with applications such as ticketing, monitoring and collaboration tools. You can also create your own custom webhook integrations.

Monitoring

You can integrate with the following monitoring applications:

1. Ansible Tower

2. Apache Kafka

3. AppDynamics

4. Amazon Web Services (AWS) AWS

5. CA Technologies

6. Catchpoint

7. DataDog

8. Dynatrace

9. Email

10. EMC Smarts

11. ExtraHop

12. Fluentd

13. HP

14. JMS

15. Microsoft Azure

16. Microsoft SCOM

17. Moogsoft Express

18. Nagios

19. New Relic

20. Node.js

21. Node-RED

22. Office 365 Email

23. Oracle Enterprise Manager

24. Pingdom

25. RabbitMQ

26. REST LAM

27. Sensu

28. SevOne

29. Site24x7

30. SolarWinds

31. Splunk Integrations

32. Sumo Logic

33. Tivoli EIF LAM

34. SNMP Trapd LAM

35. VMware

36. Webhook

37. WebSphere MQ

38. Lenovo XClarity LAM

39. Zabbix

40. Zenoss

Ingest Source Event Data

Data ingestion is the process that inputs ("raw") event data from your infrastructure and converts the relevant data fields to ("processed") Cisco events. Cisco Crosswork Situation Manager can ingest a wide variety of formats: plain-text status messages, binary SNMP data, JSON-formatted strings.

As part of your data ingestion setup, you will examine your incoming data stream, identify data fields that do not correspond to standard Cisco Crosswork Situation Manager events and decide which of these fields you want to preserve. Some data might be useful further downstream: for clustering alerts into Situations or providing Operators with diagnostic information.

As a best practice, do not try to get your data ingestion settings right on the first try. You are dealing with a bit of paradox - you need to ingest data in order to uncover the data processing requirements. By ingesting the real data you can conduct discovery sessions effectively, which leads to identifying the data ingestion requirements. So expect to update the data ingestion settings throughout the deployment process.

Data ingestion takes 4 steps.

Related image, diagram or screenshot

Watch the video to learn these steps in detail.

There are two types of LAMS:

· Generic LAMs – Based on a specific protocol or communication type, but not specific to a particular product.

· Vendor-specific LAMs – Configured or customized versions of the generic LAMs that are set up to work with a specific product

In the following section, we will step through the LAM configuration process using the REST LAM example. Consult the developer guide for other types of LAMs, but the points of considerations discussed in the following pages will apply to them.

Types of LAM

LAM and LAMbot

In Cisco Crosswork Situation Manager, event ingestion settings are configured in two places.

· LAM (lam.conf configuration file) - handles data acquisition, tokenization, and mapping. In some cases, it can also perform some normalization.

· LAMbot - optional JavaScript to handle additional normalization processes.

First, we will take a look at the LAMs.

Which Type of LAM Should I Use?

In some cases, you could ingest the same data using separate mechanisms. In such cases, typically receiving LAMs are the preferred option over polling type.

Also, when given a choice of multiple options you should consider the following list in the following order of preference:

· REST LAM or webhook

a. Implemented as a direct forward from the underlying monitoring tool or via a messaging bus such as RabbitMQ, Kafka, JMS.

b. Provide a more reliable delivery mechanism, as well as has no dependency on a polling cycle in case of a polling LAM in Cisco Crosswork Situation Manager.

· REST Client LAM

· SNMP LAM if MIB conversion already exists.

· Socket LAM if event raw payload is structured to allow tokenization.

· Syslog LAM if messages are structured and invariable and require little or no ongoing maintenance via email

UDP Socket and SNMP protocols do not offer guaranteed delivery of forwarded events as opposed to webhook or messaging bus mechanisms.

Tip

If you decide to swap the ingestion mechanisms - for example, swap webhook with a RabbitMQ LAM - for a monitoring system, make sure to use a consistent approach to the data parsing in the lambot so that there is no impact to the downstream processing in Cisco Crosswork Situation Manager.

LAM Types - Generic, Aggregation, and Vendor-Specific

Related image, diagram or screenshot

There are three types of LAMs shipped with Cisco Crosswork Situation Manager and each requires a different degree of configuration:

Generic LAMs

The generic adapters have no specific default configuration - i.e. they can be customized to any suitable single data source that sends events on the supported protocol. The associated config and lambot files provide only a framework for acquiring, tokenizing, and mapping the raw data, and require that you configure the logic for the normalization part including deriving an appropriate signature.

There are three types of generic LAMs:

· Socket: accepts data over TCP or UDP network socket, and lets you specify how to parse the incoming data stream.

· Rest/webhook: listens for data in JSON format over a REST protocol.

· Rest client: polls a REST server and accepts JSON data.

Aggregation LAMs

The aggregation LAMs are specific to a protocol or vendor platform and have a configuration and a lambot for generically consuming events from these sources, but do little or no processing of the event contents themselves. Same as generic LAMs you will be required to configure the normalization part.

Since these are often aggregated event sources, there may be multiple event formats within the single ingestion source. For example, a customer may send all of their event data - events from both Netcool and Nagios - to a Kafka bus. When we consume these using the Kafka LAM they still remain Netcool and Nagios events, containing the different attributes and lifecycle behavior associated with those underlying event sources. They will need the appropriate mapping, routing, processing, and normalization associated with these event sources. Without normalization at the aggregation layer, this work has to be done in the LAMbot.

Another example of aggregated ingestion is consuming the syslog data arriving from the Splunk platform. In this case the out of the box Splunk LAM would have to be enhanced to construct an appropriate signature and syslog string parsing into attributes such as hostname and severity unless this has already been done at the aggregation layer.

The biggest effort of setting up an aggregation LAM lies in identifying the number of separate event formats arriving via the adapter and assessing the normalization workload needed for each. You may be required to add routing logic for separate format handling. Use centrally administered modules to overcome and compensate for complexity within the LAMbot.

The aggregation LAMs are:

· Logfile

· Email

· Kafka

· JDBC

· Splunk

· Syslog (UDP or TCP socket)

· Trapd (UDP)

· RabbitMQ

· WebSphere MQ

Vendor-Specific LAMs

Vendor-specific LAMs are the most complete, because they are product specific with the data in a known format, and can be handled in a consistent manner regardless of customer. We would expect only light customization to the configuration and - only needed if the customer has any specifics - to the LAMbot.

Consult the Integrations section of the documentation for the complete list of vendor-specific LAMs.

Types of Data to Ingest

Before you configure data ingestion, explore the format and quality of the incoming data from each source. The following are the recommended practices around data ingestion.

Tip 1: Don't Feed Time Series Data

Related image, diagram or screenshot

Cisco Crosswork Situation Manager is designed to deal with fault data. That means you should ingest events that may be worthy of operator attention. Do not forward continuous metrics (time series data) should into Cisco Crosswork Situation Manager. Instead, set up those data sources to send events when performance metrics reach a specific threshold of interest to your operations team.

Tip 2: Feed Only the Complete Set of Data

Related image, diagram or screenshot

Don't send events with critical data missing from the event payload. For example fault description or hostname. Here's an obvious example: if the source event data is missing value for the description field, you won't be able to cluster by the "Events with similar descriptions" cookbook.

Tip 3: Pay Attention to Consistency

Make sure to populate data consistently across fields that will be used in any downstream processes. For example, attributes you plan to use for clustering or for maintenance windows filters clearly need to be consistent. Cookbooks are remarkably flexible clustering tools, but always be mindful to feed consistent data for maximum accuracy.

Also, given that Cisco Crosswork Situation Manager provides event deduplication based on the same context between alerts, make sure that any subsequent event updates to the original occurrence have the fields consistently populated. Otherwise, you will need a mechanism to backfill subsequent updates based on the initial event occurrence. Additional data processing introduces overhead and can slow down the event processing rate.

Tip 4: Have a Strategy to Identify Missing Data

Define your strategy for capturing alerts with missing data. For example, highlight any alerts with missing data and cluster them into specific situations. Later, an administrator can review and refine the ingestion configuration as needed.

Tokenize Source Event Data

Cisco Crosswork Situation Manager tokenizes incoming data. After it has divided the data into tokens Cisco Crosswork Situation Manager assembles the tokens into an event. This topic covers the tokenizing options so you can control how tokenising works.

Start and end characters

The first two are a start and end character. The square brackets [] are the JSON notation for a list. You can have multiple start and end characters. The system considers an event as all of the tokens between any start and end character.

start: [],
end: ["\n"],

The above example specifies:

· There is nothing defined in start; however, a carriage return (new line) is defined as the end character

In the example above, the LAM is expecting an entire line to be written followed by a return, and it will process the entire line as one event.

Carefully set up, you can accept multi-line events.

Regular expressions

Regular expressions can be used to extract relevant data from the input data. Here's an example definition:

      parsing:
{
    type: "regexp",
    regexp:
    {
        pattern : "(?m)^START: (.*?)$",
        capture_group: 1,
        tokeniser_type: "delimiters",
        delimiters:
        {
            ignoreQuotes: true,
            stripQuotes: true,
            ignores:    "",
            delimiter: ["||","\r"]
        }
    }
}

Delimiters

Delimiters define how string are split into tokens for processing. To process a comma-separated file, where a comma separates each value, define the comma as a delimiter.

Token are referenced from the start position starting at one (not zero).

For example, for the input string “the,cat,sat,on,the,mat” where the delimiter is a comma, token 1 is “the”, token 2 “cat” and so on.

Combining tokenization and parsing can be complex. For example, if you use a comma delimiter and the token contains a comma, the token is split into two. To avoid this you can quote strings. You can then define whether to strip or ignore quotes.

An example delimiters section in a configuration file is as follows:

      delimiters:
{
    ignoreQuotes: true,
    stripQuotes: false,
    ignores: "",
    delimiter: [",","\r"]
}

When ignoreQuotes is set to true, all quotes are ignored and inputs are tokenised on the delimiters only.

When ignoreQuotes is false, delimiting does not occur until the matching end quote is found. This allows tokens to include delimiters. For example, given the following input when the delimiter is a comma:

hello world, "goodbye, cruel world".

Found tokens when ignoreQuotes is true: [hello world, goodbye, cruel world] (3).

Found tokens when ignoreQuotes is false: [hello world, "goodbye, cruel world"] (2).

Set stripQuotes to true to remove start and end quotes from tokens. For example, "hello world" results in a single token: [hello world].

Ignores is a list of characters to ignore. Ignored characters are never included in tokens.

Delimiter is the list of valid delimiters used to split strings into tokens.

Mapping

For each event in the file, there is a positioned collection of tokens. Cisco Crosswork Situation Manager enables you to name these positions so if you have a large number of tokens in a line, of which you are interested in only five or six, instead of remembering it is token number 32, you can call token 32 something meaningful.

      variables:
[
    { name: "Identifier", position: 1 },
    { name: "Node", position: 4 },
    { name: "Serial", position: 3 },
    { name: "Manager", position: 6 },
    { name: "AlertGroup", position: 7 },
    { name: "Class", position: 8 },
    { name: "Agent", position: 9 },
    { name: "Severity", position: 5 },
    { name: "Summary", position: 10 },
    { name: "LastOccurrence",position: 1 }
]

The above example specifies:

· position 1 is assigned to Identifier; position 4 is assigned to node and so on

· Positions start at 1, and go up rather than array index style counting from 0

This is important because at the bottom of the file, socket_lam.conf there is a mapping object that configures how Cisco Crosswork Situation Manager assigns to the attributes of the event that is sent to the message bus, values from the tokens that are parsed. For example, in mapping there is a value called rules, which is a list of assignments.

      mapping:
{
    catchAll: "overflow",
    rules:
    [
        { name: "signature", rule: "$Node:$Serial" },
        { name: "source_id", rule: "$Node" },
        { name: "external_id", rule: "$Serial" },
        { name: "manager", rule: "$Manager" },
        { name: "source", rule: "$Node" },
        { name: "class", rule: "$Class" },
        { name: "agent", rule: "$LamInstanceName" },
        { name: "agent_location", rule: "$Node" },
        { name: "type", rule: "$AlertGroup" },
        { name: "severity", rule: "$Severity", conversion: "sevConverter" },
        { name: "description", rule: "$Summary" },
        { name: "first_occurred", rule: "$LastOccurrence" ,conversion: "stringToInt"},
        { name: "agent_time", rule: "$LastOccurrence",conversion: "stringToInt"}
    ]
}

In the example above, the first assignment name: "signature",rule:"$Node:$Serial" ( "$Node:$Serial is a string with $ syntax) means for signature take the tokens called Node and Serial and form a string with the value of Node followed by a colon followed by the value of Serial and call that signature in the event that is sent to the Cisco Crosswork Situation Manager.

You define a number of these rules covering the base attributes of an event. For reference, Cisco Crosswork Situation Manager expects a minimum set of attributes in an event that are shown in this particular section.

Using braces within mapping definitions allows you to include URLs and special characters. For example:

      mapping:
{
    [
       { name: "type", rule: "${https://url}" },
        { name: "type", rule: "${https://url} customText" },
        { name: "type", rule: "${https://url}${keyA\\b\\c}" }
    ]
}

Escape backslashes (\\) and note that you cannot embed variables.

If you have an attribute that is never referenced in a rule, for example “enterprise trap number” which is never mapped into the attribute of an event, they are collected and placed as a JSON object in a variable defined in catchAll and passed as part of the event.

Custom info mapping

You can define custom_info mapping in LAM configuration files. This allows you to configure a hierarchical structure. An example mapping configuration is:

      mapping:
{
    rules:
    [
        { name: "custom_info.eventDetails.branch", rule: "$branch" },
        { name: "custom_info.eventDetails.location", rule: "$location" },
        { name: "custom_info.ticketing.id", rule: "$incident_id" }
    ]
}

This produces the following custom_info structure:

      "custom_info": {
    "eventDetails":
    {
        "branch":"Kingston",
        "location":"KT1 1LF"
    },
    "ticketing":
    {
        "id":94111
    }
}

You can use braces within mapping definitions. This allows you to include URLs and special characters. For example:

{ name: "type", rule: "${https://url}" },
{ name: "type", rule: "${https://url} customText" },
{ name: "type", rule: "${https://url}${keyA.b.c}" }

Note that you must escape backslashes and you cannot embed variables.

Polling LAMs with multiple target support

See Configure Polling LAMs to Poll More Than One Target Data Source.

Filtering

The filter defines whether a LAM uses a LAMbot. A LAMbot moves overflow properties to custom info and performs any actions that are configured in its LAMbot file. The LAMbot processing is defined in the presend property in the filter section of the LAM configuration file.

For example, the SolarWinds LAM configuration file contains this filter section:

      filter:
{
    modules : ["CommonUtils.js"],
    presend : "SolarWindsLam.js"
}

This indicates that SolarWindsLam.js processes the events and then sends them to the Message Bus.

If you don’t want to map overflow properties, you can comment out the presend property to bypass the LAMbot and send events straight to the Message Bus. This speeds up processing if you have a high volume of incoming alerts. Alternatively, you can define a custom stream to receive events. See Alert Builder for details.

See LAMbot Configuration for more information on the presend function.LAMbot Configuration

The optional modules property can be used to provide a list of JavaScript files that are loaded into the context of the LAMbot and executed. It allows LAMs to share modules. For example, you can write a generic Syslog processing module that is used in both the Socket LAM and the Logfile LAM. This reduces the need for duplicated code in each LAMbot.

Conversion rules

Conversion rules are used by Cisco Crosswork Situation Manager to convert received data into a usable format, including severity levels and timestamps.

Severity

The following example looks up the value of severity and returns the mapped integer.

      conversions:
{
    sevConverter:
    {
        lookup: "severity",
        input: "STRING",
        output: "INTEGER"
    },
},
constants:
{
    severity:
    {
        "CLEAR": 0,
        "INDETERMINATE": 1,
        "WARNING": 2,
        "MINOR": 3,
        "MAJOR": 4,
        "CRITICAL": 5,
        moog_lookup_default: 3
    }
}

In the above example:

1. conversions receives a text value for severity.

2. sevConverter uses a lookup table "severity" to reference a table named severity defined in the constants section.

3. The integer value matching the text value is returned.

4. moog_lookup_default is used to specify a default value when a received event does not map to a listed value.

For example, the text value "MINOR" is received and the integer value 3 is returned.

If moog_lookup_default is not used and a received event severity does not map to a specifically listed value, the event is not processed.

See Severity Reference for more information about the severity levels in Cisco Crosswork Situation Manager.Severity Reference

Time

Time conversion in Cisco Crosswork Situation Manager supports the Java platform standard API specification. See Simple Date Format for more information.

Some Unix time formats are indirectly supported and LAM logging indicates any automatic conversion that occurred at startup.

The only PCRE/Perl modifier automatically converted is the lone 'U' ungreedy modifier, PCRE's '-U' is not supported. If the pattern contains a -U it should be removed manually.

You can specify a time zone configuration so the LAM parses the incoming timestamps with the expected time zone. For example:

      conversions:
{
    timeUnitConverter:
    {
        timeUnit: "MILLISECONDS",
        input: "STRING",
        output: "INTEGER"
    },
    timeConverter:
    {
        timeFormat: "%Y-%m-%dT%H:%M:%S",
        timeZone: "UTC",
        input: "STRING",
        output: "INTEGER"
    }
}

You can specify the timezone name or abbreviation. See List of TZ Database Time Zones for the full list.

JSON events

The other capability of all LAMs is the native ability to consume JSON events. You must have a start and end carriage return as it is expecting a whole JSON object following the carriage return.

Under parsing you have:

end: ["\n"]

For the delimiter you have:

delimiter: ["\r"]

JSON is a sequence of attribute/value, and the attribute is used as a name. Under mapping, you must define the following attribute builtInMapper: "CJsonDecoder". It automatically populates, prior to the rules being run, all of the values contained in the JSON object.

For example if the JSON object to be parsed was:

{"Node" : "acmeSvr01","Severity":"Major"...}\n

The attributes available to the rules in the mapping section would be xNode="acmeSvr01", $Severity="Major" and so on.

Map Raw Event Data to Cisco Crosswork Situation Manager Events

This topic is about how to map event data in a Link Access Module (LAM) configuration file.

Configure the mapping in lam.conf

Mapping options vary based on the tokenization method you used.

· For JSON formatted raw events you can map the incoming keys directly to Cisco Crosswork Situation Manager event fields under the mapping section.

· For data parsed using start_and_end or regex, the resulted tokens need to be first given a variable name depending on the position of the token. For example, if the incoming event is parsed into 4 tokens then you need 4 variables defined. You can then use these variable names in the 'mapping section'. If you don't use the variables in the mapping section then everything ends up in the "overflow" field.

      variables:
[
    #
    # Note that positions start at 1, and go up
    # rather than array index style counting from zero
    #
    # Names of fields in input data can be substituted here, which is
    # useful for removing illegal characters when building rules.
    #
    { name: "Signature",      position: 1 },
    { name: "SourceId",       position: 2 },
    { name: "ExternalId",     position: 3 },
    { name: "Manager",        position: 4 },
    { name: "Class",          position: 5 },
    { name: "Agent",          position: 6 },
    { name: "AgentLocation", position: 7 },
    { name: "Type",           position: 8 },
    { name: "Severity",       position: 9 },
    { name: "Description",    position: 10 },
    { name: "LastOccurrence", position: 11 },
    { name: "AgentTime",      position: 12 },
    { name: "Trigger Node",   substitute: "TriggerNode" }
],

During the mapping stage, LAM places any unmapped raw event data into a catchall "overflow" field. You can then choose to access the "overflow" field in a LAMBot, perform further parsing and retain certain variables from it in the custom_info object. An example of how this can be implemented is explained in the next section.

What happens data in unmapped fields?

After the source data is tokenized, converted, and parsed as needed, then mapped to the Cisco Crosswork Situation Manager fields, you will have leftover. LAM places any unmapped raw event data into a catchall "overflow" field.

You can then choose to access the "overflow" field in a LAMBot, perform further parsing and retain certain variables from it in the custom_info object.

As you have read a few times by this point already, implementation of Cisco Crosswork Situation Manager takes a cyclical, iterative process. You will be visiting this overflow field to extract some data to use to respond to more requirements as you uncover them. Begin by mapping the minimum number of custom info fields and add more processes as you identify the needs by the downstream activities.

What Is a Custom_info Field

What Is a Custom Info Object?

In Cisco Crosswork Situation Manager, there isn't always a corresponding field for everything you want to keep from the source system. Custom Info is a field that allows you to extend the Cisco Crosswork Situation Manager alert schema. You can store additional information that has not been mapped to any standard Alert field attributes. Store data in the custom_info field as a JSON-formatted tree.

Custom Info Field Best Practices

You need to be strategic and selective about adding custom_info. Keep the following points in mind as you create custom info fields.

Do NOT Add Unnecessary Information

Do NOT Overload the event Custom Info object with unnecessary information. When you are just beginning the ingestion stage, it is likely that you do not know all the fields you need. Do NOT create custom fields for everything at this stage. Consult your operators who will be addressing these alerts. Ask them what additional information they need in the alert payload in order to diagnose issues and only keep those values. Also when you get to the alert clustering process you will identify the custom information needs (if any) for clustering.

Mind the Event Size Limit

The maximum allowed size of an event is 64KB. If an event exceeds the limit, it does not get created in the system. Be mindful of the limit and truncate some of the field values as needed. For example, if you decide to add a list of values such as impacted applications, add a length limit to not risk exceeding the event size limit

Mind the Performance Impact

The size of the event directly impacts the amount of disk space required for the database server. Each time an event is deduplicated, or the alert is updated in the system, a complete copy of it I saved in the database. This includes the custom_info object. Suppose you have a 20KB alert, and it gets updated and actioned 100 times. The database footprint of it will be about 2MB. See Retention Policy under Sizing Recommendations.

Use the Same Base Model

It is best practice to enforce the same custom_info base model across all of your ingestions. Use the example model below. You can expand it as you see fit, but always add defaults.

      var baseCustomInfo = {
      enrichment : {},
      mooghandling : {
          isEnriched : false,
          archiveOnly : false,
          toolFlags : {},
      },
      services : [],
      location : {},
      eventDetails : {},
      ticketing : {
          ticketNumber : null,
          ticketStatus : null
      }
  }

Learn More

For information on Workflow Engine Functions you can use to modify custom_info, see Workflow Engine Functions Reference.

Normalize Data

For complex processing of event data you can use a LAMbot.

LAMbot processing

Some event fields may have embedded information that can be used later as enrichment and alert correlation. For example, a hostname may include regional information that can be used to cluster alerts based on physical location. As a best practice, parse incoming data fields which include this type of information. A number of utilities, for example, the Bot utility, are available to simplify data parsing in the LAMbot.

Any event field can potentially be used in correlation. It is important to identify early on which fields will be used and assign them accordingly. As a rule, any fields identified for correlation should have reliable data. Avoid unreliable event fields.

Assume a simple incoming raw event of the following format:

      {
    "ip_address":"10.42.63.74",
    "event_id":"e3562",
    "manager":"MAN",
    "host":"lon35sql04",
    "eventID":"e4268",
    "check":
    {
        "name": "database",
        "type": "availability"},
        "region":"EMEA",
        "datacenter":"London",
        "priority":"crit",
        "message":"Database is down",
        "app" : ["App A", "App B"]
    }
}

The following example shows a mapping to support the incoming event. Note the new severityConverter transformation. In the LAM configuration file:

      constants:
{
    custom_severity:
    {
        "crit": 5,
        "ok": 0,
        "warn": 2,
        "minor": 3,
        "error": 4,
        moog_lookup_default: 1
    }
},
conversions:
{
    severityConverter:
    {
        lookup: "custom_severity",
        input:  "STRING",
        output: "INTEGER"
    }
},
mapping:
{
    catchAll: "overflow",
    rules:
    [
        { name: "signature", rule:      "$host::$check.name::$check.type" },
        { name: "source_id", rule:      "$ip_address" },
        { name: "external_id", rule:    "$external_id" },
        { name: "manager", rule:        "$manager" },
        { name: "source", rule:         "$host" },
        { name: "class", rule:          "$check.name" },
        { name: "agent", rule:          "$LamInstanceName" },
        { name: "agent_location", rule: "$region" },
        { name: "type", rule:           "$check.type" },
        { name: "severity", rule:       "$priority", conversion: "severityConverter" },
        { name: "description", rule:    "$message" },
        { name: "agent_time", rule:     "$moog_now" }
    ]
}
filter:
{
    presend: "lambot.js"
}

Nested fields are supported. If the incoming field does not exist, the actual string "$incomingField" is used.

Any unmapped data from the raw event ends up in the catchAll field which by default is called "overflow". An overflow is a JSON object that maps the remaining event fields to indexes accessible using the standard Javascript object functions. If you need to access this data to perform further parsing you can do it as shown below in the LAMbot. Cisco advises using the Bot utility to achieve this.

Here is an excerpt from a LAMbot that further parses the event payload from above. Note the usage of Bot utility methods to build a custom info base and add fields to it.

Always use botUtil.setCustomInfo to set your custom_info in a LAMbot. The logic behind the method also nullifies the overflow before the event despatches to the Message Bus. This reduces the event size and therefore minimizes any size-related issues on the bus.

Also note botUtil.checkEvent.validateEvent method to validate the consistency of the event. In the LAMbot Javascript file:

Post normalization

After the LAMbot completes data normalization it exits with one of the following options:

· Return true: Event is sent to the Message Bus on the default events stream

· Return { stream: "stream_name", passed: true }: Event is sent to the message bus on a separate stream to the main one. You will need to specifically configure an AlertBuilder to listen to the stream_name:

· Return false: Event is dropped and not sent to the Message Bus (usually implemented when you want to blacklist certain events such as audit events from being processed by Cisco Crosswork Situation Manager at all).

Sending data on a separate stream is useful if you need to set up separate functionality to the main AlertBuilder. If you choose to do so, remember that you need to explicitly configure an AlertBuilder to accept the data on the configured stream.

      {
    name            : "AlertBuilder_Stream_",
    classname       : "CAlertBuilder",
    run_on_startup : true,
    moobot          : "AlertBuilder._Stream_.js",
    process_output_of : "Event Workflows",

    # metric_path_moolet - a Moolet included in the
    # calculation of the time taken for events to complete
    # their path through the system from initial ingestion
    # through to complete processing.
    #
    metric_path_moolet : true,

    # Specify a list of streams to create alerts for. Reference the
    # streams set in the filter section of the LAM configuration.
    # Defaults to the generic event stream.
    # If you are using the Event Workflow Moolet, configure the
    # process_output_of property instead.
    event_streams : [ "stream_name" ],

    threads         : 4,
    events_analyser_config : "events_analyser.conf",
    priming_stream_name         : null,
    priming_stream_from_topic   : false,
    moolet_queue_size_limit: 0
}

Map Event Severity Levels

Severity is a measure of the seriousness of an event and indicates how urgently it requires corrective action.

Cisco Crosswork Situation Manager LAMs and integrations use six industry-standard severity levels as follows:

· 0: Clear - One or more events have been reported but then subsequently cleared, either manually or automatically.

· 1: Indeterminate - The severity level could not be determined.

· 2: Warning - A number of faults with the potential to affect services have been detected.

· 3: Minor - A fault that is not affecting services has been detected. Action may be required to prevent it from becoming a more serious issue.

· 4: Major - A fault is affecting services and corrective action is required urgently.

· 5: Critical - A serious fault is affecting services and corrective action is required immediately.

The severity mapping is set in each LAM configuration file:

      severity:
{
        "CLEAR"               : 0,
        "INDETERMINATE" : 1,
        "WARNING"             : 2,
        "MINOR"               : 3,
        "MAJOR"               : 4,
        "CRITICAL"            : 5,
}

The LAM takes the severity string in a received event and translates it into one of the above integer values using the mapping in its configuration file:

      sevConverter:
{
        lookup : "severity",
        input   : "STRING",
        output : "INTEGER"
},
mapping:
        rules:
        [
                { name: "severity", rule: "$severity",conversion:"sevConverter"},
        ]

You can customize the severity section of the LAM configuration file according to the severities used in the system sending events to Cisco Crosswork Situation Manager. In the following example, events sent to the LAM with non-standard severities 'info' and 'Information' are mapped to 'INDETERMINATE' in Cisco Crosswork Situation Manager:

      severity:
{
        "info"                : 1,
        "Information"         : 1,
        "user"                : 1,
        "warning"             : 2,
        "Warning"             : 2,
        "error"               : 5,
        moog_lookup_default     : 1
}

The moog_lookup_default property specifies a default value to use when the severity does not match any of the defined strings. If you do not set a default, events with an unmapped severity are not processed. For more information on mapping see "Conversion Rules" in Data Parsing.

Cisco Crosswork Situation Manager determines a Situation's severity from the member alert with the highest severity level.

Tip

It is good practice to use moog_lookup_default in all of the configured lookups as it prevents the event from being dropped when it encounters a conversion error.

Configure de-duplication key using Signature

Signature is the value Moogsoft AIOps uses to deduplicate source events with the same context. Moogsoft AIOps assigns a signature value to each event it ingests, constructed from a subset of the event fields. If Moogsoft AIOps finds an event signature to be unique, it creates a new alert. Otherwise, it adds the event to an existing alert with a matching signature.

After Moogsoft AIOps deduplicates events into alerts, you can still access the individual event information from the alert timeline.

Most LAMs and integrations include a default signature mapping. If you are building a custom data ingestion or tweaking the default, you can use the fields of your choice to define the signature.

Why is Signature Important?

The composition of the signature is very important because it has a significant impact on what you see in the alert list.

The first time Moogsoft AIOps ingests an event with a specific signature it creates a unique alert. If it ingests another event with a matching signature it deduplicates it into the same alert. Moogsoft AIOps updates the alert timestamp and increments the alert count. This is very useful in reducing the number of alerts in the system.

Default Signatures

To view and edit default signatures for integrations configured in the Moogsoft AIOps UI:

· Go to Integrations and click the name of your installed integration in the left panel.

· Click the Alert Noise Reduction tab and scroll down to the Signature Editor section.

This section displays the fields that can be used to create a baseline signature for this integration. You can edit the signature here to select different or additional fields. Click Use Recommended Fields to restore the recommended default.

You can view and edit default LAM signatures in LAM configuration files. For example, the SevOne configuration file $MOOGSOFT_HOME/config/sevone_lam.conf contains the following signature definition in the mapping rules:

{ name: "signature", rule: "$origin::$deviceId::$objectId" }

Signature Composition

A signature is made up of a subset of event properties. Different types of events require different signatures.

In general, fields to consider using in the signature are:

· Source, such as hostname

· Event type or class

· Static unique IDs

· Error code

· Impacted entities

Do not include fields in the signature that may change between events with the same context. For example:

· Timestamp

· State changes such as up or down

· Event count

· Variable unique IDs

· Severity

· Descriptions with changing content such as metrics

For example, every event has a different timestamp so including it in the signature effectively disables deduplication.

A perfect signature contains just enough information to identify the context of an event.

Signature Length and Concatenation

There is no restriction imposed on the length of signatures in raw events. Signatures longer than 746 characters are hashed at the alert level. This improves the manageability of signatures in the database but does not affect deduplication. The hashed signature length is 40 characters.

If you edit the signature in a LAM configuration file, concatenate multiple fields with two colons "::" to prevent misleading results. For example, if you concatenate source "Node A" and unique ID "1234" as "NodeA1234" this could potentially also match Node A1 and unique ID 234.

Example

The Email LAM uses the following default signature mapping:

$hostname::$subject

The Email LAM retrieves the following email messages in this order and sends them to Moogsoft AIOps:

Event 1:

ip-172-22-97-140.ec2.internal::TDM 18 Remote Loss of Signal

Event 2:

ip-172-22-97-140.ec2.internal::TDM 18 Remote Loss of Signal

Event 3:

ip-172-22-99-144.ec3.internal::TDM 18 Remote Loss of Signal

Events 1 and 2 have an identical signature. Moogsoft AIOps creates an alert for event 1 and deduplicates event 2 into the same alert. It creates a separate alert for event 3 which has the same subject but a different hostname.

How Do I Select Appropriate Signature Values?

Signature is the context to be shared by events that belong to the same alert. For example, the shared context might be the same fault, on the same hardware or software asset, in the same network location.

Consider an example of events where only the timestamp or severity varies from event to event. All these events should have the same signature and be deduplicated into a single alert.

If in another case, events involve different equipment or different problems, they should have different signatures and become separate alerts. This way database replication failures with differing severities will have the same signature value, but a database replication failure and a database query failure will have different signatures.

Data Granularity and Deduplication

Related image, diagram or screenshot

Configure Polling LAMs to Poll More Than One Target Data Source

Polling LAMs that support multiple target data sources contain the targets property in the configuration file. The following Polling LAMs have multiple target support:

· CA Spectrum

· DataDog Client

· Dynatrace APM

· Email

· HP NNMi

· HP OMi

· JDBC

· New Relic

· New Relic Insight

· Rest Client

· SevOne

· SolarWinds

· VMware vCenter

· VMware vRealize Log Insight

· VMware vSphere

· Zabbix

· Zenoss

For these LAMs, the event payload includes the target name and target URL. These are written to custom_info.eventDetails.moog_target_name and custom_info.eventDetails.moog_target_url:

      var overflow = commonUtils.getOverflow(event);
    event.set("overflow", null);

    var eventDetails = {
        "moog_target_name": overflow.moog_target_name,
        "moog_target_url": overflow.moog_target_url
    };
    event.setCustomInfoValue("eventDetails", eventDetails);

These may be available in the LAMbot functions, and can be enabled or disabled if required.

Hold a Discovery Session with Operators

The first step of Situation design is to identify the operators' needs. Identify the teams that will be onboarded to Cisco Crosswork Situation Manager, and audit the operators within the teams about the information they need to see in Situations and the corresponding operational workflow.

Your goal for the discovery session is to identify the content and context, as well as primary and secondary data required to produce the result. Also, by investing your time in understanding exactly what information is required to respond to incoming events, you will be able to perform with clear purposes, which in turn reduces the likelihood of impairing system performance with unnecessary enrichment queries.

Who Should I audit?

Speak to the operators or SMEs who understand the alert content and will be directly involved in working on Situations in Cisco Crosswork Situation Manager.

DO NOT

Don't interview system administrators. This is a typical mistake at this stage. The team members responsible for maintaining Cisco Crosswork Situation Manager typically can't articulate the information operators will need to resolve Situations.

Tip

The operators you audit at this stage are not likely to understand Cisco Crosswork Situation Manager yet. This is perfectly fine. As long as they understand the high-level concepts of what an alert and a Situation are in Cisco Crosswork Situation Manager, they will be able to provide the input you need.

Goal: Discover the Content and Context Required

The goal of the discovery session is ultimately to identify the content and context necessary to design Situations.

A Situation has associated content and context. The content is the list of alerts shown in a Situation, while the context is the interpretation of that list of alerts. For the example below, the content of the Situation is the 11 alerts of the same severity. The Situation was assembled in response to the contextual need of the organization. They want to bundle the alerts with a similar description, with a Major severity level and assign them to the Cloud DevOps team.

Related image, diagram or screenshot

Step 1: Identify the Organizational Context

Start with identifying the organizational context. What teams will use Cisco Crosswork Situation Manager? And how should alerts be routed to different teams? Answers to these questions can provide you with the big picture requirements.

For example, where are their servers located? How distributed are they? Do they have edge locations?

What is the scope Cisco Crosswork Situation Manager is expected to handle? If they want to feed the monitoring data from both dev and production, then it is likely that you want to cluster alerts separately for each environment. In this way, by learning the organizational context you learn the clustering requirements.

Also, learning where the organization is on the journey to transform its ITOps practices should inform you what level of implementation will be most successful. For example, an organization may want to keep their existing workflow and only introduce Cisco Crosswork Situation Manager as a system for noise reduction. In that case you can focus your discovery sessions on identifying which alerts matter and which ones the operators consider to be noise.

Consult this video to learn about the adoption stages and identify the proper context Cisco Crosswork Situation Manager can fit in at a given organization.

Step 2: Identify Team and Individual Needs

After identifying the overarching context, move on to the teams' and operators' needs. What will help the operators perform their tasks better? What are they hoping to achieve with Cisco Crosswork Situation Manager? Do they want to view items that go wrong simultaneously at the same location? Or would they rather receive notifications when the infrastructure components underlying a specific business service fail? Answers to these inquiries lead you to identify the context you want to use to cluster alerts.

For example, if your operations teams are organized around applications, they are interested in seeing alerts clustered on the intersections of an impacted application. In this case, the context of a Situation is "alerts that are affecting the same application". A Situation that says "here are the list of broken items that are impacting the application you are responsible for" nicely aligns with this workflow.

But what if applications are not granular enough for your organization? If you need to organize the alerts around the impacted services, Situations need a different context: "here is the list of broken items that are impacting the service you are responsible for". So learning the specific context your organization's needs is critical for successful Situation design.

Along with interviewing, it often helps to sit down with your operators and observe how they work. Ask them questions on specific ways they handled the event at hand. Even if they struggle to articulate their needs in an abstract sense, they can often explain what is going on and what they need to resolve the case at hand. Your observation will provide a blueprint for the higher-level clustering pattern.

Tip

While Situations are organized around one context, an alert can have multiple contexts. As a result, it can belong to multiple Situations. For example, an alert can be grouped together with other alerts that are impacting the same network, but also the same alert can belong to a Situation that is organized by the floor location of the rack. An alert can be represented within both the microscopic view like an application failure and the macroscopic view like a datacenter failure.

Consider the Situation Labelling Strategy

Situation descriptions are one of the primary means that operators use to find and identify relevant Situations. For this reason, you should carefully consider the alert content to include in your descriptions. For example, imagine you have multiple separate Situations that impact separate environments occurring at the same time. How does an operator prioritize and distinguish these if the description does not include the environment, such as Test, Production, or UAT, where the Situation occurred?

Good descriptions can also help operators diagnose and assign Situations. Suppose a team assignment depends on the physical location where the Situation occurred. If the description includes the location, an operator can quickly assign a Situation to the correct team.

Consider how you might label the Situations and ask clarifying questions during the discovery sessions. The criteria for defining "good" descriptions, like the criteria for "good" data and "good" Situation design, is highly dependent on the specific needs of your organization and operators. Always consult with your operators and users when planning and maintaining your deployments.

Identify Primary and Secondary Data

Along with content and context, use operator audits to identify primary and secondary data requirements.

Primary workflow data is the information required to correlate alerts and label Situations. Identifying primary data helps you design Situations. Secondary workflow data is the information that supports diagnostic activities such as ticketing and team assignment. Identifying secondary workflow data helps you identify the enrichment requirements, which in turn helps Situation design.

Ask operators what context they want to see presented within Situations. They may want to see items that go wrong together at the same location, or they may want to be notified when things that enable a business service to fail.

To help, look into past incidents and identify what manual correlation the operator had to do in order to relate the events describing the incident. Is that information available within the event payload itself or available somewhere externally? If externally, you want to know if the data is accessible so you can use it to enrich the events in Cisco Crosswork Situation Manager.

Example Questions for Discovery Sessions

The operators may not be able to readily spell out all of the content and context requirements for you. Here are some examples of questions to ask the operator to guide their thinking:

· What sort of incidents have you seen previously and what sort of correlation did you have to do to figure out the impact? If the correlation is manual, you can see if it can be done programmatically via enrichment. This question also highlights potential external enrichment sources.

· Any site-specific issues you would like to highlight? This exposes their current pain points.

· What kind of manual correlation did you have to do in your head in the past few months? Can you show it to me and elaborate? The answer to this question will help you identify the enrichment requirements.

· If you would like to identify issues by their location - is there a way to know the location from the alert payload? Does the hostname naming convention allow me to infer it? Can I look up a device's network address in an inventory and get the location from that? The answer to this question can help you determine whether you need to parse the incoming information or you need to look up a CMDB to source the information.

· Can we use a relationship lookup to aid clustering? For example a job or application dependency relationship or device type or some other parent-child relationship. You might know whether you want to use topology information for clustering. Also, you will identify the enrichment requirements.

· How do you prioritize alerts apart from their severity? Are there any areas carrying higher importance? Perhaps SLA bound? Or by the customer or particular locations?

· How many environments do you have? For example UAT, DEV, Production. How would you like them to be prioritized? This will help you identify the clustering requirements. If they have a production environment and DEV environment, most likely the alerts with the same attributes should still be clustered separately if one is from the production environment and one is from the DEV environment.

· What alerts would you route to your team? How do you identify these alerts and what is the common element within? Common elements can include; services, region, application, data center, etc... Knowing the answer helps you determine the clustering and routing strategy.

· What specific cause-effect type of incidents would you like Moogsoft AIOps to identify? Some events on their own do not represent much interest to the operators, but in combination with a separate occurrence, these can perhaps be an indication of a more critical issue.

· If, for example, the context of a situation is the impacted business server, would you also want to break the clusters further into the impacted technological stack? If the answer is yes, you need to figure out how to source that information. Can you parse the source data to extract this? Or do you need to perform a lookup?

· Would you like to cluster alerts across the technological stack as well?

· Situations are evolving entities so it is acceptable for a situation to change its context throughout its lifecycle. What contexts should be merged together based on the alert overlap?

· Should you be notified only when multiple things fail or even a single alert is of concern? This helps you identify the alert threshold.

· How long after the initial candidate cluster creation should alerts be considered in the scope of the cluster? The answer to this question directly impacts the choice of cook_for time and its extension.

Process Events with the Workflow Engine

By default, the first component in Cisco Crosswork Situation Manager that listens for new events arriving at the message bus is the Event Workflows moolet. The Event Workflow Engine is part of the Workflow Engine framework. It listens for all new events on the standard event stream where LAMs/LAMBots will send events.

You can create alternate event streams, but, if so, you must write a new Event Workflows component to process these events differently. This is better than writing conditional logic into every component when you need different processing. Configure the event streams an Event Workflow moolet listens for in the in the event_streams property of its configuration file. See Create a WFE Moolet for more information on creating a new Workflow Engine moolet.

For all Workflow Engines, the ones delivered with the product and new ones you add, you configure the programmatic data processing in the Cisco Crosswork Situation Manager UI. See Workflow Engine for more information.

Learn More

See the following topics for more detail on the Workflow Engine:

· For a list of Workflow Engine functions, see Workflow Engine Functions Reference.

· For general information on the Workflow Engine, see Workflow Engine.

Alert Creation and Enrichment

Related image, diagram or screenshot

Configure Alert Creation tells you how to configure the Alert Builder, which creates alerts by processing event data from the Message Bus.

Configure Alert Creation

The Cisco Crosswork Situation Manager Alert Builder creates alerts by processing event data from the Message Bus. It does the following

· Deduplicates events into alerts.

· Calculates the entropy of alerts.

The following diagram illustrates where the Alert Builder operates within the overall data processing flow for Cisco Crosswork Situation Manager:

Related image, diagram or screenshot

The Alert Builder is a Moolet that you can configure in the alert_builder.conf file. See Configure Event De-duplication in the Alert Builder.

Learn More

See the following topics to learn more about the Alert Builder:

· Configure Event De-duplication in the Alert Builder.

· Alert Builder Reference.

To learn more about alert data processing, see:

· Process Alerts

Configure Event De-duplication in the Alert Builder

The Alert Builder Moolet assembles alerts from incoming events, sent by the LAMs across the Message Bus. These alerts are visible through the Alert View in the User Interface (UI). The Alert Builder Moolet is also responsible for:

· Updating all the necessary data structures.

· Ensuring copies of the old alert state are stored in the snapshot table in MoogDb, relevant events are created and the old alert record is updated to reflect the new events arriving into Cisco Crosswork Situation Manager.

Configure Alert Builder

Edit the configuration file at $MOOGSOFT_HOME/config/moolets/alert_builder.conf.

See Alert Builder Reference for a full description of all properties. Some properties in the file are commented out by default.Alert Builder Reference

Example Configuration

The following example demonstrates a simple Alert Builder configuration:

      {
      name                       : "AlertBuilder",
      classname                   : "CAlertBuilder",
      run_on_startup              : true,
      moobot                      : "AlertBuilder.js",
      event_streams               : [ "AppA" ],
      threads                     : 4,
      metric_path_moolet          : true,
      events_analyser_config      : "events_analyser.conf",
      priming_stream_name         : null,
      priming_stream_from_topic   : false
}

Alert Builder Moobot

The Moobot, AlertBuilder.js, is associated with the Alert Builder Moolet. It undertakes most of the activity of the Alert Builder. When the Alert Builder Moolet processes an event, it calls the JavaScript function, newEvent:

events.onEvent ( "newEvent" , constants.eventType( "Event" )).listen();

The function newEvent contains a call to create an alert. The newly created alert is broadcast on the Message Bus.

Learn More

See the following topics for more information:

· Alert Builder Reference for all Alert Builder properties.Alert Builder Reference

· Moobot Modules for further information about Moobots.Moobot Modules

Process Alerts

Cisco Crosswork Situation Manager processes alerts using the following backend components. For alert processing capabilities using Workflow Engine in the Cisco Crosswork Situation Manager UI, see Workflow Engine and its related topics.

These components are responsible for performing analysis, adding information to alerts, and noise reduction techniques.

· Alert Analyzer: A standalone process that analyses tokens in events and assigns each token an entropy value. The Alert Analyzer can use any text field in an event but, by default, it uses the event's description. This process runs periodically and does not form a part of the alert processing workflow. See Configure Entropy to Reduce Operational Noise for more information on setting entropy thresholds to remove noisy alerts.

· Enricher: Enriches alerts with additional information.

· Maintenance Window Manager: Marks alerts as 'In maintenance' if they match a scheduled maintenance window filter. You can set up maintenance windows for planned maintenance, such as scheduling a fix or regular maintenance of a system.

· Alert Rules Engine: Allows conditional processing of alerts, such as managing link up/link down processing. Before you configure the Alert Rules Engine, read about the Alert Rules Engine which is a powerful and flexible tool for data processing available in the Cisco Crosswork Situation Manager UI.

· Empty Moolet: An optional component that enables further processing of alerts or Situations. It usually runs as a standalone process but it can also be embedded in the processing chain. Cisco Crosswork Situation Manager provides an example Empty Moolet in the form of an Alert Manager.Empty MooletAlert Manager

The following diagram shows the alert processing components in a typical implementation of a workflow chain in Cisco Crosswork Situation Manager:

Related image, diagram or screenshot

Each component comprises a Moolet supplemented by Moobots.

Set Up Optional Alert Processing

Based upon your business requirements and your situation design, you may need to set up optional alert processing for Cisco Crosswork Situation Manager. For example:

· You may need to add data to alerts using one of the various processes for enrichment. See Enrichment Overview.

· If you want to use alert data to create and manage topologoy data, see #.

· Refer to the Alert Rules Engine documentation for a Link Up-Link Down Example and a Heartbeat Monitor.

If you need to customize data processing for alerts, you should first try the Workflow Engine. See Process Alerts with the Workflow Engine.

If you cannot address your alert processing needs with the Workflow Engine, use one of the following:

1. Alert Rules Engine.

2. Empty Moolet.

Process Alerts with the Workflow Engine

The Alert Workflow Engine in Cisco Crosswork Situation Manager through Maintenance Window Manager, and before they are sent to a clustering algorithm for clustering. In the data processing flow, it is the last opportunity to perform actions on alerts before clustering. Typical use cases include:

Typical use cases include:

· Handling time-based or delay-based alert processing.

· Heartbeat monitoring.

· Handling "flapping" behavior and metric-based failures.

· Correlation for alerts where de-duplication failed.

Learn More

See the following topics for more detail on the Workflow Engine:

· For a list of Workflow Engine functions, see Workflow Engine Functions Reference.

· For general information on the Workflow Engine, see Workflow Engine.

Enrichment Overview

Situations in Cisco Crosswork Situation Manager are built from data ingested from your monitoring systems. You may have use cases for your Situations that require more information than is contained in the raw data. If this is the case, you can use a process called enrichment to add supplemental data to alerts or Situations. Enrichment can:

· Improve accuracy for clustering alerts into Situations.

· Improve readability of alerts for operators.

· Aid operators in investigating Situations.

· Provide critical reporting data.

The first step is to identify whether your existing data is sufficient. If it is lacking, identify the type of enrichment data that meets your requirements and the data source that can provide it. You can then choose the most effective and efficient method of enrichment for your specific needs.

Do you need to enrich?

The need to enrich depends on whether the data from your data source or monitoring system fulfils your requirements. Examine the use cases for your data to identify any omissions.

For example, an organization sets up Cisco Crosswork Situation Manager to ingest the following event data:

"node_name": "U0039-router01"
"description": "Router down"

The data must fulfil these use cases:

· Operators need the site name to understand where they need to take action to fix the problem.

· Management needs the region for reporting requirements.

For this company, the node names are all based on the site name <site>-<component> so "U0039" reflects the site. There is no need to enrich for this use case.

The site name is not enough to determine the region, and the event data does not include region data. To satisfy the second use case, the company needs to enrich the event data.

Identify the enrichment purpose

The purpose of the enrichment indicates whether to enrich at alert or Situation level. Enrichment is expensive in terms of processing time and resource use. Inefficient enrichment can slow the processing of alerts, so it is important to enrich at the appropriate level.

Enrichment data can be broadly categorized to fulfil one of the following purposes:

· Operational: Functionally modifies behavior within Cisco Crosswork Situation Manager to drive processes such as clustering. Ideally performed on alert creation.

· Informational: Assists a consumer (operator or external system) to differentiate between Situations. Typically performed at Situation level. Includes updates to Situation description, services and processes.

· Diagnostic: Assists operators to investigate Situations and can be performed at either alert or Situation level. Examples include updates to alert and Situation custom_info and updates to Situation discussion threads.

The region data in our example is informational.

Identify the enrichment source

If the required data exists externally, identify its type:

· Static: Data that changes infrequently, for example a country code lookup to a country name.

· Dynamic: Data that may be subject to change, for example a database query to match a hostname to a service.

In our example, the company database stores the site number and relates it to the site address and region. The data is static:

site address city state region
U0039 1265 Battery St San Francisco CA US-WEST

Dynamic enrichment on every de-duplication has a greater performance impact than enrichment on alert creation. If the enrichment data is unlikely to change during the lifetime of an alert, enrich once on alert creation. See Enrich on Alert Creation for more details.

You can enrich from a static file in a LAMbot. All other enrichment is performed in a Moobot.

Processing for multiple enrichment sources

Enrichment should be limited and only used when necessary due to its consumption of processing time and resources. Parallel processing involves processing alerts in two or more enrichers simultaneously while serial processing hands the alert to one enricher at a time. The alert is passed onto the next enricher once it has been processed. If two enrichment sources are required for a single alert, serial processing should be used. Parallel processing could cause one enricher to disrupt the other enricher while processing the alert.

Select an enrichment method

Some enrichment methods are available in the UI:

· Situation Room Plugins.Enable Situation Room Plugins

· UI Enrichment

· using a static data file.

· Link Definitions.

Other methods are manually configured or accessed via the command line. The most common are:

· REST.V2 module to retrieve data through HTTP.

· ExternalDb module to retrieve data from supported SQL databases.

· Graze API to update Situations and alerts statically.

· Situation Manager Labeler to update Situations and alerts dynamically.

In our example, depending on the database specification, the company might use JDBC to add the region data into alert custom_info and the Situation Manager Labeler to add the region data to Situations.

Enrich on Alert Creation

If your enrichment data is unlikely to change during the lifetime of an alert, enrich once on alert creation.

To enrich on alert creation:

1. Create a custom alert enricher Moolet.

2. Configure your alert enricher to use caching.

3. Configure the Alert Builder to send data to your custom Moolet on alert creation.

4. Define your custom Moolet in Moogfarmd.

See Enrichment Overview for more information on enrichment methods and processes.

Create an Alert Enricher Moobot

Create an Alert Enricher Moobot to obtain enrichment data from your external source, for example via JDBC.JDBC LAM

Use Caching

The Bot utility is included with the Situation Manager Labeler.

You can configure your Alert Enricher Moobot to use the caching facilities in the Bot utility. This is optional but good practice if the data is relatively static. It reduces the time required to repeatedly process data from a third party system. For example:

Configure the Alert Builder

In the following example the Alert Builder sends newly created alerts to the Alert Enricher Moolet and updated alerts to the Maintenance Window Manager:

      if(alert)
{
        var alertAction=alert.payload().getAction() === "Alert Created" ? "create" : "update";

        if ( alertAction === "create" ) {
                logger.info("createAlert: Created Alert Id: " + alert.value("alert_id"));
                alert.forward("AlertEnricher");
                }
        else {
                logger.info("createAlert: Updated Alert Id: " + alert.value("alert_id"));
                alert.forward("MaintenanceWindowManager");
        }
}

Configure Moogfarmd

Define the Alert Enricher Moobot in Moogfarmd. For example:

      {
    name                : "AlertEnricher",
    classname           : "CEmptyMoolet",
    run_on_startup      : true,
    persist_state       : true,
    metric_path_moolet : false,
    moobot              : "AlertEnricher.js",
    standalone_moolet   : true,
    threads             : 5
}

Topology Overview

A topology is a set of connected entities. For example, nodes in a network for a hardware topology or connected microservices in an application topology.

Topologies let you view alerts and Situations according to the relationships that are important to you. Cisco Crosswork Situation Manager supports multiple named topologies, so different teams can have their own topological view. Network teams can use the topology to relate impacted nodes for a service outage, whereas DevOps engineers can see the impact of one service outage on other services in the topology. You could also base a topology on customers, applications, alert types, or physical locations.

Cisco Crosswork Situation Manager can cluster alerts into Situations based on the relationships between the alerts. For example, a Managed Service Provider may want to cluster alerts related to customers or a network support team may want alerts clustered by location.

You can represent alert attributes using points in the topology called nodes. Connections between the nodes are represented as links. Links in Cisco Crosswork Situation Manager are bidirectional. This means that if node A is connected to node B, B is also connected to A without the need to explicitly define that connection.

Existing topology

If you configured a topology in a previous Cisco Crosswork Situation Manager version the topology no longer exists once you upgrade to v8.0. You can recreate a topology using the Topologies API endpoints. You can also use the Topology Loader utility to load a large topology from a .csv file.Topologies API

If you configured a large topology in a previous Cisco Crosswork Situation Manager version, consider breaking it up. Smaller topologies are more manageable and enable more effective alert clustering.

Further information

See the following links for further information on topologies in Cisco Crosswork Situation Manager:

· Create and manage topologies

· Load a topology

· Topologies APITopologies API

· Graph TopologyGraph Topology

· Configure a Cookbook Recipe

Create and Manage Topologies

This topic outlines how to create and manage topologies in Cisco Crosswork Situation Manager. Topologies let you view alerts and Situations according to the relationships that are important to you. See Topology Overview for more information.

Before you begin

Before you begin to create and configure topologies in Cisco Crosswork Situation Manager, ensure you have met the following requirements:

· You know the details of the topologies, nodes and links you want to create.

· If you want to use the Topologies API or the Topology Loader utility to create topologies, your Cisco Crosswork Situation Manager user has the super_privileges role permission. See Role Permissions for more information.

· If you want to load one or more large topologies, you have generated maps of the connected nodes in .csv files. See Load a Topology for more information.

Create topologies

You can use the Topologies API to create and manage small topologies, but this is impractical for large topologies. If your topology .csv file is larger than 40 MB Cisco recommends using the Topology Loader utility.

Each topology has a status:

1. Active: You can use the Topology as a source in a Recipe's topology filter.

2. Inactive: You cannot use the Topology to filter a Recipe. You can set a topology to inactive while you are creating it or when you are updating it. See Maintain topologies.

See Topologies API and Graph Topology for a full description of the API endpoints and Moobot module methods.

See Load a Topology for more information about the Topology Loader utility.

Maintain topologies

Cisco Crosswork Situation Manager includes mechanisms for keeping topologies current and up-to-date.

You can use the clone and replace endpoints in the Topologies API to clone an active topology, make changes to the inactive clone, and then replace the active topology with the updated version. This optional feature means that you do not need to take your topologies offline in order to update them.

Cluster alerts by topology

To cluster alerts in a single topology, filter on a named topology.

Inferred topology

If you want to cluster alerts in several topologies from a single Recipe, perform the following steps. Note that alerts in separate topologies are clustered into separate Situations.

You may find this useful if you have split a single large topology into several smaller topologies, and you do not want to create a Recipe for each one.

· Configure a workflow to set the corresponding name of the topology in the custom_info.moog_topology attribute of your alerts. See populateNamedTopology for more information.

· Configure a Recipe to filter on an inferred topology. The Recipe obtains the topology name from custom_info.moog_topology.

· Set the Recipe's Node Field to the alert field that contains the topology node. The Recipe checks whether the node exists in the topology named in custom_info.moog_topology. If it does, the alert is included in the filter.

· Set the Recipe's Match property as follows:

— Any node: The Recipe checks whether the alert is from any node in the same topology as the node represented by the reference alert in the Situation.

— Nodes within: The Recipe checks whether the alert is from a node within a specified hop limit of the node represented by the reference alert in the Situation.

Note

To cluster all alerts from the same node, add a clustering attribute at 100% similarity. Use the same attribute that you are using for your Topology Node Field.

See Configure a Cookbook Recipe for more information on the Recipe topology filter.

Vertex Entropy

Vertex Entropy is a Cisco Crosswork Situation Manager algorithm that indicates the critical nodes within your topologies and their tendency to produce important events.

The Graph Analyser process calculates the Vertex Entropy for all nodes in your topologies. It processes topology changes every 30 seconds as part of Housekeeper.

For more information on configuring Vertex Entropy, see Configure Topology-based Clustering with Vertex Entropy.

If you do not want to use the Vertex Entropy feature, you can turn off the Graph Analyser process in Housekeeper to reduce load on your Cisco Crosswork Situation Manager system. To do this, go to Settings > Vertex Entropy in the Cisco Crosswork Situation Manager UI.

View Situation topology

When a Situation affects more than one topological node, Cisco Crosswork Situation Manager presents a visual representation of the topologies impacted by the Situation. See View Situation Topology for more details.

Load a Topology

This topic outlines how to load topologies in Cisco Crosswork Situation Manager. Topologies let you view alerts and Situations according to the relationships that are important to you. See Topology Overview for more information.

To use the topology loader, you create a comma-separated value (.csv) file of the node-to-node links. The utility builds and caches the topology in the topologies, topo_nodes and topo_links tables in the moogdb database. You can also add an optional description for each link in the topology.

Before you begin

Before you load your topology data into Cisco Crosswork Situation Manager, ensure you have met the following requirements:

1. You have created each topology for which you want to load nodes and links, using the Topologies API or the Graph Topology Moobot module.Topologies APIGraph Topology

2. You have generated a map of the connected nodes in a .csv file. Create a separate file for each topology.

3. Your .csv file contains all of the nodes that are expected to send events.

4. The lines in your .csv file follow the format: <node1>,<node2>,<optional description>. For example:

host_a3,host_a1,Link description
host_a3,host_a2,Link description
host_a4,host_a1,Link description

Note

If you use a .csv file with the format required by the deprecated Topology Builder utility, the optional <weight> data will be saved as link descriptions.

If a node name includes a comma, you must enclose the name in double quotes.

Ensure there are no extra spaces in the file, around the fields and at the beginning and end of each line.

The topology loader ignores any lines in your file that are not in the correct format. It also ignores any loops such as 'host_x, host_x'. The string node names are case insensitive.

5. You have access to the Cisco Crosswork Situation Manager server that runs Nginx. The utility matches the hostname you provide against the name on the SSL certification in the Nginx configuration.

Load a topology

Load your topology into Cisco Crosswork Situation Manager as follows:

· Log into the Cisco Crosswork Situation Manager server that runs Nginx.

· Load the .csv topology file into the database using the topology loader found at $MOOGSOFT_HOME/bin:

./topology_loader -t=host -f=host_topology.csv --hostname=example.com --credentials=phil:password123

The -t option is the name of the topology. You must create the topology before you run the utility.

The -f option specifies the file that contains the topology data.

The --hostname option specifies the Cisco Crosswork Situation Manager host running Nginx.

The --credentials option specifies the Graze username and password, if you are not using the defaults.

For a description of all available options see Topology Loader Utility Command Reference.

· The topology loader utility uses the source data to add nodes and links to the specified topology. The utility records the topological information in the topologies, topo_nodes and topo_links databases. The utility logs any errors to the console.

After you have loaded your topological data, the graph analyser task calculates the Vertex Entropy of the nodes. See Topology Overview for more information.

Modify a topology

You cannot use the Topology Loader utility to modify or delete existing links.

To modify a topology, use the endpoints in the Topologies API or the methods in the Graph Topology Moobot module.

Alert Clustering and Ticketing

Related image, diagram or screenshot

Process Situations tells you how to create a Situation action Workflow engine to trigger workflows based on Situation actions. For example, when a Situation is created, updated, or closed.

Integrate with Ticketing Services tells you how to integrate with ticketing services including ServiceNow.

Situation Design

Configure Clustering Algorithms

Clustering algorithms in Cisco Crosswork Situation Manager group alerts based on factors such as time, language, similarity and proximity. See the Clustering Algorithm Guide for more information.

You can configure the following Cisco Crosswork Situation Manager clustering algorithms:

1. Cookbook: A powerful clustering algorithm that creates clusters defined by the relationships between alerts and their attributes. To configure Cookbook and Recipes, see Configure Cookbooks and Recipes.Cookbook

2. Tempus: A time-based algorithm that clusters alerts into Situations based on the similarity of their timestamps. To configure Tempus, see Configure Tempus.

3. Merge Groups: Enables you to set a similarity threshold so that Cisco Crosswork Situation Manager merges Situations created by different clustering algorithms that meet this threshold. To configure merge groups, see Merge Groups.

You can also configure Cookbooks and Recipes, Tempus, and Merge Groups via the Graze API.

The Settings > Import/Export option in the Cisco Crosswork Situation Manager UI enables you to export clustering algorithm configurations from one Cisco Crosswork Situation Manager system and import them into another. For example, from a test environment to your production environment. Go to

Configure Deterministic Alert Clustering with Cookbook

Cookbook is a deterministic clustering algorithm in Cisco Crosswork Situation Manager that creates Situations defined by the relationships between alerts.

You can configure Cookbook to cluster alerts into Situations if they have specific characteristics such as temporal or topological proximity. Cookbook filters can include characteristics such as the following:

· Class or type

· Description

· Server priority

· Geographical location

· Environment classification

Each Cookbook is a collection of Recipes: sets of configurable filters, triggers, and other calculations such as priority ordering and entropy threshold. A Cookbook can run multiple Recipes concurrently to process the incoming event stream and produce a variety of Situations. A Cisco Crosswork Situation Manager deployment may include multiple instances of Moogfarmd, each of which can run multiple Cookbooks.

Configure Cookbooks and Recipes

Use the following steps to configure a Cookbook and its Recipes via the Cisco Crosswork Situation Manager UI:

· First, you must configure the Recipes that you want to use in a Cookbook. See Configure a Cookbook Recipe for details.Configure a Cookbook Recipe

· Then, you must create the Cookbook that will contain those Recipes. See Configure a Cookbook for details.Configure a Cookbook

· Finally, you must activate the Cookbook so that it starts to cluster alerts into Situations. See Configure a Cookbook for details.Configure a Cookbook

You can also configure a Cookbook and its Recipes via the Graze API.

If you change a Cookbook, see Cookbook Configuration Changes for information on how these changes affect the clusters that Cookbook creates.

Cookbooks configured in the UI and the Graze API can run concurrently.

Learn More

To learn about how you can use macro language for Situation descriptions, see Situation Manager Labeler.

Configure a Cookbook

Cookbook is a deterministic clustering algorithm in Cisco Crosswork Situation Manager that creates Situations defined by the relationships between alerts.

Cookbook requires at least one active Recipe to function and cluster alerts into Situations. See Configure a Cookbook Recipe for more details.Configure a Cookbook Recipe

Before you begin

Before you set up your Cookbook via the UI, ensure you have met the following requirements:

· You have set up the Recipes you want your Cookbook to use. See Configure a Cookbook Recipe for details.Configure a Cookbook Recipe

· Your LAMs or integrations are running and Cisco Crosswork Situation Manager is receiving events.

Create a Cookbook

To create a new Cookbook:

· Go to Settings > Cookbooks.

· Click the + icon to create a new Cookbook.

· Fill in the properties to name and describe the Cookbook:

Name: Name of the Cookbook.

Description: Text description of the Cookbook.

· Configure the Cookbook's input and clustering behaviour:

Process Output Of: Defines the source of the alerts for the Cookbook.

Cluster By: Determines Cookbook's clustering behavior. You can select one of the following:

· First Matching Cluster: Cookbook adds alerts to the first cluster in a Recipe over the similarity threshold value. This is the default behavior for Cookbook.

· Closest Matching Cluster: Cookbook adds alerts to the cluster with the highest similarity greater than the similarity threshold value. This option may be less efficient because Cookbook needs to compare alerts against each cluster in a Recipe.

Entropy Threshold: Select the type of entropy threshold that you want Cookbook to use:

· Use the Global Entropy Threshold: This is a single entropy threshold that Cookbook applies to all alerts to eliminate noisy alerts with a lower entropy value.

· Use the Manager-Specific Entropy Thresholds: Use entropy thresholds set up for individual managers. If the manager for an alert has an entropy threshold set, Cookbook uses this value to eliminate noisy alerts with a lower entropy value. If an alert's manager does not have an entropy threshold, Cookbook uses the global entropy threshold to filter out alerts.

· Use a Specific Entropy Threshold: Set a specific entropy threshold value that you want Cookbook to use to eliminate noisy alerts with a lower entropy value. Enter the value you want to use. Unlike the other two dynamic thresholds that react to changes in the distribution of events, this threshold is static and you should periodically revise it.

· Do Not Use an Entropy Threshold: Select this option if you do not want Cookbook to filter out any alerts based on their entropy value.

See Configure Entropy Thresholds with Alert Analyzer for more information on setting global and manager-specific entropy thresholds.

Cook For: Maximum time period that Cookbook clusters alerts for before the Recipe resets and starts a new cluster. See Cookbook and Recipe Examples for more information.

If you set a different Cook For time for a Recipe, it overrides the Cookbook value. Recipes without a Cook For time inherit the value from the Cookbook.

Cook For Extension: Time period that Cookbook can extend clustering alerts for before the Recipe resets and starts a new cluster. Setting this value enables the cook for auto-extension feature for this Cookbook. As Cookbook receives related alerts, it continues to extend the total clustering time until the Max Cook For period is reached. You can use this time period in conjunction with the Max Cook For value to ensure that Cookbook continues to cluster alerts together that are related to the same failure. It only applies to new related alerts, not to existing alerts that are updated with new events. See Cookbook and Recipe Examples for more information.

If you set a different Cook For Extension time for a Recipe, it overrides the Cookbook value. Recipes without a Cook For Extension time inherit the value from the Cookbook.

Max Cook For: Maximum time period that Cookbook clusters alerts for before the Recipe resets and starts a new cluster. It works in conjunction with the Cook For Extension time to help ensure that Cookbook continues to cluster alerts together that are related to the same failure. If Cook For Extension is set and this value is not set, it defaults to three times the Cook For value. See Cookbook and Recipe Examples for more information.

If you set a different Max Cook For time for a Recipe, it overrides the Cookbook value. Recipes without a Max Cook For value inherit the value from the Cookbook.

Scale By Severity: If checked, Cookbook ignores alerts with a severity of 0 (Clear).

· Configure which Recipes the Cookbook uses and how it uses them:

a. Single Recipe Matching: Enables you to set a priority order for Recipes in the Cookbook. If you select this check box, Cookbook assigns each alert to the highest priority Recipe where it satisfies the clustering criteria. If unselected, Cookbook assigns an alert to all Recipes where the alert satisfies the clustering criteria.

b. Selected Recipes: Move the Recipes from the Available column to the Selected column to to include them in the Cookbook. If you have selected First Recipe Match Only, put the Recipes in the correct order so that Cookbook can determine which Recipe an alert should be assigned to. You should place the highest priority Recipe at the top of the list.

· Click Save Changes to create the Cookbook.

Activate the Cookbook

After completing the configuration, activate the new Cookbook to run alongside any existing active Cookbooks:

· Go to Settings > Cookbook Selection.

· Move the new Cookbook from the Available Cookbooks column to the Active Cookbooks column to make it active.

· Click the Advanced tab if you want to configure Cisco Crosswork Situation Manager to remove closed and superseded Situations from Moogfarmd. Define how often you want the removal to occur in hours and minutes.

· Click Save Changes to activate the Cookbook.

Cisco Crosswork Situation Manager applies the changes to the Cookbook as soon as you save the configuration.

If you change a Cookbook, see Cookbook Configuration Changes for information on how these changes affect the clusters that Cookbook creates.

Configure a Cookbook Recipe

A Cookbook Recipe is a set of configurable filters, triggers, and calculations that defines the type of alerts and the alert relationships that Cookbook detects and clusters into Situations.

Cookbook requires at least one active Recipe in order to function and cluster alerts into Situations.

You can configure the following two Recipe types from the UI:

· Value Recipe v2: Default Recipe that extracts and analyzes groups of consecutive characters, called shingles, to measure text similarity between alerts.

· Value Recipe: First version of the Value Recipe that uses a string comparison mechanism to determine text similarity between alerts.

See Recipe Types for more details on the different types of Recipes available in Cookbook. If you want to implement a Bot Recipe that allows you to call Moobot functions, you can use the Graze API.

Before you begin

Before you set up your Recipe via the UI, ensure you have met the following requirements:

· Your LAMs or integrations are running and Cisco Crosswork Situation Manager is receiving events.

· If you want to cluster on topology information or use Vertex Entropy in your Recipes, you have created one or more topologies. See Topology Overview.

Create a Cookbook Recipe

To create a new Cookbook Recipe from the Cisco Crosswork Situation Manager UI:

· Navigate to Settings > Cookbook Recipes.

· Click the + icon to create a new Recipe.

· On the Recipe tab, enter the properties to name and describe the Recipe:

— Name: Name of the Recipe. Use a unique and descriptive name.

— Situation Description: Description that appears in Situations that the Recipe creates.

— Recipe Type: Type of Recipe. The options are Value Recipe and Value Recipe v2. See Recipe Types for more information.

Configure the Recipe behavior and filters that define the alert relationships:

— Trigger Filter: Determines the alerts that Cookbook considers for Situation creation. Cookbook includes alerts that match the trigger filter. For details on creating a filter, see Filter Search Data. To set a Vertex Entropy trigger filter, see Configure Topology-based Clustering with Vertex Entropy for more information.

— Exclusion Filter: Determines the alerts to exclude from Situation creation. Cookbook ignores alerts that match the exclusion filter. For details on creating a filter, see Filter Search Data. To set a Vertex Entropy exclusion filter, see Configure Topology-based Clustering with Vertex Entropy for more information.

— Seed Alert Filter: Determines whether to create a Situation from a seed alert. The seed alert must meet both the Trigger Filter, Exclusion Filter and Seed Alert Filter criteria to create a Situation. Cookbook considers subsequent alerts for clustering if they meet the trigger and exclusion filter criteria. Alerts that arrive prior to the seed alert that met the trigger and exclusion filter criteria do not form Situations. For details on creating a filter, see Filter Search Data. To set a Vertex Entropy seed alert filter, see Configure Topology-based Clustering with Vertex Entropy for more information.

The seed alert filter is a mechanism to ensure that only specific events create Situations. For example, if you create a seed alert filter where the description matches 'Switch failure', alerts are eligible for clustering into a Situation only after a seed alert with the matching description arrives.

— Rate Filter: Determines whether Cookbook clusters alerts into Situations based on the rate the alerts arrive and the minimum and maximum sample size. To add a rate filter, check the checkbox and complete the following fields:

· Rate: Rate, in number of alerts per second. Cookbook clusters alerts if they arrive at the rate specified here or higher.

· Min Sample Size: Number of alerts that must arrive before the Cookbook starts to calculate the alert rate.

· Max Sample Size: Maximum number of alerts that are considered in the alert rate calculation. When more than this number of alerts have arrived, Cookbook discards the oldest alerts and calculates the alert rate based on the number of alerts in the Max Sample Size.

— Topology Filter: Determines whether Cookbook clusters alerts into Situations based on topology information. This section is only enabled if you have one or more topologies in your system. To add a topology filter, check the checkbox and complete the following fields:

— Source: The source of the topology information on which to cluster. Choices are:

· Infer topology from alert: The Recipe obtains the topology name from custom_info.moog_topology. You can use this option to cluster alerts related to several topologies, without needing to create an individual Recipe for each named topology. For more information, see Create and Manage Topologies.

· Named topology: The name of the topology from which to obtain topology information.

· Node Field: The alert field that contains the topology node information. You must define a node field for both named and inferred topologies.

· Match: Maximum number of hops between the alert source nodes in order for the alerts to qualify for clustering. Cisco Crosswork Situation Manager measures hop limit from the first alert that formed the Situation and always follows the shortest possible route. A hop is the distance between two directly connected nodes. For more information on Vertex Entropy and hops, see Vertex Entropy and Configure Topology-based Clustering with Vertex Entropy. To change the default of 2, select the Nodes within checkbox and then set a different limit:

· Any node: The Recipe checks whether the alert is from any node in the same topology as the node represented by the reference alert in the Situation.

· Nodes within: The Recipe checks whether the alert is from a node within a specified hop limit of the node represented by the reference alert in the Situation.

Note

To cluster all alerts from the same node, add a clustering attribute at 100% similarity. Use the same attribute that you are using for your Topology Node Field.

For more information on topologies see Topology Overview.

— Alert Threshold: Minimum number of alerts in a candidate cluster required before Cookbook creates a Situation. If left as '1', a single alert can generate a new Situation.

To determines the number of alerts required to create a Situation, Cookbook compares the alert threshold values in the Cookbook Recipe to those of the merge group that the Cookbook Recipe belongs to. It uses the higher value.

If you are using the default merge group which has an alert threshold of 2, Cookbook will never create a Situation containing a single alert. If you want Cisco Crosswork Situation Manager to create Situations with a single alert, change the alert threshold in the default merge group to 1 or create a custom merge group. See Merge Groups for more information on updating the default merge group and setting up custom merge groups.

— Cook For: Minimum time period, in seconds, that Cookbook clusters alerts for before the Recipe resets and starts a new cluster. See Cookbook and Recipe Examples for more information.

If you set a different Cook For time for a Recipe, it overrides the Cookbook value. Recipes without a Cook For time inherit the value from the Cookbook.

— Cook For Extension: Time period that Cookbook can extend clustering alerts for before the Recipe resets and starts a new cluster. Setting this value enables the cook for auto-extension feature for this Recipe. As Cookbook receives related alerts, it continues to extend the total clustering time until the Max Cook For period is reached. Used in conjunction with the Max Cook For value, the Cook For Extension period helps to ensure that Cookbook continues to cluster alerts together that are related to the same failure. The Cook For Extension period only applies to new related alerts; it does not apply to existing alerts that are updated with new events. See Cookbook and Recipe Examples for more information.

If you set a different Cook For Extension time for a Recipe, it overrides the Cookbook value. Recipes without a Cook For Extension time inherit the value from the Cookbook.

— Max Cook For: Maximum time period that Cookbook clusters alerts for before the Recipe resets and starts a new cluster. It works in conjunction with the Cook For Extension time to help to ensure that Cookbook continues to cluster alerts together that are related to the same failure. If Cook For Extension is set and this value is not set, it defaults to three times the Cook For value. See Cookbook and Recipe Examples for more information.

If you set a different Max Cook For time for a Recipe, it overrides the Cookbook value. Recipes without a Max Cook For value inherit the value from the Cookbook.

a. Configure the alert matching property for the Recipe:

— Cluster By: Defines how Cookbook matches alerts to clusters. You can select the default option to cluster alerts based on Cookbook's First Recipe Match Only setting. The First Matching Cluster option adds alerts to the first cluster above the similarity threshold value. The alternative is Closest Matching Cluster to add alerts to the cluster with the highest similarity greater than the similarity threshold value. The second option may be less efficient because it needs to compare alerts against each cluster in a Recipe.

· On the Clustering tab, add the fields that you want Cookbook to factor in when clustering alerts:

· Click the + icon and select a field in the drop-down list.

· Use the slider to set the similarity threshold for each field. The value determines the required percentage similarity for Cookbook to cluster a set of alerts.

· If you want to use custom info fields, configure the Match List Items option. See Match List Items in Recipes for details.

· If you are configuring a Value Recipe, check Case Sensitive if you want the text similarity calculation to factor in case sensitivity. See Recipe Types for more information.

· If you are configuring a Value Recipe V2, select whether you want Cookbook to calculate text similarity using shingles or words. You can select Shingles from the drop-down list in the Language Processing field and enter a Shingle Size. The default value is the optimal shingle size for that field. Alternatively, you can select Words. See Recipe Types for more information.

· Click Save Changes.

When you have completed the configuration, Cisco Crosswork Situation Manager applies the changes to any active Cookbooks that use the Recipe as soon as you save the changes. If the Recipe has not been added to an active Cookbook, go to Settings > Cookbook and move the Recipe under Selected Recipes for that Cookbook.

If you change a Cookbook Recipe, see Cookbook Configuration Changes for information on how these changes affect the clusters that Cookbook creates.

Configure Cookbooks and Recipes

Use the following steps to configure a Cookbook and its Recipes via the Cisco Crosswork Situation Manager UI:

a. First, you must configure the Recipes that you want to use in a Cookbook. See Configure a Cookbook Recipe for details.Configure a Cookbook Recipe

b. Then, you must create the Cookbook that will contain those Recipes. See Configure a Cookbook for details.Configure a Cookbook

c. Finally, you must activate the Cookbook so that it starts to cluster alerts into Situations. See Configure a Cookbook for details.Configure a Cookbook

You can also configure a Cookbook and its Recipes via the Graze API.

If you change a Cookbook, see Cookbook Configuration Changes for information on how these changes affect the clusters that Cookbook creates.

Cookbooks configured in the UI and the Graze API can run concurrently.

Cookbook Configuration Changes

If you change the configuration of a Cookbook or Cookbook Recipe, Cisco Crosswork Situation Manager may re-evaluate any Cookbook clusters, depending on your persistence setting and the severity of the configuration change. This applies to clusters of alerts that Cisco Crosswork Situation Manager holds in memory and has not yet formed into Situations. It does not affect clusters that have already become Situations that users can see and have been saved in the database.

Your persistence setting affects whether Cookbook re-evaluates clusters as follows:

· If persistence is turned off, Cookbook resets every cluster and all new incoming alerts will form new Situations.

· If persistence is turned on, Cookbook updates existing clusters depending on the configuration parameters that have been changed, as described below. Cookbook may remove clusters or create new Situations or it may persist the clusters so that new alerts are added after you have changed the configuration.

Configuration categories

Cisco Crosswork Situation Manager groups Cookbook and Recipe configuration changes into three categories:

· Cosmetic changes: These configuration properties are non-functional and do not affect how Cookbook creates and maintains clusters. When you change these properties, there are no functional changes to existing clusters.

· Property changes: These configuration properties affect how Cookbook maintains clusters and generates Situations.

· Core changes: These configuration properties fundamentally govern how Cookbook creates clusters and groups alerts into Situations.

Cookbook and Recipe configuration properties are grouped into the following categories:

Category	Cookbook Property	Recipe Property
Cosmetic changes	Name	Name
Description	Situation Description
Property changes		Alert Threshold
Cook For	Cook For
Cook For Extension	Cook For Extension
Max Cook For	Max Cook For
Core changes		Trigger Filter
	Exclusion Filter
	Seed Alert Filter
	Rate Filter
	Topology Filter
Cluster By	Cluster By
Entropy Threshold
Scale by Severity
Recipe Matching	Recipe Matching
	Clustering

Cosmetic changes

Cosmetic changes to Cookbook and Recipe properties have the following effects on clusters:

Name

If you change the name of a Cookbook or a Recipe, Cookbook makes no operational changes to clustering.

Description

If you change the Situation Description for a Recipe, Cookbook applies the new description to all Situations created after the change. Cookbook maintains the old description for all Situations created before the change, regardless of whether new alerts are added to the Situation.

Property changes

Property changes to Cookbook and Recipe properties have the following effects on clusters:

Alert Threshold

Changes to the Alert Threshold

Reducing the Alert Threshold

For changes to the Alert Threshold, the main behavioral difference occurs when the Alert Threshold is reduced. In the example below, before the configuration change, the Alert Threshold was set to 3 and two alerts had arrived and formed a cluster in memory. If you change the Alert Threshold configuration from 3 to 1, the cluster satisfies the new configuration so Cookbook automatically creates a Situation in the database containing the these two alerts. New alerts coming into the system can continue to be added to this cluster.

Related image, diagram or screenshot

Increasing the Alert Threshold

If you increase the Alert Threshold configuration, the cluster will persist in memory until the higher Alert Threshold is reached.

Cook For/Cook For Extension/Max Cook For

Cookbook adopts a similar logic for changes to all three of these attributes because they all affect the cluster's expiry time.

Extending Cook For/Cook For Extension/Max Cook For

If you extend any of these properties, the cluster expiry time is extended from the first event time. In the example below, before the configuration change, the Cook For time is 30 seconds and alerts 1 and 2 arrive at 0 and 25 seconds, respectively. After the Cook For time changes from 30 seconds to 40 seconds, alert 3 arrives at 35 seconds. Cookbook clusters this alert with the persisted cluster from the previous configuration. When alert 4 arrives at 45 seconds, Cookbook creates a new cluster because it satisfies the newly defined Cook For time. Cookbook behaves similarly if Cook For Extension or Max Cook For properties are extended.

Related image, diagram or screenshot

Reducing Cook For/Cook For Extension/Max Cook For

If you reduce any of these properties, Cookbook relies on the first event time to establish whether clusters are still valid. In the example below, before the configuration change, the Cook For time is 30 seconds and alerts 1 and 2 arrive at 0 and 25 seconds, respectively. If you reduce the Cook For time from 30 seconds to 20 seconds, the arrival time of the most recent (second) alert exceeds the new Cook For time so Cookbook expires and removes the cluster.

Related image, diagram or screenshot

In the second example below, before the configuration change, the Cook For time is 60 seconds and alerts 1 and 2 arrive at 0 seconds and 25 seconds, respectively. If you reduce the Cook For time from 60 seconds to 40 seconds, the cluster still persists because it is still within the new Cook For time so that, when alert 3 arrives at 35 seconds, it joins the existing cluster. Alert 4 arrives at 45 seconds which exceeds the new Cook For time so Cookbook places it in a new cluster.

Related image, diagram or screenshot

In the same example, if alerts 3 and 4 had arrived (at 35 and 45 seconds) and the configuration change occurred at 50 seconds, Cookbook would close the cluster immediately with the four alerts in it, as shown below.

Related image, diagram or screenshot

Core changes

For all properties that are in the core changes group, any changes cause Cookbook to remove the associated clusters. This is because the fundamental rules on which Cookbook clusters alerts have changed, and it is no longer meaningful to cluster new alerts with old ones.

As an example, consider the example below in which you change the similarity of a Recipe. Initially, the recipe uses a 50% similarity on source ID. Alerts 1 and 2 arrive and Cookbook clusters them together. If you increase the similarity from 50% to 100%, Cookbook removes the cluster from memory. The diagram below shows how confusing it would be if the cluster persisted, visibly seeing a cluster containing alerts which clearly contradict a 100% match on source ID.

Related image, diagram or screenshot

This behavior, to remove any old clusters and start new clusters when new alerts arrive, is consistent across all core configuration changes.

Restore a Cookbook or Recipe

You can restore an earlier version of a Cookbook or a Recipe.

You must have the super_privileges permission to restore earlier versions of Cookbooks and Recipes.

Restore an earlier version of a Cookbook or Recipe

· Go to Settings > System > Change History.

The left hand panel is a Cookbook and Recipe version timeline. Each version in the timeline has the creation date and time, the user who created that version, and the category (recipe or cookbook).

The right hand panel shows the changes made to the Cookbook or Recipe for that version. For example, Recipe type, Situation description or alert threshold.

See Configure a Cookbook Recipe and Recipe Types for more information on these settings.

· Select a Cookbook or Recipe version.

· Select Restore to restore the selected version of the Cookbook or Recipe.

In the Restore Saved Configuration box, select Yes.

In the Restore Successful box, select OK.

Warning

When you restore an earlier Cookbook or Recipe version, Cisco Crosswork Situation Manager deletes all later versions of that Cookbook or Recipe.

The version timeline automatically updates, and the version you restored will be at the top of the timeline.

If you restore a version of a Recipe attached to a Cookbook instead of restoring the entire Cookbook, Cisco Crosswork Situation Manager detaches the Recipe from the Cookbook before restoring the earlier version. You must then re-attach the restored version of the Recipe to the Cookbook.

See Configure a Cookbook and Configure a Cookbook Recipe for more information.

Another user changes or restores a Cookbook or Recipe

If you are looking at a Recipe or Cookbook in the Change History page and another user changes that Cookbook or Recipe, you will see the following warning message:

There are new changes available. Refresh this page to see the changes.

Refresh the page to remove the message and see the updated version timeline.

If you’re configuring a Recipe or Cookbook and another user restores an earlier version of that Recipe or Cookbook, that Recipe or Cookbook configuration automatically updates to the restored version. You will see the following warning message:

System is restored by the other user. Updated the current list of changes.

Restore a Cookbook that is part of a merge group

Merge groups always use the latest version of a Cookbook. If you restore an earlier version of a Cookbook that a merge group uses, that merge group automatically updates to use the restored Cookbook as this version is now the latest version.

If restoring an earlier Cookbook version means that a merge group is empty, Cisco Crosswork Situation Manager automatically deletes that merge group.

If you delete a Cookbook that is part of a merge group, Cisco Crosswork Situation Manager removes that Cookbook from the merge group. If you then restore the deleted Cookbook, Cisco Crosswork Situation Manager does not automatically add that Cookbook back into the merge group.

See Configure Merge Groups for more information.

Restore a Recipe with an associated topology

In order to restore a Recipe that filters on a named topology, the topology must still exist in Cisco Crosswork Situation Manager.

Cookbook and Recipe Examples

The following examples provide further explanation of the functionality within your Cookbook and Recipe configurations. See Configure a Cookbook and Configure a Cookbook Recipe for further details.Configure a CookbookConfigure a Cookbook Recipe

Cook For auto-extension example

The Cook For auto-extension functionality uses the Cook For, Cook For Extension and Max Cook For properties in the Cookbook and Recipe configurations. These configuration properties help to ensure that Cookbook continues to cluster alerts together that are related to the same failure.

Consider the following example:

Related image, diagram or screenshot

· Cook For is set to 1 hour.

· Cook For Extension is set to 30 minutes.

· Max Cook For is set to 2 hours.

Cisco Crosswork Situation Manager receives an alert which meets the Cookbook and Recipe criteria so Cookbook starts a cluster. If Cisco Crosswork Situation Manager receives a new related alert 40 minutes after Cookbook started clustering alerts, Cookbook extends the total clustering time by 30 minutes from that time to 1 hour and 10 minutes, then:

1. If Cisco Crosswork Situation Manager receives another alert 1 hour and 5 minutes after Cookbook started clustering, because Cisco Crosswork Situation Manager received it within the extended time of 1 hour and 10 minutes, Cookbook further extends the total clustering time to 1 hour and 35 minutes. Cookbook continually extends the total clustering time as it receives more related alerts, provided that they are received within the extended time. Cookbook can extend the total clustering time until the Max Cook For time is reached. If Cookbook receives further related alerts after the Max Cook For time of 2 hours has elapsed, the Recipe resets and Cookbook adds them to a new cluster.

2. If Cisco Crosswork Situation Manager does not receive any further alerts, Cookbook stops clustering alerts after the extended time of 1 hour and 10 minutes elapses. If Cisco Crosswork Situation Manager then receives another alert after this time has elapsed, Cookbook starts a new cluster.

Recipe Types

The Cookbook clustering algorithm uses the following Recipe types to define alert relationships and control how it clusters alerts:

· Value Recipe v2

· Value Recipe

· Bot Recipe

The Value Recipe V2 and CValueRecipe use different methods to calculate the textual similarity between alerts. The CBotRecipe is a customizable recipe that allows you to call specific functions from a Moobot.

Value Recipe v2

Value Recipe v2 extracts and analyzes groups of consecutive characters to measure text similarity between alerts. It is the default Recipe in Cookbook for new Cisco Crosswork Situation Manager v7 installations and for any new Cookbooks you create.

This recipe uses the bag-of-words model and shingling natural language processing methods to calculate the text similarity between alerts. Shingling is the process in which Cookbook extracts groups of consecutive characters called shingles from a source string. Potential sources include the alert source ID or description. To measure similarity, Cookbook calculates the number of identical shingles. You can control the calculation using the shingle size property.

In the Clustering tab in the Cookbook Recipes window in Settings, you select whether Cookbook treats string values as shingles or words for each field you use to cluster alerts. If you select shingles, you can choose what you want the shingle size to be. The default shingle size settings in the Value Recipe v2 are optimal for most use cases.

For example, if you set the shingle size for source IDs to 2 and Cookbook receives two alerts with the source IDs:

webserver0100

webserver0200

Cookbook extracts the following shingles from the source ID strings:

we eb bs se er rv ve er r0 01 10 00

we eb bs se er rv ve er r0 02 20 00

Ten out of the 12 shingles are identical which indicates a high similarity.

If you set the shingle size to 0 or less, Cookbook treats the string values as words in its text similarity calculation.

For example, if Cookbook receives two alerts with the source IDs: "database01" and "database02", it treats them as:

database01

database02

These two words are not identical so the two alerts would be given a low similarity.

Value Recipe

The first version of the Value Recipe uses a string comparison mechanism to cluster alerts by textual similarity.

Value Recipe uses string metric algorithms to calculate similarity. The calculation breaks strings up into partitions and performs a character-by-character comparison of each partition to measure similarity.

For example, if you set a Cookbook Recipe to cluster alerts with source IDs and descriptions with a Similarity Threshold of 100%, in a scenario where Cookbook receives the following alerts:

Alert	source_id	description
A	001	database
B	001	webserver
C	002	database
D	002	database

Cookbook creates three clusters: one containing alert A, one containing alert B and one containing alerts C and D which have identical source IDs and descriptions. The string may contain non-alphabetical characters. Value Recipe can also convert numeric values to strings for comparison.

The Value Recipe uses the case sensitive property to enable or disable case sensitivity as a factor in text similarity matching. For example, you can enable the Case Sensitivity property for source ID so Cookbook only matches if the case is identical but you can disable it for descriptions if you do not want descriptions to be case sensitive.

If you enable case sensitivity, then an alert from a source called "WebServer1" and an alert from a source called "webserver1" would have a lower similarity.

To make Cookbook match each value in a list individually in custom info fields, check the Match List Items check box in the Cookbook Recipe Clustering tab. See Match List Items in Recipes for details.

Bot Recipe

Bot Recipe is a customizable Cookbook Recipe that allows you to call certain functions from the Cookbook.js Moobot. You can configure the Bot Recipe using the Cookbook Graze API. Bot Recipes are not available in the UI.Graze API

You can configure the Bot Recipe to call functions defined in the Cookbook.js Moolet. The Cookbook Moolet defines two functions, an initialization function called initialize_function and a member_function.

You can call the initialize_function once to set up any necessary initialization of the algorithms you want to write in the Moobot.

You can call the member_function once for every event that passes the trigger. Cookbook considers each of these events for matching and for every candidate cluster in the system. For example, Cookbook calls the member_function 100 times if there are 100 candidate clusters for each alert that comes through the system. Cookbook compares the alert to candidate clusters that are potential Situations. If the alert's similarity matches or exceeds the matcher value, Cookbook adds the alert to the candidate cluster.

Match List Items in Recipes

You can create Cookbook Recipes and configure clustering around the use of list-based fields in alert custom info. You can also set whether list-based clustering of a custom field is applied. If not, the field is treated as a string.

A list in custom info is a properly formed JavaScript array. To see if a custom info item is a list, examine the custom info details in the UI. If the list can be expanded and has a value of x items at the top level, then it is a list. For example:

Related image, diagram or screenshot

A text field containing comma separated values is not considered a list.

Configure Match List Items for a Custom Info Field

To match list items for a custom_info field:

· On the Settings tab, select Cookbook Recipes from the Algorithms section, select the Recipe you want to configure, and click on the Clustering tab.

· In the Cluster By field, select the custom_info attribute from the drop-down list. Enter the custom_info field name in the box below.

· Check the Match List Items check box to match individual items in custom_info lists and use the slider to select the similarity threshold for this custom_info field.

Related image, diagram or screenshot

Comparison of Match List Items

The Cookbook Recipe applies the similarity threshold that you set to compare each individual item in the list, not all the items in the list.

For example, you have the following lists in two alerts and the similarity threshold is 100%:

Alert 1: [ ABC , DEF ]
Alert 2: [ ABC123, DEF123, ABC, DEF ]

This results in similarity comparisons between:

· ABC and ABC123

· ABC and DEF123

· ABC and ABC

· ABC and DEF

· DEF and ABC123

· DEF and DEF123

· DEF and ABC

· DEF and DEF

Since there are two identical matches, [ ABC and ABC ] and [ DEF and DEF ], the Cookbook Recipe clusters these alerts together.

If you want to calculate the total similarity of list items, that is, how many items in list 1 appear in list 2, you should not select Match List Items and set Language Processor to Words so that the Cookbook Recipe treats the list as a string. In the above example, there is a 50% match of items in both lists, [ ABC and DEF ], so if the similarity threshold is 100%, the Cookbook Recipe does not cluster these alerts together.

Example

You configure your Recipe to treat the custom_info field 'cities' as a list and set the similarity threshold to 100%, as shown above.

After configuring the Recipe, Cisco Crosswork Situation Manager receives the following four alerts:

Alert 1: custom_info.cities = ["London"]
Alert 2: custom_info.cities = ["London", "San Francisco", "Venice", "Bangalore"]
Alert 3: custom_info.cities = ["Venice", "Bangalore"]
Alert 4: custom_info.cities = ["Bangalore"]

This configuration would produce four candidate clusters:

· Cluster A: Alert 1 and alert 2 match on "London".

· Cluster B: Alert 2 matches on "San Francisco".

· Cluster C: Alert 2 and alert 3 match on "Venice".

· Cluster D: Alerts 2, 3 and 4 match on "Bangalore".

Cookbook creates two Situations because cluster D contains all the alerts in clusters B and C:

1. Cluster A (alerts 1 and 2) becomes Situation X.

2. Clusters B, C, and D (alerts 2, 3, and 4) become Situation Y.

You must be careful when setting the similarity threshold if you are using list-based clustering. If the similarity threshold is low enough, you may end up with Situations containing blended list similarity. In the above example, alert 2 is common to both Situation X (London) and Situation Y (Bangalore). If the similarity were set to 25%, these two Situations would merge.

If the Recipe does not see 'custom_info.cities' field as a list, it treats the field as a single string. This means that, in this example, all four alerts would end up in separate Situations with no clustering.

Load a Topology

Before you begin

Before you load your topology data into Cisco Crosswork Situation Manager, ensure you have met the following requirements:

1. You have created each topology for which you want to load nodes and links, using the Topologies API or the Graph Topology Moobot module.Topologies APIGraph Topology

2. You have generated a map of the connected nodes in a .csv file. Create a separate file for each topology.

3. Your .csv file contains all of the nodes that are expected to send events.

4. The lines in your .csv file follow the format: <node1>,<node2>,<optional description>. For example:

host_a3,host_a1,Link description
host_a3,host_a2,Link description
host_a4,host_a1,Link description

Note

If you use a .csv file with the format required by the deprecated Topology Builder utility, the optional <weight> data will be saved as link descriptions.

If a node name includes a comma, you must enclose the name in double quotes.

Ensure there are no extra spaces in the file, around the fields and at the beginning and end of each line.

The topology loader ignores any lines in your file that are not in the correct format. It also ignores any loops such as 'host_x, host_x'. The string node names are case insensitive.

1. You have access to the Cisco Crosswork Situation Manager server that runs Nginx. The utility matches the hostname you provide against the name on the SSL certification in the Nginx configuration.