GR Reference Models
The CPS solution stores session data in a document-oriented database. The key advantage is that the application layer responsible for transactional Session data is stored in MongoDB (document-oriented database). Data is replicated to help guarantee data integrity. MongoDB refers to replication configuration as replica sets as opposed to Master/Slave terminology typically used in Relational Database Management Systems (RDBMS).
Replica sets create a group of database nodes that work together to provide the data resilience. There is a primary (the master) and 1..n secondaries (the slaves), distributed across multiple physical hosts.
MongoDB has another concept called Sharding that helps scalability and speed for a cluster. Shards separate the database into indexed sets, which allow for much greater speed for writes, thus improving overall database performance. Sharded databases are often setup so that each shard is a replica set.
The replica set model can be easily extended to a Geo-redundant location by stretching the set across two sites. In those scenarios, an Arbiter node is required. The Arbiter is used as a non-data-processing node that helps decide which node becomes the primary in the case of failure. For example, if there are four nodes: primary, secondary1, secondary2 and the arbiter, and if the primary fails, the remaining nodes “vote” for which of the secondary nodes becomes the primary. Since there are only two secondaries, there would be a tie and failover would not occur. The arbiter solves that problem and “votes” for one node breaking the tie.
Without Session Replication
The following list provides information related to GR without session replication:
-
If PCEF elements need to switch over clusters, the current Diameter session between the PCEF and PCRF will be terminated and a new session will need to be re-established.
-
Simplifies architecture and reduces complexity.
-
Quota data not reported. Currently, this is a limitation.
Active/Standby
In active/standby mode, one CPS system is active while the other CPS system, often referred to as the Disaster Recovery (DR) site, is in standby mode. In the event of a complete failure of the primary CPS cluster or the loss of the data center hosting the active CPS site, the standby site takes over as the active CPS cluster. All PCEFs use the active CPS system as primary, and have the standby CPS system configured as secondary.
The backup CPS system is in standby mode; it does not receive any requests from connected PCEFs unless the primary CPS system fails, or in the event of a complete loss of the primary site.
If an external load balancer or Diameter Routing Agent (DRA) is used, the CPS in the active cluster is typically configured in one group and the CPS in the standby cluster is configured in a secondary group. The load balancer/DRA may then be configured to automatically fail over from active to passive cluster.
Active/Active
-
Traffic from the network is distributed to two CPS clusters concurrently.
-
PCEFs are divided within the Service Provider’s network to have a 50/50% split based on traffic.
-
Session data is not replicated across sites.
-
SPR (subscriber information) data is replicated across Standby site.
-
Balance data is replicated across Standby site.
-
Diameter sessions need to be re-established if a failover occurs. Outstanding balance reservations will time out and be released.
-
In case of a failure all traffic is routed to the remaining CPS site.
With Session Replication
Active/Standby
-
Solution protects against complete site outage as well as link failure towards one or more PCEF sites.
-
If PCEF fails over to Secondary site while Primary site is still active (for example, link failure):
-
SPR data is retrieved from local SPR replica members at Secondary site.
-
Session and Balance data is read/written across from/to Primary site.
-
-
Complete Outage of Policy Director Layer results in database failover to Secondary site.
-
On recovery from a failure, a CPS node does not accept traffic until databases are known to be in a good state.
Active/Active
-
Traffic from the network is distributed to two clusters concurrently.
-
PCEFs are divided within the Service Provider’s network to have a 50/50% split based on traffic.
-
Session data is replicated across sites (two way replication).
-
SPR (subscriber information) data is replicated across Standby site.
-
Balance data is replicated across Standby site.
-
Diameter session does not need to be re-established if a failover occurs. No loss of profile or balance information.
-
Load balancer VMs use only local VMs for traffic processing.
-
In case of a failure all traffic is routed to the remaining site.