Create an AMI
The EC2 hosts on which the XRd runs must be configured to meet XRd's requirements.
To ensure high XRd Router performance and maintain consistency across the cluster, you must generate an Amazon Machine Image (AMI) that meets all XRd's requirements. This AMI can then be employed as a template for launching all worker nodes within the cluster.
An AMI can be shared among all the worker nodes in a single region, so you can save time by following these instructions only once. You can then copy the AMIs to different regions.
Find the Base AMI
The base AMI that is used to create an AMI for XRd worker nodes must be Amazon EKS optimized Amazon Linux (based on Amazon Linux 2). Amazon provides separate images for every supported version of Kubernetes.
You can find the latest AMI for each Kubernetes version in each region by running the following command.
aws ssm get-parameter \
--name /aws/service/eks/optimized-ami/<k8s-version>/amazon-linux-2/recommended/image_id \
--region <region> --query "Parameter.Value" --output text
Note |
The Kubernetes version specified here must be the same as the version that was used to create the cluster. |
Make a note of the AMI ID, <al2-ami-id>
.
For XRd Control Plane, you can use the base AMI with sysctl settings applied to the user data file when creating the worker node. For details, see the Create a Worker Node section.
Create an EC2 Instance
This EC2 instance must have access to the internet for installing packages and pulling in resources. If not, you must provide all these resources locally. Run the following command to create an EC2 instance.
aws ec2 run-instances --image-id <al2-ami-id> --count 1 \
--instance-type m5.24xlarge \
--key-name <key-pair-name> \
--security-group-ids <security-group-ids> \
--subnet-id <subnet-id>
You must specify the security group and subnet IDs so that the EC2 node has access to the internet. These are not necessarily
the security group and subnet IDs created earlier. <key-pair-name>
is the name of the key-pair which you must have created earlier.
Make a note of the instance ID returned from the command, <ami-instance-id>
.
When the EC2 instance is created, connect to the EC2 instance over SSH using the specified SSH key and the username, ec2-user
.
Copy XRd Resources
A set of resources on the XRd is available on the EKS public GitHub at xrd-packer.
Note |
HashiCorp Packer is a tool used for building virtual machine images, and the xrd-packer GitHub repository provides resources and Packer templates to create an AMI suitable for XRd vRouter. The manual steps described in this document use the resources from the repository as the source for some of the host OS settings, but the Packer tool is not used. The README file in the xrd-packer GitHub repository provides a description on how to build an AMI using Packer and the provided templates, as an alternative to the manual steps provided in this document. |
Copy all of the resources under the files folder to the EC2 instance, mirroring the structure in the repository. For example,
-
Copy
files/etc/modprobe.d/igb_uio.conf
to/etc/modprobe.d/igb_uio.conf
on the EC2 instance -
Copy
files/etc/tuned/xrd-eks-node-variables.conf
to/etc/tuned/xrd-eks-node-variables.conf
on the EC2 instance
If you have access to the internet, you can accomplish this task by running the following commands:
sudo yum install -y git
git clone https://github.com/ios-xr/xrd-packer
sudo cp -r xrd-packer/files/* /
Clean up the unnecessary data by running the following commands:
sudo yum remove -y git
rm -rf xrd-packer
Tuning
TuneD is a daemon for monitoring and tuning of system devices. The files copied in the previous step, contain a TuneD profile tailored for XRd hosts. This profile sets up some kernel parameters and tuning options to help XRd achieve high performance.
Note |
Previously, TuneD was installed from the Amazon Linux 2 repositories. However, the version provided in these repositories is now outdated, as it has reached end-of-life status. Hence, it is necessary to install TuneD from the source. |
Perform the following steps to install TuneD from the source.
Install the prerequisite packages using the following commands:
sudo yum install -y \
dbus \
dbus-python \
ethtool \
gawk \
polkit \
python-configobj \
python-decorator \
python-gobject \
python-linux-procfs \
python-perf \
python-pyudev \
python-schedutils \
tuna \
util-linux \
virt-what
Download the TuneD v2.20.0 source tarball using the following commands:
mkdir tuned
curl -L https://github.com/redhat-performance/tuned/archive/refs/tags/v2.20.0.tar.gz | tar -xz -C tuned --strip-components 1
Install TuneD using the following command:
sudo make -C tuned PYTHON=/usr/bin/python2 install
Take note that TuneD attempts to install a desktop file; skipping this causes the above make command to fail with the following output:
# desktop file
install -dD /usr/share/applications
desktop-file-install --dir=/usr/share/applications tuned-gui.desktop
make: desktop-file-install: Command not found
make: *** [install] Error 127
This is expected and can be ignored.
You can safely remove the TuneD sources afterward using the following command:
rm -r tuned
After installing TuneD, specific files copied from the XRd resources necessitate adjustments for individual deployments. Update the file: /etc/tuned/xrd-eks-node-variables.conf with the appropriate isolated cores and required number of hugepages.
This documentation assumes deployment on an m5.24xlarge or m5n.24xlarge instance, and in this scenario, the file should appear as follows:
isolated_cores=16-23
hugepages_gb=12
Note |
When running on an m5.24xlarge instance, the number of hugepages is doubled from the Cisco IOS XRd requirement. This is because the configured hugepages are distributed across all non-uniform memory access (NUMA) nodes in the system, whereas Cisco IOS XRd necessitates all its hugepages to be on the single NUMA node it operates on. |
Lastly, initiate the TuneD service, and select the Cisco IOS XRd profile by executing the following command:
sudo systemctl start tuned
sleep 10
sudo tuned-adm profile xrd-eks-node
Note |
Running the |
Kernel Parameters
Setting kernel parameters (sysctl settings) on the host is either required or recommended when running Cisco IOS XRd containers. These parameters address the needs of both the XRd Control Plane and XRd vRouter platforms.
Set the kernel parameters by running the following commands:
sudo -i
echo "fs.inotify.max_user_instances=64000" >> /etc/sysctl.conf
echo "fs.inotify.max_user_watches=64000" >> /etc/sysctl.conf
echo "kernel.randomize_va_space=2" >> /etc/sysctl.conf
echo "net.core.rmem_max=67108864" >> /etc/sysctl.conf
echo "net.core.wmem_max=67108864" >> /etc/sysctl.conf
echo "net.core.rmem_default=67108864" >> /etc/sysctl.conf
echo "net.core.wmem_default=67108864" >> /etc/sysctl.conf
echo "net.core.netdev_max_backlog=300000" >> /etc/sysctl.conf
echo "net.core.optmem_max=67108864" >> /etc/sysctl.conf
echo "net.ipv4.udp_mem=1124736 10000000 67108864" >> /etc/sysctl.conf
sysctl -p --system
exit
Note |
|
Configuring Core Dumping
Core files must be collected for detailed debugging. To prevent the node's disk from being exhausted, you must configure the
system carefully. The recommended solution is to use the systemd-coredump
tool.
Configure the kernel to use the systemd-coredump
tool when a process crashes.
sudo -i
echo "kernel.core_pattern=|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h" >> /etc/sysctl.conf
sysctl -p --system
exit
Copy the following contents to /etc/systemd/coredump.conf
.
[Coredump]
Storage=external # (this is the default)
Compress=yes # (this is the default)
MaxUse=32G # (twice the sum of RAM given to XRd)
KeepFree=16G # (the sum of RAM given to XRd)
ProcessSizeMax=32G # (required if < v251 of systemd)
ExternalSizeMax=32G # (required if < v251 of systemd)
These values are applicable for a single XRd vRouter installation, described in Install XRd vRouter.
For the XRd Control Plane installation, described in Install XRd Control Plane, the copied values must be MaxUse=12G
and KeepFree=6G
.
In this configuration, systemd-coredump
is configured to write the core files to /var/lib/systemd/coredump
on the host.
Building and Installing the Interface Driver
You must not use the interface driver bundled with Amazon Linux 2, as it does not support an ENAv2 interface feature.
Perform the following steps to build and install the interface driver that has Write Combining support.
-
Install the required packages
sudo yum install -y kernel-devel-$(uname -r)
-
Pull the DPDK source code, extract it, and build
igb_uio
.curl -O https://git.dpdk.org/dpdk-kmods/snapshot/dpdk-kmods-e721c733cd24206399bebb8f0751b0387c4c1595.tar.gz tar zxvf dpdk-kmods-e721c733cd24206399bebb8f0751b0387c4c1595.tar.gz cd dpdk-kmods-e721c733cd24206399bebb8f0751b0387c4c1595 make -C linux/igb_uio
-
When the the driver is built, copy the interface driver kernel module into the appropriate location and register it.
sudo cp linux/igb_uio/igb_uio.ko /lib/modules/"$(uname -r)"/kernel/drivers/uio sudo depmod -a cd ..
-
Clean up the packages.
rm -f dpdk-kmods-e721c733cd24206399bebb8f0751b0387c4c1595.tar.gz rm -rf dpdk-kmods-e721c733cd24206399bebb8f0751b0387c4c1595 sudo yum remove -y kernel-devel-$(uname -r) sudo yum autoremove -y
-
Load the kernel module with the following:
sudo modprobe uio sudo modprobe igb_uio wc_activate=1
-
Set the kernel module to load on boot, which is handled by copying the files earlier.
Reboot and Check
Reboot the EC2 instance to allow all the settings to take effect.
When the EC2 instance is rebooted, copy the host-check
script from the xrd-tools repository at xrd-tools.
The host-check
tool verifies if the host environment is configured correctly for running XRd containers. For more information on the usage
of host-check
tool, see the README file.
Run the host-check
script.
The following is a sample output from the XRd vRouter. You can run the script similarly on the XRd Control Plane.
[ec2-user@ip-172-31-3-90 ~]$ ./host-check -p xrd-vrouter
==============================
Platform checks - xrd-vrouter
==============================
PASS -- CPU architecture (x86_64)
PASS -- CPU cores (96)
PASS -- Kernel version (5.4)
PASS -- Base kernel modules
Installed module(s): dummy, nf_tables
PASS -- Cgroups (v1)
PASS -- Inotify max user instances
64000 - this is expected to be sufficient for 16 XRd instance(s).
PASS -- Inotify max user watches
64000 - this is expected to be sufficient for 16 XRd instance(s).
PASS -- Socket kernel parameters (valid settings)
PASS -- UDP kernel parameters (valid settings)
INFO -- Core pattern (core files managed by the host)
PASS -- ASLR (full randomization)
INFO -- Linux Security Modules (No LSMs are enabled)
PASS -- CPU extensions (sse4_1, sse4_2, ssse3)
PASS -- RAM
Available RAM is 365.2 GiB.
This is estimated to be sufficient for 73 XRd instance(s), although memory
usage depends on the running configuration.
Note that any swap that may be available is not included.
PASS -- Hugepages (12 x 1GiB)
PASS -- Interface kernel driver
Loaded PCI drivers: vfio-pci, igb_uio
INFO -- IOMMU
vfio-pci is set up in no-IOMMU mode, but IOMMU is recommended for security.
PASS -- Shared memory pages max size (17179869184.0 GiB)
==================================================================
Host environment set up correctly for xrd-vrouter
==================================================================
Delete the Cloud Folder
For the AMI to run the User Data passed into an EC2 instance, you must delete the /var/lib/cloud
folder.
sudo rm -rf /var/lib/cloud
Take a Snapshot
Stop the EC2 instance, by running the following command:
aws ec2 stop-instances --instance-ids <ami-instance-id>
Note |
This command returns immediately, but it may take a minute for the container to be fully stopped. |
ec2 describe-instances --instance-ids <ami-instance-id> --query 'Reservations[0].Instances[0].State.Name
Once the EC2 instance has stopped, take a snapshot using the following command:
aws ec2 create-image --instance-id <ami-instance-id> --name xrd-vrouter-ami
Make a note of the AMI ID, <xrd-ami-id>
.
Terminate the EC2 instance.
aws ec2 terminate-instances --instance-id <ami-instance-id>