Customizable Ansible pipelines for Telco partners using DCI

A telco context

The role of Telco Partner CI is to ensure that Openshift changes do not break partner network functions, and that network function changes from partners do not break Openshift. That’s why the team has built and maintained a CI system that automates this specific OCP installation and tests it with real case scenarios. This article deals with the installation of the platform. The network functions built on top of it will be the topic of a next article.

To support the automation, DCI is the core of the CI system: it fetches the components generated by Red Hat (in this case, the component can be a non GA version of OCP) and then the agent on the partner side can install the component on its local hardware. This helps with the sharing of logs on every installation pipeline which makes the support for Red Hat partners easier. If you are not familiar with DCI, you can check this blog post.

In a Telco environment, DCI is focused on Openshift and its plugins, that’s why there is a dedicated agent called the DCI Openshift Agent to handle OCP installation. It contains all the logic to set up and launch an Openshift installation and send installation logs to the DCI server. One of the main differences between DCI and a classic CI system is that jobs are not managed with the DCI main server, only the components and the logs are centralized there, not the execution. It means as an example, the Partners can schedule jobs with the DCI-agent the way they want with the least restriction possible.

The following of this article will be focused on a particular lab in Dallas which is made with the same hardware composed by one of our Telco partners. As the plugins and configuration needed by the telco CNFs are very specific, the idea is to be as close as possible to the Partner environment, keeping a disconnected environment. The final goal is to be able to continuously test with significant workloads the future Telco infrastructure, finding in advance bugs and blockers in the future version for this particular case.

A typical DCI pipeline for a Telco lab

Deploying the core infrastructure

CI pipelines in Dallas are composed of two mains stages. The first one is the installation of the bare-metal Openshift Platform. In this specific context, DCI shows its strengths. The DCI Openshift Agent contains all the Ansible roles and tasks to install on bare-metal servers any version of OCP with everything to handle specific constraints of Telco environments . For example, it includes an ansible role that mirrors OCP dependencies for disconnected environnement, another to manage SRIOV cards for better network performance or another role that installs a list of operators from the Red Hat Marketplace automatically.

The core of this first step is an ansible playbook which is split into several ansible files that contain a set of tasks. Here is a list of the main ones:

Pre-run.yml: it prepares the environment of the jumpbox before the installation of OCP. It includes tasks like mirroring binaries locally or getting the desired version of OCP.
Install.yml: it launches the roles to install a vanilla OCP platform. It is then followed by the installation of some important operators like the ‘performance-operator’ to set up some specific configuration needed in Telco workloads. In the Dallas Lab, we are talking about bare-metal installs but the agent covers the IPI and the SNO install as well.
Tests.yml : it includes tasks that check the cluster is working properly. Test suite could run as conformance tests or CSI tests. Final results are then published in the UI of DCI.
Failure.yml: these tasks collect extra logs and publish them into DCI to help troubleshoot and understand why the job failed.
Teardown.yml: these tasks clean up the cluster when the job is finished.

Parametrizing the pipeline

All tasks and playbooks mentioned above are provided by Red Hat Teams and are highly customizable! The DCI Openshift Agent is covering the common tasks needed in all OCP installations and then, partners can parameterize the pipeline to fit their needs. It is done by the usage of variables defined in a simple YAML file that describes the execution of the agent. Here is an example of such file:

---
- name: Openshift-vanilla
  type: ocp
  ansible_extravars:
    cnf_test_suites: ['sctp','ptp','performance','sriov','dpdk']
    enable_cnv: true
    dci_disconnected: true
  topic: OCP-4.7
  components:
    - ocp=4.7.24
  outputs:
    kubeconfig: "kubeconfig"

In this example, this file is piloting and centralizing complex tasks of the installation in a very simple way:

‘OCP=4.7.24’ is defining which version of OCP to install. It could have also fetched an RC version for testing in advance versions.
‘enable_cnv’ boolean triggers the installation of Hyper-converged and Virtualisation in the cluster.
dci_disconnected boolean is enabling an installation of OCP without being connected to the internet. Only the host where the playbook is launched needs to be connected.
cnf_test_suites defines which test will be run during the testing stage. The example is not exhaustive, if you want to see all the possibilities built in the DCI Openshift Agent, check the code here.

Customize the pipeline by using hooks

As Telco partners have specific needs, some external plugins are requested such as a specific load-balancer or a solution to manage storage and volumes inside the cluster. The setup of these external solutions are not handled by Red Hat teams but by partners themselves. That’s why the hook system is coming into the place!

DCI Openshift Agent allows more customization than the parameterized of the installation pipeline. It loads custom playbooks written by partners to be run during the execution. It all starts with a simple include_tasks in ansible:

tasks:
  - block:
      - name: "dci-Openshift-agent : Launch partner install"
        include_tasks: '{{ hookdir }}/hooks/install.yml'
        loop: "{{ dci_config_dirs }}"
        loop_control:
          loop_var: hookdir
    rescue: *teardown_failure

With this mechanism, partners can customize the installation of their Telco Infrastructure by using ansible scripts. From one file, partner roles can be dynamically used with the ‘include_role’ keyword. For each step of the installation (pre-run.yml, install.yml, failure.yml as mentioned above) there could be hooks that can be executed after the steps of the corresponding name and before the next one.

Typically, after the installation, a hooks/install.yml could deploy a custom operator made by a partner with its own deployment logic without impacting. Another example is the mirroring of external resources in the hooks/pre-run.yml when the cluster is running in disconnected mode.

With this feature, partners can rely on an automatic way of deploying any version of OCP and be focused on their own automation that installs and tests their own software and plugins.

To be continued

This first part of the pipeline is the foundation of the core infrastructure that will support all the CNFs. Once the OCP cluster and the custom infrastructure are ready, another agent is used to launch the CNF. This will be the topic of a next blog post.