We, Medik8s team, are four Red Hatters with a strong passion to High Availabitlity (HA) solutions in the Kubernetes world.
We develop Kubernetes Open Source Operators using Operator Lifecycle Manager (OLM) that provide maintenance support, automatic node remediation and high availability for singleton workloads:
- Node Healthcheck Operator (NHC) - Detecting Node Failures, triggering remediation, which is performed by other operators like Self Node Remediation or MachineDeletion.
- Self Node Remediation (SNR) - Remediates nodes by rebooting, without needing a management interface like e.g. IPMI, or a node / machine provisioning API. Works standalone, and / or with NHC.
- Node Maintenance Operator (NMO) - Declarative node cordoning and draining prior to harmful decisions.
We plan to improve the exiting operators by adding Master (control plane) fencing capabilites to SNR and NHC, which currently only work for worker nodes. Furthermore, we have few work in progress operators:
- Machine Deletion - Remediates nodes by deleting the associated OpenShift machine. Triggered by NHC.
- HA-SNO - High Availability based on two Single Node Openshift .
In 2018, the team behind Medik8s prototyped what would eventually ship as the Machine Healthcheck Controller for OpenShift 4.2.
Soon after, Red Hat brought the Machine Healthcheck Controller to sig-cluster for consideration as a general purpose mechanism for detecting node failures and recovering compute power and affected workloads.
In 2019, we improved support for bare metal by shipping an annotation based mechanism for rebooting nodes instead of going through a time expensive reprovisioning cycle.
In 2020, we worked with Ericsson to design an official API for using alternative mechanisms to recover bad nodes. Since then Ericsson has prototyped a metal3 based implementation, and we have implemented Poison Pill (PP) for shared-nothing environments.
In 2021, we created Medik8s to make general purpose HA available to all kubernetes clusters, not just ones backed by an infrastructure API. Afterwards, we have implelmented Node Healthcheck Operator which detects the node’s health and deploy with PP to remidate the node.
In 2022, we have moved Node Maintenance Operator from KubeVirt project to Medik8s project which is a decalrative way for node cordonning and drainning. Moreover, we have renamed Poison Pill project to Self Node Remediation project.
Join our google group to get more info, participate in dicussions and get notified for new releases