Discovering the versatility of OpenEBS

key points

  • Running stateful workloads on Kubernetes used to be a challenge, but the technology has matured. Today, up to 90% of enterprises believe Kubernetes is ready for production data workloads
  • OpenEBS provides storage for stateful applications running on Kubernetes; including dynamic persistent local volumes or volumes replicated using multiple “data engines”
  • Local PV data engines offer excellent performance, but with the risk of data loss due to node failure
  • For replicated engines, there are three options available: Jiva, cStor, and Mayastor. Each engine supports different use cases and needs
  • OpenEBS can address a wide range of applications, from casual testing and experimentation to high-performance production workloads.

When I do a Kubernetes training, a chapter always comes at the end of the training and never before. It’s the chapter on stateful assemblies and persistent storage, i.e. running stateful workloads on Kubernetes. While running stateful workloads on Kubernetes used to be a no-go, up to 90% of enterprises now believe K8s is ready for data. The last lab in this chapter involves running a PostgreSQL benchmark (which keeps writing to disk) in a pod, then breaking the node running that pod and showing various mechanisms involved in failover (such as evictions based on pollution). Historically, I’ve been using Portworx for this demo. Recently, I decided to give OpenEBS a shot.

In this post, I’ll give you my first impressions of OpenEBS: how it works, how to get started with it, and what I like about it.

OpenEBS provides storage for stateful applications running on Kubernetes; including dynamic persistent local volumes (such as Rancher’s local path provider) or volumes replicated using multiple “data engines”. Just like Prometheus, which can be deployed on a Raspberry Pi to monitor the temperature of your beer crops or sourdough in your basement, but also scaled up to control hundreds of thousands of servers, OpenEBS can be used for simple projects , fast demos, but also large clusters with sophisticated storage needs.

    Related sponsored content

OpenEBS supports many different “data engines”, and this can be a bit overwhelming at first. But these data engines are precisely what make OpenEBS so versatile. There are “local PV” engines that typically require little or no configuration, provide good performance, but exist on a single node and are unavailable if that node goes down. And there are replicated engines that provide resiliency to node failures. Some of these replica engines are very easy to set up, but the ones that offer the best performance and features will need a bit more work.

Let’s start with a quick review of all these data engines. The following is not a replacement for the excellent OpenEBS documentation; but instead it is my way of explaining these concepts.

Local Photovoltaic Data Engines

Persistent volumes using one of the “local PV” engines are not replicated across multiple nodes. OpenEBS will use the node’s local storage. Multiple variants of local PV motors are available. It can use local directories (used as HostPath volumes), existing block devices (disks, partitions, or others), raw files (ZFS filesystems that allow advanced features like snapshots and clones), or Linux LVM volumes (in this case OpenEBS works similarly). in TopoLVM).

The obvious downside to local PV data engines is that a node failure will make the volumes on that node unavailable; and if the node is lost, so is the data that was on that node. However, these engines have excellent performance: since there is no overhead in the data path, the read/write performance will be the same as if we were using the storage directly, without containers. Another advantage is that Host Route Local PV works out of the box, without any additional configuration, when OpenEBS is installed, similar to Rancher’s Local Route Provider. Extremely convenient when I need a storage class “right now” for a quick test!

Replicated engines

OpenEBS also offers multiple replication engines: Jiva, cStor, and Mayastor. I’ll be honest, I was pretty confused at first: why do we need not one, not two, but three replica engines? Let’s find out!

jiva engine

The Jiva engine is the simplest. Its main advantage is that it does not require any additional configuration. Like the local PV engine on the host path, the Jiva engine works out of the box when you install OpenEBS. Provides strong data replication. By default, every time we provision a Jiva volume, three storage pods will be created, using a scheduling placement constraint to ensure they are placed on different nodes. This way, a single node outage will not remove more than one volume replica at a time. The Jiva engine is simple to use, but lacks the advanced features of other engines (such as snapshots, clones, or adding capacity on the fly), and the OpenEBS docs mention that Jiva is suitable when “capacity requirements are small” (such as below). 50 GB). In other words, this is great for testing, labs, or demos, but maybe not for that giant production database.

cStor engine

Next on the list is the cStor engine. This one gives us the extra features mentioned above (snapshots, clones, and on-the-fly capacity scaling), but requires a bit more work to get up and running. That is, you need to engage NDM, the Node Disk Manager component of OpenEBS, and tell it which available block devices you want to use. This means you should have some free partitions (or even whole disks) to allocate to cStor.

If you don’t have an extra disk or partition available, you may be able to use loop devices. However, since loop devices carry a significant performance overhead, you can also use Jiva’s provider in this case, because it will achieve similar results, but be much easier to configure.

Mayastor engine

Finally, there is the Mayastor engine. It is designed to work tightly with Non-Volatile Memory Express (NVMe) disks and protocols (it can still use non-NVMe disks). I was wondering why this was a big deal, so I did some research.

In older storage systems, you could only send one command at a time: read this block or write this block. Then you had to wait until the command was completed before you could send another one. Later, it was possible to send multiple commands and let the disk reorder them to execute them faster; for example, to reduce the number of head searches using an elevator algorithm. In the late 1990s, the ATA-4 standard introduced TCQ (Tagged Command Queuing) to the ATA specification. This was greatly improved, later, by NCQ (Native Command Queuing) with SATA disks. SCSI disks had longer command queues, so they were also more expensive and more likely to be found in high-end servers and storage systems.

Over time, queuing systems evolved a lot. The first standards allowed queuing a few dozen commands in a single queue; now we are talking about thousands of orders in thousands of queues. This makes multi-core systems more efficient as queues can be bound to specific cores and reduce contention. Now we can also have priorities between queues, which can ensure fair disk access between queues. This is ideal for virtualized workloads, to ensure that one virtual machine does not starve the others. And importantly, NVMe also optimizes CPU usage related to disk access, because it is designed to require less between the operating system and the disk controller. While there are certainly many other features in NVMe, this queuing business alone makes a big difference; and I understand why Mayastor would be relevant to people who want to design storage systems for maximum performance.

If you want help figuring out which engine best suits your needs, you’re not alone; and the OpenEBS documentation has an excellent page on this.

Storage attached to container

Another interesting thing in OpenEBS is the concept of CAS, or Container Attached Storage. The phrase raised an eyebrow at me at first. Is it a marketing ploy? Not exactly.

When using Jiva’s replicated engine, I noticed that for each Jiva volume, you would get 4 pods and one service:

  • a “controller” pod (with “-ctrl-” in its name)
  • three “data replica” pods (with “-rep-” in their name)
  • a service that exposes (via different ports): an iSCSI target, a Prometheus metrics endpoint, and an API server

This is interesting because it mimics what you get when you deploy a SAN: multiple disks (the data replication pods) and a controller (to interface between a storage protocol like iSCSI and the disks themselves). These components are materialized by containers and pods, and the storage is actually in the containers, so the term “container-attached storage” makes a lot of sense (note that the storage doesn’t necessarily use the container’s storage copy-on-write; in my setup it defaults to a hostPath volume; however, it can be configured).

I mentioned iSCSI above. I was reassured that OpenEBS used iSCSI with cStor, because it is a solid and proven protocol widely used in the storage industry. This means that OpenEBS does not require a custom kernel module or anything like that. I think it does require some userland tools to be installed on the nodes though. I say “I think” because on my Ubuntu test nodes with a very simple cloud image, I didn’t need to install or configure anything extra anyway.

After this quick tour of OpenEBS, the most important question is: does it fit my needs? I found that its wide range of options meant that it could handle almost anything that I…

Leave a Comment

Your email address will not be published. Required fields are marked *