Thanks to the effort of companies like Red Hat, Google, Netflix and many others, it’s safe to say that open source is no longer a mystery in today’s IT organizations. However, many struggle to understand the nuances that make a huge difference between vendors commercially supporting the same open source technologies.
Should the general public have any interest in understanding those nuances? A few years ago the answer would have been “no.” However, today, understanding those nuances is critical to select the right business partner when an IT organization wants to adopt open source.
As more vendors start offering commercial support for various projects, from Linux to OpenStack to Kubernetes, the need to understand the real difference between vendor A and vendor B becomes critical for CIOs and IT Directors.
In Red Hat, we have a TL;DR answer to the question “What makes you different from vendor XYZ?”. Our short answer is we have more experience supporting open source projects, and we participate and nurture the open source communities in a way most other industry players simply don’t.
This is a true statement, but what does it actually mean? How does that translate into a competitive advantage that a CIO can appreciate when selecting the best business partner to support her/his agenda? Today, I’ll try to provide the long version of that answer, with some simplifications, in a way that is hopefully easy to understand for business-oriented people.
To narrate this story, let’s take as example a fictitious open source project that we’ll call “Project-O” and divide it into three chapters:
- Chapter 1 – Innovation brings instability
- Chapter 2 – Instability is exponentially difficult to manage in large projects
- Chapter 3 – How vendors manage instability is their competitive advantage
Chapter 1: Innovation brings instability
At any given time during the lifecycle of Project-O, any individual in the world, can contribute a piece of code to:
- introduce, complete or fix a feature (innovate)
- improve performance (optimize)
- increase reliability (stabilize)
To serve the business, we need to innovate and optimize. To protect the business, we need to stabilize. The continuous tension between these two needs compels hundreds or thousands of code contributions to Project-O at any given time. The bigger the project, and the larger the community supporting it, the more code is submitted at any given time.
Let’s use an analogy: if Project-O is an existing house, each code contribution is a renovation proposal. Imagine having hundreds or thousands of renovation proposals per day.
Just like renovation proposals, new code, especially the one that introduces new features, can be written in a very conservative way or in a very disruptive way:
- It’s conservative code when adoption doesn’t break other parts of Project-O. In other words, the individual who wrote the code has been mindful of the “backwards compatibility.”
In our analogy, it’s when a renovation proposal doesn’t force the house owner to demolish existing walls or do some other major intervention to accommodate the proposed changes. Imagine painting a guest room.
- It’s disruptive code when adoption breaks other parts of Project-O and requires some major reworking.
In our analogy, it’s when a renovation proposal requires the house owner to make drastic changes to the plumbing system in the only bathroom. It can be done, of course, but it implies temporary instability and disruption inside the house.
Obviously, the more conservative the code, the fewer chances there are to innovate. And vice versa.
When an individual wants to improve Project-O, he or she has to submit the proposed code to a group of individuals, called “maintainers”, that govern the project and have the mandate to review the quality and impact of the code before accepting it.
A maintainer has the right to reject the code for various reasons (we’ll explain this in full in Chapter 3), and needs to make a fundamentally binary choice: requesting strong backwards compatibility or allowing disruptive code.
In our analogy, the maintainer is the house owner that has to carefully evaluate the pros and cons for each renovation proposal before approving or rejecting it.
If the house owner wants an amazing new wing of the house, he has to be ready to tear down walls, rework the plumbing system, and deal with a fair amount of redesign. In similar fashion, the maintainer that wants to innovate and quickly evolve Project-O has to allow more disruptive code and deal with the implications of that disruption.
To address the business demand, especially in a highly competitive market like the one we have today, the maintainer has no choice but to allow disruptive code wherever possible*.
How a vendor deals with that disruption makes the whole difference, and can truly define its competitive advantage. This is where things get nuanced and interesting.
Chapter 2: Instability is exponentially difficult to manage in large projects
As we said, the larger the community behind an open source project, the larger the number of code contributions submitted at any given time. In other words, the amount of things you can renovate in a standard apartment is infinitely smaller than the number of things that you can renovate in a castle.
Let’s say that Project-O is a fairly complicated open source project, equivalent to an hotel in our analogy. For the maintainer of Project-O, the challenge is to consider and approve enough code contributions to keep the project innovative, but not too many to be overwhelmed by the amount of things to fix at the same time. Imagine renovating the rooms in one wing, rather than all of them at the same time.
When very many functionalities of Project-O break simultaneously due to too many code contributions, the difficulty of fixing them all together in a reasonable amount of time grows exponentially. The problem is that the market cannot wait forever for Project-O to become stable again to be used. The innovation provided by the newly contributed code must be delivered within a reasonable amount of time to be used in a competitive way. Usually, large enterprises struggle to adopt a new version of Project-O if a stable release is provided faster than every 6 months. However, the same large enterprises won’t wait years before receiving a new stable release of Project-O.
Again, it would be like the hotel owner in our analogy would approve 10,000 renovation proposals all executed at the same time, each one breaking existing parts of the hotel. Imagine upgrading the electrical, plumbing, heating, and remodeling the restaurant all at the same time. Fixing the resulting disruption would be so incredibly difficult to render the hotel completely unusable for an excessive amount of time.
According to what we said so far, the maintainer sets goals and deadlines to stop accepting code contributions. After the deadline is met, no more code contributions are applied and the community works to stabilize the new version of Project-O enough to be usable.
However, “usable” doesn’t necessarily mean “tested” and “certified as reliable”. It’s the same difference that goes between “I tried to run the code a dozen times and everytime it worked” and “I tried to run the code thousands of times, under most disparate conditions, and I know that it will always work in the conditions I tested”. This is where competing vendors can make a business out of an open source technology that is fundamentally free to access and use for the entire world.
So, at a certain point, the maintainer freezes code contributions for Project-O. Subsequently, competing vendors look at all submitted code contributions and decide how much of it should be commercially supported** after their own extensive QA testing.
Because of this, the open source version of Project-O, called “upstream”, is not necessarily identical to the commercially supported version of Project-O provided by vendor A, which in turn is not necessarily identical to the version of Project-O provided by vendor B. There are small and big differences in between these three versions as they represent three discrete states of the same open source project.
Vendor A and vendor B need to make a decision on how much of Project-O they want to commercially support, trying to balance the need for innovation (addressed by newly disruptive code being accepted by the maintainer) and the exponential complexity of fixing the amount of instability caused by that innovation.
Chapter 3: How vendors manage instability is their competitive advantage
At this point, you may think that the differentiation between vendor A and vendor B is in how savvy or smart they are in “making the cut,” in how many new code contributions to Project-O they decide to support at any given time. In reality, that is only partially relevant. What really differentiates the two vendors is how they deal with the instability caused by the newly contributed code.
To manage this instability each vendor can leverage up to three resources:
- Deep knowledge
- Special tooling
- Strong credibility
When much of the newly contributed code is disruptive in nature, many things can break at the same time within Project-O. Sometimes the new code breaks dependencies in a domino effect that is very complicated to fully understand. Fixing all broken dependencies quickly and effectively requires a broad and deep knowledge of all aspects of Project-O. Like the hotel owner who intimately knows the property inside and out through many years of renovations, and has a very clear idea of all the areas, obvious and non-obvious, that the changes in a renovation plan imply.
This is why vendors involved in the open source world make a big deal of statistics like number of contributions to any given project, like the ones captured by Stackalytics. Knowing how much and how broadly a vendor is contributing to an open source project may seem a superficial and sometimes misleading metric, but it’s meant to measure how deep is the knowledge of that vendor. The deeper the knowledge, the more skilled the vendor is in managing the instability created by disruptive code.
No matter how deep the knowledge available, at the end of the day a vendor is an organization made of people, and people can make mistakes. Human error is unavoidable. Hence, to mitigate the risk of human error, some vendors develop internal special tooling that assists humans in understanding the impact of the instability created by newly contributed code, and operate the necessary changes across the board to make Project-O as stable as possible, as quickly as possible.
Without a deep knowledge about Project-O, it can be impossible to develop and maintain any special tooling. So, human capital is the biggest asset a vendor involved in open source has.
Through deep knowledge and/or specialized tooling, a vendor can identify and fix the broken dependencies in open source code faster than its competitors, but there’s a last challenge: submit the patches to the maintainers and be sure that each part of Project-O is fixed for the newly contributed code to work in time for it’s release. If vendors get fixes back “upstream,” they don’t have to maintain those fixes alone. But, for the fixes to be accepted, vendors have to prove their code helps Project-O, not just themselves.
Back to our analogy: the hotel owner accepted a certain number of renovation proposals to build a new wing, and compiled them in a renovation plan. The plan is ambitious and the contractors executing it will break the current plumbing system in the process. Nonetheless, the plan must be completed within 3 weeks or the hotel will not remain competitive enough to justify the renovation plan itself. The contractor that is building the new wing breaks the plumbing system, as expected, and must ask for modifications from the contractor that owns that system.The owner of the plumbing system is willing to help, of course, but to comply he has to review the new wing project and the proposal changes to the plumbing system, and, if he agrees with them, order new pipes. The whole process would normally take 5 weeks, enough to compromise the whole renovation plan.
The only way to save the day is if the contractor who is building the new wing has a strong credibility in plumbing. And that credibility is so strong that the requested modifications to the plumbing system are accepted without questioning, and the pipes are ordered with an express delivery. In other words, the owner of the plumbing system trusts the wing builder so much that a further review is not necessary.
Such credibility is not granted lightly in the open source world. Few individuals are granted that sort of trust, and that sort of trust is earned over years of continuous contribution of new code and demonstration of deep knowledge.
Thanks to these amazing open source contributors that decide to join a vendor, that vendor is more or less able to fix broken dependencies in a timely way. In fact, differently from what could be assumed, highly trusted open source contributors are not easily hired and retained through standard HR practices. They independently decide to join and stay with a vendor primarily because they believe in the mission of that vendor, in how that vendor conducts business.
So, in summary, the difference between two vendors operating in the open source world boils down to how capable they are in managing the instability caused by innovation. That differentiation is very subtle and hard to appreciate for anybody until it’s time to face the instability.
* The deeper you go into the computing stack, all the way down to the kernel of the operating system, the smaller the amount of disruptive code is allowed to compromise the reliability of mission critical systems, and their capability to integrate with a well established ecosystem of ISVs and IHVs. That’s why it’s much harder to innovate at the lowest level of the stack.
** Commercially supporting open source software means that the vendor performs the QA testing to verify code stability, provides technical support in case something doesn’t work, issues updates and patches for security and functionality improvements and certifies integration with third-party software and hardware components.