Podcast Transcript: greymatter.io CEO Chris Holmes featured on The Business of Cloud-Native Podcast
greymatter.io CEO Chris Holmes recently sat down with Emily Omier, host of The Business of Cloud Native, a podcast dedicated to helping guide end-users on their cloud-native journey. Chris and Emily covered a number of topics, from the origins and future of the greymatterio enterprise microservices platform, to the value of accurately measuring and proving ROI and TCO, and the importance of brownfield and greenfield interoperability in today’s business environment.
April 6, 2021
We’ve provided a lightly edited version of the transcript of the podcast below. However, if you’d prefer to listen to it in its entirety, please click the file below.
Welcome to The Business of Cloud Native. I’m Emily Omier, your host. And today we are chatting with Chris Holmes of Grey Matter. And Chris, thank you so much for joining me.
Yeah, for sure. Good to talk to you again.
And let’s just start with having you introduce yourself, both yourself and what you do. And then also talk a little bit about what Grey Matter is, and what Grey Matter does for customers.
Sure. I’m Chris Holmes, I’m the CEO and founder, Decipher Technology Studios, which is being rebranded to better reflect what we do around Grey Matter, which is our intelligent mesh networking platform, which is today very relevant in a hybrid multi-cloud deployment strategy. And really what we’re focused on is building an end-to-end platform using AI to surface different types of anomalies across these multi-cloud, multi-tenant hybrid infrastructures, that are in some cases using lots of open-source cloud-native technologies, in some cases using Kubernetes, in some cases trying to bridge Kubernetes environments, cloud-native type environments to legacy systems, or other cloud-type services. And we sit on top of all that and tried to run some BI tools, and provide some security tools, and look for anomalies.
On the Growth of Grey Matter
And can you tell me a little bit about Grey Matter’s evolution? How did it start, and how has the project evolved, and the company evolved over time?
Good question, and it’s been pretty significant. We’ve been in business since 2015, since before service mesh was a thing in the press for sure. We were asked by a fairly large global enterprise to help modernize their infrastructure. They needed to prepare for this multi-cloud environment as they started to look at their assets internally and wanted to bring some of that digital transformation into the next century.
So, we started studying what Netflix and Twitter had been doing back in 2014, 2015, and they were pretty much the only two that we know of that really were pushing hard into the microservices universe. When we first built Grey Matter, we were actually building some custom things on top of a project that Twitter had open-sourced called Finagle. And as we have gone through this journey from an enterprise perspective, certainly a global enterprise perspective, Grey Matter has changed fairly significantly, always maintained its focus on surfacing insights and BI from a business perspective to help an enterprise modernize their infrastructure, but we’ve now added some very unprecedented security layers for a true explicit zero-trust model.
I’m pretty sure we were the first to integrate the Open Policy Agent. I know that some of our competitors recently announced, just this week that they had done that, we’ve had that in our stack for quite a few… Well, almost two years at this point. We’ve been doing token-based security layers all the way down to data planes since we’ve redone our mesh around Envoy.
Envoy Proxy is now what we are based on. Plus, we have recently announced that we are supporting the Istio control plane, but we also package our own control plane. And we do that because of the AI insights that we want to surface across an enterprise. So, lots and lots of change has happened. I’m excited about where we’re going next, which is really data network, data mesh, has become our single focus for 2021.
On Grey Matter Customers
And then, I think you work with government customers, right?
We do a lot with government and public sectors, yes.
I was just wondering if you could talk a little bit about what the difference is between working with the public sector, working with the private sector, and if there’s any difference in the goals that these two types of customers tend to have in working with you and even in general, in their cloud-native deployments, in their modernization efforts?
Good question. It’s actually really interesting. We have a few private customers, and obviously we’ve got a lot of good customers in the government and public sector. There’s a lot of similarities, tremendous amount of similarities, maybe not the acquisition cycles, but the implementations are somewhat similar. They all care about that cross-cloud service management governance model. They all care about visualizations. They all care about looking at their return on investments, and servicing any type of anomalies, or what things are actually costing them across these different environments, which we play a pretty significant role there. But the government side, much like what we’re finding in the BFSI side I might add, because it’s very similar those two, lots of segmentation, and it’s necessary segmentation. It’s not segmentation just because you want to create a cybersecurity honeypot or something like that. There’s legit segmentation that they have to create throughout their application networking stacks. And we quite frankly cut our teeth on that.
And that’s why security is such a first-class citizen in Grey Matter. Because in that sector, you’re dealing with clouds, you’re dealing with the typical stuff that you’d see on the private side, in terms of Amazon Web Services, or Google Cloud Engine, or even Azure, any of those, but you’re dealing with those in a totally different type of atmospheric. They’re usually cut off from the universe.
They’re usually segmented themselves in terms of networking. In other words, the clouds that we deploy into sometimes are not the same clouds that you would just go to azure.com and look at. It’s not even the same clouds as the Azure Gov Cloud. Our customers are completely segmented off of those particular clouds and data centers, and the ingress and egress traffic in and out of those is completely managed, if allowed at all.
So we’ve had to develop a SaaS-like model that can be deployed in those environments quickly, easily, and operate almost like a fenced-walled SaaS layer in those infrastructures. So from that perspective, it’s very different, but it’s actually good for us because what we’re finding on the private side is, when you start talking to Enterprise 500 companies, you start talking to anybody larger than that, the Fortune 500 and up, they have similar use cases.
I often brief that cloud-native is a strategy right now that most CTOs are looking at. And most CISOs, the chief information security officers, they want to enable it, but they want to enable it in a way that’s going to work. And most of these companies still have legacy systems that are not necessarily cloud native ready. In some cases it’s too costly to migrate those legacy systems. And I hate calling it legacy, I like to refer to them as brownfield systems, because most of those brownfield systems are actually running their core-business operations, their pick pack ships, their customer databases, their human resource databases, their financial databases.
These are things that are not going to necessarily be pushed hardcore into the cloud, but they’re channel deliveries are going cloud-native, the way they reach customers is going cloud-native. And one of the things that we bring to the table is connecting those brownfield environments to these cloud-native environments without having to incur costs of migrating all those brownfield environments into some new cloud-native environment that may, or may not have the necessary security, may or may not be able to have the necessary segmentation. So being that bridge.
Yeah, I know. There’s a couple of people in the industry, one is jumping to mind in particular who says that legacy applications is just another way to say application that turns a profit.
Exactly. It really is true.
On Bootstrapping and Seeking Funding
I was actually curious to talk a little, or ask about the return on investment question. What is that conversation like for both private sector and public sector? And then, how would they measure, that we’re getting a return on our investment or not? What is the return that they’re looking for?
It’s definitely multifaceted and it’s going to create a tremendous amount of data points that somebody is going to have to array, that are particular importance to their business. So for example, you certainly can go in and you can look at your Amazon costs for your traditional EC2 layer, how much compute you’re using, how much throughput you’re using, potentially looking at the cost of moving data around in Cloud One, but to measure ROI, and even TCO, because measuring TCO is hard, not just ROI, you are as an IT professional in any case, regardless of private or public, looking at probably 25 different tools, multiple reports with hundreds of data points on them, “How much is my compute on this cloud? How much is my compute on that cloud? How much throughput am I pushing through these clouds? When I connect these two clouds together, I’m going to get more charges for the ingress egress, then go into another cloud.”
There’s all of these points that you’re looking at. And when you start to add things like Kubernetes and cloud native software, that gets even harder to measure because now you’re trying to figure out… It’s not as simple as looking at an EC2, and how much an EC2 cost, it’s, “How much does a particular pod cost on top of that EC2 infrastructure? How much does it cost me to run multiple containers within that pod? How much is this particular application costing me?” And you’ve got to traverse multiple layers of networking in and out of this environment, to that environment.
It’s a very difficult thing to measure return on investment. And some of the tools we bring to the table are business criticality. Tell us what services you’re running, regardless of the cloud that it’s running in, are most important to your business.
And then we’ll look at it and we’ll monitor, and we’ll alert you when things tip above those thresholds, especially for those really urgent business-critical systems. We’ll measure your infrastructure layers from those base cloud infrastructure layers all the way up through your Kubernetes layers into the code. We’ll tell you things that could potentially take something down that is of business-critical importance.
One use case is, we’ll alert you on layer-three certificate that you may be using inside your infrastructure for a particular database or an index engine, when that thing expires so that it can rotate. We can also automatically rotate that, but those little types of things could end up costing downtime, which now, all of a sudden, in terms of return on investment, the more you’re down, the more your cost goes up, the less you’re making any kind of return.
And when you multiply that by N, because you’ve got your assets sprawled all over these different environments, you can see why people get a little skeptical at higher levels in these enterprises as to, “Well, wait a minute, I’ve got this brownfield system that is actually turning a profit. It’s working. Why am I changing this? And you’re telling me I have to cut this big brownfield system up into 1,000 little points? I’ll never be able to measure or manage it.” And that’s what we’re trying to solve.
On Grey Matter’s Security Model
Excellent. And then of course, also you have security use cases and measuring return on investment for security. I actually don’t even know where you would start with that.
Well yeah, by centralizing your policy, and I don’t just mean network policy, but by centralizing your application-layer networking policies, by centralizing your policies around your TCP-type connections, by centralizing your policy on your data itself, you can manage that policy across a multi-cloud infrastructure now. And by doing that, you can absolutely do two things: 1) guarantee that the policy is going to be in place across those agnostic environments, and you control those, 2) you’re also going to be able to measure when things fall out of bound of that policy, and you can fix it.
You can actually have command and control where you can go in and you’re not going to have to spend hours upon hours trying to figure out, “Where do I change policy A versus policy B?” And again, security policy is wrapped for us, in terms of network performance, network-type policies, things that you normally see: circuit breakers, rate limiters, but also getting into the very detailed rule-type-based policy models of, “this IP address is allowed to have access to that IP address”, or “this function is allowed to run under these circumstances by these users”.
And then on the flip side of that, we Audit Tab all of that. So, at a very granular level in this multi-cloud hybrid environment. So you could literally see everything that’s happening across this environment, you can put that into some repository on the side for all kinds of different use cases.
On the Importance of Existing IT Investments
So, when you talk to somebody and they have a brownfield application, pejoratively, we say it’s a legacy application, it’s doing $1 billion in revenue, why should they change it? Why are they even considering that?
Our message is don’t. Actually, that’s one of our big differentiators, is we don’t walk in there and say, “Change that.” Usually what that conversation with us is, is, “I have this system and it’s got really critical data, but it is making and turning a profit. However, for me to reach more channels, for me to be fully omni-channel delivery with all of my different customers, I do need to somehow expose elements of this brownfield system.
My OSS/BSS-type layers are holding this critical system that will let me build little apps that are going to run on different phones, or these pieces are going to be an aggregate app that that gets delivered via a cloud.” And we connect that cloud-native hemisphere with those OSS/BSS layers at a really granular level with security. So we don’t tell them, “Shut that down and modernize it,” because it doesn’t make sense until they have better insight on exactly how it’s going to get used in a cloud environment.
So you’d look at the telemetry, you’d look at the audit calls, you’d look at the general usage of different aspects of those brownfield systems. And then you can put a business strategy together to say, “How would we modernize that old system, or relatively new system that we’ve just invested in, and what parts make sense to be put into a cloud-native infrastructure in the first place as a cloud-native application and what parts don’t? If it’s serving my need, why am I spending dollars on modernizing or migrating that?
Because somebody told me that the database I’m running on a virtual machine needs to be pushed into a container?” That’s not good enough. We do not subscribe to the, “Rewrite your applications from scratch.” We actually subscribe, and we’re all about connecting those applications that are actually moneymakers and have crucial data, and exposing those endpoints to cloud-native systems and bridging those cloud-native systems with those brownfield systems via a mesh network.
It’s almost like, sometimes the best approach to tackling your tech debt might be don’t pay it. That’s what it sounds like.
It might. Going through this for five plus years with global enterprises at this point, we’ve seen the gamut of, “Yeah, we’re going to migrate these capabilities to cloud-native environment. We can scale them better. They become easier to upgrade or update,” but it’s only going to be these two subset capabilities in this much bigger system.
We’ve also seen, “We’re going to take this old monolithic application,” which I can’t stand the term monolithic, “But we’re going to take this old monolithic application, and it is heavily being used. Therefore we’re going to incur the cost to ‘modernize’ it and microservice-enable it, but we’re going to do that in a very methodical way where we’re looking at each component and we’re designing out each component and that can be a year to two-year transition, but we don’t necessarily have to rush into that.”
And then we’ve also seen, “We’re going to deprecate this. We’ve been running this, but it’s hooked to a bunch of different things now via something like Grey Matter, and there’s nobody really using it. So what are we incurring the cost for to even run it, whether we modernize it or not? Modernization there is, turn it off, get rid of it, minimize my overhead on it.
I’ve got these other competing capabilities that do relatively the same thing.” So that becomes a, “migrate the data out of that one particular thing into the places that it is needed and useful, and then deprecate that”. So we’ve seen all kinds of things happen at scale, as folks start going down this journey.
On The Real Enterprise Value of Cloud-Native Tech
And ultimately, what do you think is the end goal? Why are these companies or these government entities even interested in making the investment, investment in terms of money, time, human resources, to modernize, to adopt cloud-native for anything?
I really do think that there are a lot of pros. For one, I touched on one earlier, command and control. It is feasible to put command-and-control interfaces in where you can now see what’s running on your infrastructure, regardless of the cloud it’s running in, regardless of the segmentation you have in place, you have legit, template-based, derivative infrastructure up that you can now manage, which is difficult to do in some of these old brownfield systems, it’s all distributed. It’s usually part of the application itself.
And it’s very difficult to get reporting out of those things without heavy lifting. Speed to market is another good reason. It is much easier, when you have your full pipelines in place, to do faster delivery. It’s well documented, it’s not rocket science that when you start to put in this automated infrastructure as a service, platform as a service cloud native-type technology pipeline in place, you’re not spending days or weeks tearing down old servers and putting in new servers, you’re not spending days or hours to rebuild a server from scratch with OS patches.
So it does absolutely speed up market delivery, from that perspective. And then it’s just a better development. I say this, but it’s funny, I see all kinds of things. I have a bunch of developers that work for me, and I see all kinds of things on Twitter. I would say that it is a overall more pleasurable development experience than having to deal with code, when I was coding. Dealing with the old DLL nightmares, independency trails, and dealing with the way you had to build apps, hour-long or days-long compile times, depending on how big your projects were in old Visual Studio code, the overall developer experience is much better.
The API ecosystem that you have at your fingertips now, that you can bring in different APIs from different cloud native environments is a wonderful thing. The amount of open-source libraries that are at your fingertips, that you can just pull into your project that handle some complexities that you might have in your applications, all of that is really a bunch of pros and why this is attractive.
On Lessons Learned
Now let’s talk a little bit about what you’ve learned since 2015, and how did you get the idea that Grey Matter was something you wanted to pursue? And then, what have you learned over the course of running a business built around it?
Patience. we bootstrapped, this is my first company ever. I’ve been talking to a lot of different people in the last couple of months in particular. I don’t think there’s a right or wrong way to do it. We are bootstrapped. We never took money and we’ve been cutting our teeth for real, with real enterprise-production customers, some of them are actually very large customers.
We don’t just deploy into pockets, we deploy at the enterprise level down, which is a different beast completely. I would certainly say that when we first got into this, we looked at the space, we knew that Docker and things were coming, we had been using them internally ourselves. And we knew that Kubernetes was going to become a thing, but it was going to become a thing for certain workflows, certainly not all workflows.
We knew that clouds were very cool and very real, and certainly saved a lot of hardships on the deployment models and things we just talked about. But at the same time, we’ve always been a crew that was very attuned to measuring what was put on the infrastructure at multiple layers, from layer three TCP connections, up through layer-seven HTTP connections. Some of us had a cybersecurity background, so we were always looking at the security of ingress and egress routes at really granular levels, both in a defense posture…
Well, mostly in defense posture, but we knew that something like a service mesh was going to become critical. I mean heck, in 2016, 2017, and I argue this now, although it’s getting much better, Kubernetes’ networking is pretty terrible. And then when you take that and you start looking at multiple clouds, networking in itself is really hard.
It’s a hard problem set. And a lot of people don’t like to do it. However, when you look at half of the things that get deployed, especially now, everything has APIs, everything is bundled with networking-like components. You’ll have this proxy with this particular piece of software that you just bought, and you won’t be able to go in and change because that’s an application off the shelf that you just put in your infrastructure. Then you couple that with the amount of open source that has matured. Cloud Native Computing Foundation, last I looked, had something like 876 total projects.
If I’m an enterprise CIO and I’ve got my staff on a global enterprise, trying to maintain my business, which is not an IT business by the way, and my staff is coming and telling me we have to pay for even a hundred different tools, that probably are all good features, but there are a hundred different tools, that’s overwhelming to me as a CIO or a CTO.
I still want to modernize, but it’s just overwhelming. And we feel strongly that some of these things are just features of a larger platform. We are very bearish on the fact that things have to be agnostic. I’ve always said that service mesh and the concept of service mesh should be an agnostic layer, similar to software-defined networking, if I could only use my service mesh as part of Kubernetes, then I miss a whole big, huge population, because there’s things that just don’t make sense to push into a container tomorrow. There may be things that don’t make sense to push into a container ever. And this technology really needs to be far reaching across all my OSS and BSS implementations, and bridge my cloud-native tech. I mean, it really does.
And that’s things that we’ve just learned. We’ve had to implement all of these type of patterns, and depending on what you read and who he read it from, some of them will be called anti-patterns, and some of them won’t be. But the fact of the matter is, that an enterprise is a mixed universe, and network topologies always change.
And you have to plan for downtime and failures, and you have to plan for security headaches, and potential breaches. And you need to know where assets are running and potentially why they’re running there. And you need to have a better handle on detecting anomalies, and I’m not talking about just security anomalies, any anomalies, service in region A hasn’t been hit in six months and all of a sudden it’s getting 100,000 hits.
That’s an anomaly that you’d want to drill into and understand what’s going on there. Could be very positive, it could be the time of day, it could be the time of year, it could be just all of a sudden that the data that you have in that region is something very popular, but you want to understand all of those things as a business, and capitalize when you can, and also minimize any risk, if there’s potential risk that you need to minimize.
What’s Different Now?
And then how has Grey Matter changed since you started in 2015? And as you’ve deployed it in the real life, what feedback have you gotten and how has actual usage played into changes in, either direction, minor or major, or just adjustments to the business strategy?
Two really big ones, and the third one is something I mentioned earlier, but I’ll talk about the two that we actually focused on in the last year and a half, two years. Grey Matter was always an overwatch business insight mesh network that we were putting in place. It always consisted of our own control plane, with a flavor of Envoy proxy, as the data plane. Over the years, we’ve started to support more data planes because our overwatch capability should manage not just one particular data point. We love Envoy, but there are other data planes that we have to pay attention to.
We’ve really focused a lot on that business insight and visualization layer, that multi-cloud management and catalog layer, but we’ve also on the other end of the spectrum, the dev-ops engineers, the developers at scale, that potentially have to deal with horrendous configuration files of all of this cloud-native infrastructure.
And it’s funny, what I think of first when somebody says, “I’ve got a bunch of cloud-native infrastructure running in my enterprise,” it’s, you’re going to have a tremendous amount of configuration files at every layer of the stack, probably YAML configs that have hundreds of thousands of lines up to JSON configs. And then you add in the special-sauce glue scripts that these dev-ops engineers and developers create just to manage this stuff.
It’s a hairball, it’s a nightmare. And we have really focused, at the latter part of 2020, on simplification. We actually just released a fully template-driven model to deploy a service mesh with maybe answering three or four questions, automatically generating all of that configuration behind the scenes that gets deployed. There’s two things that happen in our tool. We’ve also augmented and made gitops a first-class citizen in our tool.
Again, this is partly because of the enterprise setting. Nobody in the enterprise is just going to let you deploy a service to production with one simple button push. It demos really well, and we certainly have demos, but in a real production setting, you’re not going to go from, “I just wrapped up the code and I did some testing on my laptop,” to, “Deploy this into production now.”
So we made gitops flows first-class citizens where you’ve got verification checks and governance models where the enterprise guarantees what the templates are, these templates are being checked out, here’s who just check those templates out, here’s who’s applying those templates. The verification happens, step one, step two, step three, so that you know before something gets into production, that somebody just didn’t turn off MTLS as an example, to let things talk over SSL. So we’ve really focused on that kind of simplicity in the toolchain.
That’s been a major focus for us. And now what we’re actually really doing in 2021 is making data a first-class citizen inside the network. You have access at every point of the service mesh itself underneath the network. You can see all traffic flowing from these different layers and different devices and different types of technology that you have in your system. And you can do all kinds of pretty neat things, including applying content-type policies as content flows from one service to another service.
And I’m not talking, one rest API to another rest API, I’m talking content flowing from one piece of software, you’re pulling pointers to data off of Kafka, that data happens to be sitting in some S3 bucket or some other backend object store. And that data is going to be munged and sent to three or four different service tool chains or systems, and we’re applying content policy every step of the way. That’s really what we’re focused on right now, and bringing that data network together with the actual mesh network.
A Day in the Life of a CEO
Excellent. And to ask something a little bit different now, what do you actually do on a day-to-day basis? What’s your actual tasks that you do every day?
So Decipher is a little bit more than 60 people, and I wear lots of hats. Most recently, I have spent my last two months building up an investment portfolio package, and we are actually looking for investment now. So for me, day-to-day, I’m in tremendous amount of Zooms, or pick your meeting thing, and going through my pitch and having some really good conversations. And those conversations are actually very active and actually all have been very good.
We received some phenomenal feedback. I’m excited about what the future brings there for us in particular. So that’s been my day-to-day recently, but we have a business to run. I talk to almost, or try to talk to almost all of our customers as much as possible. So a week doesn’t go by where I don’t have first-hand discussions with customers across the board, I make special time to do that.
I also follow that same mantra with my staff. I am a big proponent of doing one-on-ones with my staff, not just the management tier, but I will have my staff set up one-on-ones with me and we’ll go through whatever they want to talk about. In some cases they want to talk to me about their detailed tickets that they’re working on, and other cases they want to actually ask, in general, how’s the company doing and what the strategy is, and how the market is shaping up in my opinion and all those good things, and I spend a lot of time talking to folks like you, and analysts, but that usually ebbs and flows depending on the press cycles.
Excellent. Fabulous. Well, I think we’re going to wrap up. And the last question is just how listeners can connect with you or learn more?
Sure. They can certainly go to greymatter.io. We’re about to release a brand new website in the coming weeks. So, hope everybody checks that out, but the current one is pretty good. And also, you can find me on LinkedIn, Christopher Holmes at Grey Matter IO, certainly look me up. And if you do send me a LinkedIn, I usually will accept and I try to answer the messages there. So feel free.
Fabulous. Well, thank you so much for taking the time to talk.
Yeah. Thank you very much, Emily. Enjoyed it.