Newsday: Crowdstrike and Delta Responsibility and Growth by Simplification with David Ting

August 14, 2024: David Ting, Founder and CTO at Tausight joins Bill Russell from This Week Health to discuss recent developments in cybersecurity, particularly focusing on the CrowdStrike incident. This conversation highlights the critical issues surrounding kernel-level coding vulnerabilities and the necessity for thorough testing to prevent significant disruptions. CIOs face the challenge of balancing rapid deployment with the risks associated with untested releases. The discussion also delves into the complexities of healthcare system architectures, the growing interdependence of applications, and the essential role of cyber resilience. As healthcare increasingly relies on technology, what strategies should healthcare systems implement to streamline their IT environments and minimize attack surfaces?

Key Points:

  • Discussion on CrowdStrike Incident
  • Impact and Response Strategies
  • Lessons in Cyber Resiliency
  • Apple’s Approach to Technology Releases
  • Simplifying Healthcare IT Architecture

News articles:

Read on thisweekhealth.com.

Video Transcript:

Below is a transcript for an episode of the 2-Minute Drill from This Week Health. Please read the transcript and respond ‘done’ when it is complete.

This episode is brought to you by Tausight.

Tausight uses patented AI and advanced machine learning technology to discover both structured and unstructured PHI, going beyond DLP solutions to safeguard patient data from breaches. Their experienced team is dedicated to mitigating the financial losses and reputation challenges faced by healthcare organizations, allowing providers to focus on delivering quality care.

Tausight. Trust the experts. Trust Tausight. Visit thisweekhealth. com slash Tausight for more information.

Today on Newsday.

📍 I’ve always heard the expression, there’s no glory in cleanup. And I think that’s the truth. 📍 My name is Bill Russell. I’m a former CIO for a 16 hospital system and creator of This Week Health. where we are dedicated to transforming healthcare, one connection at a time. Newstay discusses the breaking news in healthcare with industry experts

Now, let’s jump right in.

(Main)   Is Newsday today, and we’re joined by David Ting with Tausight. David, how’s it going?

I’m good. How are you? Good. You’re in Florida with all the rain, so I’m hoping you’re doing all right.

I am doing all right. the rain seems to be the least of our challenges these days. we haven’t gotten a chance to talk since the CrowdStrike event.

And I want to talk about that with you a little bit, cause It has to have changed the calculus for everybody in terms of how we think about things, right? So CrowdStrike has an agent that sits below everything, right? So it sits below the kernel, it’s

it’s part of the kernel.

It gets loaded as part of the kernel. it’s like the core stuff that you count on. It’s, feel bad for everyone involved and having built kernels in my past, I know what happens when you’d make one stupid little error that just shuts down your whole machine and you had to physically restart it and in this case, you have to physically go and touch it, put it in safe boot mode, delete that file.

It’s just I feel bad for. The folks at CrowdStrike, I feel bad for all the hospitals that we know were affected. There’s a lot of work

the house site agent load at that level or is it higher?

We’re higher than that. So having done in Pravada as well. We load software onto all the running machines.

I had always had a fear that if we ever wrote code for the kernel, we would end up with a blue screen of death somewhere.

It makes sense for it to operate at that level, right? Given what it’s doing.

Given what it’s doing. If you can’t, Do it at the user level, which is secured.

At least your operating system will function. Putting anything at the kernel, you gotta be so rigorous in making sure your testing has covered every issue. And the George Kurtz note about why it occurred is just Oh, somebody de referenced the pointer that probably was not, they had 21 parameters and they, one missing parameter and it broke.

And that’s all it takes. I think they said the release was live for 70 minutes, 60 minutes. It was a small amount of time, but it hit everywhere around the world. Very.

8 million, I thought was the number is some ginormous number, which I would never want to repeat.

Yeah. So I guess my question to you is so now I’m a sitting CIO. I know I have this thing at the kernel. I’ve trusted it up until this point. a reason it’s there because it does protect certain level. There’s a reason. That the releases are done this way too, because the whole idea of CrowdStrike is crowdsourcing

so we see the attacks, we engineer a solution to those attacks, and then we release them very quickly so that if they’re attacking one place. we can then get there pretty quick. So that’s the theory behind it. We just saw what could potentially go wrong with the practice.

I’m now a sitting CIO. How do I think about these releases, these automated releases of software that it’s there for a purpose. It’s there for a reason. It’s do I insist on, hey, you know what, release it to me a half hour after or an hour later. I want to see what you do to the rest of the world before it comes to me.

was going to say the whole notion of automatically pushing into an enterprise, a critical change like that should be really carefully thought through of how do I at least establish some runtime on it in my test environment. Before I turn on the switch to say release it to 95 percent of my machines.

I would love to have that air gap testing to be done on a selected set of IT people, if it failed within an hour.

So if I’m nefarious actor and some people have called me a nefarious actor in the past but let’s assume I am a hacker. Isn’t that good news for me? I’m like, all right, so I now have a window. So , they’re going to release it into a test environment. They’re going to test it, whatever.

These things are released like at midnight, one o’clock in the morning, two o’clock. I don’t have staff. That’s, my staff isn’t going to be testing this until the morning. So now I’ve opened up a window of six hours, five hours. Am that an acceptable, all we’re doing is mitigating risk. Is that an acceptable risk?

I think that is. You’re taking, after watching your machine go down, blue screening, right and left, and having to physically touch them to restart them. That’s a risk that you’re going to have to calculate. Is it worth it? Or is there something I can do? Or do I push back on the vendor to say, I want to see all your test plans for what you did with this new change?

Come on, let’s, all my engineers are saying, didn’t their unit test code pick this thing up? Did they not cover this exception case? This is exception handling. I would

think, I have that same thought. It’s don’t they like release this into a little test bed and see if Windows will boot?

It sounded like it impacted every single Windows system.

Yes. Uniformly. It just. suspect that the logic was probably okay in the test case. It’s when they actually put it into their production run that they go, Gee, we forgot one of those data sets. And then the released code. Didn’t like the fact that there was one missing piece of data in the installed package and it croaked.

Would this have happened on a Unix machine or is different operating systems designed differently?

Oh, if you have that error that shows up in the kernel code and It doesn’t matter what your operating system is. Your operating system is not going to like it. If you didn’t defensively code to say, I am not going to take down the operating system, no matter what.

That’s the harder part of writing code. My Engineering manager used to tell me, he said, everybody can write code that does logically what it’s supposed to do, but then you have to beef it up to make it resilient. And then you have to beef it up again to make it handle all the environmental exceptions.

So he said, when you get a piece of code to work, you’ve only done one third, the amount of work you need, it’s going to go up by three X. The harder part are the two other pieces, make it resilient, And make it handle all the exceptions. And he was right. This is, he told me this decades and decades ago.

And as I look back at good code worries about what am I going to do to the environment that I’m running in? And so some steps were

skipped. Some steps were skipped. That was my immediate thought when I heard about you what, David, I was talking to some CIOs and some CISOs about this.

When it first hit, they thought they were ransomed and they immediately went into that mode. Now CISOs have a pretty good network and, when they see it’s not only my health system, but three, five, six, whatever, they immediately said no, this is something else.

And so they, they figured that out pretty quickly. I’m glad that we’re starting to generate those. those kinds of networks. Let’s talk about response. So I talked to one CIO. They didn’t miss a single appointment. They didn’t miss a single surgery. They were able to, it, it was tedious, but it was manageable.

Like they have the staff to get around and hit all those machines, or at least the critical machines, so that they were able to keep functioning. But then we read Delta is suing CrowdStrike. They’re going after them for and my flight got canceled, 48 hours after the crowd, no, not even 48, it was longer than that after the CrowdStrike incident.

And it’s a case where they had created this environment that was pretty centrally managed, push everything out to the edge. And it was a very distributed environment. And now it’s like, all right, we’ve got to get to all of these locations. and actually physically touch these machines and they couldn’t do it.

They just.

Look at the number of machines where you actually have to go behind, power it up in safe mode. Go and delete that file, restart the machine, and do that one after another. Even if it took five minutes, just getting to physically to those machines airport. Oh,

I don’t think you’ll be called as an expert witness on this, but do you think that Delta has a case here?

Hey, we lost X amount of dollars were coming after CrowdStrike. I’m sure the CrowdStrike agreement sort of states, Hey, this is a potential. And then the other thing is, if it does go to court, doesn’t CrowdStrike just line up organization after organization that had decent recovery plans in place?

I thought they said some, somebody said another airline got back up. much faster than Delta did, so it must be their own procedures for coming back online. So typically you would expect that your machines, if you go into a bad state, at least they would come back in a mode where you can remotely manage them.

For you to have to physically go and touch them is incredibly complicated. In a distributed environment like, like Delta Airlines with machines all over the world. I can’t imagine the scale of that work. I would think. Unbelievable. When I heard it about it, I, this is incredibly bad.

Look, we went Change Healthcare, Ascension, CrowdStrike in was that the span of 30 days? It couldn’t have been more than 45. It was a.

Within the month and a half, two months at least. I’m losing track because they just keep coming.

They do just keep coming. What’s the lesson, what’s the lesson to be learned?

I, there’s a resiliency and a recovery focus. I think that is now becoming just paramount. We’re just it’s. We used to always say it’s not if but when. And, okay, great. By the way, I hate that phrase, because it’s almost like you’re throwing up your hands, but

Oh, okay. The phrase we’d like to talk about is cyber resiliency, but cyber resiliency has always been thought of as, gee, I am going to be attacked and compromised.

I don’t have the same kind of processes in place for something that a vendor does. That’s tends to be the I hope it never happens plan versus, gee, I’m under attack by an attack and I can quarantine and segment, isolate, restart. Here is across the board in this by a trusted vendor, by most trusted vendor.

been in the room now with CrowdStrike and healthcare partners on several occasions since this transpired. And it’s interesting. The response is right. It’s humble. It’s contrite. It’s, look there’s no, it’s been very transparent.

Yeah, we’ve we’ve really messed up here.

I think their CTO said, we gained trust in drops and we just lost it in buckets. Yeah. And he goes, and I realized we have a long way to go to get this back. I actually liked their approach. It was transparent. They communicated very well. I felt through the process again, I just, from a change management standpoint.

Knowing how much the pains we went through for change management within the health system. Like we had everybody on the call. We had the storage people on the call, the networking people, the what, and we went through what’s going to happen. We’re going to do this release. And we did those calls weekly for any changes we were doing.

And they were rigorous calls and that kind of stuff. I have to believe those processes existed.

I’m sure they did or do, but I’m sure they’re being beefed up because you read the latest RCA, the root cause analysis that they’ve put together. And my engineers and I were just saying, They, come on, they must have had a process, or somebody skipped a process, or the testing was too optimistic.

It’s oh, it did what it’s supposed to do, but you didn’t do the final testing, or this would have never happened.

I’m going to point, because I’m going to transition to talking about Apple here for a minute. And, I remember the debacle, there was a couple, there was AntennaGate, I don’t know if you remember AntennaGate, but on the exterior of the iPhone.

If you like touched two different things, you interfered with the antenna and the signal. That was one of the things that Apple released. That was a mess. The other was if you remember Apple Maps, their map device, when they first released it there was all sorts of stories of people it was just poorly poorly released.

And I remember the guy who did release that, the guy who was in charge and responsible for that did get fired. They moved him out of Apple. And I think that’s interesting to look at that in this context, and then look at they delayed the release of their Apple intelligence.

So they announced it. There it’s in the development environment. You can get access to it. Coders are working with it, but as they’re playing with it, whatever, for whatever reason, they’re pushing it back three months and whatnot. It’s interesting when larger companies start to take an approach that, you know what it’s more important to get it right than to move fast.

And I think Apple is one of the companies that epitomizes that now. Because for, oh my gosh, for at least a year, people were saying Apple’s falling behind, Apple’s falling behind, generative AI, open AI they’re just falling behind. And they just, they took their time, they put it together.

And now they’re slowing down the release to, I think, make sure They get it right. Is there too much of a push on these organizations to get it out in the market?

I think the whole model for agile development is. Let’s build it, test, get it to work and put it into the marketplace and see what happens to it.

This is a form of testing, get it into the market.

Yes, it has been for a lot of startups. It’s a, Hey, I could deal with, 95 percent capability. It speaks to, we are now so dependent on technology. We’re so dependent on these software systems. we. Believe we have criteria that adequately tests the software that we put out, but the combinations, the input variables are just so huge these days.

There’s so many things that can disrupt it. And so the complexity of software systems is getting to the point where it’s harder and harder to test to make it more reliable. We have so many layers in our stack that we don’t know about and foundationally has something changed at the very bottom of the stack at the kernel level.

📍 📍 📍 📍

📍 Hi, it’s Sarah Richardson here, president of the 229 Executive Development Community for This Week Health. I’m excited to share details about the upcoming SOAR conference, happening September 18th through the 20th, 2024 in Midtown Atlanta. SOAR is your opportunity to elevate your career in healthcare IT.

Join us for dynamic sessions, interactive workshops, and keynotes from trailblazing women in the industry. This event offers actionable strategies and fosters genuine connections. Whether you’re a health system employee or a vendor partner, SOAR provides unique networking and growth opportunities.

Sponsorship packages are available, offering visibility and engagement with industry leaders. Don’t miss out. Register today at bluebirdleaders. org slash 2024 SOAR Atlanta. Let’s empower the next generation of women leaders in healthcare technology together. See you in Atlanta.   📍 📍 📍 📍

📍 talk to me about architecture then. So I’m a, again, sitting CIO and architecture is not something we talk about all that often, although it’s coming up more and more. And so you’re talking about the individual systems are getting more and more complex. And the interplay between those systems is very complex.

The interoperability of data is very complex. We have APIs and all sorts of other things. If you were sitting in front of, I don’t know, 50 CTOs of health systems, what’s the message? My first message is simplify. That’s way too complex. Your tax surface is too big, too many applications, too many.

One of the things we found is, in looking at the telemetry we collect from different customers and prospects is the huge number of software products that they use. Huge. You go, who knows what goes into the system? actually keeps track of, gee, the number of programs, the number of version differences between them?

The patching level differences. How do you even start to get a stable base to say, hey, let’s slow down the changes. Let’s figure out, do we really need all these programs because they’re just being introduced and added. At a rate that I think is going to swamp the IT’s ability to manage them.

And if they have side effects, you’re going to destabilize your entire environment. And I think if this were a physical manufacturing facility or precision manufacturing facilities, we’d be in big trouble. There’s just too many variables.

If you can’t count them and you can’t give me an accurate inventory.

You need to get rid of some. It was the same thing I would say to my family. Hey, we have too many of this. Like we’ve lost track. We don’t use them. if you can’t keep track of them. You can’t own them. And I still remember when I came into St.

Joe’s and I asked for again, this 2012, but I asked for an inventory, just a physical inventory of machines. And and I wasn’t talking biomed and stuff that would have been other that would have been insane. So I just asked for, give me the PC and they gave me it and they said plus or minus 10%.

Like plus or minus 10% what do you mean? Plus or minus 10%. Like I was I didn’t even understand the concept. of plus or minus 10 percent in a health system that has to be secure. And they said you have to realize some of these machines haven’t been turned on in months, some of them, and they just gave me all these different things.

And I’m like I’ll accept let’s try to get to plus or minus 1%.

One percent would be pretty good.

look, if it has an IP address, we should know what the heck it is. We know

what it is. I completely agree. It should be, but there’s no glory in doing that.

There’s

no glory. There’s no glory. It used to be there’s no glory in it, but now that we’ve had all these attacks and all these things, Maybe there is some glory in it. Maybe we should be elevating these systems.

Exactly. I’ve always heard the expression, there’s no glory in cleanup. And I think that’s the truth.

Typically, I see hospitals with 2, 000 different programs running. you look at them, you look by vendors, by their date of install, and you go. Who’s using these things? Somebody’s using them, but who’s managing them? Ah,

yeah, this is gonna sound self serving, but it takes courage.

When I came into St. Joe’s, they, we had this cycle that was going, they would audit the IT department, they would tell them all the things they’re doing wrong, the team would try to respond to it, then they’d do another audit and tell them, and so this thing was just going on, and I’m like, all right, I’m gonna call timeout, and I went to the executive team.

I’m like, I’m gonna call timeout. I don’t want another audit. I don’t want another audit for at least, 12 months until we can put something in place that they should audit. Like we already know it’s broken. And every time they come in, they make money and our team just spirals because, and now they don’t have time to actually put something.

You said there’s no glory in cleanup, but we should ask for that. We should ask, Hey, I need some time to clean this up.

I think that is like the tech debt in software development. You have IT debt. IT debt is all the stuff that you know should be done to make your systems more resilient, to make it more manageable, to clean up the clutter that you know will bite you in the, bite you eventually.

It’s the same thing in software development, same thing in hardware development. We all have debt. Then we just keep saying, hey, the business has to keep running., we have to get this release out. We have to support this one feature that a customer has asked for. We just know that Dr. So and so just uses that machine and it’s really obsolete and we should replace it.

But nobody’s going to go and do it.

You see that story on Southwest that they had like a windows? Oh, what was it? Was it XP or? It was ancient. Oh no, it might’ve even been before XP. I don’t know what it was, but it was, and they’re like, Hey, it’s saved our bots because it’s still running.

It’s still running. Oh my gosh. You know what this reminds me of? Have you ever done plumbing?

Yes, I have. There are bad analogies here. I dislike plumbing. If you make a mistake in plumbing, it will find you out.

I was going to say, as an electrical engineer, I can do anything with wires and electricity. I hate plumbing.

Water is the worst thing to deal with, whether it’s clean or dirty. It’s the worst job.

It’s like that. We need to think through healthcare IT and the same. it will find you out. Like somebody is trying to find that really old PC that hasn’t been patched or that, application that hasn’t been patched or whatever.

That’s what they’re looking for. And it’s just one hole in and now they’re in.

Yes. Your attackers do a very good job of finding all those vulnerabilities. So it’s been interesting. I’ve, we’ve been looking at a lot of the attack vectors that bring in ransomware into different environments and how relatively easy it is to do to craft a piece of software that will evade your EDRs and your AVs to get privileged access, to get elevation of sorry, to, to dump out admin credentials or to elevate privileges.

to get access to your files. Those are things that are, again, there’s no glory in it, but if we can stop those, it’s like that dripping leak that turns into a fractured solder joint. It just pops out. You want to stop those if you can.

Last question. What’s the distinction between the health systems they’re not getting into and the health systems they’re getting into.

Because I’m not going to call them out by name, but there’s health systems that, I would be shocked if I heard that they were breached. Not that it’s not possible, but it’s just, there’s certain health systems that they’re not getting into. What’s the difference?

I think it’s a combination of, gee, it’s just more difficult or you’re, it’s, Harder to get a foothold. Once you get a foothold into a system, then you can launch that reconnaissance, you can evade, you can look for different entryways to escalate your attack. It’s getting that initial foothold.

Whether it’s training of your staff or whether you have better defenses, or you just have more restrictive policies. The more restrictive you are, the better off you’re going to be.

David, always fun to catch up with you.

In the middle of summer, we’re still looking at, gee, all the things that we just talked about.

And can we ever get a break?

I don’t think so. Somebody once said to me, There’s a global cyber war going on at all times, like the kind of thing that if it were happening in the physical world, we would say we’re in the middle of World War III, but on the cyber world, we’re not as in touch and as in tune with it.

But the attacks go on every day. And people put up their best defenses or whatever. It’s just it’s, I think it’s going to be perpetual and that’s the world we live in and we’ve got to adjust.

Exactly. We’re so dependent on that technology and it’s so pervasive.

That and it’s getting way more sophisticated, much more complex. Now we have AI models and general AI models that need to be secured and tested and all this dependency now on the data and the data analysis to create more data is going to create yet another layer of cyber concerns. Yep.

Absolutely.

David, thanks for your time.

You’re welcome. Great talking to you.

Thanks for listening to Newstay. There’s a lot happening in our industry and while Newstay covers interesting stuff, another way to stay informed is by subscribing to our daily insights email, which delivers Expertly curated health IT news straight to your inbox. Sign up at thisweekealth. com slash news.

Thanks for listening. That’s all for now