Tech failures like the Crowdstrike outage are the new normal. Here’s how leaders can prep for them
PagerDuty CEO Jennifer Tejada says companies can’t stop incidents from happening, but they can reduce the risks.
Hello and welcome to Modern CEO! I’m Stephanie Mehta, CEO and chief content officer of Mansueto Ventures. Each week this newsletter explores inclusive approaches to leadership drawn from conversations with executives and entrepreneurs, and from the pages of Inc. and Fast Company. If you received this newsletter from a friend, you can sign up to get it yourself every Monday morning.
Do you remember where you were when you heard about the CrowdStrike outage last month?
I was on a pre-dawn jog in New York’s Central Park when I looked at my phone and saw that our company’s chief operating officer (COO) had sent the entire company a message about the incident and urged users of Microsoft Windows devices to keep their computers off. I immediately trotted back to my computer—a Mac—to catch up on the news, monitor our IT team’s response, and figure out how disruptive the outage would be.
An update gone awry
Luckily for our organization, the impact was manageable: Some computers needed to be rebooted, and some servers and software were down for a few working hours, slowing down work internally. But others weren’t so lucky. The outage—caused by a faulty software update—resulted in major disruptions to air travel, healthcare, and other industries.
Alas, companies need to gird themselves for more non-cybersecurity tech failures, says Jennifer Tejada, chairperson and CEO of PagerDuty, maker of operations management technology that detects disruptive events and helps companies respond. A recent PagerDuty survey of 500 IT leaders found that nearly two-thirds of companies saw an increase in customer-facing incidents, growing an average of 43% in the past 12 months. “You have to anticipate these failures because they are the new normal,” Tejada says.
Modernization’s patchwork progress
Tejada says that while companies have invested in automation and digital transformation, embracing everything from chatbots to moving mission-critical software to the cloud, few have modernized the way they operate. Too often, she says, much of incident response is not automated. That certainly was the case for our company, where the alert and updates came in the form of messages from our COO and later from our IT managers, not via automated notifications.
That lack of modernization can be costly for organizations. PagerDuty’s IT leader survey estimates that for respondents, the human costs of working on non-automated processes (things such as internal and external communications, remediation, and documenting problems) run an average of nearly $800,000 a year.
Ready, set, react
Tejada urges CEOs and other leaders to treat the threat of tech failures the same way they handle other potential crises. “We all run tabletop exercises in our boardrooms and in our management teams for cybersecurity events or physical emergencies like storms,” she says. “You have to run the same practice drills for technology failures.”
I asked Tejada if companies could reduce the number of failures and incidents by improving the way they deploy technology or asking their vendors for more uptime assurances. She was skeptical. “We’re able to deploy more technology at a faster rate than ever, at a lower cost than ever. That complexity is going to proliferate at a rate that far outpaces humans’ ability to manage it,” she says. “Rather than trying to manage and harness it, it’s about how you put smart guardrails and mechanisms in place to reduce your risk of a major failure.”
Do you have a digital disaster plan?
Does your organization have an emergency plan for tech disruptions and incidents? How automated is your response? Send your examples and stories to me at stephaniemehta@mansueto.com. We can share top ideas in a future newsletter.
ABOUT THE AUTHOR
(7)