Food for Thought: Rethinking Maintenance to Decrease Human Error
by Josh Anderson
TORONTO, CANADA – Sometimes the best way to solve a problem is to think against the conventional knowledge, and the data center industry knows that better than any other. CapRate recently spoke with some data center industry insiders about reducing risk in data center operations, and Mark Hurley, Solutions Architect for Data Centers at Schneider Electric provided some interesting food for thought regarding maintenance and human error.
To begin, Hurley wants you to remember one thing – less is more. “One thing they say is that, 70% of outages in data centers are due to human error,” he beings. “So there’s this school of thought – if stop touching them, stop maintaining them, would we have less outages?”
He’s not so sure. “I don’t believe in lights out, never touch, never maintain, but I do believe that you’ve got to find the right balance of maintenance,” he continues. “Recommended maintenance on UPS systems for switches that Schneider provides is annual, but I see lots of companies that are doing maintenance twice a year or sometimes even quarterly. That may be overkill.”
However, Hurley does like to give the conventional wisdom the benefit of the doubt. “If you’ve got properly trained people, and you have MOPs in place – and I can’t tell you enough, we have executed, vetted, trained MOPs that are just flawless…” he muses. “But a guy who executes those years after year, well after 5, or 10 years, he could get complacent. He knows it off the top of his head, he doesn’t read the script, he skips a step, and boom. That’s when you get an outage.”
Hurley thinks it’s perhaps more important to focus on the how than the when, though, since he knows as well as anyone else that human error leading to data center outages is directly related to maintenance — not rounds, not readings, but maintenance.
“If outages are primarily due to human error, then why do we have our operating engineers always having to do the switching activity on a Saturday night or Sunday morning?” asks Hurley. “They’re probably not motivated to be there over the weekend, they’re probably not on their a-game. They like to sleep at night and work during the week just like the rest of us. It’s food for thought.”
Hurley admits that all maintenance can’t be done during the day, but also points out that something like CRAC maintenance would definitely be up for grabs. “Why not have some done on Saturday morning?” he asks further. “At that time, maybe some of these supply houses are open, if you run into a problem. It’s hard to get someone to respond at 2:00 in the morning.”