Avoiding Catastrophic Downtime Events at Data Centers
What happens when a critical data center server shuts down due to power loss? Like a tree falling in the forest, does it make a sound? How do data center operators know what happened?
Thanks to advanced control and monitoring tools, data center teams can identify issues through alerts and notifications before a problem triggers a catastrophic chain of events. These tools also provide historical context in case of an unintended shutdown, collecting essential data that allow teams to prepare for the future.
“When events, alarms, and faults happen, you need to trace the entire history of why it happened and what transpired throughout the operation of the data center,” says Jay David, Senior Manager, Solutions Marketing at Stratus Technologies. “If you lose that, you lose the ability to investigate things that are important to your operations.”
In data centers, traceability means having access to essential data―without it, it is exceedingly difficult to operate, managers at the top can lose their ability to make real-time business decisions, and organizations can lose their competitiveness.
This was the case of a data center customer in Denver, Colorado. It lacked redundancy and failover, and its power monitoring delivered lackluster performance. It was in such poor condition that when a transfer test was performed on a circuit, the server plugged into it failed―it would lose visibility into everything.
The customer also faced an uphill battle with a disparate system, where different automation systems supervised the control and monitoring of the data center’s air-conditioning, switch gears, power quality meters, battery monitoring, PDUs, AB circuits, crack units, fire alarm system, and lighting for the whole building.
It is common for data centers to have systems from different vendors. The problem is: How do you integrate them together?
With a hodgepodge of different servers and machines, the data center customer had to log in to all these different systems―this configuration set up by the previous vendor was not what the data center customer wanted, according to Rolla Starkey, Branch Manager at Dynamic Controls Inc. (DCI), an EcoStruxure systems integrator.
In a precarious situation and at risk of unplanned downtime, the data center customer decided to restart from scratch.
“It was so important to them that they were willing to invest in ripping out a system that was only two years old,” explains Starkey.
The data center customer also brought in new talent, including an engineer and a trusted local general contractor, to replace their existing vendor.
Ultimately, the customer felt it was necessary to build out a new specification to do a near-complete overhaul of the entire data center control system.
When DCI heard about the public project bid, they specified Schneider Electric solutions:
The server safeguards the data center customer from losing visibility by protecting the EcoStruxure software from unplanned downtime.
DCI integrated EcoStruxure and all the other Schneider Electric software in the Stratus ftServer. This allowed the data center customer to consolidate all its automation and control software assets, which normally would be installed in multiple computers, into a single yet redundant virtualized edge platform.
“The beauty of what is happening is everything is coming into EcoStruxure. We are bringing all that into one platform, one place for them to go and only one platform they need to worry about,” says Starkey.
The solution provides workload consolidation and operational resilience to the 12,000 sq foot data center, protecting it from unplanned downtime with 99.999% availability, while integrating all its disparate islands of automation. This resulted in lower costs and increased capacity for future growth due to less hardware, panel space, and wiring.
David also explains that the thin client built into the EcoStruxure solution extends the data center capabilities beyond the control room or panel because it allows operators to use remote mobile devices like cell phones, tablets, and laptops for monitoring and control.
“With this solution, it doesn't matter where they are, they can be in the control room or at home, they'll be able to see the information from their mobile devices to address faults, problems, alarms, events, and anything that happens in that data center,” David says. “Whether it's failure or just an alarm or an event, they'll be able to go to address it immediately.”
“EcoStruxure with Stratus were the big winners because it could handle all of the data center operations, integrating all siloed operations on a single platform, and in the process, consolidating all the disparate data and islands of automation,” says Starkey.
The Value of a Trusted Partner
When DCI was preparing their specification, it became clear that the data center customer needed high availability redundancy. Stratus was recommended to DCI, a company with a deep-seeded relationship with Schneider Electric.
Stratus too also has very mature relationship with Schneider Electric. Together, they co-developed the EcoStruxure Micro Data Center solution.
“Think about it as a big data center, made into a compact version, equipped with Stratus’ continuous availability and Schneider Electric’s uninterrupted power in an enclosed rack that you can install in a plant floor or a facility to run your HMI-SCADA, together with your other advanced solutions like historian, MES, batch, asset performance, and AI,” says David, who is an active contributor on Schneider Electric Exchange, where he is often involved in conversations in forums about various data center topics.
He explains that Stratus Technologies’ solutions integrated with EcoStruxure provide customers with peace of mind. “Once these servers run, they almost always run forever, until they decide to replace it,” says David.
However, even if a component fails, the customer can still operate because it's redundant, explains David.
In the case of failure of a hard disk or CPU or memory module, any component in one of those redundant servers, Stratus gets an email, the issue gets automatically flagged, and the customer or service provider like DCI gets an email that a component failed.
“We automatically send a replacement part to the customer,” he says. “In a lot of cases, the customer only finds out that something bad happens when they receive the spare part through the mail. That is really a testament to the reliability of our solutions, as well as our partnerships.”
In addition to solutions provided by Schneider Electric partners DCI and Stratus, a community for data center operators on Schneider Electric Exchange can help people find answers to address the significant challenges they face and collaborate on building innovative, reliable data center solutions.
So, are you ready to collaborate to help solve some of the most pressing challenges in the data center sector? Register on Exchange today.