A path to modernize the infrastructure of a data center facility
As 2020 begins, we have likely noticed that the many components of the data center infrastructure have aged past their warranties and software management tools no longer echo the reality of the actual systems. Along with this the ongoing operations & maintenance (O&M) grows outdated and obsolete as well as under-staffed. In a gradual manner the data center seems suddenly at a higher risk of an interruption. That aging data center should either be brought up to more modern standards or reviewed for what should be relocated, perhaps outsourced to colocation facilities, in the coming year to reduce the risk of disruptions to critical needs. Postponing only runs the risks of outage every month while also missing out on the benefits of updating to a more modern facility. Those updates not only reduce the risk of failure for the systems and components but also leverage the benefits of advancements in technology for more efficient, simpler operations in the future. Improvements can also lead to facilities that are better integrated, easier to maintain, and less expensive to operate, thus having faster paybacks while also allowing the facility managers to rest easy with reduced risk of outages.
There are a few steps that can be taken to bring an aging facility to the modern age of 2020. If not already clear, the first step begins with defining the expectations of performance and setting standards of operation to meet this goal. Studies can then be performed to identify operational gaps, severity of risks, and lists of needs for improvements. The assessments and studies would then be used to rate key areas of the facility, from the physical mechanical and electrical systems to the software management tools to the O&M programs. Potential costs and savings, of both downtime and modernizing, can add the detail needed for which investments would be best for the short and long term, which becomes critical for decision makers when modernizing a facility.
To analyze a data center for modernization, systematic steps are taken to determine what should be examined and how best to review it. Those steps can be targeted to evaluate specific systems or components but when this process is applied methodically across the data center a more holistic picture begins to emerge that can tell the story of whether the data center is aging well or if it has a hard life thus far.
Goals & Standards
What are the goals of the data center? Generally it might be to stay available for 100% of the next year or operate with a lower PUE or perhaps another goal. But beyond this other specifics should be identified. Are O&M practices standard? Are there performance standards or goals to be met?
But also add what you think of a modern data center – how does it look and behave? This helps give a target for any modernization projects and what may be needed to match those expectations. What may have changed since the data center first opened or underwent a renovation is the growth of density, redundancy and capacities of the power and cooling systems. Modern trends have shown this growth is not expected to slow down even if the pace may seem exaggerated in recent years. Also by picturing a modern data center and its new expectations gives a good gage of the staffing and O&M needs for the future.
Desire -> Goal -> Plan
Among the goals and standards might be a tier rating to be met and the details should include specifics in how those requirements are met for the data center, system by system. As those details are written down, this becomes the plan for how the data center will reach the desired modern state. More specifics can be broken out, such as security or fuel systems or controls, to be scrutinized for updates with experts and staff that have a leading role with those disciplines. Then with input from the needed contributors, managers, and stakeholders (and partners too), decisions can be made to set those standards as the new expectations once the data center has been modernized. This should also include if more IT equipment is to be added in existing spaces, what IT equipment might be relocated (or outsourced), and the partners involved.
Plan -> Benchmark
After setting the goals and standards of what the modernized data center is aiming to be, the evaluations begin. This should dive into the details of the systems, components, as well as software tools and O&M practices, and how the complex systems interconnect. Benchmarks of performance may already exist but should be scrutinized to make sure they are accurate. Investigations should dive into the physical infrastructure to verify the age, status, load, capacity, etc. and should go beyond past drawings and reports. The O&M team should be interviewed to capture procedures, training, and any outstanding ‘tribal’ knowledge not captured elsewhere.
Data center infrastructure management (DCIM) software models need to be checked with the actual assets to understand if this is a reliable depiction of the data center. As a data center is populated and over time gaps in a DCIM model might emerge that lead to incorrect assumptions that need to be corrected. An audit of those assets, readings, connections and behavior can lead to improvements in the data center as well as how much the DCIM model is trusted by the operators.
Benchmark -> Improvements
With the existing data center information captured, the gaps, issues and areas for improvement should be identified. With some more analysis and review, with the additional perspective of vendors and consulting engineers, the effort and costs to close those gaps and make beneficial improvements can be defined more clearly, along with options that may include incremental investments or changes. The time, investment, and labor to reach these new goals can start to form into the expectations that can be achieved. Along with this the data center standards can be revisited to bring goals that may seem out of reach closer to grasping.
As the improvements and schedules are being planned the equipment warranties, age, and performance history can be added to the evaluation to weigh in on whether the effort is worthwhile now or can be postponed. Keep in mind that age, maintenance and performance also gage the risk of downtime for a data center due to equipment failures. The current loads and capacities can also be a factor, but often this has been included in the initial goals and standards.
With a list of the possible actions and their related costs, they too should be evaluated and prioritized. Cost, risk, and simple payback can be the main drivers for the action plan, which is the last step before implementing the upgrades and replacements. This should also be reviewed one more time against the desired goals and standards of the data center to make sure they are still aligned.
Among the things discovered in the audit is gaps – those things not covered or owned by any discipline, group, or captured anywhere in the data center reviews. Those gaps should be addressed and the responsibility of each should be assigned, as they may each pose a risk to the data center efficiency and overall reliability. Like the other audit items, these unplanned gaps should be assessed against cost, time, and potential risk to ongoing operations.
It should be noted that consultants can assist with these evaluations, often with neutral yet forthright reporting on the potential gaps, issues, costs, and risks to the data center. They can provide high level executive summaries followed by detailed reports that can be tailored to specific goals and standards. This can benefit any data center team that might be opinionated on certain aspects that may need attention. Being independent, they can provide more accurate reports and also unbiased evaluations of the risks, costs, and timeline to make improvements.
Assessments -> Basic Enhancements
Through the steps of finding the benchmarks and assessments, there are likely items that are easier to address without much cost or schedule issues. They are the things that can be corrected quickly, and should not have much push back or shift the dominant paradigm of operations.
Common low-cost, quick improvement items include:
- Preventative maintenance on all equipment that is past due (or unknown);
- Removing unused components or out-of-date equipment;
- Rebalancing loads (A-B-C or A-B) and PDUs to ensure redundancy
- Air flow improvements such as blanking panels and containment seals
- Improving raised floor airflow and utilities stacking
- Updating as-built drawings
- Procedures are up-to-date and staff has latest training and full understanding for emergency operations
- Accurate lists and models of assets (DCIM) along with verifying dependencies
Even by following this simple list of improvements will lead to optimizing performance and reducing risk of the data center and its equipment having one unpredictable event after another. Operational costs will also be reduced as performance is improved, along with planned replacements that are typically more efficient than the predecessor. Along with this the data center infrastructure should keep continually improving to not only meet the original needs when it was first built but also to meet the new business requirements of the future. And making those updates and changes now will add up to major savings and reduced risks into the next decade.
Posted in: Controls, Cooling, Costs, DCIM, Energy, Maintenance, Reliability, Standards
Filed under: cost, maintenance, modern, operation, outage, risk, Standards