Perhaps your agency has an existing data center, and you are getting inquiries about doing some AI research, so you contemplate using the existing infrastructure. That would be good, right? Well, not quite. As you will quickly discover, there are better systems than your CPU-based infrastructure that are specifically designed for AI/machine learning research. Unfortunately, those systems, aside from being expensive, will require you to consider several other things.

To start the discussion, AI/ML research needs parallel processing capabilities — graphic processing units (GPUs), not the more familiar central processing units, or CPUs. High-end CPUs can be used for small AI/ML tasks such as cleaning or preparing data, but they’re designed for sequential tasks, where a GPU is designed for parallel tasks, that is, performing multiple calculations at the same time. As a result, a CPU could take 100 hours to complete a task, where a GPU could conceivably complete it in 5 hours.
Depending on the configuration, the number of systems under consideration, and the state of your data center, creating your own on-premises AI/ML data center can range from expensive to outrageously expensive. Your chief financial officer may question your sanity.
So, for AI research, GPUs are the way to go. We already indicated they are expensive. What else do you need to know? Well, they are hot, need lots of power and are heavy! Seriously. A heavy-duty AI/ML rack can currently be configured to hold four of the systems. That rack can weigh from one to two tons. That is per rack. Unless your raised floor is exceptionally strong, you may have another item to consider — reinforcing your data center structure.
Well, now we know they are expensive and heavy. If your budget (and floor) can handle the load, then it is time to check your data center’s power supply. A typical CPU-based computer cabinet may utilize 10kW to perhaps 25kW, depending on the equipment. The systems used for AI/ML need a lot more power. A fully configured high-density AI/ML rack can consume 120kW to 135kW of power. A huge difference! And since, as we mentioned, they are hot you must determine how to keep them cool. Most of them need more than the traditional air-cooling. Liquid cooling — whether direct-to-chip, rear-door heat exchangers, immersion cooling, or other technology —currently appears to be the best solution. But none of these is easy to install in an existing structure.
Meeting these requirements starts with a fair assessment of your data center. Validate your current power usage and the effort (cost) to increase it. How much depends on the number and type (brand) of systems? As an example, using 50 fully populated AI racks needing 125kW of power, we can calculate needing 6.25mW of power. But don’t forget the infrastructure overhead needs. Factor that as an additional 0.1 to 0.3 resulting in 8.125mW at the high end. Depending on the systems, additional separation space may be needed in the data center as well. There are many other factors to consider (e.g., networking, etc.) that we cannot delve into here. Hopefully though, this may provide you with a starting discussion point to consider using your data center for AI/ML research needs.
Dan Kempton is the Sr. IT Advisor at North Carolina Department of Information Technology. An accomplished IT executive with over 35 years of experience, Dan has worked nearly equally in the private sector, including startups and mid-to-large scale companies, and the public sector. His Bachelor’s and Master’s degrees in Computer Science fuel his curiosity about adopting and incorporating technology to reach business goals. His experience spans various technical areas including system architecture and applications. He has served on multiple technology advisory boards, ANSI committees, and he is currently an Adjunct Professor at the Industrial & Systems Engineering school at NC State University. He reports directly to the CIO for North Carolina, providing technical insight and guidance on how emerging technologies could address the state’s challenges.



Leave a Reply
You must be logged in to post a comment.