Suppose you are called into a meeting and asked to help with some quick planning estimates for a new IT system that is being proposed in response to an important and urgent White House initiative.
The system will be used to collect and analyze massive amounts of real-time sensor data.
Your boss says “we need to be able to manage up to one million transactions per second. We expect transactions to be a 50-50 mix of reads and writes, where each record consists of a unique 23-byte key and 120-bytes of data.”
“The system must operate with high availability – essentially fail-safe, non-stop.”
“The system must guarantee that each transaction is processed reliably; that ACID properties (atomicity, consistency, isolation, and durability) are met.”
“So what’s your ballpark estimate for the cost of the system software and hardware that we need to meet these performance requirements?”
Without skipping a beat, you look her in the eye and say “about $25K capital cost if we host it ourselves, or about $4.25 per hour to host it on a public cloud.”
The room gets real quiet.
The boss gives you a hard look. This is serious business. “Explain,” she says.
You explain. You tell her that this kind of requirement is one that was faced, and met, years ago by Internet companies who needed to manage astonishing volumes of transactions; for Facebook and Twitter updates, for Internet auctions; for managing cookies; for managing advertising click-through tracking. Many of the solutions that were developed to meet these high transaction rate requirements were placed into the public domain, as open-source solutions.
These products are often referred to as “NoSQL” databases, because they are not based on a general purpose relational database model, but rather are highly optimized for key-value data management; effectively single tables.
In fact, your estimate is based on the use of a product from a young Silicon Valley company, Aerospike, Inc. Their products (Aerospike Database software and Aerospike Management Console) are available as open-source (as well as a fully supported and enhanced Enterprise Edition with license fees based on the total volume of unique data that is managed).
Aerospike recently had their database performance benchmarked, and achieved one million transactions per second (tps) on a single, $5,000 server.
They ran the same benchmark on an Amazon cloud server, and achieved the same 1 million tps on a server that cost $1.68 per hour for an on-demand instance. (For a reserved instance, the cost per hour falls in the range of $1.02 per hour to about $0.67 per hour based on one to three-year terms).
So you actually provided a conservative estimate.
The $25K would give you a five node cluster, capable not only of handling several times more than the one million tps required, but also extremely high availability as you could actually lose four of the five clustered servers and remain operational. Similarly, the $4.25 per hour for the cloud assumes the use of five clustered servers – slightly smaller than the single cloud server that was benchmarked, but still enough to provide more than double the required tps throughput, and resilient enough to meet the one million tps requirement even if three of the five cloud servers fail.
Your boss nods thoughtfully. “Excellent,” she says.
Jim Tyson – Word to the Wise
Jim Tyson www.linkedin.com/in/jimtyson1/ is an IT Senior Executive with over 30+ years of experience. He has a passion for human nature and Information Technology – working to understand the relationships between both to create productive environments.
Please share your comments and thoughts below and tweet them to@JimT_SMDI.