In the last post we postulated that business value is created by users running applications which provide value added transformations of data. Users need three things from applications:
- Availability
- Performance
- Integrity
Of course, applications depend on the entire IT infrastructure.
Let’s look at these in more detail.
Availability means that the user can use the application when they need to – that the application, data, and IT infrastructure work together to allow the user to generate business value.
Performance means that the application has an acceptable response time to user requests. “Acceptable response time” depends on the context. For a large simulation job, multiple hours may be entirely acceptable. For many interactive operations, instantaneous is acceptable. For example, when entering data into a form, there should be no delay in echoing user input and in moving from one field to the next. In general, performance needs to be at a level where the user is productive and not frustrated with the system.
Integrity can have multiple meanings. In fact, integrity deserves a post of its own.
I think this is what you’re looking for. This has been a standard requirement in the industry for decades.
https://en.wikipedia.org/wiki/Reliability,_availability_and_serviceability_(computer_hardware)
Yes, RAS is an important part of the story – I’m planning to address it in more depth in a future article.
One of the interesting developments over the last few years is to use scale out of commodity servers to provide resilience. This works well for things like web application servers, but not as well for traditional database servers (the new distributed databases like Mongo work well in scale out).
One of the interesting questions when using commodity hardware for resilience is how to determine when something is failing; a big feature of RAS is knowing when a failure has occurred and when the best approach is to crash!