This is pointless. Proactive operating systems can detect problems before they become destructive, and can make adjustments without human intervention.
For example, a transaction monitoring tool, such as the AIops tool, sees that the storage system produces periodic I / O errors, which means that the storage system is likely to suffer serious damage soon. The data is automatically transferred to another storage system using pre-defined self-recovery processes and the system is turned off and marked for maintenance. There is no stay.
These types of proactive processes and automations happen thousands of times an hour, and the only way to know they’re working is to avoid interruptions caused by failures in cloud services, applications, networks, or databases. We know everything. We all see. We track data over time. We fix problems before they turn into business interruptions.
It’s great to have this technology to reduce downtime to almost zero. However, like anything, there are good and bad aspects to consider.
Traditional reactive operational technology is just that: it responds to failure and launches a chain of events, including messages from people, to correct problems. In the event of a failure when something stops working, we quickly understand the root cause and correct it, either by an automated process or by sending someone.
The disadvantage of jet operations is the downtime. We usually do not know that there is a problem until we have a complete failure – this is only part of the reactive process. We don’t usually monitor details about a resource or service, such as I / O storage. We focus only on the binary file: does it work or not?
I’m not in favor of cloud-based downtime, so jet operations seem like something to be avoided in favor of proactive operations. However, in many of the cases I see, even if you have purchased a proactive operation tool, the monitoring systems of that tool may not be able to see the details needed for proactive automation.
The main cloud services for hyperscaler (storage, computing, database, artificial intelligence, etc.) can monitor these systems in a subtle way, such as current I / O usage, CPU saturation, etc. Many of the other technologies you use on cloud-based platforms may have only primitive APIs in their internal operations and can only tell you when they work and when they don’t. As you may have guessed, proactive operational tools, no matter how good, will not do much for these cloud resources and services.
I find that more of these types of systems work in public clouds than you think. We spend a lot of money on proactive operations without being able to monitor internal systems, which will give us indications that resources are likely to fail.
In addition, public cloud resources, such as large storage systems or computing systems, are already monitored and managed by the provider. You do not control the resources provided to you in a multi-lease architecture, and cloud providers do a very good job of providing proactive operations on your behalf. They see hardware and software problems long before you and are in a much better position to fix things before you even know they have a problem. Even with a shared responsibility model for cloud resources, providers are committed to making sure the services are up and running.
Proactive operations are the right way – don’t get me wrong. The problem is that in many cases, companies are making huge investments in proactive clouds with little ability to use it. Just saying.
Copyright © 2022 IDG Communications, Inc.
https://www.infoworld.com/article/3661431/understand-the-trade-offs-with-reactive-and-proactive-cloudops.html#tk.rss_all