System Manageability

The greatest need for improving the manageability of Linux systems is to provide a standard programming interface – an API – for system management functions.

The API should be a low-level interface that provides the needed control over managed systems. It should also support a higher level abstraction, making it easy for system administrators to use it for routine tasks.

The API should support scripting and CLI use. Studies at Red Hat have shown that roughly 75% of Linux system administrators use CLI and scripts. They manage remote systems by logging in through SSH and running the CLI and scripts locally.

This API should provide both a remote interface and a local interface. It should support managing remote systems without having to directly login to each system. The API should support multiple language bindings, allowing it to be easily used from the common tools used for system management.

The API should be supported by a formal object model. Many of the idiosyncrasies of managing Linux systems are due to the organic growth of independent subsystems over time – this makes it difficult to design and provide a consistent interface.

Finally, any system management interface should be built on top of existing Linux tools and capabilities. Any attempt to build complete new subsystems from scratch is doomed by the sheer size of the task – and this doesn’t even touch the challenges of getting new tools accepted by the upstream community and users!

From a real-world perspective, note that Linux subsystems like the I/O subsystem contain tremendous amounts of “institutional knowledge” – these systems are complex because they are solving complex problems on a wide range of real world hardware. In theory, it should be straightforward to replace the I/O subsystem with a clean design. In practice, by the time a new I/O subsystem was able to do what the current I/O subsystem can do, it would be roughly as complex and ugly.

Customer Challenges

Customer feedback indicates that the greatest immediate need is to manage the physical configuration of systems, with emphasis on the ability to configure storage – especially local storage – and networks – on production servers.

A “production server” can be loosely defined as a remote system (you don’t login through a local, physical, keyboard/mouse/monitor), with no graphics (no X-windows or desktop environment installed), with multiple drives and NICs. A production server will often be configured with 4-8 NICs and may have several dozen local drives as well as network storage.

In a Nutshell

The system manageability challenge can be summarized as:

Provide a standardized remote API to configure, manage and monitor production servers.
- Physical servers
- Virtual machines
Supporting CLI and scripting.
Providing language bindings for the main languages used for system management.
Built on top of existing Linux subsystems.
That Linux System Administrators can – and will – use.

About Russell Doty

A technology strategist and product manager at Red Hat, working on the next generation of open source systems.

View all posts by Russell Doty →

This entry was posted in System Management. Bookmark the permalink.

4 Responses to System Manageability

oxtan says:

June 21, 2013 at 8:29 pm

we already have that: cfengine, puppet, chef, ansible, …

Is this some kind of NIH syndrome at Redhat?

- Dennis Jacobfeuerborn says:
  
  June 22, 2013 at 2:52 pm
  
  These are high level orchestration tools and still rely on the low level management interfaces of the systems they manage.
  
Dennis Jacobfeuerborn says:

June 22, 2013 at 3:07 pm

One thing I’d like to see is the adoption of a model similar to what the new LIO based targetcli is doing for iscsi. Here you have a structured configuration file but also a shell you can use to manipulate that configuration file in a controlled way.

Also I’d like to see a more dynamic type of configuration similar to what e.g. network switches are doing right now. There I modify settings of interfaces and these changes are performed immediately but not actually saved. You have to explicitly copy the configuration from a “running config” to the “startup config”. On Juniper systems you even get a transactional mechanism where you perform all the changes to the config but they are not actually performed until you do a “commit”. Before you commit the changes you can even do a diff that shows all the new values that will be applied compared to the currently used values.

For example I’d like to add an IP to a network interface I can only do it live using “ip address add” but that will not be reflected in the config file or I can add it to the config file and then have to essentially restart the interface which is very disruptive. What I’d rather like to do is a high level call “netiface add address ipv4 …” that adds the ip dynamically to the interface without restarting it but also adds it to the config file so the live config and the config I would get after a reboot are always in sync.

Russ Doty says:

September 13, 2013 at 6:35 pm

Dennis, you raise some good points. We are building OpenLMI mainly as an interface layer on top of underlying tools. In the case of networking, we are building on top of Network Manager. For storage we are building on top of the Blivet storage library, which was originally developed for Anaconda.
Thus, we have access to the capabilities of the underlying tools. To add new capabilities they are first added to the underlying tool, and then to OpenLMI.
The dynamic vs. permanent changes idea is interesting.
We will need to take a close look at LIO when we start working on iscsi support.