From Pandora FMS Wiki
- 1 Introduction
- 2 Introduction to Monitoring
Pandora FMS: What is it, exactly?
Pandora FMS is a network monitoring software package, intended for all types of environments. To use the word 'monitoring', in its broad semantic sense, is somehow risky - as there are hundreds of tools available - each one of them adapted to a singular type of environment: monitoring a couple of printers in a small office isn't the same as monitoring thousands of interfaces and switches with extremely high network traffic in a data center with thousands of servers.
Pandora FMS is designed to adapt to every role and to every organization. Its main aim is to be flexible enough to manage and control the complete infrastructure, without the need to invest more time or money in another monitoring tool.
FMS is an acronym for Flexible Monitoring System. Its purpose is to be able to monitor both complex new generation tools and systems with outdated elements of difficult access and scarce compatibility -- all on one platform.
Pandora FMS currently uses agents for every ‘modern’ operating system on the market from Windows NT4, to Windows 2012. Not forgetting all the modern Unices (AIX, Solaris, HPUX, BSD, Linux) in every version and in all of its distributions.
Pandora FMS can, of course, be used successfully not only as a systems monitoring tool, but with all sorts of network devices, be it by using SNMP (versions 1,2,3) or via TCP protocol probes (snmp, ftp, dns, http, https, etc), ICMP or UDP.
About the Documentation
All of this power and flexibility comes with an implicit difficulty at setup stages. In spite of Pandora’s mostly graphical configuration, we are aware that learning how to use it seems complicated at first. That is why we have divided the 800 pages of the User’s Guide into several chapters:
- Chapter I. Understanding Pandora FMS.
- Chapter II. Installation and Configuration.
- Chapter III. Monitoring with Pandora FMS.
- Chapter IV. Operating and Managing Pandora FMS.
- Chapter V. Complex Environments and Best Performance.
- Chapter VI & VII. References and Technical Appendices.
Besides the official documentation, you can access the user’s forum at http://openideas.info/smf where you can post queries in English, Spanish and Japanese to other users. If you require official training there is an official training program  taught by the developers of Pandora.
We have compiled some quick reference guides to assist you in the configuration of Pandora FMS and to implement simple monitoring tasks with Pandora’s tool. You can also avail yourself of quick reference manuals for the installation of software agents, like Windows and Linux. Short videos are also available to help you through some of the more technical parts of the configuration and if necessary, you could participate in our regularly scheduled workshops. More detailed information on all of the above can be found on our website at http://pandorafms.com
The Evolution of Pandora as a Project
Pandora was created by Sancho Lerena in 2003. Since then, it has gradually evolved to become the resilient, innovative and flexible monitoring tool we offer today.
Originally written in 100% open source code, it passed the years of experimentation and growth and, after strong demand for the product from large companies and corporations, we felt compelled to launch the Enterprise version. This version offers some specific characteristics designed for conditions which require the processing of large volumes of information while properly operating with thousands of devices.
The company financing and coordinating all the back up work on Pandora FMS's development is Artica Soluciones Tecnologicas, a Spanish company, founded in 2005. The open source version is, nonetheless, fully operational and functional as a production tool, and companies who do not require professional support, or which are very well staffed, get by well with the Open Source version.
Pandora FMS can be found to this day among Sourceforge’s top rated, with thousands of downloads and satisfied users all over the world. For more information on Pandora FMS's evolution and to see a road map of the project, please visit http://pandorafms.com
A Quick Glance at the Features of Pandora FMS
- Autodiscovery. On a local network, Pandora’s plug-in agents permit hard disk, partition, and database detection in the Pandora server, among many other features by default.
- Autoexploration. By using the web-based interface of Pandora FMS, we can detect active systems, and catalog them according to the target's operating system. By applying a profile, Pandora is able to commence monitoring the discovered targets. It can even detect the topology of the network and create a web-based map based on route distribution.
- Monitoring. The Agents of Pandora FMS are the most powerful on the market. They are capable of obtaining information - from the execution of a command to the call, at its most basic level- on the Windows API: Events, logs, numerical data, process stages, memory and CPU consumption. Pandora makes use of a default monitors’ library, but one of the greatest advantages of Pandora is the ability to quickly add, edit and create new monitors.
- Remote access. The agents themselves can activate services, delete temporary files or execute processes. Commands can also be executed remotely from the console, like stopping or starting services. Furthermore, it's possible to program tasks that require periodical execution. It's also possible to use Pandora FMS as the launch-point to access Windows machines remotely (via VNC), to access web or Unix systems through Telnet, or SSH from the Pandora web interface.
- Alerts and Notifications. Notifications are just as important as failure detection. Pandora FMS gives you an almost infinite variety of notification methods and formats. This includes, but is not limited to escalation, correlation of alerts and prevention and mitigation of cascading events.
- Analysis and Visualization. Monitoring is not just receiving a trap or visualizing a failing service. Within the Pandora environment, monitoring is also a method to present forecast reports, correlated summary charts of long term gathered data, and to generate user portals, delegate reports to third parties or to define its own charts and tables. Pandora incorporates all of these tools within a Web interface.
- Inventory Creation. Contrary to other solutions where the idea of CMDB is just an afterthought, to Pandora it is an active option. The inventory is flexible and dynamic (it can auto-discover, accepts remote input, etc.) It can notify observers of changes (e.g. uninstalled software) or simply be used to make listings.
Introduction to Monitoring
Right from the start, every technical manual for a software package will tell you about configuration, text files, databases, protocols, etc. We very often learn to configure at low levels while remaining ignorant of the full potential of the software under discussion - what can be done with it and in which situations. The purpose of this section is to explain the theory behind monitoring in a concise but systematic way, regardless of the software used for this purpose.
Types of Monitoring
When we wonder about the condition of a target item that we'd like to monitor, be it a server, a data base, a web element, or a refrigerator, we can ask ourselves the following questions:
- How do we obtain the information from the target(s)? Do we have something in place to make this happen, or do we need to install infrastructure (software or hardware)?
- Are we interested in having to constantly ask the target's status or to wait for the target to tell us something has happened?
- What sort of information does the target give me? Is it something I can measure in a graphical way and observe its progression?
All of these questions answer the three key points that shape the essence of our monitoring model. The first question dictates whether we are going to use an agent-based monitor to be executed inside the device we are controlling or, on the contrary, if our monitoring will be done externally, by employing an internet connection. There are monitoring systems that operate one way or the other, and devices that can only be monitored via either model. Pandora FMS supports both models.
The second question concerns whether the monitoring is synchronous (every X number of seconds it asks itself, regardless of any information changes taking place or not) or asynchronous (it only receives information when something relevant has taken place). If I am using synchronous monitoring with 10 million elements, collecting data at 5 minute intervals, the load will be considerable, but if I do it every 50 minutes instead, it will be much more manageable, the down side of the second option being that if something takes place in between, it can take 50 minutes before I realize it. If I use asynchronous monitoring (e.g. with SNMP traps or logs) I can save many processing resources, but I will not be able to draw graphics or create historical graphs, except those directly related to the incidents that occurred. Many tools are based solely on one of the models, sometimes known as 'performance' or 'capacity' tools, and there are other tools based on events management. They are not often exchangeable in their functions. Pandora FMS supports both approaches.
The third question refers to what we are looking for in a given moment in time. The result can be a text chain (a descriptive event) a floating point number (to be able to draw graphics) or simply a status (down, up). Being able to work with different kinds of data allows more flexibility. Pandora FMS supports all types of data.
These three "paradigms" condition the monitoring environment greatly, and dictate the appropriate tool chosen to monitor it. Acknowledge the type of information needed and the best approach to obtain it. Plan around the available information elements and on how to monitor them.
If we speak of remote monitoring, we mean to say that Pandora FMS’s server probes, ('polling') in a synchronous way, the devices it intends to monitor. When we speak of Remote Monitoring, we aren't referring to the 'local' monitoring, based on agents installed on the devices we wish to observe.
Generally speaking, when we monitor remotely, we do it with two different purposes:
- To make sure they are 'alive' (e.g. interface, or active system)
- To obtain a numerical value (e.g. to measure the web traffic or the number of active connections)
Synchronous monitoring is always conducted in the same direction: From the monitoring server to the monitored element (target).
We may also be interested in the opposite process: receiving a notification when an incident occurs. This is called asynchronous monitoring, and in case of remote monitoring, we usually refer to it as SNMP traps.
Synchronous monitoring is usually done by using the SNMP protocol, which is the most widely used in methodology for observing and collecting status-related information. WMI, a similar protocol owned by Microsoft, is an alternative method of observing and collecting status-related information.
Basically, both protocols work in a similar fashion, which is as follows: A server sends a request for a particular configuration element of the ‘SNMP agent’ or ‘WMI service’ available in the target device. This particular element is called OID, in SNMP and in WMI it can be identified by a WQL query. The request could be for the free available memory, the router’s number of connections or the traffic in a given interface - or a wide variety of other reportable information.
If the monitoring is mainly based on internet environments it is important to know SNMP in detail, as it will be the monitoring tool's most widely used function. The asynchronous monitoring through SNMP is also vital. Together with a monitoring tool, you'll need an external explorer of SNMP devices, access to the MIBS collections from the makers of your target devices (which are like OID’S libraries) and, of course, a lot of patience to investigate, given that each device usually has its own collection of OID’s but, among the thousands that each device has, you'll only be interested in some of those elements.
If you are monitoring Windows servers and you're not interested in installing agents on the machines, WMI remote monitoring can be very powerful and well suited. The WMI interface is even more potent (and better organized than SNMPs). With WMI, you'll be able to obtain practically any data, status or event on your Windows servers.
Unix and Windows systems can also utilize SNMP, but the information returned is limited. Further, you'll need to activate and configure the SNMP agents of the operating system, which can be much more complex than simply installing a Pandora FMS monitoring agent.
Finally, you can always monitor networked elements through the use of TCP or ICMP tests. ICMP is mainly used for two purposes:
- To verify if a system responds (ping)
- To find out the latency time of that device (in milliseconds)
Through TCP tests, it is possible to test if a web server responds properly, or if a mail server (SMTP) sends the mail properly and in a timely fashion. These types of tests are not intended to just get the server to 'open the port' but also to get it to 'communicate’: that is, the sending mail command receives an OK to confirm its functionality or the answer from the web server is ‘200 OK’ (a valid reply in the HTTP protocol).
By default, Pandora FMS supports a series of plugins for TCP testing, but it can easily implement its own tests by adapting its own scripts or developing new ones. Integration with Pandora FMS does not require an API, complex structures or proprietary libraries.
Given the importance of the topic, Web Transaction Monitoring and remote monitoring receive a separate chapter.
Local Monitoring (by Agents)
When it concerns systems and applications, the best way to obtain information is definitely from the target system. This is done by executing commands, or querying the system data sources from the same engine we want to monitor. This means we have to execute a command or script, or to investigate the system or the application. To that end, we use Pandora’s monitoring agent, a specific software modeule to take care of those small monitoring tasks.
The agents can only be installed on Unix and Windows operating systems. An agent can not be installed in a cisco device, for example. According to the nomenclature used by Pandora FMS, we use ‘agent’ to refer to the entity containing the information and ‘software agent’ as the part of that software installed in that system to extract information and report to Pandora FMS’ server. The software agent executes, constantly, on the system (as a service) and reports information periodically.
The agents allow you to do more than obtain information through commands, for example to obtain inventory information. Agents can also be configured to react in case of a problem or a failure, interacting automatically with the system, deleting a temporary file or executing a given command.
To obtain 'precise' and specific information that we may be interested in, we will often have to refer to the manuals of the application we want to monitor, because even when we employ ‘generic’ monitors, what we are looking for may not be so trivial.
Under Windows, there's an almost infinite variety of access to the information: WMI, performance counters, event logs, system logs, registry, commands, powershell scripts, API (by Windows NT) etc. In fact, Microsoft’s architecture is one of the easiest, more powerful and better documented, when it comes to obtaining the information from the system. In Unix / Linux systems the capability of the software agent to execute any command, allows us to benefit from the full power of the shell.
The Monitoring Procedure
What do you really want to monitor, and why? Have you given it some thought? Once you have obtained data from your servers, such as when they fall, or how much they consume, have you thought of what you're going to do? You may wish to ask yourself: What's the most critical one? What's my response plan? You'll save valuable time that would otherwise have been spent investigating issues that aren't going to be useful in your day-to-day work.
Please dedicate five minutes to answering some questions. In your case, what do you think describes your monitoring needs better ?
- To avoid losses -> Availability.
- To analyze degradations -> Performance.
- To evaluate growth -> Capacity planning.
For each of those answers, the focus of your monitoring solution will be different in certain aspects.
Availability You are mostly interested in event-based monitoring and remote monitoring will probably be enough for your needs; it's faster to deploy and will give you relatively quick results. You are after SLA reports.
Performance Its strength is graphics and numbers, collecting information through agents or remotely, even though you will probably require agents to get in-depth information on their systems. Group reports and combined graphics are your primary interest.
Capacity Planning Much more specific. The monitor needs to obtain data, as in the second instance, but to parse and manipulate the data, with predictive monitors and very specialized projective reports. Establishing early alerts will be of great help and you're required to have good knowledge of the WARNING and CRITICAL status meanings, besides elaborating serial event management policies to prevent the problem before it happens, which is -without a doubt- the most complex and interesting case.
Once you know which model you will follow, you are left to wonder what to do when the system tells you the service is down, or worse, what will happen if the server's capacity reaches its limit next Friday?
You need to think in action procedures.
We call action procedures something that can't be achieved by any tool (so far) which is basically thinking and planning how to notify the intended observer of incidents. In order to do that you'll need to consider several factors:
- Event Urgency. You are able to discriminate between something unusual and something critical.
- Notification Format. E-mail, SMS or, why not, a mild shock to the operator to stop him from falling asleep (we are yet to implement it, but it would't be difficult...)
- Scalability. Notify someone first to get the incident resolved - and if not, then a second person is informed and if the problem persists, a third becomes involved. Maybe supervisory personnel need to be notified?
Ideally, before any configuring is done, you should have these concepts in mind. Even better, gather some patience and a virtual design tool (Visio, OpenDraw, for example) and draw your monitoring target critical elements and paint with arrows how the information is obtained and who'll be notified or what will be done in response to that information.
By focusing on the most critical issues first, you reach a logical starting point that defines what the most important issues for your organization are. Once you know what the most critical elements are, you can define how to monitor the target(s), consider who will be responsible for the resolution of the reported problems in those systems and how to notify the appropriate people of the existence of a problem.
By supervision models, we are stating that a monitoring system is designed to report information to an automated system, but this is watched by a human being in a direct or indirect way. This person often receives the title operator, which is the person who looks at the screen or otherwise receives the events, be it by a smartphone device or similar, by e-mail or logs registered with another tool. The "how" doesn't matter, the important thing is the fact that someone is minding the system.
On the other hand, there are certain people we name system administrators in general or infrastructure personnel, those who, when something happens, receive a call from the operator saying: "Hey, we have got a problem here," or a direct notification sent automatically by the system, warning them of an event, which is frequently sent as an SMS or an email.
Here we can already see the differences:
- The direct supervision model implies a person or several people, constantly watching the system, so if something critical occurs it will be detected immediately. The monitoring package can usually notice small, non-critical changes, and has much greater flexibility in how it reports this information. It is not necessary to define 'notifications' (alerts under Pandora) for each possible case, but it's enough to examine the events (some sort of visual indicator to detect status changes) to have an idea of ‘what is cooking’ in the system at any given time. It is possible to define many screens and also to define alerts to support that supervision. This model is used in large environments, given that it does not matter how much we define an alert policy.
- The indirect supervision model implies that there is no one permanently looking at the screen, so it is necessary to define, beforehand, the automatic notifications (alerts) that the system is going to have; given that the events, graphics and maps aren't going to be observed by anyone. This system is suitable when we have few devices, or when we have very closely identified what's critical and how to confront the problem (solution and notification).
For teamwork that involves operators, administrators and third level personnel, Pandora FMS provides meaningful tools like: ticketing of events, incident creation, scaling of notifications, internal mail, notice board and chat among the users of Pandora FMS.
And what Now ?
The following chapters are exclusively dedicated to Pandora FMS. Up to this point, we have been discussing general matters which were probably important for you to know before we continue to explore Pandora FMS. You probably know many of these things already. You may have used other monitoring programs. You may have heard, perhaps, that this or that application is always monitored in a certain way because it's the best way possible.
Maybe, but from our experience, each client does things a certain way and regardless of how much we know about monitoring, I doubt we know more about how your infrastructure was configured than you do. Monitoring easy tasks presents no problems, the hard job is to adapt the monitoring to your business without having to adapt your business to the monitoring. Not a trivial task. More than 800 pages await, if you wish to discover the best way to monitor your organization with Pandora FMS. It is a challenge, but one we believe is well worth the effort.