Pandora: QuickGuides EN: Alert configuration

From Pandora FMS Wiki
Jump to: navigation, search

Go back to Quick Guides index

1 Pandora FMS Alert Configuration Quick Guide

1.1 Alert structure

Esquema-alert-structure.png

Alerts comprise:

  • Commands
  • Actions
  • Templates

The command defines the operation or the final action or executes when the alert is triggered. Examples of commands may be: note in a log, send an email or SMS, execute a script or program, etc.

An action relates a command with a template and permits customizable command executions via three generic parameters, Field1, Field2 and Field3. These parameters allow you to customize the execution of the command as they are the ones that will pass when executed as entry parameters.

In templates the conditions for triggering an alert are defined, and if there will also be a recovery action, and the default executable action.

  • Trigger conditions: are the conditions under which an alert is triggered, e.g. exceeding a threshold, registering critical status, etc. May be found in template
  • Triggered actions: are always associated with a command, and allow the execution of commands to be customized and to send arguments to the command via Field1, Field2, etc.
  • Alert recovery: configure actions to be carried out when the system is recovering from an alert and is back to normal status.

1.1.1 Data flow in the alerts system

When defining actions and templates there are some generic fields available (Field 1, Field 2, etc.) that will be the entry parameters in the command execution. The values of these parameters spread from the template to the action and finally to the command. The propagation of template to action only occurs if the corresponding action field has no assigned value. If the action has an assigned value it stays the same, maintaining hierarchy over the field which would inherit from the template if it were empty. E.g. 1: in Field1 and Field2 the templates contain content, but NOT in the action. The action inherits the content for its own Field1 and Field2 which become the command parameters. E.g. 2: Field1 and Field2 have content in the template, but ALSO in the action. The action won't inherit the content from Field1 or Field2 from the template, as it has its own content.

Esquema-parameters-carrying.png

This would be an example of how to overwrite the values of the template using those of the action:

Alertas esquema6.png

E.g. create a template to trigger an alert and send an email by using the following fields:

  • Template:
    • Field1: [email protected]
    • Field2: [Alert] The alert was fired
    • Field3: The alert was fired!!! SOS!!!

The values which would reach the command would be:

  • Command:
    • Field1: [email protected]
    • Field2: [Alert] The alert was fired
    • Field3: The alert was fired!!! SOS!!!

For fields 1 and 2 the defined values are maintained on the template, but Field1 will use the values defined in the action.

1.2 Defining an Alert

Now, suppose we have to monitor a module that has numerical values. In our case, it's a module that measures the system CPU. Let's first make sure that our module receives the data correctly:

Qgcpu1.png

In this screenshot, we see that we have a module called sys_cpu with a current value of 7. In our case, we want the system to fire an alert when the value becomes greater than 20. For this to occur we're going to configure the module such that it goes to CRITICAl status when it gets higher than 20. For that to happen, click on the adjustable wrench to configure the monitor performance:

Qgcpu2.png

Modify the value selected in red as shown on the following screenshot:

Threshold.JPG

Agree and save any changes. Now, when the CPU module value goes up to 20 or higher, it will change status to CRITICAL and it will be marked in red, as seen here.

Qgcpu4.png

The system knows how to recognize when something is right (OK, green) and when it is wrong (CRITICAL, red). Now, what we want to do is have Pandora FMS send us an email when the module changes to this status. To do so, we will use the Pandora FMS alert system.

The first thing to do is to make sure that there is at least one command that does what we need it to (send an email). This example is easy because it's a default command in Pandora FMS to send mails, but we can also create any command necessary, such as executing a script or opening a ticket on an external platform.

1.3 Configuring the Action

Now, we have to create an action called "Send an email to the operator". Let's do it: go to the menu -> Alerts -> Actions and click to create a new action:

Qgcpu5.png

This action uses the command "Send email" and it's really simple, you only need to fill out one field (Field 1) and leave the other two empty. This is one of the most confusing parts of the Pandora FMS alert system: What are the fields: field1, field2 and field3?.

These fields are used to "pass" the information from the alert template to the command, so both the Template and the Command can give different information to the command line. In this case, the command only uses field 1, and we leave field 2 and field 3 to the template, as we can see below.

Field 1 is the one we use to define the operator's email, in this case, to "[email protected]".

1.4 Configuring the Template (Alert template)

Now, we have to create an alert template, that should be as generic as possible, in order to use it later. For example, "This is wrong because I have a module in Critical status" and by default have it send an email to the operator. Let's go to the administration menu-> Alerts-> Templates and click on the button to create a new alert template:

Qgcpu6.png


The first step is to define certain aspects of the alert. In this case, it is programmed to "Critical status", although this has nothing to do with the "Critical" status of the module. Prioritizing alerts allows us to visualize them in determined colors in the event viewer, but at the functional level the working of our alert is not affected. Step 2 is where we find some of the most important parameters for the correct functioning of our alert:

Qgcpu7.png

The "Condition Type" field defines the conditions under which an alert is triggered, in this case it's shown as "Critical Status" so that this template, when it's associated with a module, will trigger when its associated module is in critical status. Previously, the "cpu_user" module must be configured in order to display critical status when it reaches a value of 20 or over.

Step 2, apart from defining the triggering conditions, defines the alert's time range, whether hourly or by days of the week. They limit the time of action on this alert to some specific days, during a specific time period.

The most critical parameters here are the following:

  • Time threshold: One day by default. If one module with a 5 minute value assigned to it is down for a day, then it means that it would be sending us an alert every 5 minutes. If we adjust it to one day (24 hours), it only sends the alert once, when it's triggered. If the module recovers and triggers an alert again, it simply re-sends the alert, but if the object remains down from the second fall, then the system won't send another alert until another 24 hours have passed.
  • Min. Number of alerts: the minimum number of times the condition must repeat itself (in this case, that the module would be in CRITICAL status) before Pandora FMS triggers the alert. This is a way to avoid false positives, or that an erratic performance (bouncing) cause many alerts to be fired. If we put 1 here, it means that until it happens at least once, the system won't take it into account. If we put 0, the first time the module is triggered, the alert will fire.
  • Max. Number of alerts: 1 means that it will execute the action only once. If we set it to 10, it'll execute the action 10 times. It's a way to limit the number of times an alert can be executed.

Now let's look at fields "field1, field2 and field3" again. Now we can see that field1 is blank, which is exactly what we defined when we configured the action. Field2 and field3 are used in the action of sending an email to define the message's subject and text, whereas field1 is used to define the recipients for said message (separated by commas). So the template, using some macros, defines the subject and the message alert so that in our case we receive a message like the one that follows (supposing that the agent where the module is, is called "Farscape"):

To: [email protected]
Subject: [PANDORA] Farscape cpu_sys is in CRITICAL status with value 20
email:
This is an automated alert generated by Pandora FMS
Please contact your Pandora FMS for more information. *DO NOT* reply to this email.

Given that the default action is the one we have defined previously, all the alerts that use this template will use this predefined action by default.

In the third case, we'll see that it's also possible to configure the alert system in order for it notify when the alert has stopped.

Qgcpu8.png Qgcpu8 1.png

You can use HTML or plain text in the body of the email alert Template.

It's almost the same, but field1 is not defined because the same one is used that was defined in the previously executed action (when the alert was fired). In this case it only sends an email with a subject that says that the condition in the cpu-syst module has been recovered.

Alert recovery is optional. It's important to say that if in the alert recovery data there are fields (field2 and field3) that are defined, these ignore and overwrite have priority. The only valid field that can't be modified is field1.

1.5 Associating the Alert to the Command

Now that we have all we need, we only have to associate the alert template to the module. To do this, go to the alert tab in the agent where the module is:

Qgcpu9.png

It's easy. In the following screenshot we can see an already configured alert for a module named "Last_Backup_Unixtime" on the same template that we have already defined as "critical Module". Now, in the controls below, we are going to create an association between the module "cpu-sys" and the alert template "Module critical". By default it shows the action that we've defined on this template "Send email to Sancho Lerena".

1.6 Scaling Alerts

The values that are placed under the "Number of alerts match from" parameter are to define the alert scaling. This allows "redefining" the alert performance a little more, so if we've defined a maximum of 5 times for an alert to be fired, and we only want it to send us an email, then we should insert a 0 and a 1, to command it to only send us an email from time 0 to 1 (meaning once).

Now we see that we can add more actions to the same alert, defining with the "Number of alerts match from" field the alert performance depending on how many times it should be fired.

For example: we want it to send an email to XXXXX the first time it happens, and if the monitor continues to be down, it sends an email to ZZZZ. To do that, after associating the alert, in the assigned alerts table, you can add more actions to a previously defined alert, as we can see in the following screenshot:

Qgcpu9.png

Qgcpu10.png

1.7 Standby alerts

Alerts can be enabled, disabled or in standby mode. The difference between the disabled and standby alerts is that the disabled alerts just won't work and therefore will not be shown on the alerts view. Standby alerts will be shown in the alerts view and will work, but only at a display level. It will show if they're triggered or not, but they will not engage in configured actions and will not generate events.

Standby alerts are useful as view-only without interfering in other ways.

1.8 Using Alert Commands different from the email

The email, as a command, is internal to Pandora FMS and can't be configured, which means that field1, field2 and field3 are fields that are defined and are used to define recipient, subject and text of the message. But, what happens if I want a different action that is user defined?

We're going to define a new command, something completely defined by us. Imagine that we want to create a log file with each alert that we find. The format of this log file should be something like:

DATE_ HOUR - NAME_AGENT - NAME_MODULE - VALUE - PROBLEM DESCRIPTION

Where VALUE is the value of the module at this moment. It'll generate several log files, depending on the action that calls to the command. The action will define the description and the file which the events go to.

To carry it out, first create a command as follows:

Qgcpu11.png

And define an action:

Qgcpu12.png

If we take a look at the log that we've created:


2010-05-25 18:17:10 - farscape - cpu_sys - 23.00 - Custom alert for LOG#1

The alert was triggered at 18:17:10 on the " farscape" agent, on the "cpu_sys" module, with a data of "23.00" and with the description that we chose when we defined the action.

Given that the command execution, the field order and other things could cause us to not understand well how the command is finally executed, the easiest thing is to activate the pandora server debug traces (verbose 10) available in the pandora server configuration file /etc/pandora/pandora_server.conf, and restart the server (/etc/init.d/pandora_server restart) after we take a look at the file /var/log/pandora/pandora_server.log looking for the exact line with the alert command execution that we've defined, to see how the Pandora FMS server is executing the command.