跳转至

Nagios 集成

Nagios 是电脑系统和网络监控程序,用于检测主机和服务,当异常发生和解除时能提醒用户;是基于 GPLv2 开发的开源软件,可免费获得及使用。 Nagios 原名 NetSaint,由 Ethan Galstad 开发并维护至今。


Nagios 集成 CA 步骤

  1. 睿象云Cloud Alert界面创建 nagios 应用,并获取 appkey

  1. 在 nagios server 端安装 CA 探针

  2. 下载 CA 探针

    wget https://download.aiops.com/ca_agent/nagios/ca_agent-4.1.3.1-linux-x64.tar.gz
    
    # 请使用root或nagios用户下载
    
  3. 安装 Agent

    注意!下文以 nagios 默认安装路径 /usr/local/nagios/为例,如果你的 nagios 服务器不是安装在该目录,请自行替换。

    tar -xzf ca_agent-4.1.3.1-linux-x64.tar.gz
    cp -R ca_agent /usr/local/nagios/libexec/
    cp ca_agent/plugin/nagios-plugin/nagios /usr/local/nagios/libexec/
    chmod +x /usr/local/nagios/libexec/nagios
    cp ca-agent/plugin/nagios-plugin/cloudalert.cfg /usr/local/nagios/etc/objects/
    
  4. 修改配置

    修改 /usr/local/nagios/etc/objects/cloudalert.cfg,设置 pager 为 CA 应用 appkey。

    vi /usr/local/nagios/etc/objects/cloudalert.cfg
    
    define contact{
    contact_name                    cloudalert                 ; The name of      this contact template
    alias                           ca                 ;
    service_notification_period     24x7                    ; service notifications can be sent anytime
    host_notification_period        24x7                    ; host notifications can be sent anytime
    service_notification_options    w,u,c,r,f,s             ; send notifications for all service states, flapping events, and scheduled downtime events
    host_notification_options       d,u,r,f,s               ; send notifications for all host states, flapping events, and scheduled downtime events
    service_notification_commands   notify-service-by-cloudalert ; send service notifications via email
    host_notification_commands      notify-host-by-cloudalert    ; send host notifications via email
    pager --  --处填入您新建应用时生成的appkey  ;
    }
    
    # 'notify-host-by-cloudalert' command definition
    define command{
    command_name notify-host-by-cloudalert
              command_line $USER1$/nagios
              "alarmName:$NOTIFICATIONTYPE$
              Host Alert: $HOSTADDRESS$ is $HOSTSTATE$"
              "alarmContent:$HOSTNAME$/$HOSTADDRESS$ $HOSTOUTPUT$
              Date/Time: $SHORTDATETIME$"
              "entityName:$HOSTADDRESS$"
              "priority:$HOSTSTATE$"
              "app:$CONTACTPAGER$"
              "eventType:$NOTIFICATIONTYPE$"
    }
    
    # 'notify-service-by-cloudalert' command definition
    define command{
    command_name notify-service-by-cloudalert
              command_line $USER1$/nagios
              "alarmName:$NOTIFICATIONTYPE$
              Service Alert: $HOSTADDRESS$/$SERVICEDESC$ is $SERVICESTATE$"
              "alarmContent:$HOSTALIAS$/$HOSTADDRESS$/$SERVICEDESC$ $SERVICEOUTPUT$
              Date/Time: $SHORTDATETIME$"
              "entityName:$HOSTADDRESS$/$SERVICEDESC$"
              "priority:$SERVICESTATE$"
              "app:$CONTACTPAGER$"
              "eventType:$NOTIFICATIONTYPE$"
    }
    

    修改 /usr/local/nagios/etc/objects/contacts.cfg,新增 cloudalert 到默认联系组

    vi /usr/local/nagios/etc/objects/contacts.cfg
    
    define contactgroup{
    contactgroup_name       admins
    alias                   Nagios Administrators
    members                 nagiosadmin,cloudalert
    }
    

    修改 /usr/local/nagios/etc/nagios.cfg,将 cloudalert.cfg 新增到 nagios.cfg

    vi /usr/local/nagios/etc/nagios.cfg
    
    cfg_file=/usr/local/nagios/etc/objects/cloudalert.cfg
    

    可选:为了让告警信息显示更友好,建议修改 nagios.cfg 由原先 us 更改为 iso8601

    vi /usr/local/nagios/etc/nagios.cfg
    
  5. 重启 nagios

    重启前检查下配置是不是正确.

    /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    

    请使用 root 账号重启 Nagios

    service nagios restart
    

集成结果验证

登录 Nagios 页面控制台发送通知

Warning

请确认对应服务的notifications_enabled为1。

define service{
use                             local-service         ; Name of service template to use
host_name                       localhost
service_description             Tomcat18080
check_command                   check_http18080
notifications_enabled           1
}

查看 agent 日志,出现 sucess 字样代表成功,如果发送的告警通知,则会同步发送微信、移动 app、短信、邮件

tail -f /usr/local/nagios/libexec/ca_agent/log/agent.log

正常返回 success 即表示成功

10-05-2015 15:48:53,056 CST INFO  [main] [com.upyoo.agent.NagiosClient@45] start to call alert ...
10-05-2015 15:48:53,063 CST INFO  [main] [com.upyoo.agent.CommandClient@82] alarmName:PROBLEM Service Alert: 127.0.0.1/Tomcat18080 is CRITICAL

10-05-2015 15:48:53,064 CST INFO  [main] [com.upyoo.agent.CommandClient@82] alarmContent:localhost/127.0.0.1/Tomcat18080 connect to address 127.0.0.1 and port 18080: Connection refused Date/Time: 2015-05-10 15:48:52

10-05-2015 15:48:53,064 CST INFO  [main] [com.upyoo.agent.CommandClient@82] entityName:127.0.0.1/Tomcat18080

10-05-2015 15:48:53,066 CST INFO  [main] [com.upyoo.agent.CommandClient@82] priority:CRITICAL
10-05-2015 15:48:53,066 CST INFO  [main] [com.upyoo.agent.CommandClient@82] app:9c4bc722-6677-9fc9-fbdc-003d8977d17e

10-05-2015 15:48:53,067 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,068 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,068 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,069 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,105 CST INFO  [main] [com.upyoo.agent.CommandClient@58] start to post url:http://api.aiops.com/alert/api/event

10-05-2015 15:48:53,180 CST INFO  [main] [com.upyoo.agent.CommandClient@65] body:{"app":"9c4bc722-6677-9fc9-fbdc-003d8977d17e","alarmContent":"localhost/127.0.0.1/Tomcat18080 connect to address 127.0.0.1 and port 18080: Connection refused Date/Time: 2015-05-10 15:48:52","eventId":"8G8OGOYUCOOLOENYOGGENOOOOONYNOLU","priority":"3","alarmName":"PROBLEM Service Alert: 127.0.0.1/Tomcat18080 is CRITICAL","eventType":"trigger","entityName":"127.0.0.1/Tomcat18080"}

10-05-2015 15:48:53,775 CST INFO  [main] [com.upyoo.agent.CommandClient@68] result:{"result":"success","message":null,"data":"3516","totalCount":0,"code":"200"}

集成后收不到告警排错方法

若在 nagios 新触发测试告警后,CA 平台无法看到告警,请点击 告警->所有告警,确认是否有告警:

  1. 若有,则说明告警已成功发送到 CA 平台,需要您在 配置->分派策略 菜单下添加分派策略;

  2. 若无,则说明告警未成功发送到 CA 平台,排错方法如下:

进入 nagios.log 文件,可以看到 CA 探针的 log 信息,确保 nagios 给 cloudalert 这个成员发送了告警信息,使用以下命令进行告警测试

./nagios app:"--"  --处填入您新建应用时生成的appkey  eventType:trigger eventId:1234 alarmName:"hello"
  1. 如果返回成功,证明部署已经成功,在此时若是收不到告警消息,判断为 nagios 环境的问题;
  2. 如果返回失败,请做出如下调整,再次进行测试。

    !!! example 1. ca_agent 目录的权限设置成 nagios:nagios 2. 和 ca_agent 目录同级目录下有一个 nagios 脚本,权限设置成 nagios 3. ca_agent 目录下的 bin 目录和 jre 下的 bin 目录权限设置成 777

3.

 排查cloudalert.cfg中的command路径
 vi /usr/local/nagios/etc/objects/cloudalert.cfg
 检查cloudalert.cfg 中
 # 'notify-host-by-cloudalert' command definition和
 # 'notify-service-by-cloudalert' command definition下的
 command_line $USER1$/nagios
 把$USER1$/nagios 这部分替换成nagios的绝对路径,比如/usr/local/ngaiox/libexec/nagios

Nagios 与 CA 告警级别映射关系

睿象云 nagios(alerts.status)
致命 down
严重 critical
警告 Warning
提醒 Unknown
通知 --
睿象云 Nagios
事件ID (eventId) alerts.incident_key

以上是Nagios 告警设置中集成的步骤。