跳转至

Nagios集成

Nagios是电脑系统和网络监控程序,用于检测主机和服务,当异常发生和解除时能提醒用户;是基于GPLv2开发的开源软件,可免费获得及使用。 Nagios原名NetSaint,由Ethan Galstad开发并维护至今。


Nagios集成CA步骤

  1. 睿象云Cloud Alert界面创建nagios应用,并获取 appkey

  2. 在nagios server端安装CA探针

    1. 下载CA探针

      wget https://download.aiops.com/ca_agent/nagios/ca_agent-4.1.3.1-linux-x64.tar.gz 
      
      # 请使用root或nagios用户下载
      
    2. 安装Agent

      注意!下文以nagios默认安装路径 /usr/local/nagios/为例,如果你的nagios服务器不是安装在该目录,请自行替换。

      tar -xzf ca_agent-4.1.3.1-linux-x64.tar.gz
      cp -R ca_agent /usr/local/nagios/libexec/
      cp ca_agent/plugin/nagios-plugin/nagios /usr/local/nagios/libexec/
      chmod +x /usr/local/nagios/libexec/nagios
      cp ca-agent/plugin/nagios-plugin/cloudalert.cfg /usr/local/nagios/etc/objects/
      
    3. 修改配置

      修改 /usr/local/nagios/etc/objects/cloudalert.cfg,设置 pager 为CA应用appkey。

      vi /usr/local/nagios/etc/objects/cloudalert.cfg 
      
      define contact{
      contact_name                    cloudalert                 ; The name of        this contact template
      alias                           ca                 ;
      service_notification_period     24x7                    ; service notifications can be sent anytime
      host_notification_period        24x7                    ; host notifications can be sent anytime
      service_notification_options    w,u,c,r,f,s             ; send notifications for all service states, flapping events, and scheduled downtime events
      host_notification_options       d,u,r,f,s               ; send notifications for all host states, flapping events, and scheduled downtime events
      service_notification_commands   notify-service-by-cloudalert ; send service notifications via email
      host_notification_commands      notify-host-by-cloudalert    ; send host notifications via email
      pager --  --处填入您新建应用时生成的appkey  ; 
      }
      

      修改 /usr/local/nagios/etc/objects/contacts.cfg,新增cloudalert到默认联系组

      vi /usr/local/nagios/etc/objects/contacts.cfg
      
      define contactgroup{
      contactgroup_name       admins
      alias                   Nagios Administrators
      members                 nagiosadmin,cloudalert
      }
      

      修改 /usr/local/nagios/etc/nagios.cfg,将 cloudalert.cfg 新增到 nagios.cfg

      vi /usr/local/nagios/etc/nagios.cfg
      
      cfg_file=/usr/local/nagios/etc/objects/cloudalert.cfg
      

      可选:为了让告警信息显示更友好,建议修改 nagios.cfg 由原先 us 更改为 iso8601

      vi /usr/local/nagios/etc/nagios.cfg
      
    4. 重启nagios

      重启前检查下配置是不是正确.

      /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
      

      请使用root账号重启Nagios

      service nagios restart
      

集成结果验证

登录Nagios页面控制台发送通知

Warning

请确认对应服务的notifications_enabled为1。

define service{
use                             local-service         ; Name of service template to use
host_name                       localhost
service_description             Tomcat18080
check_command                   check_http18080
notifications_enabled           1
}

查看agent日志,出现sucess字样代表成功,如果发送的告警通知,则会同步发送微信、移动app、短信、邮件

tail -f /usr/local/nagios/libexec/ca_agent/log/agent.log 

正常返回success即表示成功

10-05-2015 15:48:53,056 CST INFO  [main] [com.upyoo.agent.NagiosClient@45] start to call alert ...
10-05-2015 15:48:53,063 CST INFO  [main] [com.upyoo.agent.CommandClient@82] alarmName:PROBLEM Service Alert: 127.0.0.1/Tomcat18080 is CRITICAL

10-05-2015 15:48:53,064 CST INFO  [main] [com.upyoo.agent.CommandClient@82] alarmContent:localhost/127.0.0.1/Tomcat18080 connect to address 127.0.0.1 and port 18080: Connection refused Date/Time: 2015-05-10 15:48:52

10-05-2015 15:48:53,064 CST INFO  [main] [com.upyoo.agent.CommandClient@82] entityName:127.0.0.1/Tomcat18080

10-05-2015 15:48:53,066 CST INFO  [main] [com.upyoo.agent.CommandClient@82] priority:CRITICAL
10-05-2015 15:48:53,066 CST INFO  [main] [com.upyoo.agent.CommandClient@82] app:9c4bc722-6677-9fc9-fbdc-003d8977d17e

10-05-2015 15:48:53,067 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,068 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,068 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,069 CST INFO  [main] [com.upyoo.agent.CommandClient@82]
10-05-2015 15:48:53,105 CST INFO  [main] [com.upyoo.agent.CommandClient@58] start to post url:http://api.aiops.com/alert/api/event

10-05-2015 15:48:53,180 CST INFO  [main] [com.upyoo.agent.CommandClient@65] body:{"app":"9c4bc722-6677-9fc9-fbdc-003d8977d17e","alarmContent":"localhost/127.0.0.1/Tomcat18080 connect to address 127.0.0.1 and port 18080: Connection refused Date/Time: 2015-05-10 15:48:52","eventId":"8G8OGOYUCOOLOENYOGGENOOOOONYNOLU","priority":"3","alarmName":"PROBLEM Service Alert: 127.0.0.1/Tomcat18080 is CRITICAL","eventType":"trigger","entityName":"127.0.0.1/Tomcat18080"}

10-05-2015 15:48:53,775 CST INFO  [main] [com.upyoo.agent.CommandClient@68] result:{"result":"success","message":null,"data":"3516","totalCount":0,"code":"200"} 

集成后收不到告警排错方法

若在nagios新触发测试告警后,CA平台无法看到告警,请点击 告警->所有告警,确认是否有告警:

  1. 若有,则说明告警已成功发送到 CA平台,需要您在 配置->分派策略 菜单下添加分派策略;

  2. 若无,则说明告警未成功发送到 CA平台,排错方法如下:

    进入 nagios.log 文件,可以看到 CA探针的 log 信息,确保 nagios 给 cloudalert 这个成员发送了告警信息,使用以下命令进行告警测试

    ./nagios app:"--"  --处填入您新建应用时生成的appkey  eventType:trigger eventId:1234 alarmName:"hello"
    
    1. 如果返回成功,证明部署已经成功,在此时若是收不到告警消息,判断为nagios环境的问题;
    2. 如果返回失败,请做出如下调整,再次进行测试。

      Example

      1. ca_agent目录的权限设置成nagios:nagios
      2. 和ca_agent目录同级目录下有一个nagios脚本,权限设置成nagios
      3. ca_agent目录下的bin目录和jre下的bin目录权限设置成777

Nagios与CA告警级别映射关系

Nagios 级别状态 参数值 Cloud Alert 级别状态
down&critical 3 严重
unknown 2 警告
warning 1 提醒

以上是Nagios告警设置中集成的步骤。