Ad Widget

Collapse

Zabbix unreachable poller processes more than 90% busy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • wyang
    Member
    • Mar 2016
    • 93

    #1

    Zabbix unreachable poller processes more than 90% busy

    We have a single Zabbix server deployment. On our Zabbix server, the alarm "Zabbix unreachable poller processes more than 75% busy" had been triggered and cleared frequently. This issue has been occurred after we changed the threshold to 90%. The average of the past 3 days is 82%.

    Found a bug report that "unreachable pollers may "hang" when are doing ipmi checks", we changed the StartIPMIPollers=0 on zabbix_server.conf and restarted zabbix server. However, the issue still occurs.

    Zabbix server version 2.4.8
    Ubuntu 14.04 LTS
    Bare metal server: CPU 12 core, RAM 64GB, Disk HDD 2TB

    MySQL
    max_connections = 1024

    Zabbix server
    StartPollers=150
    StartIPMIPollers=0
    StartPollersUnreachable=100
    StartPingers=10
    StartJavaPollers=2
    StartVMwareCollectors=3
    CacheSize=1G
    StartDBSyncers=25
    HistoryCacheSize=512M
    TrendCacheSize=512M
    HistoryTextCacheSize=512M
    ValueCacheSize=1G
    Timeout=20

    Any idea where I'd start to troubleshoot what's causing the issue? Thanks in advance.
  • batchenr
    Senior Member
    • Sep 2016
    • 440

    #2
    check this :

    Comment

    • wyang
      Member
      • Mar 2016
      • 93

      #3
      Thanks very much for your help.

      The root cause is that some discovered interfaces do not exist any more. The issue has been resolved by setting "Keep lost resources period" to be 0 and redo discovery.

      Comment

      • batchenr
        Senior Member
        • Sep 2016
        • 440

        #4
        Originally posted by wyang
        Thanks very much for your help.

        The root cause is that some discovered interfaces do not exist any more. The issue has been resolved by setting "Keep lost resources period" to be 0 and redo discovery.
        you right! i had issue with the pollers and this thred i gave you didnt helped me but what you said did!
        from 100% to 4%

        how did you find out about it? cool!

        Comment

        • wyang
          Member
          • Mar 2016
          • 93

          #5
          The debugging procedure

          By looking into /var/log/zabbix/zabbix_server.log

          Error message: Timeout while connecting to SNMP agents
          With reference to https://forums.manageengine.com/topi...gets-timed-out, this kind of issue may be caused by
          a. community string
          b. listening port
          c. SNMP agent slow response

          In my case, from what I could tell at that time, the only possibility is the SNMP agent slow response. I increased the polling interval for item prototypes on the discovery rule. While redoing discovery, it is found that the discovered items significantly decreased from 1600 to 1000 on an host.

          At this time, non-existing objects on hosts caused the issue.

          In conclusion, on traditional switches/routers, it is fair to set "Keep lost resources period" to be 7 (default). On software defined networking hosts where interfaces change rapidly, it'd be better to set "Keep lost resources period" to be 0.

          Comment

          • Gknives
            Junior Member
            • Aug 2017
            • 2

            #6
            Hello guys! I've been stuck on the same problem for days (I'm totally new on this). Would you explain step by step what did you do to solve it? Please add some screenshots if You can, I will appreciate it so much!!

            Comment

            • wyang
              Member
              • Mar 2016
              • 93

              #7
              E.g., to configure the setting on template 'Template SNMP Device'. Click on 'Discovery rules', then click on 'Template SNMP Interfaces: Network interfaces', the discovery rule window will open, on which you could set "Keep lost resources period" to be 0.

              Hopefully the screenshot is attached.
              Attached Files

              Comment

              • Rudlafik
                Senior Member
                • Nov 2018
                • 144

                #8
                Hi, I resolve this case there too: DISABLE RED HOSTS

                Comment

                Working...