SNMP Troubles on an Alcatel 7750
Posted: 19 Jun 2014 16:25
All,
First post here, but having an odd issue and hoping that someone can help me out here. We gather usage statistics every 5 minutes on all of our network gear via SNMP. We have a lot of Alcatel 7750's: half of them poll correctly, whereas the other half don't. SNMP works against all of them so we know it's not a bad credential issue. The issue is that on the half that don't work, we see the following : At the 5 minute mark exactly as session counts build ( up to about 6000 ), the Alcatel 7750 stops responding to SNMP and almost acts if its queuing our SNMP request. Our agents will experience timeouts unless we increase our timeouts to almost 60 seconds. As a matter of fact, no polls to the 'broken' boxes are honored unless we increase our timeouts. On the 'good' boxes we never see this happen. We can poll them with the defaults all through 5 minute interval.
Our 'good' 7750's are actually busier and have higher CPU loads than the 'broken' ones.
We've not been able to pin down any difference code wise, software wise, etc. We can poll other device types (junipers, cisco's, etc. ) on the same network segment as the Alcatel's during the time that the 'broken' alcatel's don't respond. ping also responds back 100% of the time while our snmp is timing out.
We've tried polling from multiple network segments.
We've tried tracing the polls ( we see that the Alcatel looks like it gets the snmp packet 50s after we send it on the 'broken' routers but then sends it back immediately ).
We've had alcatel on the phone ( but they've not really given us any clear direction - other than to insist on the fact that the alcatel is not queuing up the request ). They've asked us get on a system and trace the snmp call through the last hop before it gets to the Alcatel ( but due to our network configuration we can't do that ).
Any other things to try? We're completely at a loss.
VARIOUS OUTPUTS LISTED BELOW :
BROKEN : SNMPv2-MIB::sysDescr.0 = STRING: TiMOS-C-10.0.R4 cpm/hops ALCATEL SR 7750 Copyright (c) 2000-2012 Alcatel-Lucent.
GOOD : SNMPv2-MIB::sysDescr.0 = STRING: TiMOS-C-10.0.R4 cpm/hops ALCATEL SR 7750 Copyright (c) 2000-2012 Alcatel-Lucent.
*Broken router*
A:BAD 7750# show snmp counters
==============================================================================
SNMP counters:
==============================================================================
in packets : 97160430
------------------------------------------------------------------------------
in gets : 756371
in getnexts : 2338
in getbulks : 96389555
in sets : 11952
out packets: 97160216
------------------------------------------------------------------------------
out get responses : 97160216
out traps : 0
variables requested: -1085720426
variables set : 21294
*Working router*
*A: GOOD 7750 # show snmp counters
==============================================================================
SNMP counters:
==============================================================================
in packets : 96646877
------------------------------------------------------------------------------
in gets : 2929839
in getnexts : 765
in getbulks : 93694878
in sets : 19687
out packets: 96645179
------------------------------------------------------------------------------
out get responses : 96645169
out traps : 0
variables requested: 947820440
variables set : 38794
==============================================================================
First post here, but having an odd issue and hoping that someone can help me out here. We gather usage statistics every 5 minutes on all of our network gear via SNMP. We have a lot of Alcatel 7750's: half of them poll correctly, whereas the other half don't. SNMP works against all of them so we know it's not a bad credential issue. The issue is that on the half that don't work, we see the following : At the 5 minute mark exactly as session counts build ( up to about 6000 ), the Alcatel 7750 stops responding to SNMP and almost acts if its queuing our SNMP request. Our agents will experience timeouts unless we increase our timeouts to almost 60 seconds. As a matter of fact, no polls to the 'broken' boxes are honored unless we increase our timeouts. On the 'good' boxes we never see this happen. We can poll them with the defaults all through 5 minute interval.
Our 'good' 7750's are actually busier and have higher CPU loads than the 'broken' ones.
We've not been able to pin down any difference code wise, software wise, etc. We can poll other device types (junipers, cisco's, etc. ) on the same network segment as the Alcatel's during the time that the 'broken' alcatel's don't respond. ping also responds back 100% of the time while our snmp is timing out.
We've tried polling from multiple network segments.
We've tried tracing the polls ( we see that the Alcatel looks like it gets the snmp packet 50s after we send it on the 'broken' routers but then sends it back immediately ).
We've had alcatel on the phone ( but they've not really given us any clear direction - other than to insist on the fact that the alcatel is not queuing up the request ). They've asked us get on a system and trace the snmp call through the last hop before it gets to the Alcatel ( but due to our network configuration we can't do that ).
Any other things to try? We're completely at a loss.
VARIOUS OUTPUTS LISTED BELOW :
BROKEN : SNMPv2-MIB::sysDescr.0 = STRING: TiMOS-C-10.0.R4 cpm/hops ALCATEL SR 7750 Copyright (c) 2000-2012 Alcatel-Lucent.
GOOD : SNMPv2-MIB::sysDescr.0 = STRING: TiMOS-C-10.0.R4 cpm/hops ALCATEL SR 7750 Copyright (c) 2000-2012 Alcatel-Lucent.
*Broken router*
A:BAD 7750# show snmp counters
==============================================================================
SNMP counters:
==============================================================================
in packets : 97160430
------------------------------------------------------------------------------
in gets : 756371
in getnexts : 2338
in getbulks : 96389555
in sets : 11952
out packets: 97160216
------------------------------------------------------------------------------
out get responses : 97160216
out traps : 0
variables requested: -1085720426
variables set : 21294
*Working router*
*A: GOOD 7750 # show snmp counters
==============================================================================
SNMP counters:
==============================================================================
in packets : 96646877
------------------------------------------------------------------------------
in gets : 2929839
in getnexts : 765
in getbulks : 93694878
in sets : 19687
out packets: 96645179
------------------------------------------------------------------------------
out get responses : 96645169
out traps : 0
variables requested: 947820440
variables set : 38794
==============================================================================