In my network, two aggregate Switch(OmmiSwitch 9000) working in Load Balancing Mode.
But recently, one of the switch always in high CPU Utilization. The CPU jump to 1000% Utilization nearly every 80 seconds, last about 20 seconds, and then down to normal.
Then I check the process in dShell Mode, I Found that the root cause is the "pthr11" process:
Working: [Kernel]->?
NAME ENTRY TID PRI total % (ticks) delta % (ticks)
-------- -------- ----- --- --------------- ---------------
tExcTask excTask fe8bf40 0 0% ( 3730) 0% ( 1)
tNetTask netTask f7fcaa0 50 2% ( 66898) 1% (
ipct ipctLoop dee6000 50 2% ( 83225) 1% (
bcmLINK.0 _bcm_esw_l db85000 50 0% ( 8407) 0% ( 1)
EsmDrv esmDrv bf8cdc0 80 0% ( 28100) 0% ( 3)
SwLogging swLogTask fbe0000 100 0% ( 3844) 0% ( 2)
PortMgr pmMain dd44f40 100 0% ( 6964) 0% ( 2)
VlanMgr vmcControl dd1ee80 100 2% ( 82929) 0% ( 3)
SNMPagt snmp_task dbb9e20 100 0% ( 16784) 0% ( 3)
SNMP GTW snmp_udp_g dbaf710 100 0% ( 1593) 0% ( 1)
SrcLrn slCmmMain bf78e20 100 0% ( 4145) 0% ( 1)
Ipedr ipedrMain ae2bee0 100 0% ( 5486) 0% ( 1)
tIpedrNiD ipedrNiDMa a260e20 100 0% ( 13160) 0% ( 4)
pthr11 9432000 100 80% ( 2446073) 18% ( 130)
UdpRly udpRlyMain 942a530 100 1% ( 46826) 0% ( 5)
Ipmem ipmem_main 9007e80 100 0% ( 1854) 0% ( 1)
EthOAM main_ethoa 8fe9e80 100 2% ( 80312) 0% ( 3)
Vrrp vrrpMainTa 8b9a400 100 0% ( 28814) 0% ( 4)
tTelnetIn0 cmmtelnetI 8463e80 100 0% ( 59) 0% ( 2)
KERNEL 0% ( 47725) 0% ( 5)
INTERRUPT 0% ( 37160) 0% ( 3)
IDLE 11% ( 8054316) 73% ( 514)
TOTAL 100% (71994298) 93% ( 697)
Working: [Kernel]->i
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
......
pthr1 302c34 96b1000 100 PEND+T 33daa8 96b0b20 3d0002 20
pthr2 302c34 a0e8000 100 PEND 33daa8 a0e7ec0 0 0
pthr3 302c34 a0e0000 100 PEND 33daa8 a0dfec0 0 0
pthr4 302c34 a0d8000 100 PEND 33daa8 a0d7ec0 0 0
pthr5 302c34 a0d0000 100 PEND 33daa8 a0cfec0 0 0
pthr6 302c34 a0c8000 100 PEND 33daa8 a0c7ec0 0 0
pthr7 302c34 a0c0000 100 PEND 33daa8 a0bfec0 0 0
pthr8 302c34 944a000 100 PEND 33daa8 9449ec0 0 0
pthr9 302c34 9442000 100 PEND 33daa8 9441ec0 0 0
pthr10 302c34 943a000 100 DELAY 344384 9439ea8 0 90
pthr11 302c34 9432000 100 DELAY 344384 9431e78 0 2545
In this Switch, I fount 11 pthr process, and pthr11 process alway cost lots of CPU resource, but in other Aggregrate OmmiSwitch 9000,each only 9 pthr process.
Every time The pthr11 process change in REDY status, the cpu utilization begin jump high, about 20 seconds, it change to DELAY status, and CPU Utilization down normal, This occurs every 80 seconds.
It seems that the pthr11 process is always tring to restart but always failed. But I do not know any information about this process, and find no solution in google. I hope you can help me .
All of my Aggregate Switch are OmmiSwitch 9000, and their microcode is the same:
R2-5# show microcode
Package Release Size Description
-----------------+---------------+--------+-----------------------------------
Jbase.img 6.4.3.907.R01 22207046 Alcatel-Lucent Base Software
Jadvrout.img 6.4.3.907.R01 2877749 Alcatel-Lucent Advanced Routing
Jos.img 6.4.3.907.R01 2223832 Alcatel-Lucent OS
Jeni.img 6.4.3.907.R01 6567144 Alcatel-Lucent NI software
Jsecu.img 6.4.3.907.R01 589033 Alcatel-Lucent Security Management
Jencrypt.img 6.4.3.907.R01 3437 Alcatel-Lucent Encryption Management
Thank you very much.
