OS-6850 Stack ---> CPU heavy loaded !
-
- Member
- Posts: 9
- Joined: 25 Aug 2008 06:09
OS-6850 Stack ---> CPU heavy loaded !
Hello everybody,
at first sorry for my bad english, I hope you´ll understand my problem anyway
at second - here ist my Problem.
Last Wekkend I´ve Installed a Omniswitch 6850 Stack (4x 6850-48), which is replacing an old AOS7700 System. There is configured just a basic set of commands (some Ports, some VLAns - nothing complicated) and this monday was the first day, the Switch Stack has to work as our brandnew Server-Backbone-Switch. The switche worked fine til 11:15 a.m., but then the first users called our helpdesk and told, that some serves won´t be available. I startet a telnet Session to the 6850 stack and the response was very slow. "Show health" command gave me the answer: CPU > 80% usage. I thought "mh, primary cmm still frozen?" and I startet a "takeover". After 30 Sekonds the CPU load was between 19-25 % - all problems gone !
45 Minutes later, the same poblem - the cpu was heavy under fire. Takeover, an 60 seconds after that the switche worked fine again. So - has everbody a good idea for any reason? The logging is emty (only the takeover command is logged, nothing else before). I´ve allready told the switch the command "swlog appid HEALTH level debug3" to get more logging informations about the cpu load - but there are no logging events.
Here some Infos about the system:
Uboot Version : 6.1.3.601.R01
Miniboot Version : 6.1.3.601.R01
Config:
! Stack Manager :
! Chassis :
system name xxx
system contact xxx
system location "xxx"
system timezone MET
! Configuration:
! VLAN :
ethernet-service mode vstk
vlan 1 enable name "VLAN 1"
vlan 2 enable name "VLAN 2"
vlan 2 port default 1/1
vlan 2 port default 1/2
vlan 2 port default 1/3
vlan 2 port default 1/4
vlan 2 port default 1/5
vlan 2 port default 1/7
vlan 2 port default 1/8
vlan 2 port default 1/9
vlan 2 port default 1/10
vlan 2 port default 1/11
vlan 2 port default 1/12
vlan 2 port default 1/13
vlan 2 port default 1/14
vlan 2 port default 1/18
vlan 2 port default 1/19
vlan 2 port default 1/20
vlan 2 port default 1/21
vlan 2 port default 1/32
vlan 2 port default 1/33
vlan 2 port default 1/34
vlan 2 port default 1/35
vlan 2 port default 1/36
vlan 2 port default 1/37
vlan 2 port default 1/40
vlan 2 port default 1/41
vlan 2 port default 1/42
vlan 2 port default 1/43
vlan 2 port default 1/44
vlan 2 port default 1/45
vlan 2 port default 1/46
vlan 2 port default 1/47
vlan 2 port default 1/48
vlan 2 port default 2/1
vlan 2 port default 2/2
vlan 2 port default 2/3
vlan 2 port default 2/4
vlan 2 port default 2/5
vlan 2 port default 2/7
vlan 2 port default 2/8
vlan 2 port default 2/9
vlan 2 port default 2/10
vlan 2 port default 2/11
vlan 2 port default 2/12
vlan 2 port default 2/13
vlan 2 port default 2/14
vlan 2 port default 2/18
vlan 2 port default 2/19
vlan 2 port default 2/20
vlan 2 port default 2/21
vlan 2 port default 2/32
vlan 2 port default 2/33
vlan 2 port default 2/34
vlan 2 port default 2/35
vlan 2 port default 2/36
vlan 2 port default 2/37
vlan 2 port default 2/40
vlan 2 port default 2/41
vlan 2 port default 2/42
vlan 2 port default 2/43
vlan 2 port default 2/44
vlan 2 port default 2/45
vlan 2 port default 2/46
vlan 2 port default 2/47
vlan 2 port default 2/48
vlan 2 port default 3/1
vlan 2 port default 3/2
vlan 2 port default 3/3
vlan 2 port default 3/4
vlan 2 port default 3/5
vlan 2 port default 3/6
vlan 2 port default 3/7
vlan 2 port default 3/8
vlan 2 port default 3/9
vlan 2 port default 3/10
vlan 2 port default 3/11
vlan 2 port default 3/12
vlan 2 port default 3/13
vlan 2 port default 3/14
vlan 2 port default 3/15
vlan 2 port default 3/16
vlan 2 port default 3/17
vlan 2 port default 3/18
vlan 2 port default 3/19
vlan 2 port default 3/20
vlan 2 port default 4/1
vlan 2 port default 4/2
vlan 2 port default 4/3
vlan 2 port default 4/4
vlan 2 port default 4/5
vlan 2 port default 4/6
vlan 2 port default 4/7
vlan 2 port default 4/8
vlan 2 port default 4/9
vlan 2 port default 4/10
vlan 2 port default 4/11
vlan 2 port default 4/12
vlan 2 port default 4/13
vlan 2 port default 4/14
vlan 2 port default 4/15
vlan 2 port default 4/16
vlan 2 port default 4/17
vlan 2 port default 4/18
vlan 2 port default 4/19
vlan 2 port default 4/20
vlan 3 enable name "VLAN 3"
vlan 3 port default 3/47
vlan 3 port default 3/48
vlan 3 port default 4/47
vlan 3 port default 4/48
! VLAN SL:
! IP :
ip service all
ip interface "2) xxx LAN" address xxx mask 255.255.0.0 vlan 2 ifindex
1
ip interface "3) TRANSIT - FW" address xxx mask 255.255.255.224 vlan
3 ifindex 2
ip interface "1) Server LAN RZ" address xxx mask 255.255.255.0 vlan 1
findex 3
! IPX :
! IPMS :
! AAA :
aaa authentication default "local"
aaa authentication console "local"
! PARTM :
! AVLAN :
! 802.1x :
! QOS :
! Policy manager :
! Session manager :
session timeout cli 60
session timeout http 60
session prompt default "xxx"
! SNMP :
snmp security authentication set
snmp authentication trap enable
snmp community map "xxx" user "xxx" on
snmp station xxx 162 "xxx" v1 enable
snmp station xxx 162 "xxx" v1 enable
! RIP :
! OSPF :
! ISIS :
! IPv6 :
! IP multicast :
ip static-route 0.0.0.0/0 gateway xxx metric 1
ip static-route 192.168.60.151/32 gateway xxx metric 1
ip static-route 192.168.70.161/32 gateway xxx metric 1
ip static-route 192.168.80.171/32 gateway xxx metric 1
ip static-route 192.168.90.181/32 gateway xxx metric 1
ip static-route 192.168.100.101/32 gateway xxx metric 1
ip static-route 192.168.120.141/32 gateway xxx metric 1
ip static-route 192.168.130.131/32 gateway xxx metric 1
ip static-route 192.168.190.191/32 gateway xxx metric 1
ip static-route 192.168.200.201/32 gateway xxx metric 1
ip static-route 192.168.210.212/32 gateway xxx metric 1
ip static-route 192.168.220.221/32 gateway xxx metric 1
ip static-route 192.168.230.231/32 gateway xxx metric 1
ip static-route 192.168.240.241/32 gateway xxx metric 1
! RIPng :
! OSPF3 :
! BGP :
! Health monitor :
! Interface :
! Udld :
! Netsec :
! Port Mapping :
! Link Aggregate :
! VLAN AGG:
! 802.1Q :
! Spanning tree :
bridge mode 1x1
! Bridging :
! Bridging :
! Port mirroring :
! UDP Relay :
ip helper address 128.11.64.111
ip helper address 128.11.160.59
ip helper address 172.30.0.1
ip helper address 172.30.0.16
ip helper address 172.30.0.19
ip helper pxe-support enable
! Server load balance :
! System service :
ip name-server xxx
ip domain-name xxx
ip domain-lookup
swlog appid HEALTH level debug3
! SSH :
! VRRP :
! Web :
ip http ssl
! AMAP :
! LLDP :
! Lan Power :
! NTP :
! RDP :
! VLAN STACKING:
! Ethernet-OAM :
->
I would be happy, if I got some hints from you. I tought, that the OS6850 does the main L2&l3 forwarding in ASICS, so the CPU can´t be so heavily under load as seen.
Kind regards!
René
at first sorry for my bad english, I hope you´ll understand my problem anyway
at second - here ist my Problem.
Last Wekkend I´ve Installed a Omniswitch 6850 Stack (4x 6850-48), which is replacing an old AOS7700 System. There is configured just a basic set of commands (some Ports, some VLAns - nothing complicated) and this monday was the first day, the Switch Stack has to work as our brandnew Server-Backbone-Switch. The switche worked fine til 11:15 a.m., but then the first users called our helpdesk and told, that some serves won´t be available. I startet a telnet Session to the 6850 stack and the response was very slow. "Show health" command gave me the answer: CPU > 80% usage. I thought "mh, primary cmm still frozen?" and I startet a "takeover". After 30 Sekonds the CPU load was between 19-25 % - all problems gone !
45 Minutes later, the same poblem - the cpu was heavy under fire. Takeover, an 60 seconds after that the switche worked fine again. So - has everbody a good idea for any reason? The logging is emty (only the takeover command is logged, nothing else before). I´ve allready told the switch the command "swlog appid HEALTH level debug3" to get more logging informations about the cpu load - but there are no logging events.
Here some Infos about the system:
Uboot Version : 6.1.3.601.R01
Miniboot Version : 6.1.3.601.R01
Config:
! Stack Manager :
! Chassis :
system name xxx
system contact xxx
system location "xxx"
system timezone MET
! Configuration:
! VLAN :
ethernet-service mode vstk
vlan 1 enable name "VLAN 1"
vlan 2 enable name "VLAN 2"
vlan 2 port default 1/1
vlan 2 port default 1/2
vlan 2 port default 1/3
vlan 2 port default 1/4
vlan 2 port default 1/5
vlan 2 port default 1/7
vlan 2 port default 1/8
vlan 2 port default 1/9
vlan 2 port default 1/10
vlan 2 port default 1/11
vlan 2 port default 1/12
vlan 2 port default 1/13
vlan 2 port default 1/14
vlan 2 port default 1/18
vlan 2 port default 1/19
vlan 2 port default 1/20
vlan 2 port default 1/21
vlan 2 port default 1/32
vlan 2 port default 1/33
vlan 2 port default 1/34
vlan 2 port default 1/35
vlan 2 port default 1/36
vlan 2 port default 1/37
vlan 2 port default 1/40
vlan 2 port default 1/41
vlan 2 port default 1/42
vlan 2 port default 1/43
vlan 2 port default 1/44
vlan 2 port default 1/45
vlan 2 port default 1/46
vlan 2 port default 1/47
vlan 2 port default 1/48
vlan 2 port default 2/1
vlan 2 port default 2/2
vlan 2 port default 2/3
vlan 2 port default 2/4
vlan 2 port default 2/5
vlan 2 port default 2/7
vlan 2 port default 2/8
vlan 2 port default 2/9
vlan 2 port default 2/10
vlan 2 port default 2/11
vlan 2 port default 2/12
vlan 2 port default 2/13
vlan 2 port default 2/14
vlan 2 port default 2/18
vlan 2 port default 2/19
vlan 2 port default 2/20
vlan 2 port default 2/21
vlan 2 port default 2/32
vlan 2 port default 2/33
vlan 2 port default 2/34
vlan 2 port default 2/35
vlan 2 port default 2/36
vlan 2 port default 2/37
vlan 2 port default 2/40
vlan 2 port default 2/41
vlan 2 port default 2/42
vlan 2 port default 2/43
vlan 2 port default 2/44
vlan 2 port default 2/45
vlan 2 port default 2/46
vlan 2 port default 2/47
vlan 2 port default 2/48
vlan 2 port default 3/1
vlan 2 port default 3/2
vlan 2 port default 3/3
vlan 2 port default 3/4
vlan 2 port default 3/5
vlan 2 port default 3/6
vlan 2 port default 3/7
vlan 2 port default 3/8
vlan 2 port default 3/9
vlan 2 port default 3/10
vlan 2 port default 3/11
vlan 2 port default 3/12
vlan 2 port default 3/13
vlan 2 port default 3/14
vlan 2 port default 3/15
vlan 2 port default 3/16
vlan 2 port default 3/17
vlan 2 port default 3/18
vlan 2 port default 3/19
vlan 2 port default 3/20
vlan 2 port default 4/1
vlan 2 port default 4/2
vlan 2 port default 4/3
vlan 2 port default 4/4
vlan 2 port default 4/5
vlan 2 port default 4/6
vlan 2 port default 4/7
vlan 2 port default 4/8
vlan 2 port default 4/9
vlan 2 port default 4/10
vlan 2 port default 4/11
vlan 2 port default 4/12
vlan 2 port default 4/13
vlan 2 port default 4/14
vlan 2 port default 4/15
vlan 2 port default 4/16
vlan 2 port default 4/17
vlan 2 port default 4/18
vlan 2 port default 4/19
vlan 2 port default 4/20
vlan 3 enable name "VLAN 3"
vlan 3 port default 3/47
vlan 3 port default 3/48
vlan 3 port default 4/47
vlan 3 port default 4/48
! VLAN SL:
! IP :
ip service all
ip interface "2) xxx LAN" address xxx mask 255.255.0.0 vlan 2 ifindex
1
ip interface "3) TRANSIT - FW" address xxx mask 255.255.255.224 vlan
3 ifindex 2
ip interface "1) Server LAN RZ" address xxx mask 255.255.255.0 vlan 1
findex 3
! IPX :
! IPMS :
! AAA :
aaa authentication default "local"
aaa authentication console "local"
! PARTM :
! AVLAN :
! 802.1x :
! QOS :
! Policy manager :
! Session manager :
session timeout cli 60
session timeout http 60
session prompt default "xxx"
! SNMP :
snmp security authentication set
snmp authentication trap enable
snmp community map "xxx" user "xxx" on
snmp station xxx 162 "xxx" v1 enable
snmp station xxx 162 "xxx" v1 enable
! RIP :
! OSPF :
! ISIS :
! IPv6 :
! IP multicast :
ip static-route 0.0.0.0/0 gateway xxx metric 1
ip static-route 192.168.60.151/32 gateway xxx metric 1
ip static-route 192.168.70.161/32 gateway xxx metric 1
ip static-route 192.168.80.171/32 gateway xxx metric 1
ip static-route 192.168.90.181/32 gateway xxx metric 1
ip static-route 192.168.100.101/32 gateway xxx metric 1
ip static-route 192.168.120.141/32 gateway xxx metric 1
ip static-route 192.168.130.131/32 gateway xxx metric 1
ip static-route 192.168.190.191/32 gateway xxx metric 1
ip static-route 192.168.200.201/32 gateway xxx metric 1
ip static-route 192.168.210.212/32 gateway xxx metric 1
ip static-route 192.168.220.221/32 gateway xxx metric 1
ip static-route 192.168.230.231/32 gateway xxx metric 1
ip static-route 192.168.240.241/32 gateway xxx metric 1
! RIPng :
! OSPF3 :
! BGP :
! Health monitor :
! Interface :
! Udld :
! Netsec :
! Port Mapping :
! Link Aggregate :
! VLAN AGG:
! 802.1Q :
! Spanning tree :
bridge mode 1x1
! Bridging :
! Bridging :
! Port mirroring :
! UDP Relay :
ip helper address 128.11.64.111
ip helper address 128.11.160.59
ip helper address 172.30.0.1
ip helper address 172.30.0.16
ip helper address 172.30.0.19
ip helper pxe-support enable
! Server load balance :
! System service :
ip name-server xxx
ip domain-name xxx
ip domain-lookup
swlog appid HEALTH level debug3
! SSH :
! VRRP :
! Web :
ip http ssl
! AMAP :
! LLDP :
! Lan Power :
! NTP :
! RDP :
! VLAN STACKING:
! Ethernet-OAM :
->
I would be happy, if I got some hints from you. I tought, that the OS6850 does the main L2&l3 forwarding in ASICS, so the CPU can´t be so heavily under load as seen.
Kind regards!
René
Re: OS-6850 Stack ---> CPU heavy loaded !
Hi,
From the configuration parameters I can see that you use AOS 6.3.1.R01, but not which version.
The config looks easy, so it should not be any of those config lines.
Please post the output of "show microcode". Just to know which AOS you are using.
(The miniboot/bootrom was already good, but it only gives limited information about the AOS)
In case you are using AOS 6.3.1.871.R01 then you run the GA and I suggest you just try to upgrade to the latest maintenance release (AOS 6.3.1.999.R01) and continue to monitor your network with that version.
-benny
From the configuration parameters I can see that you use AOS 6.3.1.R01, but not which version.
The config looks easy, so it should not be any of those config lines.
Please post the output of "show microcode". Just to know which AOS you are using.
(The miniboot/bootrom was already good, but it only gives limited information about the AOS)
In case you are using AOS 6.3.1.871.R01 then you run the GA and I suggest you just try to upgrade to the latest maintenance release (AOS 6.3.1.999.R01) and continue to monitor your network with that version.
-benny
Regards,
Benny
Benny
-
- Member
- Posts: 9
- Joined: 25 Aug 2008 06:09
Re: OS-6850 Stack ---> CPU heavy loaded !
Hi Benny,
thanks so much for your first Info - I thought too, that the config can´t be the reason for CPU problems, but I didn´t know, if there are some default paramaters, wich differ from the AOS of my old 7700 system. Ok, show microcode says following:
Package Release Size Description
-----------------+---------------+--------+-----------------------------------
Kbase.img 6.3.1.871.R01 14716315 Alcatel Base Software
K2os.img 6.3.1.871.R01 1733068 Alcatel OS
Keni.img 6.3.1.871.R01 4896214 Alcatel NI software
Ksecu.img 6.3.1.871.R01 474897 Alcatel Security Management
Benny - you are right: 6.3.1.871.R01
So you think, an update to 6.3.1.999.R01 may be a sollution to fix that problem?
Yesterday at night we reactivated the old 7700 system and patched it to the new 6850 stack on to 3 Ports. The AOS7700 get the IP Adresses from the 6850 (the 6850 gets some temp. IP addresses) and does know the Layer 3 routing between the 3 VLANS. Try and error - maybe, the hight CPU load is caused by L3 routing? At this time, there are no Problems with the CPU. I´ll let you know lates this day, if the problem is work´arrounded.
Best Regards !
René
thanks so much for your first Info - I thought too, that the config can´t be the reason for CPU problems, but I didn´t know, if there are some default paramaters, wich differ from the AOS of my old 7700 system. Ok, show microcode says following:
Package Release Size Description
-----------------+---------------+--------+-----------------------------------
Kbase.img 6.3.1.871.R01 14716315 Alcatel Base Software
K2os.img 6.3.1.871.R01 1733068 Alcatel OS
Keni.img 6.3.1.871.R01 4896214 Alcatel NI software
Ksecu.img 6.3.1.871.R01 474897 Alcatel Security Management
Benny - you are right: 6.3.1.871.R01
So you think, an update to 6.3.1.999.R01 may be a sollution to fix that problem?
Yesterday at night we reactivated the old 7700 system and patched it to the new 6850 stack on to 3 Ports. The AOS7700 get the IP Adresses from the 6850 (the 6850 gets some temp. IP addresses) and does know the Layer 3 routing between the 3 VLANS. Try and error - maybe, the hight CPU load is caused by L3 routing? At this time, there are no Problems with the CPU. I´ll let you know lates this day, if the problem is work´arrounded.
Best Regards !
René
-
- Member
- Posts: 19
- Joined: 04 Oct 2007 03:16
Re: OS-6850 Stack ---> CPU heavy loaded !
Hi there - we normally only see high CPU utilisation due to 'external' problems e.g. layer 2 network loops etc. Are you sure you don't have a spanning-tree issue or similar ? I've found that quite often these types of problems can take quite a few minutes to 'show themselves'. I doubt that the 6850s are struggling with normal traffic as they are nearly as powerful as 7700s.
-
- Member
- Posts: 9
- Joined: 25 Aug 2008 06:09
Re: OS-6850 Stack ---> CPU heavy loaded !
hi, thanks for the SPT idea - but spanning-tree is not the reason, I cheched it out at monday this week. Configured quit well, no loops detected.
But in some more laboratory tests I found out, that the cluster has very big ARP Problems (the switch won´t refresh / create ARP correctly).
Next step to solve the case: I´ve tested the Version 6.3.1.999.R01 on an other system - works fine without detecting any problems. Next Monday I´ll update the stack and tuesday we´ll see, if the problems gone with it.
Thanx for answering folks - I´ll let you know next week, if the problem is fixed...
Regards
René
But in some more laboratory tests I found out, that the cluster has very big ARP Problems (the switch won´t refresh / create ARP correctly).
Next step to solve the case: I´ve tested the Version 6.3.1.999.R01 on an other system - works fine without detecting any problems. Next Monday I´ll update the stack and tuesday we´ll see, if the problems gone with it.
Thanx for answering folks - I´ll let you know next week, if the problem is fixed...
Regards
René
Re: OS-6850 Stack ---> CPU heavy loaded !
It is easier to start the analysis from a maintenance release, so in case of issues it is always a good idea to upgrade to the latest version and check if you can still see issues. This way you can avoid opening a SR with your BP and lose time. (In case you get the maintenance window to upgrade your network quite easily...)
Please note that the OS6850 is more powerful than the OS7700/7800.
Edit:
Please note that AOS 6.3.1.R01 has a new feature called "ARP defense". The switch will not process ARP requests for the same destination as long as the destination did not answer, so you might see some issues with specific applications.
(Something like: "The second icmp echo-reply from dst IP is always lost")
You can switch of "ARP defence" by entering the following command:
-> icmp unreachable host-unreachable enable
-benny
Please note that the OS6850 is more powerful than the OS7700/7800.
Edit:
Please note that AOS 6.3.1.R01 has a new feature called "ARP defense". The switch will not process ARP requests for the same destination as long as the destination did not answer, so you might see some issues with specific applications.
(Something like: "The second icmp echo-reply from dst IP is always lost")
You can switch of "ARP defence" by entering the following command:
-> icmp unreachable host-unreachable enable
-benny
Regards,
Benny
Benny
Re: OS-6850 Stack ---> CPU heavy loaded !
While you notice the high CPU utilization, log on to the switch, enter
dshell
spyMax=3
Now the top three CPU utilizing tasks will be dumped to the screen and switch log every 5 seconds. Then it should be easy to figure out which process is taking up all of your CPU. To stop,
spyStop
exit
If I had to venture a guess at what was going on: routing loop. All of those static host routes in your config spell trouble if there was a typo.
dshell
spyMax=3
Now the top three CPU utilizing tasks will be dumped to the screen and switch log every 5 seconds. Then it should be easy to figure out which process is taking up all of your CPU. To stop,
spyStop
exit
If I had to venture a guess at what was going on: routing loop. All of those static host routes in your config spell trouble if there was a typo.
-
- Member
- Posts: 9
- Joined: 25 Aug 2008 06:09
Re: OS-6850 Stack ---> CPU heavy loaded !
MWLosRios wrote:While you notice the high CPU utilization, log on to the switch, enter
dshell
spyMax=3
Now the top three CPU utilizing tasks will be dumped to the screen and switch log every 5 seconds. Then it should be easy to figure out which process is taking up all of your CPU. To stop,
spyStop
exit
If I had to venture a guess at what was going on: routing loop. All of those static host routes in your config spell trouble if there was a typo.
Hi,
thanx for the hint. After Update to the latest AOS the problem was blown away. But wen it will happen again, I´ll know how a start a dump (these configuration commands are not descripted anywhere for Endusers? ).
Best Regards folks and thanx all for helping me!
René
Re: OS-6850 Stack ---> CPU heavy loaded !
Hi,MWLosRios wrote:While you notice the high CPU utilization, log on to the switch, enter
dshell
spyMax=3
Now the top three CPU utilizing tasks will be dumped to the screen and switch log every 5 seconds. Then it should be easy to figure out which process is taking up all of your CPU. To stop,
spyStop
exit
Hope everyone is fine ... I have a similar issue in a stacked 6850.
Code: Select all
> show health all CPU
* - current value exceeds threshold
1 Min 1 Hr 1 Hr
Cpu Limit Curr Avg Avg Max
-----------------+-------+------+------+-----+----
01 80 24 21 22 61
02 80 100* 100 88 100
Code: Select all
------- SPY TOP TICK USAGE -------
task ticks: taUdldNi 30
task ticks: bcmRX 10
task ticks: VlanMgr 7
idle total: 384 505
Is there a way to show the processes of stack 02?
thanks in advance!
Re: OS-6850 Stack ---> CPU heavy loaded !
you can probably telnet into chassis/Stackmember two
(telnet 127.2.2.1 afaik local user or admin/switch)
and enter the dshell there.
what code are you running?
(telnet 127.2.2.1 afaik local user or admin/switch)
and enter the dshell there.
what code are you running?