OS-6850 Stack ---> CPU heavy loaded !

r.gruen_DUP
Member
Posts: 9
Joined: 25 Aug 2008 06:09

OS-6850 Stack ---> CPU heavy loaded !

Post by r.gruen_DUP »

Hello everybody,

at first sorry for my bad english, I hope you´ll understand my problem anyway ;-)
at second - here ist my Problem.

Last Wekkend I´ve Installed a Omniswitch 6850 Stack (4x 6850-48), which is replacing an old AOS7700 System. There is configured just a basic set of commands (some Ports, some VLAns - nothing complicated) and this monday was the first day, the Switch Stack has to work as our brandnew Server-Backbone-Switch. The switche worked fine til 11:15 a.m., but then the first users called our helpdesk and told, that some serves won´t be available. I startet a telnet Session to the 6850 stack and the response was very slow. "Show health" command gave me the answer: CPU > 80% usage. I thought "mh, primary cmm still frozen?" and I startet a "takeover". After 30 Sekonds the CPU load was between 19-25 % - all problems gone !

45 Minutes later, the same poblem - the cpu was heavy under fire. Takeover, an 60 seconds after that the switche worked fine again. So - has everbody a good idea for any reason? The logging is emty (only the takeover command is logged, nothing else before). I´ve allready told the switch the command "swlog appid HEALTH level debug3" to get more logging informations about the cpu load - but there are no logging events.


Here some Infos about the system:

Uboot Version : 6.1.3.601.R01
Miniboot Version : 6.1.3.601.R01



Config:

! Stack Manager :
! Chassis :
system name xxx
system contact xxx
system location "xxx"
system timezone MET
! Configuration:
! VLAN :
ethernet-service mode vstk
vlan 1 enable name "VLAN 1"
vlan 2 enable name "VLAN 2"
vlan 2 port default 1/1
vlan 2 port default 1/2
vlan 2 port default 1/3
vlan 2 port default 1/4
vlan 2 port default 1/5
vlan 2 port default 1/7
vlan 2 port default 1/8
vlan 2 port default 1/9
vlan 2 port default 1/10
vlan 2 port default 1/11
vlan 2 port default 1/12
vlan 2 port default 1/13
vlan 2 port default 1/14
vlan 2 port default 1/18
vlan 2 port default 1/19
vlan 2 port default 1/20
vlan 2 port default 1/21
vlan 2 port default 1/32
vlan 2 port default 1/33
vlan 2 port default 1/34
vlan 2 port default 1/35
vlan 2 port default 1/36
vlan 2 port default 1/37
vlan 2 port default 1/40
vlan 2 port default 1/41
vlan 2 port default 1/42
vlan 2 port default 1/43
vlan 2 port default 1/44
vlan 2 port default 1/45
vlan 2 port default 1/46
vlan 2 port default 1/47
vlan 2 port default 1/48
vlan 2 port default 2/1
vlan 2 port default 2/2
vlan 2 port default 2/3
vlan 2 port default 2/4
vlan 2 port default 2/5
vlan 2 port default 2/7
vlan 2 port default 2/8
vlan 2 port default 2/9
vlan 2 port default 2/10
vlan 2 port default 2/11
vlan 2 port default 2/12
vlan 2 port default 2/13
vlan 2 port default 2/14
vlan 2 port default 2/18
vlan 2 port default 2/19
vlan 2 port default 2/20
vlan 2 port default 2/21
vlan 2 port default 2/32
vlan 2 port default 2/33
vlan 2 port default 2/34
vlan 2 port default 2/35
vlan 2 port default 2/36
vlan 2 port default 2/37
vlan 2 port default 2/40
vlan 2 port default 2/41
vlan 2 port default 2/42
vlan 2 port default 2/43
vlan 2 port default 2/44
vlan 2 port default 2/45
vlan 2 port default 2/46
vlan 2 port default 2/47
vlan 2 port default 2/48
vlan 2 port default 3/1
vlan 2 port default 3/2
vlan 2 port default 3/3
vlan 2 port default 3/4
vlan 2 port default 3/5
vlan 2 port default 3/6
vlan 2 port default 3/7
vlan 2 port default 3/8
vlan 2 port default 3/9
vlan 2 port default 3/10
vlan 2 port default 3/11
vlan 2 port default 3/12
vlan 2 port default 3/13
vlan 2 port default 3/14
vlan 2 port default 3/15
vlan 2 port default 3/16
vlan 2 port default 3/17
vlan 2 port default 3/18
vlan 2 port default 3/19
vlan 2 port default 3/20
vlan 2 port default 4/1
vlan 2 port default 4/2
vlan 2 port default 4/3
vlan 2 port default 4/4
vlan 2 port default 4/5
vlan 2 port default 4/6
vlan 2 port default 4/7
vlan 2 port default 4/8
vlan 2 port default 4/9
vlan 2 port default 4/10
vlan 2 port default 4/11
vlan 2 port default 4/12
vlan 2 port default 4/13
vlan 2 port default 4/14
vlan 2 port default 4/15
vlan 2 port default 4/16
vlan 2 port default 4/17
vlan 2 port default 4/18
vlan 2 port default 4/19
vlan 2 port default 4/20
vlan 3 enable name "VLAN 3"
vlan 3 port default 3/47
vlan 3 port default 3/48
vlan 3 port default 4/47
vlan 3 port default 4/48
! VLAN SL:
! IP :
ip service all
ip interface "2) xxx LAN" address xxx mask 255.255.0.0 vlan 2 ifindex
1
ip interface "3) TRANSIT - FW" address xxx mask 255.255.255.224 vlan
3 ifindex 2
ip interface "1) Server LAN RZ" address xxx mask 255.255.255.0 vlan 1
findex 3
! IPX :
! IPMS :
! AAA :
aaa authentication default "local"
aaa authentication console "local"
! PARTM :
! AVLAN :
! 802.1x :
! QOS :
! Policy manager :
! Session manager :
session timeout cli 60
session timeout http 60
session prompt default "xxx"
! SNMP :
snmp security authentication set
snmp authentication trap enable
snmp community map "xxx" user "xxx" on
snmp station xxx 162 "xxx" v1 enable
snmp station xxx 162 "xxx" v1 enable
! RIP :
! OSPF :
! ISIS :
! IPv6 :
! IP multicast :
ip static-route 0.0.0.0/0 gateway xxx metric 1
ip static-route 192.168.60.151/32 gateway xxx metric 1
ip static-route 192.168.70.161/32 gateway xxx metric 1
ip static-route 192.168.80.171/32 gateway xxx metric 1
ip static-route 192.168.90.181/32 gateway xxx metric 1
ip static-route 192.168.100.101/32 gateway xxx metric 1
ip static-route 192.168.120.141/32 gateway xxx metric 1
ip static-route 192.168.130.131/32 gateway xxx metric 1
ip static-route 192.168.190.191/32 gateway xxx metric 1
ip static-route 192.168.200.201/32 gateway xxx metric 1
ip static-route 192.168.210.212/32 gateway xxx metric 1
ip static-route 192.168.220.221/32 gateway xxx metric 1
ip static-route 192.168.230.231/32 gateway xxx metric 1
ip static-route 192.168.240.241/32 gateway xxx metric 1
! RIPng :
! OSPF3 :
! BGP :
! Health monitor :
! Interface :
! Udld :
! Netsec :
! Port Mapping :
! Link Aggregate :
! VLAN AGG:
! 802.1Q :
! Spanning tree :
bridge mode 1x1
! Bridging :
! Bridging :
! Port mirroring :
! UDP Relay :
ip helper address 128.11.64.111
ip helper address 128.11.160.59
ip helper address 172.30.0.1
ip helper address 172.30.0.16
ip helper address 172.30.0.19
ip helper pxe-support enable
! Server load balance :
! System service :
ip name-server xxx
ip domain-name xxx
ip domain-lookup
swlog appid HEALTH level debug3
! SSH :
! VRRP :
! Web :
ip http ssl
! AMAP :
! LLDP :
! Lan Power :
! NTP :
! RDP :
! VLAN STACKING:
! Ethernet-OAM :
->



I would be happy, if I got some hints from you. I tought, that the OS6850 does the main L2&l3 forwarding in ASICS, so the CPU can´t be so heavily under load as seen.

Kind regards!

René
User avatar
benny
Member
Posts: 750
Joined: 20 Oct 2007 14:51
Contact:

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by benny »

Hi,

From the configuration parameters I can see that you use AOS 6.3.1.R01, but not which version.

The config looks easy, so it should not be any of those config lines.

Please post the output of "show microcode". Just to know which AOS you are using.
(The miniboot/bootrom was already good, but it only gives limited information about the AOS)

In case you are using AOS 6.3.1.871.R01 then you run the GA and I suggest you just try to upgrade to the latest maintenance release (AOS 6.3.1.999.R01) and continue to monitor your network with that version.

-benny
Regards,
Benny
r.gruen_DUP
Member
Posts: 9
Joined: 25 Aug 2008 06:09

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by r.gruen_DUP »

Hi Benny,

thanks so much for your first Info - I thought too, that the config can´t be the reason for CPU problems, but I didn´t know, if there are some default paramaters, wich differ from the AOS of my old 7700 system. Ok, show microcode says following:

Package Release Size Description
-----------------+---------------+--------+-----------------------------------
Kbase.img 6.3.1.871.R01 14716315 Alcatel Base Software
K2os.img 6.3.1.871.R01 1733068 Alcatel OS
Keni.img 6.3.1.871.R01 4896214 Alcatel NI software
Ksecu.img 6.3.1.871.R01 474897 Alcatel Security Management

Benny - you are right: 6.3.1.871.R01

So you think, an update to 6.3.1.999.R01 may be a sollution to fix that problem?

Yesterday at night we reactivated the old 7700 system and patched it to the new 6850 stack on to 3 Ports. The AOS7700 get the IP Adresses from the 6850 (the 6850 gets some temp. IP addresses) and does know the Layer 3 routing between the 3 VLANS. Try and error - maybe, the hight CPU load is caused by L3 routing? At this time, there are no Problems with the CPU. I´ll let you know lates this day, if the problem is work´arrounded.

Best Regards !

René
markwoodward
Member
Posts: 19
Joined: 04 Oct 2007 03:16

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by markwoodward »

Hi there - we normally only see high CPU utilisation due to 'external' problems e.g. layer 2 network loops etc. Are you sure you don't have a spanning-tree issue or similar ? I've found that quite often these types of problems can take quite a few minutes to 'show themselves'. I doubt that the 6850s are struggling with normal traffic as they are nearly as powerful as 7700s.
r.gruen_DUP
Member
Posts: 9
Joined: 25 Aug 2008 06:09

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by r.gruen_DUP »

hi, thanks for the SPT idea - but spanning-tree is not the reason, I cheched it out at monday this week. Configured quit well, no loops detected.

But in some more laboratory tests I found out, that the cluster has very big ARP Problems (the switch won´t refresh / create ARP correctly).

Next step to solve the case: I´ve tested the Version 6.3.1.999.R01 on an other system - works fine without detecting any problems. Next Monday I´ll update the stack and tuesday we´ll see, if the problems gone with it.

Thanx for answering folks - I´ll let you know next week, if the problem is fixed...

Regards
René
User avatar
benny
Member
Posts: 750
Joined: 20 Oct 2007 14:51
Contact:

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by benny »

It is easier to start the analysis from a maintenance release, so in case of issues it is always a good idea to upgrade to the latest version and check if you can still see issues. This way you can avoid opening a SR with your BP and lose time. (In case you get the maintenance window to upgrade your network quite easily...)

Please note that the OS6850 is more powerful than the OS7700/7800.

Edit:

Please note that AOS 6.3.1.R01 has a new feature called "ARP defense". The switch will not process ARP requests for the same destination as long as the destination did not answer, so you might see some issues with specific applications.
(Something like: "The second icmp echo-reply from dst IP is always lost")

You can switch of "ARP defence" by entering the following command:
-> icmp unreachable host-unreachable enable

-benny
Regards,
Benny
MWLosRios
Member
Posts: 35
Joined: 24 Jul 2008 12:49

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by MWLosRios »

While you notice the high CPU utilization, log on to the switch, enter

dshell
spyMax=3

Now the top three CPU utilizing tasks will be dumped to the screen and switch log every 5 seconds. Then it should be easy to figure out which process is taking up all of your CPU. To stop,

spyStop
exit

If I had to venture a guess at what was going on: routing loop. All of those static host routes in your config spell trouble if there was a typo.
r.gruen_DUP
Member
Posts: 9
Joined: 25 Aug 2008 06:09

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by r.gruen_DUP »

MWLosRios wrote:While you notice the high CPU utilization, log on to the switch, enter

dshell
spyMax=3

Now the top three CPU utilizing tasks will be dumped to the screen and switch log every 5 seconds. Then it should be easy to figure out which process is taking up all of your CPU. To stop,

spyStop
exit

If I had to venture a guess at what was going on: routing loop. All of those static host routes in your config spell trouble if there was a typo.

Hi,

thanx for the hint. After Update to the latest AOS the problem was blown away. But wen it will happen again, I´ll know how a start a dump (these configuration commands are not descripted anywhere for Endusers? ).

Best Regards folks and thanx all for helping me!

René
mouthpiec
Member
Posts: 53
Joined: 27 Aug 2013 10:58

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by mouthpiec »

MWLosRios wrote:While you notice the high CPU utilization, log on to the switch, enter

dshell
spyMax=3

Now the top three CPU utilizing tasks will be dumped to the screen and switch log every 5 seconds. Then it should be easy to figure out which process is taking up all of your CPU. To stop,

spyStop
exit
Hi,
Hope everyone is fine ... I have a similar issue in a stacked 6850.

Code: Select all

> show health all CPU
* - current value exceeds threshold

                                1 Min  1 Hr  1 Hr
Cpu                Limit   Curr   Avg    Avg   Max
-----------------+-------+------+------+-----+----
01                   80     24     21    22    61
02                   80    100*   100    88   100
But when running the dshell command I am getting the following ... I guess these are of stack 01

Code: Select all

------- SPY TOP TICK USAGE -------
task  ticks: taUdldNi  30
task  ticks: bcmRX  10
task  ticks: VlanMgr  7
idle  total: 384  505  

Is there a way to show the processes of stack 02?

thanks in advance!
devnull
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 976
Joined: 07 Sep 2010 10:16
Location: Germany

Re: OS-6850 Stack ---> CPU heavy loaded !

Post by devnull »

you can probably telnet into chassis/Stackmember two
(telnet 127.2.2.1 afaik local user or admin/switch)
and enter the dshell there.

what code are you running?
Post Reply

Return to “OmniSwitch 6850 / 6850E”