mc-lag problems

m00n
Member
Posts: 28
Joined: 12 Apr 2011 08:26
Location: Poland

mc-lag problems

Post by m00n »

Hey
Does only me have problem with mc-lag on 10k ? it's really unstable in my config. I must restart chasis in random moment once a week becouse one linkagg change state to down. I have also lot off errors in swlog but i dont really understand it for example

Code: Select all

Jan 17 09:51:45 (none) local0.info swlogd: mcm vlan_mgr  info(5) CMM:mcmCMM_process_vlan_event_range@1501: Vlan event=x200 vlan=1-1 ifIdx=40000097-40000097 vpaFlg=1- duplicated 1 times!
Jan 17 09:51:45 (none) local0.info swlogd: mcm ipc       info(5) CMM:disconnect_callback@685: Reactor (port 37417)
Jan 17 09:51:45 (none) local0.info swlogd: VlanMgr main info(5) Slot is present, but no NI connection yet!!!
Jan 17 09:51:45 (none) local0.info swlogd: stp _AGRt info(5)  LA_AGGDOWN agg_ifdx=40000097
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcSendToNI:112] TX to NI 3 sock 28 length 184 - duplicated 2 times!
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcManageLagEvent:113] VFC Rx AGG down for lagId 97 gPort 256
Jan 17 09:51:45 (none) local0.info swlogd: stp _MSGt info(5) CS_NI_DOWN slot:3
Jan 17 09:51:45 (none) local0.info swlogd: stp _STPt info(5) slot 3 connection 0
Jan 17 09:51:45 (none) local0.info swlogd: source GENERAL info(5) NO_CONSOLE:(600031.283)upDesigLa[1100]send to all NIs that designated LA slot is 2
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcSendConfigAckToNi:345] TX VFC_CONFIG_REQEST to NI 2 - duplicated 1 times!
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcManageLagEvent:113] VFC Rx AGG down for lagId 98 gPort 257
Jan 17 09:51:45 (none) local0.err swlogd: VlanMgr main error(2) NI down from a slot I didnt know of
Jan 17 09:51:45 (none) local0.info swlogd: stp _AGRt info(5)  LA_AGGDOWN agg_ifdx=40000098
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcManageLagEvent:113] VFC Rx AGG down for lagId 99 gPort 258
Jan 17 09:51:45 (none) local0.info swlogd: stp _AGRt info(5)  LA_AGGDOWN agg_ifdx=40000099
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcHandleIncomingDisconnect:574] Removing connection on socket 29
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcAddNewConnection:601] New connection: 127.2.2.1:41075, Socket=29
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcHandleIncomingMsg:381] RX VFC_MSG_HELLO zNi 1 BOOTUP
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcPMEventsRegister:575] Port Manager Registrations Done
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcPMPortEvtRegistration:830] PM Registration done for zNi 1
Jan 17 09:51:45 (none) local0.info swlogd: vfc main info(5) [vfcHandlePMMsg:96] PM Registration success
Jan 17 09:51:46 (none) local0.warn swlogd: LinkAgg main warning(4) Peer Disconnect slot:3 up:0 reset:0

Code: Select all

Jan 10 11:13:28 (none) local0.err swlogd: LinkAgg main error(2) Range mismatch PEER : ours 0 47 Peers:48 95
Jan 10 11:13:28 (none) local0.err swlogd: LinkAgg main error(2) Range mismatch LOCAL : ours 48 95 Peers:0 47

Code: Select all

B3-OS10K[MC-2]-> show linkagg range
                       Operational            Configured
                       Min       Max          Min     Max
--------------------+--------+--------+-------------+-------+
Local                   48        95            48        95
Peer                     0        47             0        47
Multi-Chassis           96       127            96       127


Code: Select all

B3-OS10K[MC-1]-> show linkagg range
                       Operational            Configured
                       Min       Max          Min     Max
--------------------+--------+--------+-------------+-------+
Local                   48        95            48        95
Peer                     0        47             0        47
Multi-Chassis           96       127            96       127

Please help
Network & UnixAdministrator, The State School of Higher Professional Education in Elbląg, ACFE
ydeschoe
Member
Posts: 33
Joined: 30 Dec 2008 07:54

Re: mc-lag problems

Post by ydeschoe »

Which software version are you running

we have a case open at Alcatel about MC-LAG problems and HP,

This is the answer

***********************
Thank you for the logs. We could see that 10k is running code version 7.1.1.1696.R01. On further checking we could see "PR#162717 issue ::LACP is OUT_OF_SYNC between OS6900 and HP 10G-FLEXMODULE".

We suspect that we are getting the same issue here. Fix for PR#162717 is available from 7.1.1. 1736.R01. Please upgrade the 10k to this code or higher and let us know if the issue is seen.
************************
Via my pre-sales manager I heart he had a client with the same problem and I forwarded him this message, the client did the upgrade to the proposed version and his problem was solved on the 6900 switches.

We will upgrade in 2 weeks to this latest version
-------
Do you have MC-LAG and normal LAG configured on the same device

there seems to be a problem to mixed them if they will carry the same vlans

my MC-LAG are stable and are running since 13 NOV 2011

Regards,

Yves
m00n
Member
Posts: 28
Joined: 12 Apr 2011 08:26
Location: Poland

Re: mc-lag problems

Post by m00n »

Hi thanks for answer.
My software version is:

Ros.img 7.1.1.1638.R01:relman 64246700 Alcatel-Lucent OS
Reni.img 7.1.1.1638.R01:relman 60482684 Alcatel-Lucent NI

I don't have lag directly connected to 10k, but i have connect mc-lag to 6850X stack and another stack of 6400 connected to 6850 stack by normal lag.

P.S
Were you in Callabass in October :) ? I remember someone with HP gbics problem in lab ;)
Network & UnixAdministrator, The State School of Higher Professional Education in Elbląg, ACFE
ydeschoe
Member
Posts: 33
Joined: 30 Dec 2008 07:54

Re: mc-lag problems

Post by ydeschoe »

I was in Callabasas but we didn't know yet about the problem as we were still in the implementation phase and the blade systems were not yet in

maybe ask for the update to the intermediate software version and maybe some of the problems will be fixed

On some older OS6850 stacks (soft 6.3.1.999.R01) we have also the problem of the MC-LAG not coming up but then we configured it as a MC Statis LAG on the 10K and as a static lag on the OS6850 and this is working, links are stable since the installation.

Yves
devnull
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 976
Joined: 07 Sep 2010 10:16
Location: Germany

Re: mc-lag problems

Post by devnull »

You have the same lacp ranges configured on both switches? This is afaik wrong.

Peer and local need to be mirrored:
(for Chassis 1) linkagg range local 48-95 peer 0-47 multi-chassis 96-127
(for Chassis 2) linkagg range local 0-47 peer 48-95 multi-chassis 96-127

AFAIK the docs clearly state that. So that it is clear, that LAG 62 is on chassis 1 and 42 is on chassis 2

We encounterd Hardware incompatibility with 6900 and very old 6850 (revision 1 or so) where only one link will get active. No (software) workaround possible.
m00n
Member
Posts: 28
Joined: 12 Apr 2011 08:26
Location: Poland

Re: mc-lag problems

Post by m00n »

yes i found it in documentation ;) and changed but still have some problems.

For example some console traps:

Code: Select all

[slot 3] Mon Jan 23 18:17:32  Udld Ni error udld_ni_decode_frame: plGetGportFromSlotUnitDportUnit Error :unit: 8 port: 18- duplicated 2 times!
[slot 2] Mon Jan 23 18:17:32  ipni hw warning 284: invalid gport 0- duplicated 2 times!
[slot 3] Mon Jan 23 18:17:32  ipni hw warning 284: invalid gport 0- duplicated 2 times!
[slot 3] Mon Jan 23 18:17:43  Udld Ni error udld_ni_decode_frame: plGetGportFromSlotUnitDportUnit Error :unit: 8 port: 20- duplicated 5 times!
[slot 3] Mon Jan 23 18:18:32  Udld Ni error udld_ni_decode_frame: plGetGportFromSlotUnitDportUnit Error :unit: 8 port: 18- duplicated 2 times!
[slot 2] Mon Jan 23 18:18:32  ipni hw warning 284: invalid gport 0- duplicated 2 times!
[slot 3] Mon Jan 23 18:18:32  ipni hw warning 284: invalid gport 0- duplicated 2 times!
[slot 3] Mon Jan 23 18:18:43  Udld Ni error udld_ni_decode_frame: plGetGportFromSlotUnitDportUnit Error :unit: 8 port: 20- duplicated 5 times!
Result of that is no getting address from dhcp for some clients. I see in log of dhcp server DHCPOFFER bu client don't get address. What is funny this thing is absolutely random :/

Another portion of errors in log

Code: Select all

Jan 23 15:14:40 (none) local0.err swlogd: portmgr main error(2) : [pmCmmPeerSendViolStatusUpdateToPeer:2172] Invalid Slot Number 1
Jan 23 15:14:40 (none) local0.err swlogd: portmgr main error(2) : [pmCmmPeerSendViolStatusUpdateToPeer:2172] Invalid Slot Number 4
Jan 23 15:14:40 (none) local0.err swlogd: portmgr main error(2) : [pmCmmPeerSendViolStatusUpdateToPeer:2172] Invalid Slot Number 5
Jan 23 15:14:40 (none) local0.err swlogd: portmgr main error(2) : [pmCmmPeerSendViolStatusUpdateToPeer:2172] Invalid Slot Number 6
Jan 23 15:14:40 (none) local0.err swlogd: portmgr main error(2) : [pmCmmPeerSendViolStatusUpdateToPeer:2172] Invalid Slot Number 7
Jan 23 15:14:40 (none) local0.err swlogd: portmgr main error(2) : [pmCmmPeerSendViolStatusUpdateToPeer:2172] Invalid Slot Number 8
Network & UnixAdministrator, The State School of Higher Professional Education in Elbląg, ACFE
User avatar
bitbin
Member
Posts: 24
Joined: 31 Aug 2010 19:04

Re: mc-lag problems

Post by bitbin »

ydeschoe wrote: On some older OS6850 stacks (soft 6.3.1.999.R01) we have also the problem of the MC-LAG not coming up but then we configured it as a MC Statis LAG on the 10K and as a static lag on the OS6850 and this is working, links are stable since the installation.
I wasn't aware that MC Static LAG was supported on the 10k. I tried it in our lab back in September (don't recall the code) and it wasn't functioning. do you happen to have the part of the config on the 10k for setting up the MC-Lag static?

thank you
devnull
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 976
Joined: 07 Sep 2010 10:16
Location: Germany

Re: mc-lag problems

Post by devnull »

Random may be because of one specific chassis responding.
Can you create a picture of your setup?
Are both clients and DHCP Servers "behind" MC-Lags?
Have realized that according to the docs you can't access VIP VLANs (Which you probably have with MC-Lags) from non MC-LAG Ports?
Have you opend a Ticket?
m00n
Member
Posts: 28
Joined: 12 Apr 2011 08:26
Location: Poland

Re: mc-lag problems

Post by m00n »

Hello tjanks for reply

I don't have now visio to make a picture so i describe it ;)

|10k-1| ---VFL---- |10k-2|
| | Mc-LAG
|______6850E-48X_____|
||
|| LACP
6400

So for example vlan 10

On 10k: ip interface for vlan 10, taged on mc-lag to 6850
On 6850 LACP to 6400 an tagged vlan 10 on mc-lag and lacp-lag to 6400
On 6400 LACP to 6850 tagged vlan 10 on this lag, and untagged port in switch

DHCP server is another vlan, ip helper address is on both chassis

Yesterday i changed software on 10k to 1722 and changed LACP between 6400 and 6850 for static linkagg and today for now is good noone call to me with problems, so i hope it help
Network & UnixAdministrator, The State School of Higher Professional Education in Elbląg, ACFE
devnull
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 976
Joined: 07 Sep 2010 10:16
Location: Germany

Re: mc-lag problems

Post by devnull »

And DHCP Server is also on VLAN 10 (or is the server in the os10k?)
Do you have a vip address, or a "normal" adress on OS10k?

Otherwise, there may be a delay problem:
show ip helper stats
-> delay violations, then set:
ip helper forward delay 0

could you post relevant config parts?
Post Reply

Return to “OmniSwitch 10k”