Page 1 of 1

Slow LSP failover on down links

Posted: 19 Aug 2015 13:43
by jwatki
Hey guys,

Have a Core network with multiple 7750's, recently had an issue between 2 of my 7750's where link went down between them, and the fail over was either too slow, or didn't occur at all, and simply restored when the offending link was restored. Everywhere I read says 50ms switch over, mine was over 2 seconds, slow enough for one of my major customers (cell tower) to see it and tell me before I even knew about it. The only thing I can find that I might be missing on my LSP's is the "Adaptive" command, not sure if that is what I am missing, but these are production routers so I can't go "testing" without alot of headache and overnight work :) Any suggestions as to what I'm missing? Let me know if you need to see a different part of the config.

Currently runring C-10.0.R5

Sample of my config:

Code: Select all

 mpls
            path "igp"
                no shutdown
            exit
            path "follow-igp"         
                no shutdown
            exit
            lsp "BSRA01-BSRA02"
                to 172.xx.xx.x
                cspf
                fast-reroute facility
                exit
                primary "igp"
                exit
                no shutdown

Re: Slow LSP failover on down links

Posted: 20 Aug 2015 03:25
by zeips
Hi,
The adaptive option is enabled by default. Can you verify that for your lsp there actually exists any protection?? if you say fail over didn't occur at all, maybe there are no bypass tunnels and then if the link fails, lsp needs to be recalculated(wait for IGP) and eventually this lsp is established trough a new path. Anyway check this: show router mpls lsp "BSRA01-BSRA02" path detail
You can check a lot of things with this command, like if any protection of this lsp exists, if yes there should be also information if this protection was ever used, during failure you should also see the node which reports the failure and so on...

Re: Slow LSP failover on down links

Posted: 25 Aug 2015 16:57
by jwatki
zeips wrote:Hi,
The adaptive option is enabled by default. Can you verify that for your lsp there actually exists any protection?? if you say fail over didn't occur at all, maybe there are no bypass tunnels and then if the link fails, lsp needs to be recalculated(wait for IGP) and eventually this lsp is established trough a new path. Anyway check this: show router mpls lsp "BSRA01-BSRA02" path detail
You can check a lot of things with this command, like if any protection of this lsp exists, if yes there should be also information if this protection was ever used, during failure you should also see the node which reports the failure and so on...
Thanks for that command Zeips, here is the output on the LSP that the link failed on, it does show the MBB event on 8/19 which is when I had problems. Do you see anything out of the ordinary or that is missing?

Code: Select all

Adm State   : Up                                 Oper State  : Up
Path Name   : igp                                Path Type   : Primary
Path Admin  : Up                                 Path Oper   : Up
OutInterface: lag-102:0                          Out Label   : 261063
Path Up Time: 259d 15:12:22                      Path Dn Time: 0d 00:00:00
Retry Limit : 0                                  Retry Timer : 30 sec
RetryAttempt: 0                                  NextRetryIn : 0 sec 
 
Adspec      : Disabled                           Oper Adspec : Disabled
CSPF        : Enabled                            Oper CSPF   : Enabled
CSPF-FL     : Disabled                           Oper CSPF-FL: Disabled
Least Fill  : Disabled                           Oper LeastF*: Disabled
FRR         : Enabled                            Oper FRR    : Enabled
FRR NodePro*: Enabled                            Oper FRR NP : Enabled
FR Hop Limit: 16                                 Oper FRHopL*: 16
Prop Adm Grp: Disabled                           Oper PropAG : Disabled
 
Neg MTU     : 9190                               Oper MTU    : 9190
Bandwidth   : No Reservation                     Oper Bw     : 0 Mbps
Hop Limit   : 255                                Oper HopLim*: 255
Record Route: Record                             Oper RecRou*: Record
Record Label: Record                             Oper RecLab*: Record
SetupPriori*: 7                                  Oper SetupP*: 7
Hold Priori*: 0                                  Oper HoldPr*: 0
Class Type  : 0                                  Oper CT     : 0
Backup CT   : None                               
MainCT Retry: n/a                                
    Rem     :                                    
MainCT Retry: 0                                  
    Limit   :                                    
Include Grps:                                    Oper InclGr*:  
None                                           None
Exclude Grps:                                    Oper ExclGr*:  
None                                           None
 
Adaptive    : Enabled                            Oper Metric : 4
Preference  : n/a                                
Path Trans  : 14                                 CSPF Queries: 37830
Failure Code: noError                            Failure Node: n/a
ExplicitHops:                                    
    No Hops Specified
Actual Hops :                                    
    172.xx @ n               Record Label    : N/A
 -> 172.xx @                 Record Label    : 261063
 -> 172.xx                  Record Label    : 260988
ComputedHops:                                    
    172.xx   -> 172.xx    -> 172.xx  
ResigEligib*: False                              
LastResignal: 08/25/2015 16:42:35                CSPF Metric : 4
Last MBB    :
 MBB Type   : TimerBasedResignal                 MBB State   : Success
 Ended At   : 08/19/2015 03:55:48                Old Metric  : 8
 Signaled BW: 0 Mbps 

Re: Slow LSP failover on down links

Posted: 28 Aug 2015 03:53
by zeips
It looks fine actually:
Actual Hops :
172.xx @ n Record Label : N/A
-> 172.xx @ Record Label : 261063
-> 172.xx Record Label : 260988

you have both link and node protection on your headend so signaling of the bypass tunnel was successful. MBB as well.

So mpls looks fine but this can't be proof that there weren't any issues. Maybe problems occurred somewhere on service layer or application itself. Would be good to test this in the Lab and then check everything step by step.
If something more come up to my mind I will let you know.