Monday, July 4, 2022
HomeHealth NewsWi-fi Catalyst 9800 WLC KPIs, Half 1

Wi-fi Catalyst 9800 WLC KPIs, Half 1


Half 1 of the 3-part Wi-fi Catalyst 9800 WLC KPIs

When working in essential wi-fi infrastructures, it is very important be proactive and decide prematurely if there may be any potential challenge that might affect end-clients expertise. Wi-fi Catalyst 9800 WLC KPIs will assist in that activity.

On this weblog, I’ll share a scientific strategy plus a listing of instructions that I’ve used whereas offering help on the NOC for one of many largest worldwide wi-fi occasions. The concept behind is to maintain an in depth eye on the right way to monitor Key Efficiency Indicators (KPIs) for Catalyst 9800 WLC.

KPIs outputs could be collected periodically to create a baseline when a community is working nice. Due to this fact, making it simpler later to seek out any deviation by evaluating new outputs with beforehand collected ones.

I’ve divided WLC KPIs into six completely different buckets or areas:

  • WLC checks
  • Reference to different units
  • AP checks
  • RF checks
  • Shopper checks
  • Packet Drops

KPIs will assist us to identify points in any of the talked about six areas. On this weblog, I’ve included WLC checks and Connections with different units. Moreover, there shall be two extra blogs the place I’ll share AP checks, RF checks, Shopper checks, and Packet Drops.

WLC checks

I often begin by checking the WLC first, since it’s the most important half. If any points are seen within the controller, they are going to cascade shortly after as issues with APs and purchasers.  In different phrases, the thought right here is to carry out top-down standards.

Whereas reviewing the well being state of the WLC, I’d first affirm that WLC is working the meant model and in set up mode. Set up mode will make sure that the controller will boot quicker, with a decreased reminiscence footprint. After that, I’d test the uptime of the WLC to see if any reload has occurred. Use the command: “present model | i uptime|Set up mode|Cisco IOS Software program”

Gladius1#present model | i uptime|Set up mode|Cisco IOS Software program
Cisco IOS Software program [Amsterdam], C9800 Software program (C9800_IOSXE-K9), Model 17.3.5a, RELEASE SOFTWARE (fc2)
Gladius1 uptime is 2 weeks, 5 days, 21 hours, half-hour
Set up mode is INSTALL

Test anticipated launch, uptime, and WLC working in set up mode.

For Catalyst 9800 WLC deployed in Excessive Availability, which by the way in which, is very advisable for essential deployments, we have to first confirm that the HA pair stack is fashioned and in a standby-hot state. Secondly, test the stack uptime and every of the member’s particular person uptime. Thirdly, establish quite a lot of switchovers between energetic and standby. Use the command: “present redundancy | i ptime|Location|Present Software program state|Switchovers”.

Gladius1#present redundancy | i ptime|Location|Present Software program state|Switchovers
       Obtainable system uptime = 2 weeks, 1 day, 2 hours, 48 minutes
Switchovers system skilled = 1
               Lively Location = slot 1
        Present Software program state = ACTIVE
       Uptime in present state = 7 hours, 10 minutes
              Standby Location = slot 2
        Present Software program state = STANDBY HOT
       Uptime in present state = 7 hours, 4 minutes

Test stack uptime, variety of switchovers, and uptime for members. Switchover occurred 7 hours in the past. Slot1 is new energetic and Slot2 reloaded.

In HA deployments, the advice is to make use of RMI characteristic. This can enable monitoring energetic and standby by Wi-fi Administration Interface (WMI) and Redundancy Port (RP). After that, we should always allow Default-gateway Test to verify that each energetic and standby can attain the gateway. Here’s a hyperlink to the 9800 Excessive Availability deployment information.

The subsequent step shall be to test if there are any WLC crashes. Decide if crash matches with the time of switchovers or sudden reload. When WLC crash happens it ought to generate a core dump or a system report. These recordsdata are saved in WLC harddisk for 9800-40/80 or in bootflash for 9800-L/CL. Use command: “dir harddisk:/core/ | i core|system-report”, “dir stby-harddisk:/core/| i core|system-report” and substitute harddisk by bootflash for 9800-L/CL.

Gladius1#dir harddisk:/core/ | i core|system-report
Listing of harddisk:/core/
3661831  -rw-         11260562  Mar 25 2022 22:07:12 +01:00  Gladius1_1_RP_0_wncd_16574_20220325-220708-CET.core.gz
3661830  -rw-            48528  Mar 25 2022 21:57:20 +01:00  Gladius1_1_RP_0-system-report_20220325-215658-CET-info.txt
3661829  -rw-        126548098  Mar 25 2022 21:57:10 +01:00  Gladius1_1_RP_0-system-report_20220325-215658-CET.tar.gz
3661828  -rw-            57191   Mar 9 2021 16:21:48 +01:00  Gladius1_1_RP_0-system-report_20210309-161907-CET-info.txt
3661827  -rw-        504311304   Mar 9 2021 16:20:51 +01:00  Gladius1_1_RP_0-system-report_20210309-161907-CET.tar.gz
3661826  -rw-         11714625  Nov 19 2020 10:35:54 +01:00  Gladius1_1_RP_0_wncd_30240_20201119-103550-CET.core.gz

Test for cores and system stories. 2xcores in wncd course of and 2xsystem-reports have occurred.

In case we observe any core dump we will establish the impacted course of by checking file identify. For instance: WLC_1_RP_0_wncd_16574_20220325-220708-CET.core.gz crash occurred in “wncd” course of, WLC_1_RP_0_dbm_14119_20201104-092800-CET.core.gz crash occurred in “dbm” course of. Open a TAC case to establish the basis reason behind the crash.

As soon as we have now verified crashes or sudden reloads, we will proceed by reviewing WLC CPU and reminiscence utilization. For CPU monitoring we have to run command a number of occasions. Detect if there are any processes exhibiting CPU above 80% constantly and never as a spike. I favor to execute the command with sorted key phrase. That manner you possibly can give attention to processes with excessive CPU first. Now we have seen circumstances the place constant excessive CPU in WNCD course of result in AP disconnections. Nonetheless, the releases 17.3.5 and 17.6.3 have obtained further hardening, with the target to guard AP CAPWAP connections in case a excessive CPU happens. Use command: “present processes cpu platform sorted | ex 0%      0%      0%”

Gladius1#present processes cpu platform sorted | ex 0%      0%      0%
CPU utilization for 5 seconds:  14%, one minute:  16%, 5 minutes:  16%
Core 0: CPU utilization for 5 seconds: 10%, one minute:  7%, 5 minutes: 11%
Core 1: CPU utilization for 5 seconds:  6%, one minute: 28%, 5 minutes: 12%
Core 2: CPU utilization for 5 seconds: 48%, one minute: 55%, 5 minutes: 68%
Core 3: CPU utilization for 5 seconds: 20%, one minute:  8%, 5 minutes: 11%
Core 4: CPU utilization for 5 seconds: 38%, one minute: 13%, 5 minutes: 17%
Core 5: CPU utilization for 5 seconds: 14%, one minute: 11%, 5 minutes: 13%
Core 6: CPU utilization for 5 seconds:  9%, one minute: 20%, 5 minutes: 23%
Core 7: CPU utilization for 5 seconds:  5%, one minute:  8%, 5 minutes: 18%
Core 8: CPU utilization for 5 seconds:  7%, one minute: 50%, 5 minutes: 34%
Core 9: CPU utilization for 5 seconds: 100%, one minute: 58%, 5 minutes: 27%
Core 10: CPU utilization for 5 seconds: 27%, one minute: 17%, 5 minutes: 25%
   Pid    PPid    5Sec    1Min    5Min  Standing        Dimension  Title                 
--------------------------------------------------------------------------------
 19056   19037     99%     99%     99%  R          7525896  wncd_0               
 21922   21913     96%     97%     99%  R           127488  smand                
 19460   19451     37%     34%     33%  R          6363828  wncd_2               
 19604   19596     18%     19%     18%  R          4556132  wncd_3

Test CPU utilization per Core and per Course of. Course of wncd_0 and smand dealing with near 100% CPU utilization

Catalyst 9800-CL and 9800-L platforms use CPU cores for knowledge forwarding. Due to this fact, it’s anticipated to see excessive CPU in ucode_pkt_PPE0. For these platforms to judge knowledge aircraft efficiency use command: “present platform {hardware} chassis energetic qfp datapath utilization | i Load”

Gladius1#present platform {hardware} chassis energetic qfp datapath utilization | i load
CPP 0: Subdev 0            5 secs        1 min        5 min       60 min
Processing: Load (pct)            4            3            4            3
Test datapath load %

Whereas checking reminiscence utilization, we have to monitor if the machine utilization is just too excessive. Subsequently, establish if there are any processes holding reminiscence and never releasing it over time (leak). Use command: “present platform assets” (fundamental), “present course of reminiscence platform sorted”, ”present processes reminiscence platform accounting” (superior)

Gladius1#present platform assets
**State Acronym: H - Wholesome, W - Warning, C - Important
Useful resource                 Utilization                 Max             Warning         Important        State
----------------------------------------------------------------------------------------------------
RP0 (okay, energetic)                                                                               H
Management Processor       0.79%                 100%            80%             90%             H
DRAM                   4839MB(15%)           31670MB         88%             93%             H
harddisk               0MB(0%)               0MB             80%             85%             H
ESP0(okay, energetic)                                                                               H
QFP                                                                                           H
TCAM                   68cells(0%)           1048576cells    65%             85%             H
DRAM                   420162KB(20%)         2097152KB       85%             95%             H
IRAM                   13738KB(10%)          131072KB        85%             95%             H
CPU Utilization        0.00%                 100%            90%             95%             H

Verify state is wholesome for metrics. Assessment Management Processor and reminiscence utilization

Gladius1#present processes reminiscence platform sorted
System reminiscence: 15869340K complete, 6152000K used, 9717340K free,
Lowest: 9717340K
Pid    Textual content      Information   Stack   Dynamic       RSS              Title
----------------------------------------------------------------------
3546  367768   1404580     136       488   1404580   linux_iosd-imag
23602   22335    449968     136      1052    449968    ucode_pkt_PPE0
24525     847    437624     136     46628    437624            wncd_0
24004     160    373176    3956      6400    373176           wncmgrd
26358     128    344868     136    136628    344868         mobilityd

Test free reminiscence out there. Determine prime processes holding extra reminiscence.

Gladius1#present processes reminiscence platform accounting
Hourly Stats
course of                 callsite_ID(bytes)  max_diff_bytes   callsite_ID(calls)  max_diff_calls   tracekey                                  timestamp(UTC)
------------------------------------------------------------------------------------------------------------------------------------------------------------
cpp_cp_svr_fp_0         2887897091          7243446          2887897092          1133             1#e4bd31e0c668be2b8786dec9fcc99486        2022-05-25 14:04
ndbmand_rp_0            3571094529          5453112          3570931712          1119             1#00c5632bf072231d06cf80b8ccc37392        2022-05-09 21:52
wncd_4_rp_0             2556049411          3059712          3028615169          227              1#9f4792f37292983824f5bb97d7e2167c        2022-05-10 14:54
wncd_0_rp_0             2556049411          1990656          3028615168          680              1#9f4792f37292983824f5bb97d7e2167c        2022-05-25 11:05
wncd_2_rp_0             2556049411          1953792          3028615169          682              1#9f4792f37292983824f5bb97d7e2167c        2022-05-13 14:01
smand_rp_0              2887895047          1491984          3028615168          89               1#eaf6dd665e73b1edeee32fb9c5ac8639        2022-05-10 14:54

Test prime processes and the variety of calls. Stats are hourly, day by day, weekly, and month-to-month.

As last controller well being test, we will do a validation of the {hardware}. Test the standing of energy provides, followers, SFPs, and temperature (just for bodily WLCs). Likewise, evaluation license standing and the appropriate variety of licenses in use. Use instructions: “present platform”, “present stock”, “present atmosphere” and “present license abstract | i Standing:”

Gladius1#present platform
Chassis kind: C9800-40-K9
Slot      Sort                State                 Insert time (in the past)
--------- ------------------- --------------------- -----------------
0         C9800-40-K9         okay                    2w5d
0/0      BUILT-IN-4X10G/1G   okay                    2w5d
R0        C9800-40-K9         okay, energetic            2w5d
F0        C9800-40-K9         okay, energetic            2w5d
P0        C9800-AC-750W-R     okay                    2w5d
P1        Unknown             empty                 by no means
P2        C9800-40-K9-FAN     okay                    2w5d

Slot      CPLD Model        Firmware Model
--------- ------------------- ---------------------------------------
0         19030712            16.10(2r)
R0        19030712            16.10(2r)
F0        19030712            16.10(2r)

Gladius1#present stock
NAME: "Chassis 1", DESCR: "Cisco C9800-40-K9 Chassis"
PID: C9800-40-K9       , VID: V03  , SN: TTM242504SR
NAME: "Chassis 1 Energy Provide Module 0", DESCR: "Cisco Catalyst 9800-40 750W AC Energy Provide Reverse Air"
PID: C9800-AC-750W-R   , VID: V01  , SN: ART2418F0GJ

NAME: "Chassis 1 Fan Tray", DESCR: "Cisco C9800-40-K9 Fan Tray"
PID: C9800-40-K9-FAN   , VID:      , SN:
NAME: "module 0", DESCR: "Cisco C9800-40-K9 Modular Interface Processor"
PID: C9800-40-K9       , VID:      , SN:
NAME: "SPA subslot 0/0", DESCR: "4-port 10G/1G multirate Ethernet Port Adapter"
PID: BUILT-IN-4X10G/1G , VID: N/A  , SN: JAE87654321
NAME: "subslot 0/0 transceiver 0", DESCR: "10GE LR"
PID: SFP-10G-LR          , VID: V02  , SN: AVD2141KCFB
NAME: "module R0", DESCR: "Cisco C9800-40-K9 Route Processor"
PID: C9800-40-K9       , VID: V03  , SN: TTM242504SR
NAME: "module F0", DESCR: "Cisco C9800-40-K9 Embedded Companies Processor"
PID: C9800-40-K9       , VID:      , SN:
NAME: "Crypto Asic F0/0", DESCR: "Asic 0 of module F0"
PID: NOT               , VID: V01  , SN: JAE242711XF

Gladius1#present atmosphere
Variety of Important alarms:  0
Variety of Main alarms:     0
Variety of Minor alarms:     0

Test energy provides, fan standing, SFPs, SPAs, and any alarms.

An instance of these Catalyst 9800 WLC KPIs serving to to establish a problem, was a customer-facing Excessive Availability setup challenge between two WLCs. By reviewing the model, and {hardware} put in in each WLCs we recognized a distinction in SPA adapters that was inflicting the WLC to not pair as HA.

Reference to different units Checks

Along with WLC well being, we will test the standing of  WLC’s connections. An important connections are mobility with different WLCs for inter-WLC roams, telemetry with DNAC/PI for monitoring and automation, and Nmsp with DNA-Areas/CMX for location providers. We have to make sure that these connections are established and dealing nice.

Verify that mobility tunnels with different WLCs are up and utilizing the appropriate encryption and MTU. And purchasers can roam or be anchored to different WLC. If tunnels are down we will discover if a problem is going on within the management tunnel (UDP port 16666), within the knowledge tunnel (UDP port 16667), or in each. Use command: “present wi-fi mobility sum”

Gladius1#sh wi-fi mobility abstract
Wi-fi Administration VLAN: 25
Wi-fi Administration IP Handle: 192.168.25.25
Mobility Management Message DSCP Worth: 48
Mobility Keepalive Interval/Depend: 10/3
Mobility Group Title: eWLC3
Mobility Multicast Ipv4 handle: 0.0.0.0
Mobility MAC Handle: 001e.f62a.46ff
Mobility Area Identifier: 0x2e47
Controllers configured within the Mobility Area:
 IP          Public Ip    MAC Handle      Group Title   Multicast IPv4    Multicast IPv6  Standing  PMTU
----------------------------------------------------------------------------------------------------------
192.168.25.25  N/A          001e.f62a.46ff   eWLC3        0.0.0.0         ::              N/A     N/A
192.168.5.35  192.168.5.35  00b0.e1f2.f480   3500-2       0.0.0.0         ::              Up     1385
192.168.25.23 192.168.25.23 706d.1535.6b0b   DAO2         0.0.0.0         :: Management And Information Path Down
192.168.25.33 192.168.25.33 f4bd.9e57.ff6b   5500         0.0.0.0         ::              Up     1005

Test for mobility down and low PMTU.

If we have now DNAC for Assurance or Provision we will affirm that DNAC Netconf connection is established. Afterward confirm telemetry statistics for WLC, APs, and purchasers are up to date in DNAC.  Use command: “present telemetry inside connection”. After 17.7 this command have been changed by “present telemetry connection all”

Gladius2#present telemetry inside connection
Load for 5 secs: 29%/5%; one minute: 4%; 5 minutes: 2%
Time supply is NTP, 10:21:45.942 CET Wed Nov 4 2020
Telemetry connections
Index Peer Handle               Port  VRF Supply Handle             State
----- -------------------------- ----- --- -------------------------- ----------
    1 192.168.0.105              25103   0 192.168.25.42              Lively

Test for telemetry state

In case we’re utilizing DNA-Areas for location. Firstly, we will affirm Nmsp connection standing, and the variety of packets transmitted and obtained. Secondly, checklist of purchasers in WLC probing database. And lastly, the shopper location is up to date in DNA-Areas. Use command “present nmsp standing”

Gladius1#present nmsp standing
NMSP Standing
-----------
DNA Areas/CMX IP Handle  Lively    Tx Echo Resp  Rx Echo Req   Tx Information     Rx Information     Transport
----------------------------------------------------------------------------------------------------------
192.168.0.65                  Lively    693870        693870        16833737    181084      TLS      
192.168.0.66                  Inactive  21            21            222         7           TLS

Test for inactive servers, mismatch between echo tx/rx

With supplied checks, we will proactively monitor the well being of our 9800 WLC and reference to different units like CMX/DNA-Areas, different WLCs, and DNAC. Within the subsequent weblog, we are going to share KPIs to watch APs and RF.

Record of instructions to make use of for KPIs and automation scripts

Within the doc under, there may be additionally a hyperlink to a script that may routinely acquire all of the instructions. It can acquire instructions based mostly on platform and launch, save them in a file, and export the file. The script is utilizing the “Visitor-shell” characteristic that for now could be solely out there in bodily WLCs 9800-40/80 and 9800-L.

The doc additionally supplies an instance of EEM script to gather logs periodically. In conclusion, EEM together with “Visitor-shell” script will assist to gather 9800 WLC KPIs and have a baseline in your Catalyst 9800 WLC.

For the checklist of instructions used to watch these KPIs

Share:

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments