Half 1 of the 3-part Wi-fi Catalyst 9800 WLC KPIs
When working in essential wi-fi infrastructures, it is very important be proactive and decide prematurely if there may be any potential challenge that might affect end-clients expertise. Wi-fi Catalyst 9800 WLC KPIs will assist in that activity.
On this weblog, I’ll share a scientific strategy plus a listing of instructions that I’ve used whereas offering help on the NOC for one of many largest worldwide wi-fi occasions. The concept behind is to maintain an in depth eye on the right way to monitor Key Efficiency Indicators (KPIs) for Catalyst 9800 WLC.
KPIs outputs could be collected periodically to create a baseline when a community is working nice. Due to this fact, making it simpler later to seek out any deviation by evaluating new outputs with beforehand collected ones.
I’ve divided WLC KPIs into six completely different buckets or areas:
- WLC checks
- Reference to different units
- AP checks
- RF checks
- Shopper checks
- Packet Drops
KPIs will assist us to identify points in any of the talked about six areas. On this weblog, I’ve included WLC checks and Connections with different units. Moreover, there shall be two extra blogs the place I’ll share AP checks, RF checks, Shopper checks, and Packet Drops.
WLC checks
I often begin by checking the WLC first, since it’s the most important half. If any points are seen within the controller, they are going to cascade shortly after as issues with APs and purchasers. In different phrases, the thought right here is to carry out top-down standards.
Whereas reviewing the well being state of the WLC, I’d first affirm that WLC is working the meant model and in set up mode. Set up mode will make sure that the controller will boot quicker, with a decreased reminiscence footprint. After that, I’d test the uptime of the WLC to see if any reload has occurred. Use the command: “present model | i uptime|Set up mode|Cisco IOS Software program”
Gladius1#present model | i uptime|Set up mode|Cisco IOS Software program Cisco IOS Software program [Amsterdam], C9800 Software program (C9800_IOSXE-K9), Model 17.3.5a, RELEASE SOFTWARE (fc2) Gladius1 uptime is 2 weeks, 5 days, 21 hours, half-hour Set up mode is INSTALL
Test anticipated launch, uptime, and WLC working in set up mode.
For Catalyst 9800 WLC deployed in Excessive Availability, which by the way in which, is very advisable for essential deployments, we have to first confirm that the HA pair stack is fashioned and in a standby-hot state. Secondly, test the stack uptime and every of the member’s particular person uptime. Thirdly, establish quite a lot of switchovers between energetic and standby. Use the command: “present redundancy | i ptime|Location|Present Software program state|Switchovers”.
Gladius1#present redundancy | i ptime|Location|Present Software program state|Switchovers Obtainable system uptime = 2 weeks, 1 day, 2 hours, 48 minutes Switchovers system skilled = 1 Lively Location = slot 1 Present Software program state = ACTIVE Uptime in present state = 7 hours, 10 minutes Standby Location = slot 2 Present Software program state = STANDBY HOT Uptime in present state = 7 hours, 4 minutes
Test stack uptime, variety of switchovers, and uptime for members. Switchover occurred 7 hours in the past. Slot1 is new energetic and Slot2 reloaded.
In HA deployments, the advice is to make use of RMI characteristic. This can enable monitoring energetic and standby by Wi-fi Administration Interface (WMI) and Redundancy Port (RP). After that, we should always allow Default-gateway Test to verify that each energetic and standby can attain the gateway. Here’s a hyperlink to the 9800 Excessive Availability deployment information.
The subsequent step shall be to test if there are any WLC crashes. Decide if crash matches with the time of switchovers or sudden reload. When WLC crash happens it ought to generate a core dump or a system report. These recordsdata are saved in WLC harddisk for 9800-40/80 or in bootflash for 9800-L/CL. Use command: “dir harddisk:/core/ | i core|system-report”, “dir stby-harddisk:/core/| i core|system-report” and substitute harddisk by bootflash for 9800-L/CL.
Gladius1#dir harddisk:/core/ | i core|system-report Listing of harddisk:/core/ 3661831 -rw- 11260562 Mar 25 2022 22:07:12 +01:00 Gladius1_1_RP_0_wncd_16574_20220325-220708-CET.core.gz 3661830 -rw- 48528 Mar 25 2022 21:57:20 +01:00 Gladius1_1_RP_0-system-report_20220325-215658-CET-info.txt 3661829 -rw- 126548098 Mar 25 2022 21:57:10 +01:00 Gladius1_1_RP_0-system-report_20220325-215658-CET.tar.gz 3661828 -rw- 57191 Mar 9 2021 16:21:48 +01:00 Gladius1_1_RP_0-system-report_20210309-161907-CET-info.txt 3661827 -rw- 504311304 Mar 9 2021 16:20:51 +01:00 Gladius1_1_RP_0-system-report_20210309-161907-CET.tar.gz 3661826 -rw- 11714625 Nov 19 2020 10:35:54 +01:00 Gladius1_1_RP_0_wncd_30240_20201119-103550-CET.core.gz
Test for cores and system stories. 2xcores in wncd course of and 2xsystem-reports have occurred.
In case we observe any core dump we will establish the impacted course of by checking file identify. For instance: WLC_1_RP_0_wncd_16574_20220325-220708-CET.core.gz crash occurred in “wncd” course of, WLC_1_RP_0_dbm_14119_20201104-092800-CET.core.gz crash occurred in “dbm” course of. Open a TAC case to establish the basis reason behind the crash.
As soon as we have now verified crashes or sudden reloads, we will proceed by reviewing WLC CPU and reminiscence utilization. For CPU monitoring we have to run command a number of occasions. Detect if there are any processes exhibiting CPU above 80% constantly and never as a spike. I favor to execute the command with sorted key phrase. That manner you possibly can give attention to processes with excessive CPU first. Now we have seen circumstances the place constant excessive CPU in WNCD course of result in AP disconnections. Nonetheless, the releases 17.3.5 and 17.6.3 have obtained further hardening, with the target to guard AP CAPWAP connections in case a excessive CPU happens. Use command: “present processes cpu platform sorted | ex 0% 0% 0%”
Gladius1#present processes cpu platform sorted | ex 0% 0% 0% CPU utilization for 5 seconds: 14%, one minute: 16%, 5 minutes: 16% Core 0: CPU utilization for 5 seconds: 10%, one minute: 7%, 5 minutes: 11% Core 1: CPU utilization for 5 seconds: 6%, one minute: 28%, 5 minutes: 12% Core 2: CPU utilization for 5 seconds: 48%, one minute: 55%, 5 minutes: 68% Core 3: CPU utilization for 5 seconds: 20%, one minute: 8%, 5 minutes: 11% Core 4: CPU utilization for 5 seconds: 38%, one minute: 13%, 5 minutes: 17% Core 5: CPU utilization for 5 seconds: 14%, one minute: 11%, 5 minutes: 13% Core 6: CPU utilization for 5 seconds: 9%, one minute: 20%, 5 minutes: 23% Core 7: CPU utilization for 5 seconds: 5%, one minute: 8%, 5 minutes: 18% Core 8: CPU utilization for 5 seconds: 7%, one minute: 50%, 5 minutes: 34% Core 9: CPU utilization for 5 seconds: 100%, one minute: 58%, 5 minutes: 27% Core 10: CPU utilization for 5 seconds: 27%, one minute: 17%, 5 minutes: 25% Pid PPid 5Sec 1Min 5Min Standing Dimension Title -------------------------------------------------------------------------------- 19056 19037 99% 99% 99% R 7525896 wncd_0 21922 21913 96% 97% 99% R 127488 smand 19460 19451 37% 34% 33% R 6363828 wncd_2 19604 19596 18% 19% 18% R 4556132 wncd_3
Test CPU utilization per Core and per Course of. Course of wncd_0 and smand dealing with near 100% CPU utilization
Catalyst 9800-CL and 9800-L platforms use CPU cores for knowledge forwarding. Due to this fact, it’s anticipated to see excessive CPU in ucode_pkt_PPE0. For these platforms to judge knowledge aircraft efficiency use command: “present platform {hardware} chassis energetic qfp datapath utilization | i Load”
Gladius1#present platform {hardware} chassis energetic qfp datapath utilization | i load CPP 0: Subdev 0 5 secs 1 min 5 min 60 min Processing: Load (pct) 4 3 4 3 Test datapath load %
Whereas checking reminiscence utilization, we have to monitor if the machine utilization is just too excessive. Subsequently, establish if there are any processes holding reminiscence and never releasing it over time (leak). Use command: “present platform assets” (fundamental), “present course of reminiscence platform sorted”, ”present processes reminiscence platform accounting” (superior)
Gladius1#present platform assets **State Acronym: H - Wholesome, W - Warning, C - Important Useful resource Utilization Max Warning Important State ---------------------------------------------------------------------------------------------------- RP0 (okay, energetic) H Management Processor 0.79% 100% 80% 90% H DRAM 4839MB(15%) 31670MB 88% 93% H harddisk 0MB(0%) 0MB 80% 85% H ESP0(okay, energetic) H QFP H TCAM 68cells(0%) 1048576cells 65% 85% H DRAM 420162KB(20%) 2097152KB 85% 95% H IRAM 13738KB(10%) 131072KB 85% 95% H CPU Utilization 0.00% 100% 90% 95% H
Verify state is wholesome for metrics. Assessment Management Processor and reminiscence utilization
Gladius1#present processes reminiscence platform sorted System reminiscence: 15869340K complete, 6152000K used, 9717340K free, Lowest: 9717340K Pid Textual content Information Stack Dynamic RSS Title ---------------------------------------------------------------------- 3546 367768 1404580 136 488 1404580 linux_iosd-imag 23602 22335 449968 136 1052 449968 ucode_pkt_PPE0 24525 847 437624 136 46628 437624 wncd_0 24004 160 373176 3956 6400 373176 wncmgrd 26358 128 344868 136 136628 344868 mobilityd
Test free reminiscence out there. Determine prime processes holding extra reminiscence.
Gladius1#present processes reminiscence platform accounting Hourly Stats course of callsite_ID(bytes) max_diff_bytes callsite_ID(calls) max_diff_calls tracekey timestamp(UTC) ------------------------------------------------------------------------------------------------------------------------------------------------------------ cpp_cp_svr_fp_0 2887897091 7243446 2887897092 1133 1#e4bd31e0c668be2b8786dec9fcc99486 2022-05-25 14:04 ndbmand_rp_0 3571094529 5453112 3570931712 1119 1#00c5632bf072231d06cf80b8ccc37392 2022-05-09 21:52 wncd_4_rp_0 2556049411 3059712 3028615169 227 1#9f4792f37292983824f5bb97d7e2167c 2022-05-10 14:54 wncd_0_rp_0 2556049411 1990656 3028615168 680 1#9f4792f37292983824f5bb97d7e2167c 2022-05-25 11:05 wncd_2_rp_0 2556049411 1953792 3028615169 682 1#9f4792f37292983824f5bb97d7e2167c 2022-05-13 14:01 smand_rp_0 2887895047 1491984 3028615168 89 1#eaf6dd665e73b1edeee32fb9c5ac8639 2022-05-10 14:54
Test prime processes and the variety of calls. Stats are hourly, day by day, weekly, and month-to-month.
As last controller well being test, we will do a validation of the {hardware}. Test the standing of energy provides, followers, SFPs, and temperature (just for bodily WLCs). Likewise, evaluation license standing and the appropriate variety of licenses in use. Use instructions: “present platform”, “present stock”, “present atmosphere” and “present license abstract | i Standing:”
Gladius1#present platform Chassis kind: C9800-40-K9 Slot Sort State Insert time (in the past) --------- ------------------- --------------------- ----------------- 0 C9800-40-K9 okay 2w5d 0/0 BUILT-IN-4X10G/1G okay 2w5d R0 C9800-40-K9 okay, energetic 2w5d F0 C9800-40-K9 okay, energetic 2w5d P0 C9800-AC-750W-R okay 2w5d P1 Unknown empty by no means P2 C9800-40-K9-FAN okay 2w5d Slot CPLD Model Firmware Model --------- ------------------- --------------------------------------- 0 19030712 16.10(2r) R0 19030712 16.10(2r) F0 19030712 16.10(2r) Gladius1#present stock NAME: "Chassis 1", DESCR: "Cisco C9800-40-K9 Chassis" PID: C9800-40-K9 , VID: V03 , SN: TTM242504SR NAME: "Chassis 1 Energy Provide Module 0", DESCR: "Cisco Catalyst 9800-40 750W AC Energy Provide Reverse Air" PID: C9800-AC-750W-R , VID: V01 , SN: ART2418F0GJ NAME: "Chassis 1 Fan Tray", DESCR: "Cisco C9800-40-K9 Fan Tray" PID: C9800-40-K9-FAN , VID: , SN: NAME: "module 0", DESCR: "Cisco C9800-40-K9 Modular Interface Processor" PID: C9800-40-K9 , VID: , SN: NAME: "SPA subslot 0/0", DESCR: "4-port 10G/1G multirate Ethernet Port Adapter" PID: BUILT-IN-4X10G/1G , VID: N/A , SN: JAE87654321 NAME: "subslot 0/0 transceiver 0", DESCR: "10GE LR" PID: SFP-10G-LR , VID: V02 , SN: AVD2141KCFB NAME: "module R0", DESCR: "Cisco C9800-40-K9 Route Processor" PID: C9800-40-K9 , VID: V03 , SN: TTM242504SR NAME: "module F0", DESCR: "Cisco C9800-40-K9 Embedded Companies Processor" PID: C9800-40-K9 , VID: , SN: NAME: "Crypto Asic F0/0", DESCR: "Asic 0 of module F0" PID: NOT , VID: V01 , SN: JAE242711XF Gladius1#present atmosphere Variety of Important alarms: 0 Variety of Main alarms: 0 Variety of Minor alarms: 0
Test energy provides, fan standing, SFPs, SPAs, and any alarms.
An instance of these Catalyst 9800 WLC KPIs serving to to establish a problem, was a customer-facing Excessive Availability setup challenge between two WLCs. By reviewing the model, and {hardware} put in in each WLCs we recognized a distinction in SPA adapters that was inflicting the WLC to not pair as HA.
Reference to different units Checks
Along with WLC well being, we will test the standing of WLC’s connections. An important connections are mobility with different WLCs for inter-WLC roams, telemetry with DNAC/PI for monitoring and automation, and Nmsp with DNA-Areas/CMX for location providers. We have to make sure that these connections are established and dealing nice.
Verify that mobility tunnels with different WLCs are up and utilizing the appropriate encryption and MTU. And purchasers can roam or be anchored to different WLC. If tunnels are down we will discover if a problem is going on within the management tunnel (UDP port 16666), within the knowledge tunnel (UDP port 16667), or in each. Use command: “present wi-fi mobility sum”
Gladius1#sh wi-fi mobility abstract Wi-fi Administration VLAN: 25 Wi-fi Administration IP Handle: 192.168.25.25 Mobility Management Message DSCP Worth: 48 Mobility Keepalive Interval/Depend: 10/3 Mobility Group Title: eWLC3 Mobility Multicast Ipv4 handle: 0.0.0.0 Mobility MAC Handle: 001e.f62a.46ff Mobility Area Identifier: 0x2e47 Controllers configured within the Mobility Area: IP Public Ip MAC Handle Group Title Multicast IPv4 Multicast IPv6 Standing PMTU ---------------------------------------------------------------------------------------------------------- 192.168.25.25 N/A 001e.f62a.46ff eWLC3 0.0.0.0 :: N/A N/A 192.168.5.35 192.168.5.35 00b0.e1f2.f480 3500-2 0.0.0.0 :: Up 1385 192.168.25.23 192.168.25.23 706d.1535.6b0b DAO2 0.0.0.0 :: Management And Information Path Down 192.168.25.33 192.168.25.33 f4bd.9e57.ff6b 5500 0.0.0.0 :: Up 1005
Test for mobility down and low PMTU.
If we have now DNAC for Assurance or Provision we will affirm that DNAC Netconf connection is established. Afterward confirm telemetry statistics for WLC, APs, and purchasers are up to date in DNAC. Use command: “present telemetry inside connection”. After 17.7 this command have been changed by “present telemetry connection all”
Gladius2#present telemetry inside connection Load for 5 secs: 29%/5%; one minute: 4%; 5 minutes: 2% Time supply is NTP, 10:21:45.942 CET Wed Nov 4 2020 Telemetry connections Index Peer Handle Port VRF Supply Handle State ----- -------------------------- ----- --- -------------------------- ---------- 1 192.168.0.105 25103 0 192.168.25.42 Lively
Test for telemetry state
In case we’re utilizing DNA-Areas for location. Firstly, we will affirm Nmsp connection standing, and the variety of packets transmitted and obtained. Secondly, checklist of purchasers in WLC probing database. And lastly, the shopper location is up to date in DNA-Areas. Use command “present nmsp standing”
Gladius1#present nmsp standing NMSP Standing ----------- DNA Areas/CMX IP Handle Lively Tx Echo Resp Rx Echo Req Tx Information Rx Information Transport ---------------------------------------------------------------------------------------------------------- 192.168.0.65 Lively 693870 693870 16833737 181084 TLS 192.168.0.66 Inactive 21 21 222 7 TLS
Test for inactive servers, mismatch between echo tx/rx
With supplied checks, we will proactively monitor the well being of our 9800 WLC and reference to different units like CMX/DNA-Areas, different WLCs, and DNAC. Within the subsequent weblog, we are going to share KPIs to watch APs and RF.
Record of instructions to make use of for KPIs and automation scripts
Within the doc under, there may be additionally a hyperlink to a script that may routinely acquire all of the instructions. It can acquire instructions based mostly on platform and launch, save them in a file, and export the file. The script is utilizing the “Visitor-shell” characteristic that for now could be solely out there in bodily WLCs 9800-40/80 and 9800-L.
The doc additionally supplies an instance of EEM script to gather logs periodically. In conclusion, EEM together with “Visitor-shell” script will assist to gather 9800 WLC KPIs and have a baseline in your Catalyst 9800 WLC.
For the checklist of instructions used to watch these KPIs
Share: