ASR 1000 Series Router Memory Troubleshoot Guide Document ID: 116777 Contributed by Vishnu Asok and Girish Devgan, Cisco TAC Engineers. Nov 19, 2013
Contents Introduction Prerequisites Requirements Components Used ASR Memory Layout Overview Memory Allocation under the lsmpi_io pool Memory Usage Verify Memory Usage on IOS−XE Verify Memory Usage on IOSd Verify TCAM Utilization on an ASR1K Verify Memory Utilization on QFP
Introduction This document describes how to check system memory and troubleshoot memory issues on Cisco 1000 Series Aggregation Services Routers (ASR1K).
Prerequisites Requirements Cisco recommends that you have basic knowledge of these topics: • Cisco IOS−XE software • ASR CLI Note: You might need a special license in order to log in to the Linux shell on the ASR 1001 Series router.
Components Used The information in this document is based on these software and hardware versions: • All ASR1K platforms • All Cisco IOS−XE software releases that support the ASR1K platform The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
ASR Memory Layout Overview With most of the previous Cisco router platforms, the majority of the internal software processes are run with the Cisco IOS® (IOS) memory. The ASR1K platform introduces a distributed software architecture that moves many Operating System (OS) responsibilities out of the IOS process. In this architecture, IOS, which was previously responsible for almost all of the internal software processes, now runs as one of many Linux processes. This allows other Linux processes to share responsibility for the operation of the router. The ASR1K runs IOS−XE, not the traditional IOS. In IOS−XE, a Linux component runs the kernel, and the IOS runs as a daemon, which hereafter is referred as IOSd (IOS−Daemon). This creates a requirement that the memory be split between the Linux kernel and the IOSd instance. The memory that is split between IOSd and the rest of the system is fixed at startup and cannot be modified. For a 4−GB system, IOSd is allocated approximately 2 GB, and for a 8−GB system, the IOSd is allocated approximately 3.8 GB (with software redundancy disabled). Since the ASR1K has a 64−bit architecture, any pointer that is in every data structure in the system consumes double the amount of memory when compared to the consumption of a traditional single−CPU router (8 bytes instead of 4 bytes). The 64−bit addressing enables IOS to overcome the 2−GB addressable memory limitation of IOS, which allows it to scale to millions of routes. Note: Ensure that you have sufficient memory available before you activate any new features. Cisco recommends that you have at least 8 GB DRAM if you receive the entire Border Gateway Protocol (BGP) routing table when software redundancy is enabled in order to prevent memory exhaustion.
Memory Allocation under the lsmpi_io pool The Linux Shared Memory Punt Interface (LSMPI) memory pool is used in order to transfer packets from the forwarding processor to the route processor. This memory pool is carved at router initialization into preallocated buffers, as opposed to the processor pool, where IOS−XE allocates memory blocks dynamically. On the ASR1K platform, the lsmpi_io pool has little free memory − generally less than 1000 bytes − which is normal. Cisco recommends that you disable monitoring of the LSMPI pool by the network management applications in order to avoid false alarms.
ASR1000# show memory statistics Head
Total(b)
Used(b)
Free(b)
Lowest(b)
Largest(b)
Processor
2C073008
1820510884
173985240
1646525644
1614827804
1646234064
lsmpi_io
996481D0
6295088
6294120
968
968
968
If there are any issues in the LSMPI path, the Device xmit fail counter appears to increment in this command output (some output omitted):
ASR1000−1# show platform software infrastructure lsmpi driver LSMPI Driver stat ver: 3 Packets:
In: 674572 Out: 259861 Rings: RX: 2047 free
0
in−use
2048 total
TX: 2047 free
0
in−use
2048 total
RXDONE: 2047 free
0
in−use
2048 total
TXDONE: 2047 free
0
in−use
2048 total
473
in−use
8194 total
Buffers: RX: 7721 free
Reason for RX drops (sticky): Ring full
: 0
Ring put failed
: 0
No free buffer
: 0
Receive failed
: 0
Packet too large : 0 Other inst buf
: 0
Consecutive SOPs : 0 No SOP or EOP
: 0
EOP but no SOP
: 0
Particle overrun : 0 Bad particle ins : 0 Bad buf cond : 0 DS rd req failed : 0 HT rd req failed : 0 Reason for TX drops (sticky): Bad packet len
: 0
Bad buf len
: 0
Bad ifindex
: 0
No device
: 0
No skbuff
: 0
Device xmit fail : 0 Device xmit rtry : 0 Tx Done ringfull : 0
Bad u−>k xlation : 0 No extra skbuff
: 0
Memory Usage The control CPUs in the ASR1K chassis, such as the Route Processor (RP), the Embedded Switch Processor (ESP), and the Shared Port Adapter (SPA) Interface Processor (SIP), run IOS−XE software. This OS software consists of a Linux−based kernel and a common set of OS−level utility programs, which includes Cisco IOS software that runs as a user process on the RP card. Within IOS−XE, each child process operates in protected memory under each line card Linux kernel and embedded memory.
Verify Memory Usage on IOS−XE Enter the show platform software status control−processor brief command in order to monitor the memory usage on the RP, the ESP, and the SIP. The system state must be identical, in regards to aspects such as the feature configuration and traffic, while you compare the memory usage.
ASR1K# show platform software status control−processor brief Memory (kB) Slot
Status
Total
Used (Pct)
Free (Pct)
Committed (Pct)
RP0
Healthy
3907744
1835628 (47%)
2072116 (53%)
2614788 (67%)
ESP0
Healthy
2042668
789764 (39%)
1252904 (61%)
3108376 (152%)
SIP0
Healthy
482544
341004 (71%)
141540 (29%)
367956 (76%)
SIP1
Healthy
482544
315484 (65%)
167060 (35%)
312216 (65%)
Note: Committed memory is an estimate of how much RAM you need in order to guarantee that the system is never Out of Memory (OOM) for this workload. Normally, the kernel overcommits memory. For example, when you run a 1−GB malloc, nothing really happens; you only receive true memory−on−demand when you begin to use that allocated memory, and only as much as you use.
Each processor listed in the previous output might report the status as Healthy, Warning, or Critical, which is dependent upon the amount of free memory. If any of the processors display the status as Warning or Critical, enter the monitor platform software process command in order to identify the top contributor.
BGL.J.16−ASR1000−4# monitor platform software process ? 0
SPA−Inter−Processor slot 0
1
SPA−Inter−Processor slot 1
F0
Embedded−Service−Processor slot 0
F1
Embedded−Service−Processor slot 1
FP R0
Embedded−Service−Processor Route−Processor slot 0
R1
Route−Processor slot 1
RP
Route−Processor
You might be prompted to set the terminal−type before you can execute the monitor platform software process command:
BGL.J.16−ASR1000−4# monitor platform software process r0 Terminal type 'network' unsupported for command Change the terminal type with the 'terminal terminal−type' command.
The terminal type is set to network by default. In order to set the appropriate terminal type, enter the terminal terminal−type command:
ASR1000# terminal terminal−type vt100
Once the correct terminal type is configured, you can enter the monitor platform software process command (some output omitted):
ASR1000# monitor platform software process r0 top − 00:34:59 up Tasks: 136 total, Cpu(s): Mem:
0.8%us,
5:02,
0 users,
load average: 2.43, 1.52, 0.73
4 running, 132 sleeping, 2.3%sy,
0.0%ni, 96.8%id,
0 stopped, 0.0%wa,
0.0%hi,
2009852k total,
1811024k used,
198828k free,
0k total,
0k used,
0k free,
Swap: PID USER
PR
NI
VIRT
RES
SHR S %CPU %MEM
0 zombie 0.0%si,
0.0%st
135976k buffers 1133544k cached
TIME+
COMMAND
25956 root
20
0
928m 441m 152m R
1.2 22.5
4:21.32 linux_iosd−imag
29074 root
20
0
106m
0.0
0:14.86 smand
95m 6388 S
4.9
24027 root
20
0
114m
61m
55m S
0.0
3.1
0:05.07 fman_rp
25227 root
20
0 27096
13m
12m S
0.0
0.7
0:04.35 imand
23174 root
20
0 33760
11m 9152 S
1.0
0.6
1:58.00 cmand
23489 root
20
0 23988 7372 4952 S
0.2
0.4
0:05.28 emd
24755 root
20
0 19708 6820 4472 S
1.0
0.3
3:39.33 hman
28475 root
20
0 20460 6448 4792 S
0.0
0.3
0:00.26 psd
27957 root
20
0 16688 5668 3300 S
0.0
0.3
0:00.18 plogd
14572 root
20
0
0.0
0.1
0:02.37 reflector.sh
4576 2932 1308 S
Note: In order to sort the output in descending order of memory usage, press Shift + M.
Warning: Open a Cisco Technical Assistance Center (TAC) case if any of the processors report a Critical or Warning status, and you need assistance in order to identify the cause.
Verify Memory Usage on IOSd If you notice that the linux_iosd−imag process holds an unusually large amount of memory in the monitor platform software process rp active command output, focus your troubleshoot efforts on the IOSd instance. It is likely that a specific process in the IOSd thread is not freeing the memory. Troubleshoot memory related issues in the IOSd pool the same way that you troubleshoot any software−based forwarding platform, such as the Cisco 2800, 3800, or 3900 Series platforms. ASR1000# monitor platform software process rp active PID
USER
PR
NI VIRT
25794 root
20
23038 root 9599
root
RES
SHR
S
%CPU
%MEM TIME+
COMMAND
0
2929m 1.9g 155m R
99.9
38.9 1415:11
linux_iosd−imag
20
0
33848 13m
20
0
2648
10m
S
5.9
0.4
30:53.87 cmand
1152 884
R
2.0
0.0
0:00.01
top
Enter the show process memory sorted command in order to identify the problem process:
ASR1000# show process memory sorted Processor Pool Total: 1733568032 lsmpi_io Pool Total: 6295088
Used: 1261854564
Used: 6294116
Free: 471713468
Free: 972
PID
TTY
Allocated
Freed
Holding
Getbufs
Retbufs
Process
522
0
1587708188
803356800
724777608
54432
0
BGP Router
234
0
3834576340
2644349464
232401568
286163388
15876
IP RIB Update
0
0
263244344
36307492
215384208
0
0
*Init
Note: Open a TAC case if you require assistance in order to troubleshoot or identify if the memory usage is legitimate.
Verify TCAM Utilization on an ASR1K Traffic classification is one of the most basic functions found in routers and switches. Many applications and features require that the infrastructure devices provide these differentiated services for different users based on quality requirements or features based on classification requirements. The traffic classification process should be quick, so that the throughput of the device is not greatly degraded. The ASR1K platform uses the 4th generation of Ternary Content Addressable Memory (TCAM4) for this purpose. In order to determine the total number of TCAM cells available on the platform, and the number of free entries that remain, enter this command:
ASR1000# show platform hardware qfp active tcam resource−manager usage Total TCAM Cell Usage Information −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Name
: TCAM #0 on CPP #0
Total number of regions
: 3
Total tcam used cell entries : 65528 Total tcam free cell entries : 30422 Threshold status
: below critical limit
Note: Cisco recommends that you always check the threshold status before you make any changes to Access−lists or Quality of Service (QoS) policies, so that the TCAM has sufficient free cells available in order to program the entries.
If the forwarding processor runs critically low on free TCAM cells, the ESP might generate logs similar to these, and then crash, which causes the traffic forwarding to stop (if there is no redundancy):
%CPPTCAMRM−6−TCAM_RSRC_ERR: SIP0: cpp_sp: Allocation failed because of insufficient TCAM resources in the system.
%CPPOSLIB−3−ERROR_NOTIFY: SIP0: cpp_sp: cpp_sp encountered an error −Traceback= 1#d7f63914d8ef12b8456826243f3b60d7 errmsg:7EFFC525C000+1175 cpp_common_os:7EFFC8D20000+D1E5 cpp_common_os:7EFFC8D20000+D12E
Verify Memory Utilization on QFP In addition to the physical memory, there is also memory attached to the Quantum Flow Processor (QFP) ASIC that is used in order to forward data structures, which includes data such as Forwarding Information Base (FIB) and QoS policies. The amount of DRAM available for the QFP ASIC is fixed, with ranges of 256 MB, 512 MB and 1 GB, dependent upon the ESP module. Enter the show platform hardware qfp active infrastructure exmem statistics command in order to determine the exmem memory usage. The sum of the memory for IRAM and DRAM that is used gives the total QFP memory that is in use.
BGL.I.05−ASR1000−1# show platform hardware qfp active infra exmem statistics user Type: Name: IRAM, CPP: 0 Allocations
Bytes−Alloc
Bytes−Total
User−Name
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 1
115200
115712
CPP_FIA
Bytes−Total
User−Name
Type: Name: DRAM, CPP: 0 Allocations
Bytes−Alloc
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 4
1344
4096
P/I
9
270600
276480
CEF
1
1138256
1138688
QM RM
1
4194304
4194304
TCAM
1
65536
65536
Qm 16
3
15745024
15745024
ING_EGR_UIDB
The IRAM is the instruction memory for QFP software. In the event that DRAM is exhausted, available IRAM can be used. If the IRAM runs critically low on memory, you might see this error message:
%QFPOOR−4−LOWRSRC_PERCENT: F1: cpp_ha: − 97 percent depleted
QFP 0 IRAM resource low
%QFPOOR−4−LOWRSRC_PERCENT: F1: cpp_ha: − 98 percent depleted
QFP 0 IRAM resource low
In order to determine the process that consumes most of the memory, enter the show platform hardware qfp active infra exmem statistics use command:
ASR1000# show platform hardware qfp active infra exmem statistics user Type: Name: IRAM, CPP: 0 Allocations
Bytes−Alloc
Bytes−Total
User−Name
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 1
115200
115712
CPP_FIA
Type: Name: DRAM, CPP: 0 Allocations
Bytes−Alloc
Bytes−Total
User−Name
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 4
1344
4096
P/I
9
270600
276480
CEF
1
1138256
1138688
QM RM
1
4194304
4194304
TCAM
1
65536
65536
Qm 16
3
15745024
15745024
ING_EGR_UIDB
Once you identify the feature that holds most of the memory, collect the output from the show platform hardware qfp active feature command, and contact the Cisco TAC in order to determine the root cause. Updated: Nov 19, 2013
Document ID: 116777