千家信息网

如何在x86虚拟机上使用ramoops和kdump记录内核crash信息

发表于:2025-01-20 作者:千家信息网编辑
千家信息网最后更新 2025年01月20日,这篇文章主要介绍"如何在x86虚拟机上使用ramoops和kdump记录内核crash信息",在日常操作中,相信很多人在如何在x86虚拟机上使用ramoops和kdump记录内核crash信息问题上存
千家信息网最后更新 2025年01月20日如何在x86虚拟机上使用ramoops和kdump记录内核crash信息

这篇文章主要介绍"如何在x86虚拟机上使用ramoops和kdump记录内核crash信息",在日常操作中,相信很多人在如何在x86虚拟机上使用ramoops和kdump记录内核crash信息问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"如何在x86虚拟机上使用ramoops和kdump记录内核crash信息"的疑惑有所帮助!接下来,请跟着小编一起来学习吧!

1.ramoops

ramoops是一个oops/panic记录器(logger),它能够在系统崩溃前将日志信息记录到RAM中。ramoops需要一个带有持久的(persistent)RAM,因此这些内存区域中的内容在重启后能够保留。

ramoops能够以模块的形式编译,为了方便,我直接编进内核。需要开启的配置宏如下:

CONFIG_PSTORE=yCONFIG_PSTORE_CONSOLE=yCONFIG_PSTORE_RAM=y# CONFIG_PSTORE_PMSG is not set 推荐开启# CONFIG_PSTORE_FTRACE is not set 推荐开启

使用VirtualBox,即使配置正确下,内核crash后也不会在/sys/fs/pstore目录下生成相应的记录文件(这个非常坑),因此我在VMware上搭建环境。配置2G的内存。根据官方文档,有三种使用ramoops的方法:ramoops官方文档,我们之间使用第一种,即通过向kernel传递启动参数的方式:修改/boot/grub/grub.cfg文件,增加如下参数:

mem=1920M ramoops.mem_address=0x78000000 ramoops.mem_size =0x4000000 ramoops.dump_oops=1 ramoops.ecc=1

其中,mem是给内核使用的大小,mem_address表示ramoops使用的起始内存,mem_size 表示这个预留内存的大小,dump_oops=1表示oopses和panics均记录,ecc=1表示ECC-protected,具体可见官方文档。在配置好启动参数后,重启生效。

系统重启后,我们故意让内核crash,执行如下命令:

echo c > /proc/sysrq-trigger

系统就会crash,之后就会重启。进入shell下,查看/sys/fs/pstore下,会有相应的记录文件。当前是console-ramoops-0和dmesg-ramoops-1文件,内容如下:

console-ramoops-0[  120.719799] CPU: 0 PID: 7475 Comm: sh Tainted: G           O    4.9.166+ #2[  120.719942] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017[  120.720192] task: ffff96770c862240 task.stack: ffffba30c4a80000[  120.720324] RIP: 0010:[]  [] sysrq_handle_crash+0x12/0x20[  120.720533] RSP: 0018:ffffba30c4a83e78  EFLAGS: 00010282[  120.720680] RAX: 000000000000000f RBX: 0000000000000063 RCX: 0000000000000000[  120.720819] RDX: 0000000000000000 RSI: ffff96771ba10648 RDI: 0000000000000063[  120.720962] RBP: ffffffffad0bffc0 R08: 0000000000000001 R09: 0000000000059284[  120.721131] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000004[  120.721278] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000[  120.721421] FS:  0000000001ea0880(0000) GS:ffff96771ba00000(0000) knlGS:0000000000000000[  120.721618] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033[  120.721745] CR2: 0000000000000000 CR3: 000000004cb84000 CR4: 0000000000360670[  120.721911] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000[  120.722108] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400[  120.722256] Stack:[  120.722331]  ffffffffac832e31 0000000000000002 fffffffffffffffb ffffba30c4a83f08[  120.722604]  0000000001ea4f00 ffffffffac83326b ffff9676f3effdd8 ffffffffac67c0dd[  120.722849]  0000000000000002 ffff96770ffc8880 ffffffffac60cba0 ffff96770ffc8880[  120.723105] Call Trace:[  120.723188]  [] ? __handle_sysrq+0xf1/0x140[  120.723316]  [] ? write_sysrq_trigger+0x2b/0x30[  120.723443]  [] ? proc_reg_write+0x3d/0x60[  120.723640]  [] ? vfs_write+0xb0/0x190[  120.723761]  [] ? SyS_write+0x52/0xc0[  120.723882]  [] ? do_syscall_64+0x87/0xf0[  120.724002]  [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6[  120.724167] Code: 41 5c 41 5d 41 5e 41 5f e9 0c 8a ce ff 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 c7 05 c9 45 94 03 01 00 00 00 0f ae f8  04 25 00 00 00 00 01 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 8d[  120.725651] RIP  [] sysrq_handle_crash+0x12/0x20[  120.725801]  RSP [  120.725894] CR2: 0000000000000000[  120.726019] ---[ end trace 9d0e2c84273289ed ]---[  120.730468] Kernel panic - not syncing: Fatal exception[  120.730706] Kernel Offset: 0x2b400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)[  120.735485] Rebooting in 5 seconds..[  125.739764] ACPI MEMORY or I/O RESET_REG. No errors detected===============================dmesg-ramoops-1<4>[    5.718737] hrtimer: interrupt took 9700 ns<6>[    6.096115] igb_uio: Use MSIX interrupt by default<3>[    6.169775] EXT4-fs (sda2): unable to read superblock<3>[    6.170072] EXT4-fs (sda2): unable to read superblock<3>[    6.170081] EXT4-fs (sda2): unable to read superblock<6>[    6.695732] device eth0 entered promiscuous mode<6>[    6.697183] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None<0>[   12.596101] TIPC: Started in network mode<0>[   12.596217] TIPC: Own node identity 1001285, cluster identity 4711<0>[   12.596345] TIPC: 32-bit node address hash set to 1001285<0>[   12.638127] TIPC: Enabled bearer , priority 10<0>[   18.791882] TIPC: Resetting bearer <6>[   23.124506] ip_tables: (C) 2000-2006 Netfilter Core Team<6>[   23.623557] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)<6>[   24.067819] Initializing XFRM netlink socket<6>[   24.087773] Netfilter messages via NETLINK v0.30.<6>[   24.522481] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.<6>[  120.714716] sysrq: SysRq : Trigger a crash<1>[  120.714869] BUG: unable to handle kernel NULL pointer dereference at           (null)<1>[  120.715080] IP: [] sysrq_handle_crash+0x12/0x20<7>[  120.715239] PGD 800000004fe74067<7>[  120.715296] PUD 4fe07067<7>[  120.715393] PMD 0<7>[  120.715417]<7>[  120.715497] Oops: 0002 [#1] SMP<7>[  120.715587] Modules linked in: <7>[  120.718151]  mii mtd sd_mod ata_piix ahci libahci libata scsi_mod sdhci_pci sdhci mmc_block mmc_core squashfs vfat fat ext4 crc16 fscrypto jbd2 mbcache<7>[  120.719799] CPU: 0 PID: 7475 Comm: sh Tainted: G           O    4.9.166+ #2<7>[  120.719942] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017<7>[  120.720192] task: ffff96770c862240 task.stack: ffffba30c4a80000<7>[  120.720324] RIP: 0010:[]  [] sysrq_handle_crash+0x12/0x20<7>[  120.720533] RSP: 0018:ffffba30c4a83e78  EFLAGS: 00010282<7>[  120.720680] RAX: 000000000000000f RBX: 0000000000000063 RCX: 0000000000000000<7>[  120.720819] RDX: 0000000000000000 RSI: ffff96771ba10648 RDI: 0000000000000063<7>[  120.720962] RBP: ffffffffad0bffc0 R08: 0000000000000001 R09: 0000000000059284<7>[  120.721131] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000004<7>[  120.721278] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000<7>[  120.721421] FS:  0000000001ea0880(0000) GS:ffff96771ba00000(0000) knlGS:0000000000000000<7>[  120.721618] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033<7>[  120.721745] CR2: 0000000000000000 CR3: 000000004cb84000 CR4: 0000000000360670<7>[  120.721911] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<7>[  120.722108] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400<7>[  120.722256] Stack:<7>[  120.722331]  ffffffffac832e31 0000000000000002 fffffffffffffffb ffffba30c4a83f08<7>[  120.722604]  0000000001ea4f00 ffffffffac83326b ffff9676f3effdd8 ffffffffac67c0dd<7>[  120.722849]  0000000000000002 ffff96770ffc8880 ffffffffac60cba0 ffff96770ffc8880<7>[  120.723105] Call Trace:<7>[  120.723188]  [] ? __handle_sysrq+0xf1/0x140<7>[  120.723316]  [] ? write_sysrq_trigger+0x2b/0x30<7>[  120.723443]  [] ? proc_reg_write+0x3d/0x60<7>[  120.723640]  [] ? vfs_write+0xb0/0x190<7>[  120.723761]  [] ? SyS_write+0x52/0xc0<7>[  120.723882]  [] ? do_syscall_64+0x87/0xf0<7>[  120.724002]  [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6<7>[  120.724167] Code: 41 5c 41 5d 41 5e 41 5f e9 0c 8a ce ff 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 c7 05 c9 45 94 03 01 00 00 00 0f ae f8  04 25 00 00 00 00 01 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 8d<1>[  120.725651] RIP  [] sysrq_handle_crash+0x12/0x20<7>[  120.725801]  RSP <7>[  120.725894] CR2: 0000000000000000<4>[  120.726019] ---[ end trace 9d0e2c84273289ed ]---<0>[  120.730468] Kernel panic - not syncing: Fatal exception<0>[  120.730706] Kernel Offset: 0x2b400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) No errors detected

可以看到,这次panic由system request导致。

2.kdump使用

每当system kernel(工作内核)的内存需要dump时(如系统panic),kdump便使用kexec快速启动(绕过BIOS检查)一个捕获dump(dump-capture)的内核。在启动第二个内核时,工作内核的所有信息(memory image)被保留,并且能够被启动的捕获内核获取。

在x86-64上需要开启的配置宏:(其实分为system kernel和dump-capture kernel,具体看官方文档;我们仅用一个kernel:启动的捕获内核仍为工作内核)

CONFIG_KEXEC=yCONFIG_SYSFS=yCONFIG_DEBUG_INFO=Y  #dump分析工具需要带有vmlinux的符号表CONFIG_CRASH_DUMP=yCONFIG_PROC_VMCORE=yCONFIG_RELOCATABLE=yCONFIG_PHYSICAL_START=0x1000000  #加载内核的内存区域起点CONFIG_SMP=n #该配置为捕获内核配置项,我们仍然开启,修改kernel启动参数即可

在重新编译内核后,我们需要修改system kernel的起动参数,为捕获kernel预留一定的内存,当前预留512M,修改/boot/grub/grub.cfg增加如下:

crashkernel=512M@16M

重新启动后生效,此时我们便需要编译kexec工具, 我们在shell下执行如下命令即可:

kexec -p /boot/vmlinux-4.9.xxx --initrd=/boot/initrd-4.9.166.xxx --append="1 irqpoll maxcpus=1 reset_devices noapic recovery" --reuse-cmdline

然后,当kernel崩溃后,就会加载捕获内核,系统启动后,生成/proc/vmcore文件,将其拷贝至其它目录后重启,使用kexec-x86/sbin/vmcore-dmesg或者gdb调试该文件即可。

到此,关于"如何在x86虚拟机上使用ramoops和kdump记录内核crash信息"的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注网站,小编会继续努力为大家带来更多实用的文章!

0