技术解析

hyper-v 搭建的 centos7.8.2003 空载情况下内存消耗 80%,找不到消耗的进程,真是奇怪
0
2021-06-17 19:20:43
idczone
简介:
在 hyper-v 上搭建了一个 centos 系统,几乎没有做什么事情,但是内存使用率在几分钟后,会突然使用到 80%,ps 、top 手段看不到 占用内存的进程。

环境信息如下:

宿主机:
win10 专业版 1909 版本、E5 2696 v2 、64G 的 ECC 内存。
hyper-v 虚拟机:
1 核心处理器,6000MB 的内存分配,启动动态内存,最小 RAM 2000MB,最大 RAM 6000MB 、内存权重:中。
centos 系统配置:
centos 7.8.2003 X64 server 版,无 GUI.

表现:
centos 在空载的情况下,前 1 分钟内,使用的内存都是 200MB,但是在 1-3 分钟后,突然上升到 5000MB 的使用量。而且找不到消耗的进程。

命令执行情况如下:
free -m
[[email protected] ~]# free -m
total used free shared buff/cache available
Mem: 5663 4151 1301 8 210 1280
Swap: 8063 0 8063

df -h
[[email protected] ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 2.8G 0 2.8G 0% /dev
tmpfs 2.8G 0 2.8G 0% /dev/shm
tmpfs 2.8G 9.0M 2.8G 1% /run
tmpfs 2.8G 0 2.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 50G 1.7G 49G 4% /
/dev/sda1 1014M 191M 824M 19% /boot
/dev/mapper/centos-home 192G 918M 191G 1% /home
tmpfs 567M 0 567M 0% /run/user/0


ps 根据内存排序
[[email protected] ~]# ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem
PID PPID CMD %MEM %CPU
689 1 /usr/bin/python2 -Es /usr/s 0.5 0.0
1044 1 /usr/bin/python2 -Es /usr/s 0.3 0.0
661 1 /usr/lib/polkit-1/polkitd - 0.2 0.0
690 1 /usr/sbin/NetworkManager -- 0.1 0.0
1047 1 /usr/sbin/rsyslogd -n 0.1 0.0
482 1 /usr/lib/systemd/systemd-jo 0.1 0.0
1 0 /usr/lib/systemd/systemd -- 0.1 0.0
1497 1045 sshd: [email protected]/0 0.0 0.0
502 1 /usr/sbin/lvmetad -f 0.0 0.0
819 690 /sbin/dhclient -d -q -sf /u 0.0 0.0
512 1 /usr/lib/systemd/systemd-ud 0.0 0.0
1045 1 /usr/sbin/sshd -D 0.0 0.0
663 1 /usr/bin/dbus-daemon --syst 0.0 0.0
1501 1497 -bash 0.0 0.0
670 1 /usr/sbin/chronyd 0.0 0.0
658 1 /usr/lib/systemd/systemd-lo 0.0 0.0
8255 1501 ps -eo pid,ppid,cmd,%mem,%c 0.0 0.0
681 1 /usr/sbin/crond -n 0.0 0.0
635 1 /sbin/auditd 0.0 0.0
687 1 /sbin/agetty --noclear tty1 0.0 0.0
2 0 [kthreadd] 0.0 0.0


[[email protected] ~]# ps -eo rss,pmem,pcpu,vsize,args | sort -k 1 -r -n
29524 0.5 0.0 358764 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
17432 0.3 0.0 574304 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
12956 0.2 0.0 613020 /usr/lib/polkit-1/polkitd --no-debug
10996 0.1 0.0 550336 /usr/sbin/NetworkManager --no-daemon
7332 0.1 0.0 218552 /usr/sbin/rsyslogd -n
6724 0.1 0.0 39080 /usr/lib/systemd/systemd-journald
6640 0.1 0.0 128024 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
5680 0.0 0.0 156796 sshd: [email protected]/0
5560 0.0 0.0 201104 /usr/sbin/lvmetad -f
5508 0.0 0.0 102904 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-acb65408-a188-4419-9d1c-5c58f456bc41-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
5220 0.0 0.0 48284 /usr/lib/systemd/systemd-udevd
4316 0.0 0.0 112924 /usr/sbin/sshd -D
3996 0.0 0.0 227552 /usr/libexec/nm-dispatcher
2588 0.0 0.0 66476 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
2152 0.0 0.0 115680 -bash
1844 0.0 0.0 117808 /usr/sbin/chronyd
1756 0.0 0.0 26384 /usr/lib/systemd/systemd-logind
1668 0.0 0.0 126384 /usr/sbin/crond -n
1488 0.0 0.0 153348 ps -eo rss,pmem,pcpu,vsize,args
912 0.0 0.0 116600 sort -k 1 -r -n
856 0.0 0.0 55532 /sbin/auditd
848 0.0 0.0 110204 /sbin/agetty --noclear tty1 linux

dmesg 超出长度了,删减一点 0-2 时间的日志
[[email protected] ~]# dmesg
[ 1.941334] sda: sda1 sda2
[ 1.945116] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.422141] ata1.01: NODEV after polling detection
[ 2.505749] ata1.00: host indicates ignore ATA devices, ignored
[ 2.506095] ata2.01: NODEV after polling detection
[ 2.592571] ata2.00: ATAPI: Virtual CD, , max MWDMA2
[ 2.595149] ata2.00: configured for MWDMA2
[ 2.597455] scsi 3:0:0:0: CD-ROM Msft Virtual CD/ROM 1.0 PQ: 0 ANSI: 5
[ 2.633490] sr 3:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
[ 2.633494] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 2.633822] sr 3:0:0:0: Attached scsi CD-ROM sr0
[ 3.277974] SGI XFS with ACLs, security attributes, no debug enabled
[ 3.297495] XFS (dm-0): Mounting V5 Filesystem
[ 3.618658] XFS (dm-0): Ending clean mount
[ 3.954048] random: crng init done
[ 4.019475] systemd-journald[92]: Received SIGTERM from PID 1 (systemd).
[ 4.577895] type=1404 audit(1591358205.434:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
[ 4.634156] SELinux: 2048 avtab hash slots, 112685 rules.
[ 4.699424] SELinux: 2048 avtab hash slots, 112685 rules.
[ 4.730215] SELinux: 8 users, 14 roles, 5046 types, 316 bools, 1 sens, 1024 cats
[ 4.730219] SELinux: 130 classes, 112685 rules
[ 4.734688] SELinux: Completing initialization.
[ 4.734690] SELinux: Setting up existing superblocks.
[ 4.738802] type=1403 audit(1591358205.595:3): policy loaded auid=4294967295 ses=4294967295
[ 4.754962] systemd[1]: Successfully loaded SELinux policy in 173.104ms.
[ 4.965923] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 4.965951] systemd[1]: Inserted module 'ip_tables'
[ 4.994376] systemd[1]: Relabelled /dev, /run and /sys/fs/cgroup in 24.600ms.
[ 5.189157] psmouse serio1: trackpoint: IBM TrackPoint firmware: 0x01, buttons: 3/3
[ 5.190570] input: TPPS/2 IBM TrackPoint as /devices/platform/i8042/serio1/input/input3
[ 5.191489] input: AT Translated Set 2 keyboard as /devices/LNXSYSTM:00/device:00/PNP0A03:00/device:08/VMBUS:01/d34b2567-b9b6-42b9-8778-0a4ec0b955bf/serio2/input/input4
[ 6.309916] systemd-journald[482]: Received request to flush runtime journal from PID 1
[ 7.060905] piix4_smbus 0000:00:07.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
[ 7.755829] hv_vmbus: registering driver hv_balloon
[ 7.760236] hv_balloon: Using Dynamic Memory protocol version 2.0
[ 7.782817] pps_core: LinuxPPS API ver. 1 registered
[ 7.782819] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 7.806385] PTP clock support registered
[ 7.828141] hv_utils: Registering HyperV Utility Driver
[ 7.828145] hv_vmbus: registering driver hv_util
[ 7.828808] hv_utils: Heartbeat IC version 3.0
[ 7.830498] hv_utils: Shutdown IC version 3.0
[ 7.831212] hv_utils: TimeSync IC version 4.0
[ 7.832551] hv_utils: VSS IC version 5.0
[ 7.852739] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 7.853294] sr 3:0:0:0: Attached scsi generic sg1 type 5
[ 7.862171] input: PC Speaker as /devices/platform/pcspkr/input/input5
[ 7.939175] cryptd: max_cpu_qlen set to 1000
[ 8.153763] AVX version of gcm_enc/dec engaged.
[ 8.153766] AES CTR mode by8 optimization enabled
[ 8.165736] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[ 8.165774] alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni)
[ 8.301663] EDAC sbridge: Seeking for: PCI ID 8086:0ea0
[ 8.301668] EDAC sbridge: Ver: 1.1.2
[ 8.675971] Adding 8257532k swap on /dev/mapper/centos-swap. Priority:-2 extents:1 across:8257532k FS
[ 8.722279] XFS (sda1): Mounting V5 Filesystem
[ 8.846217] XFS (sda1): Ending clean mount
[ 10.421871] XFS (dm-2): Mounting V5 Filesystem
[ 10.518566] XFS (dm-2): Ending clean mount
[ 10.866185] type=1305 audit(1591358210.636:4): audit_pid=635 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[ 13.479185] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 13.626047] Ebtables v2.0 registered
[ 13.774889] Netfilter messages via NETLINK v0.30.
[ 13.817275] ip_set: protocol 7
[ 13.992945] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 13.994648] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 13.996028] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 14.017224] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 14.017908] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 14.072969] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 14.338697] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 14.543551] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 14.970829] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 14.998091] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 54.801674] hv_balloon: Max. dynamic memory size: 6000 MB
请使用 markdown 对您提交的内容进行排版,大量日志烦请使用 pastebin 等服务贴出。这么排,完全没法看啊

更新过了。我第一把发上去,就发现乱了。后续 死活打不开 V2EX 了。好像网站断断续续被攻击。

信息不足,烦请使用 pastebin 等服务贴出完整日志。

完整的 dmesg 的地址 https://pastebin.com/3Zn2LtGN

Hyper-V Memory ballooning,就是你粘的 dmesg 最后一行
简言之,半虚拟化的内核可以将不用的内存还给宿主系统。

我猜是 Memory ballooning 的锅

更新内核
关闭动态内存
先试试看是什么问题
另外微软建议 Linux guest 的内存是 128M 的整数倍


问题已经解决了。是 Hyper-v 的动态内存的问题。关闭了 Hyper-v 的动态内存,Centos 的内存消耗就变正常。
刚开始,解决问题的思路偏了,一直以为有什么恶意脚本。但是始终找不到进程。
猜测的最终原因:应该是 Hyper-v 与 Centos7 的适配有问题,Hyper-v 的动态内存我设定最小 2G,最大 6G,中间应该有 4G 没有真的分配,被宿主机挪走了。但是 Centos 就显示 4G+真正的使用的内存。就变成 80%的内存使用率了。

只看标题就知道是 ballooning 。
这东西开着一般也不影响什么,反正虚拟机里跑程序的时候宿主会把内存还回去的。

centos 内核太老,对 hyperv 的内存动态调节兼容差?
建议用 centos8,ubuntu1804,debian9,debian10.
分享 debian9.12 ,bt 种子下载:
https://cdimage.debian.org/cdimage/archive/9.12.0/amd64/bt-dvd/

数据地带为您的网站提供全球顶级IDC资源
在线咨询
专属客服