Avatar (simoncion) wrote,
Avatar
simoncion

  • Mood:

Dear Internets: An LVM question

EDIT: The syslog record of the incident and other information are now at the bottom of the post.


I think that some of the LVM tools are flakey.

Case in point:
1)
My machine has 128MB of RAM, 1GB of swap, and a Slot 1 Pentium 2 333 (running at 300). I'm running a 2.6.16-ck12-r2 kernel, on Gentoo Linux.
I'm ssh'd into the box, doing a pvmove on a pv that's no larger than 270GB large to a pv that's ~290GB large.
I let this run for quite a while (more than an hour?). The pvmove stops progressing, my system load spikes (while my CPU utilization is nil), and after another thirty minutes, I check dmesg. The kernel has oops'd with a:

"Unable to handle kernel paging request at virtual address ..."

The gkrellm daemon shows that I'm using ~70% of my RAM and very little swap.
EDIT: Top told me that pvresize was using ~55% of available RAM.

The pvmove process is not responding to SIGTERM or SIGINT or SIGKILL.
I can make new ssh connections, run top, and people can initiate samba transfers from the machine and all that, but when I try to kick off something like emerge, the process hangs, and stops responding to signals. (sending SIGKILL does *absolutely nothing*)

I'm in Knoppix right now, and am babysitting the pvmove so that this doesn't happen again (I hope).

2)
A previous experience with lvresize, same machine, 32MB of RAM, and 1GB of swap:
I would attempt to resize an lv. lvresize would silently fail segfault (IIRC). I didn't look at the dmesg output, but I'll setup a scenario in Vmware when I get settled into my new home in January.

Dear god, I'm sorry for the ramble. I'm a bit wired on caffinee at the moment. I'm babysitting the pvmove, terminating it when I'm about to run out of RAM, and restarting... : / UGH. This sucks, a lot. Why can't this program Just Work(TM)?

Anyhow,
My question:
Have any of you had similar experiences with the LVM tools? If so, have you resolved the issue (And how did you do it)? And, if you didn't resolve the issue, what do you think is the root cause?

If anyone wants more information... kernel config, uname output, anything really, leave a comment, and I'll try my best to provide. I'll be away from Dec 23 to Jan 3rd or so, though.

(Syslog and system info follows:)


Dec 19 23:08:43 server e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
Dec 19 23:09:11 server smartd[9210]: Device: /dev/hda, SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 109
Dec 19 23:09:11 server smartd[9210]: Device: /dev/hdc, SMART Usage Attribute: 194 Temperature_Celsius changed from 116 to 108
Dec 19 23:09:11 server smartd[9210]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 125 to 110
Dec 19 23:10:01 server cron[11297]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Dec 19 23:10:01 server cron[11298]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:15:01 server cron[11634]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:17:21 server sshd[11843]: Accepted publickey for root from XXXX port 39970 ssh2
Dec 19 23:17:21 server sshd(pam_unix)[11845]: session opened for user root by root(uid=0)
Dec 19 23:20:01 server cron[11974]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Dec 19 23:20:01 server cron[11975]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:25:01 server cron[12311]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:27:38 server sshd[12525]: Accepted publickey for root from XXXX port 49059 ssh2
Dec 19 23:27:38 server sshd(pam_unix)[12527]: session opened for user root by root(uid=0)
Dec 19 23:30:01 server cron[12645]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Dec 19 23:30:01 server cron[12647]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:35:01 server cron[12986]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:37:19 server Unable to handle kernel paging request at virtual address c9010138
Dec 19 23:37:19 server printing eip:
Dec 19 23:37:19 server c8ffb97e
Dec 19 23:37:19 server *pde = 020c2067
Dec 19 23:37:19 server *pte = 00000000
Dec 19 23:37:19 server Oops: 0000 [#1]
Dec 19 23:37:19 server Modules linked in: dm_mirror xt_tcpudp xt_tcpmss xt_string xt_state xt_sctp xt_realm xt_pkttype xt_physdev xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack xt_connbytes xt_comment xt_NFQUEUE xt_MARK xt_CLASSIFY iptable_raw iptable_nat iptable_mangle iptable_filter ipt_ttl ipt_tos ipt_recent ipt_owner ipt_multiport ipt_layer7 ipt_iprange ipt_hashlimit ipt_esp ipt_ecn ipt_dscp ipt_ah ipt_addrtype ipt_ULOG ipt_TTL ipt_TOS ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_DSCP ipt_CLUSTERIP arptable_filter arpt_mangle arp_tables ip_nat_tftp ip_nat_snmp_basic ip_nat_irc ip_nat_ftp ip_nat ip_tables x_tables sch_teql sch_tbf sch_sfq sch_red sch_prio sch_netem sch_ingress sch_htb sch_hfsc sch_gred sch_dsmark sch_cbq cls_u32 cls_tcindex cls_route cls_fw cifs ipr 8139too e100 dm_snapshot dm_crypt dm_mod
Dec 19 23:37:19 server CPU:    0
Dec 19 23:37:19 server EIP:    0060:[]    Not tainted VLI
Dec 19 23:37:19 server EFLAGS: 00010246   (2.6.16-ck12-r2 #6) 
Dec 19 23:37:19 server EIP is at core_in_sync+0xe/0x20 [dm_mirror]
Dec 19 23:37:19 server eax: c900f000   ebx: c380a480   ecx: c8fff5a0   edx: 000089da
Dec 19 23:37:19 server esi: c6518820   edi: 00000000   ebp: c56f7934   esp: c56f7890
Dec 19 23:37:19 server ds: 007b   es: 007b   ss: 0068
Dec 19 23:37:19 server Process smbd (pid: 12952, threadinfo=c56f6000 task=c0a78580)
Dec 19 23:37:19 server Stack: <0>c8ffd443 c7dfaae0 000089da 00000000 00001000 c6518820 c1e7ead8 c88a741d 
Dec 19 23:37:19 server c89ee070 c6518820 c1e7eae0 c56f7934 c77f7320 c1e7eae8 c56f7934 c88a7866 
Dec 19 23:37:19 server c89ee070 c6518820 c1e7ead8 00000001 00000008 00011210 c013c1c3 c13f66e0 
Dec 19 23:37:19 server Call Trace:
Dec 19 23:37:19 server [] mirror_map+0x53/0xd0 [dm_mirror]
Dec 19 23:37:19 server [] __map_bio+0x4d/0xd0 [dm_mod]
Dec 19 23:37:19 server [] __clone_and_map+0x2b6/0x2d0 [dm_mod]
Dec 19 23:37:19 server [] mempool_alloc+0x33/0xc0
Dec 19 23:37:19 server [] __split_bio+0xb8/0xf0 [dm_mod]
Dec 19 23:37:19 server [] dm_request+0x9a/0xd0 [dm_mod]
Dec 19 23:37:19 server [] generic_make_request+0xa9/0x120
Dec 19 23:37:19 server [] bio_clone+0x44/0x60
Dec 19 23:37:19 server [] __map_bio+0x4d/0xd0 [dm_mod]
Dec 19 23:37:19 server [] __clone_and_map+0x2b6/0x2d0 [dm_mod]
Dec 19 23:37:19 server [] mempool_alloc+0x33/0xc0
Dec 19 23:37:19 server [] __delay+0x12/0x20
Dec 19 23:37:19 server [] __split_bio+0xb8/0xf0 [dm_mod]
Dec 19 23:37:19 server [] radix_tree_insert+0xe1/0x110
Dec 19 23:37:19 server [] dm_request+0x9a/0xd0 [dm_mod]
Dec 19 23:37:19 server [] generic_make_request+0xa9/0x120
Dec 19 23:37:19 server [] mempool_alloc+0x33/0xc0
Dec 19 23:37:19 server [] find_get_page+0x21/0x40
Dec 19 23:37:19 server [] submit_bio+0x62/0x100
Dec 19 23:37:19 server [] bio_alloc_bioset+0x90/0x180
Dec 19 23:37:19 server [] end_buffer_read_sync+0x0/0x30
Dec 19 23:37:19 server [] bio_alloc+0x20/0x30
Dec 19 23:37:19 server [] submit_bh+0xd5/0x130
Dec 19 23:37:19 server [] ll_rw_block+0x76/0xc0
Dec 19 23:37:19 server [] search_by_key+0xc2/0xcb0
Dec 19 23:37:19 server [] qdisc_restart+0x23/0x180
Dec 19 23:37:19 server [] pathrelse+0x23/0x40
Dec 19 23:37:19 server [] init_inode+0x1b3/0x410
Dec 19 23:37:19 server [] ip_output+0x136/0x2a0
Dec 19 23:37:19 server [] reiserfs_read_locked_inode+0x71/0x110
Dec 19 23:37:19 server [] reiserfs_find_actor+0x0/0x30
Dec 19 23:37:19 server [] reiserfs_init_locked_inode+0x0/0x20
Dec 19 23:37:19 server [] reiserfs_iget+0xac/0xc0
Dec 19 23:37:19 server [] reiserfs_find_actor+0x0/0x30
Dec 19 23:37:19 server [] reiserfs_init_locked_inode+0x0/0x20
Dec 19 23:37:19 server [] reiserfs_lookup+0x123/0x170
Dec 19 23:37:19 server [] copy_to_user+0x42/0x60
Dec 19 23:37:19 server [] d_lookup+0x2e/0x60
Dec 19 23:37:19 server [] real_lookup+0xc5/0xf0
Dec 19 23:37:19 server [] do_lookup+0x9d/0xb0
Dec 19 23:37:19 server [] __link_path_walk+0x5f5/0xb10
Dec 19 23:37:19 server [] reiserfs_readdir+0x25d/0x4e0
Dec 19 23:37:19 server [] mntput_no_expire+0x23/0x80
Dec 19 23:37:19 server [] link_path_walk+0x47/0xd0
Dec 19 23:37:19 server [] buffered_rmqueue+0xf4/0x200
Dec 19 23:37:19 server [] do_path_lookup+0xfc/0x260
Dec 19 23:37:19 server [] __user_walk_fd+0x3c/0x60
Dec 19 23:37:19 server [] vfs_stat_fd+0x24/0x60
Dec 19 23:37:19 server [] vfs_stat+0x1f/0x30
Dec 19 23:37:19 server [] sys_stat64+0x1b/0x40
Dec 19 23:37:19 server [] syscall_call+0x7/0xb
Dec 19 23:37:19 server Code: 04 8b 54 24 08 8b 40 04 8b 40 18 0f a3 10 19 d2 31 c0 85 d2 0f 95 c0 c3 90 8d 74 26 00 8b 44 24 04 8b 54 24 08 8b 40 04 8b 40 1c <0f> a3 10 19 d2 31 c0 85 d2 0f 95 c0 c3 90 8d 74 26 00 31 c0 c3 
Dec 19 23:38:41 server smartd[9210]: Device: /dev/hda, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 200
Dec 19 23:38:41 server smartd[9210]: Device: /dev/hdc, SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 105
Dec 19 23:38:41 server smartd[9210]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 110 to 107
Dec 19 23:40:01 server cron[13318]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Dec 19 23:40:01 server cron[13319]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:45:02 server cron[13655]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:50:01 server cron[13983]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Dec 19 23:50:01 server cron[13984]: (root) CMD (nice -n 1 /usr/sbin/aggregator XXXX)
Dec 19 23:51:09 server sshd[14057]: Accepted publickey for XXXX from XXXX port 54172 ssh2
Dec 19 23:51:09 server sshd(pam_unix)[14059]: session opened for user XXXX by (uid=0)
Dec 19 23:57:55 server sshd[14567]: Accepted publickey for root from XXXX port 40989 ssh2
Dec 19 23:57:55 server sshd(pam_unix)[14569]: session opened for user root by root(uid=0)
Dec 19 23:58:13 server sshd(pam_unix)[14059]: session closed for user XXXX


/usr/sbin/aggregator is python script that does some network scanning and sets up a DFS tree for samba to use.
This machine hosts several samba shares in addition to the DFS tree.

*********
Some more system information: (If you need anything else, I'll try to provide, but you might have to wait till early January)

emerge -pv lvm2
These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild U ] sys-fs/device-mapper-1.02.10-r1 [1.01.03] USE="(-selinux)" 904 kB
[ebuild U ] sys-fs/lvm2-2.02.06 [2.01.09] USE="readline -clvm% -cman% -gulm% -nolvm1% -nolvmstatic -nomirrors% -nosnapshots% (-selinux)" 472 kB

uname -a
Linux server 2.6.16-ck12-r2 #6 Wed Nov 29 14:03:54 CST 2006 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux

cat /etc/make.conf
CFLAGS="-O2 -mcpu=pentium2 -march=pentium2 -fomit-frame-pointer -pipe"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"

USE="-X -gtk -alsa -oss -arts -gnome -kde apache2 -jpeg -xml2 -cups"
FEATURES="usersandbox userpriv"
PORTDIR_OVERLAY="/usr/local/portage"
PORTAGE_NICENESS="5"

gcc --version
gcc (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)

lspci
00:00.0 Host bridge: Intel Corporation 440LX/EX - 82443LX/EX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440LX/EX - 82443LX/EX AGP bridge (rev 03)
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
00:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 02)
00:0b.0 VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 01)

Subscribe
  • Post a new comment

    Error

    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 5 comments