BTRFS: open_ctree failed after ubuntu image deploy

Oleg

Hello guys,
by uploading an 16.06 image with this partition table:

# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda3 during installation
UUID=dfa3bf1a-9ca1-4fc4-863f-72815db61539 /               btrfs   defaults,discard,relatime,subvol=@ 0       1
# /boot was on /dev/sda1 during installation
UUID=0a46a7fe-23c2-4e4c-a76a-7d2410311a25 /boot           ext4    defaults        0       2
# /home was on /dev/sda5 during installation
UUID=dfa3bf1a-9ca1-4fc4-863f-72815db61539 /home           btrfs   defaults,discard,relatime,subvol=@home 0       2
# /opt was on /dev/sda6 during installation
UUID=719313d4-5d42-44e7-8cda-87c492b92ae6 /opt            btrfs   defaults,discard        0       2
# swap was on /dev/sda2 during installation
UUID=c5c9a2e6-885a-4ed1-aeb6-909043dae122 none            swap    sw              0       0

i get the above after uploading the first partition.

Fog is finishing the job though. But if I put the image on onother computer, ubuntu starts with busybox.

I was not able to find any solution, so I hope someone can help. Thanks

Oleg

I’ve done a new installation of the ubuntu 16.04 with default partition-table (ext4+swap) and it seems it is a hardware+kernel problem.
The computer is a Fujitsu Q556 with skylake chipset and SSD Drive. I will try do install other kernels and hope the problem dissapear.
Any other solutions?

Sebastian Roth

@Oleg That’s an interesting one. The message random: nonblocking pool is initialized is printed by the kernel on boot (if kernel messages are turned on - which we don’t by default). Very strange that you see this message while imaging. My guess is this message is not causing any trouble. But we seem to not properly set the filesystem UUIDs which are used in your fstab. We should definitely fix this.

From your description (you end up in a busybox shell) it should be the root partition’s filesystem UUID (on /dev/sda5) that we screw up. Can you please post the contents of /images/<imagename>/d1.original.uuids.

Oleg

@Sebastian-Roth
I don’t have this file my image-folder.
There is a “d1.partitions”:

label: dos
label-id: 0x173d9cfe
device: /dev/sda
unit: sectors

/dev/sda1 : start=        2048, size=     5857280, type=83, bootable
/dev/sda2 : start=     5859328, size=    15624192, type=82
/dev/sda3 : start=    21483520, size=    19531776, type=83
/dev/sda4 : start=    41017342, size=   209051650, type=5
/dev/sda5 : start=    41017344, size=    19529728, type=83
/dev/sda6 : start=    60549120, size=   189519872, type=83

and the “d1.original.swapuuids”:

/dev/sda2 c5c9a2e6-885a-4ed1-aeb6-909043dae122

The “d1.has_grub” is empty

Sebastian Roth

@Tom-Elliott Seems like we only do saveUUIDInformation on resizable image type. Do you know why?

@Oleg Till we get this sorted in the code can you try this: create a new image definition, set image type to resizable, upload again and see if it is working after deploy.

Oleg

@Sebastian-Roth
I changed the image to resizable and did an upload. After downloading I stuck on busybox again.
here are the image-files:
d1.fixed_size_partitions

2:3:4:5:6

d1.minimum.partitions:

label: dos
label-id: 0x173d9cfe
device: /dev/sda
unit: sectors

/dev/sda1 : start=        2048, size=      347702, type=83, bootable
/dev/sda2 : start=     5859328, size=    15624192, type=82
/dev/sda3 : start=    21483520, size=    19531776, type=83
/dev/sda4 : start=    41017342, size=   209051650, type=5
/dev/sda5 : start=    41017344, size=    19529728, type=83
/dev/sda6 : start=    60549120, size=   189519872, type=83

d1.partitions:

label: dos
label-id: 0x173d9cfe
device: /dev/sda
unit: sectors

/dev/sda1 : start=        2048, size=     5857280, type=83, bootable
/dev/sda2 : start=     5859328, size=    15624192, type=82
/dev/sda3 : start=    21483520, size=    19531776, type=83
/dev/sda4 : start=    41017342, size=   209051650, type=5
/dev/sda5 : start=    41017344, size=    19529728, type=83
/dev/sda6 : start=    60549120, size=   189519872, type=83

d1.original.fstypes:

/dev/sda1 extfs

d1.original.swapuuids is the same

and this is how i my busybox look like:

Sebastian Roth

@Oleg said:

and this is how i my busybox look like…

Good that you posted this picture because, oh well, I guess I was on the wrong track with this. Do you get the same error on non-resizable image type or is it a different error before getting to the busybox shell?

Sebastian Roth

@Oleg To me this looks like it could be an issue in the partclone.btrfs code. Hope it’s not but you never know.

Here it says:

Luckily all decent systems that support btrfs (like Ubuntu 14.04) will have btrfs tools included in the initramfs environment, so you can run btrfs commands from there and try to recover from the situation without the need to boot the system form an alternative media, like a live CD.

So can you please boot up the system after deploy using a live CD and try the following commands:

btrfs-show-super -a /dev/sda3
btrfs check /dev/sda3
btrfs-find-root /dev/sda3
mkdir -p /mnt/sda3 && mount -t btrfs -o ro,recovery /dev/sda3 /mnt/sda3
btrfs restore -F -i -D -v /dev/sda3 /dev/null

Oleg

@Sebastian-Roth said:

… Do you get the same error on non-resizable image type or is it a different error before getting to the busybox shell?

This is what i get:

The commands doesn’t work for me - will try to repair that. But is it btrfs-problem?
My system is a completly new with a only a couple files added.

Oleg

I’ve just tried to create an Image with latest clonezilla but it’s the same.

Tom Elliott

Does it upload all the partitions or just /dev/sda1?

Tom Elliott

Also, just for clarification.

The first upload broke this system, from what I understand. Are you uploading the broken system or are you ensuring the system is operational between uploads?

Oleg

@Tom-Elliott
in fog - yes, all partitions have been uploaded

The system is running fine before uploading.
I mean if I setup a clean Ubuntu 16.06 Server with BTRFS and the partition table i mentioned, then i get the same error. Tested with another Fujitsu Computer, which is a couple years old.

Tom Elliott

Then the last question I have, I suppose.

Have you updated to the latest FOG Version and retried uploading?

Oleg

@Tom-Elliott
sorry, had to mention that at first - last try was with Trunk 8046

Sebastian Roth

@Oleg said:

But is it btrfs-problem? … I’ve just tried to create an Image with latest clonezilla but it’s the same.

As you see from your tests it seems to be a partclone/clonezilla issue. This confirms this as well. Although I really wonder why I can’t find anything about this on the web… There should be other people running into this issue!!

Possibly I will be able to do some tests over the weekend. Kepp us posted if you find anything new on this.

Sebastian Roth

@Oleg Starting to get my VM setup to test your issue I am a bit confused about the partition layout. While I am not saying that this is causing the error I am wondering why:

sda1 (/boot) is about 3 GB - not bad but usually you don’t need that much for it
sda3 (/) is around 9.5 GB - might be enough but I’d use a little more
sda5 (/home) is around 9.5 GB - this is where users store all their data… usually need a lot more
sda6 (/opt) is around 90 GB - usually /opt is for optional software. Do you install that much custom tools?
I am not saying that this layout is wrong. Depending on your requirements it might be very useful this way. Just saying that this is not the way I’d partition my disk.

Quazz

@Sebastian-Roth this seems relevant to this discussion as well.

Should be fixed in the partclone version FOG uses, but sounds like a btrfsfsck might be useful to try.

Sebastian Roth

@Quazz Good find man! Although I am wondering if this is the exact same issue as they are talking about an issue with “lzo compressed btrfs volumes” which @Oleg does not seem to have according to his /etc/fstab…

I had a bit of time while I was waiting for some other installations today so I setup Ubuntu 16.04 server (should be close enough to the scenario with Oleg’s Ubuntu desktop), booted it a couple of times without an issue, uploaded an image and deployed it again. My Ubuntu server is coming up and seems normal but taking a look at /var/log/kern.log I see a lot of these messages:

BTRFS error (device sda3): bad tree block start 0 40345712
BTRFS error (device sda3): bad tree block start 0 40484864
BTRFS error (device sda3): bad tree block start 0 40091648
BTRFS error (device sda3): bad tree block start 0 40108032
BTRFS error (device sda3): bad tree block start 0 40042496
...

Notice the different numbers at the end of the lines. I am not sure what that means. Guess we need to do some more research on this as it does not seem to be a showstopper in my case. I don’t see the BTRFS: open_ctree failed but I have some other btrfs related messages:

BTRFS info (device sda3): read error corrected: ino 1 off 125304832 (dev /dev/sda3 sector 261120)
BTRFS info (device sda3): read error corrected: ino 1 off 125308928 (dev /dev/sda3 sector 261128)
BTRFS info (device sda3): read error corrected: ino 1 off 125313024 (dev /dev/sda3 sector 261136)
BTRFS info (device sda3): read error corrected: ino 1 off 125317120 (dev /dev/sda3 sector 261144)

Anyone keen to dig into this and take a look at the partclone code as well. I’d love to but I guess I won’t find the time in the near future.

Oleg

@Sebastian-Roth
Thanks for your suggestion! for a normal use yours is better - in our case we have only a couple applications which are storing their data in the /opt. For sda2 and sda3 I think I will follow your suggestion.
Yes your right - in my setting I don’t have the “lzo compressed” options in the fstab.
In your case the system comes up in my not. Will look further today to confine the issue.

I think if it’s a partcone “code-issue”, the solution could take a while?! I’m asking because then I have to switch to another filesystem.

BTRFS: open_ctree failed after ubuntu image deploy

89

12.4k

17.5k

156.0k