Strange upload error
-
OK just to restate what I think you said, all of your fog servers are running under ubuntu 14.04 and the version of FOG is 1.2.0 base image. You have not applied any SVN trunk (a.k.a 1.3 beta) images on top of the 1.2.0 base image.
Now here is where I’m a bit fuzzy. Are you trying to deploy to a HP microserver or is the destination of the captured image the HP microserver? What I’m trying to dig at is of you are on the 1.2.0 base image and you have not updated the kernels since the base image was released (which is separate than applying a SVN upgrade), that kernels from 1.2.0 my not support current hardware.
-
@Julianh said:
failed to enable AA (error_mask=0x1)
Take a look here for a detailed explanation: https://bugzilla.redhat.com/show_bug.cgi?id=907193#c5
You can try adding
libata.force=noncq
as host kernel argument but that’ll probably make things slow as well. But hopefully not break it.PS: The first error on ‘calibration’ I see on several of my machines too. Does not do any harm I find.
-
As you can see in the bug report some hard drives are blacklisted within the libata kernel source code. I thought about suggesting you should try FOG trunk as maybe newer kernel versions have several more drives on the blacklist. But comparing the kernel source code I only see one new hard disk being added as BROKEN_FPDMA_AA. See youself:
kernel 3.15 (FOG 1.2.0 has 3.15.6): http://lxr.free-electrons.com/source/drivers/ata/libata-core.c?v=3.15#L4150
kernel 4.3 (FOG trunk): http://lxr.free-electrons.com/source/drivers/ata/libata-core.c#L4143Take a look at the source code and see if you can find your specific hard drive. To find out which HD you have in this machine you can boot up any live linux and run
hdparm -I /dev/sda
. You should see model, serial and firmware information on the first few lines of the output. We might report this to the kernel developers. Or you can just try using a different hard drive in this server. -
@Julianh Any news on this. Did you get this solved?
-
Hi Sebastian,
I’m working away, so I only had the weekend to try it. I swapped over the hard drive, and then tried the libata.force=noncq option, but unfortunatley it didn’t work.
Do you have any suggestions I could try, or options I could enable?
I’m back home at the weekend, then at home for a couple of weeks. Hopefully we can find the solution then.
Is there ny further information you want?
Thanks
Julian
-
@Julianh said:
unfortunatley it didn’t work.
What exactly didn’t work? Do you still see the same thing happening? Same messages and failing upload? Do you get an error message when things fail? What happens if you use a different hard drive??
-
Hi Sebastian,
I swapped the hard drive, and got the exact same error as in the video, I also left it, but at the end of the supposed imaging rocess, it failed.
What does the error code mean, I have the controller set to AHCI disks,
Thanks
Julian
-
@Julianh Does the microserver use RAID?
-
Hi Julian,
I have to admit that I haven’t had this error on one of my systems before. So I am just reading up things trying to understand what might be going wrong in your case. So there is a lot of guess work involved and I might be wrong…
Have you read through the bug report yet? What exactly do you see when it fails in the end of imaging process? Error messages?
What sort of hard drives have you tried so far? Are those Seagate Momentus SpinPoint disks or HP VB0250EAVER (https://lkml.org/lkml/2015/7/1/235)??
Please start a debug session for this host or boot a live linux via CD/DVD. Then run
hdparm -I /dev/sda
to see the model and serial number of the disk. Please let us know. By the way, do you see the same messages when booting a live linux??dmesg | grep "failed to enable AA"
-
I’m very very confused by this.
Is there a failure to actually uploading the image? Based on the video, it seems to be doing what it needs to even with all the “errors” on the screen. Of course we only see the point of the first partition calculating the bitmap and uploading the first partition, but does it not go further?
-
Thanks you all for getting back to me.
I’ve uploaded a slightly longer video to https://youtu.be/cg5tpB3kq9M
It shows the imaging start, but at the end it fails. I have to wait 12 hours for that though.Tom, the image continues, as if it were imaging the server, then fails. The imaging will take 12 hours or so it says, it’s 9pm here, so I’ll post the “end” error tomorrow morning.
Sebastian, there is no raid in the microserver. The drive is a samsung 1TB green. If I run the image in “upload - debug” mode, can I run the commands in that mode?
Thanks again for your help. I’ll post the image tomorrow.
Yours
Julian
-
@Julianh said:
If I run the image in “upload - debug” mode, can I run the commands in that mode?
Try host -> Basic tasks -> Debug. That should get you to a shell.
-
This is the end screen.
-
@Julianh Is it possible there’s an issue with RAM on this device? Or is this what’s happening on ALL devices?
-
This is the result of the hdparm -I /dev/sda command. I didn’t know what you wanted, so here’s the lot.
-
@Tom-Elliott
Hi Tom, the Machine is behaving perfectly well, I doubt there’s a memory issue, but I’ll run a check on it anyway. THanks for the suggestion.Julian
-
@Julianh I forgot to say Tom, It’s only this one machine.
-
@Julianh said:
I forgot to say Tom, It’s only this one machine.
Do other machines of this model work?
-
The disk model is pretty close to those I mentioned earlier but I don’t think that my suggestions on broken FPDMA_AA were correct in this case - now that I know that it’s only this one machine having issues. I guess you have the same disk in all or some of the other machines as well.
It might be memory but from the things we have seen so far I reckon it’s the disk itself - as simple as that. Run another debug session and
smartctl -A /dev/sda
. If you post a picture of the values we’ll probably be able to tell you how “healthy” your disk is. -
I fixed it, it was the grapjics card. I swapped it over and now it’s working perfectly.
I spend, not hours, but days on this, eventually I swapped every component with one that worked.
Thanks for all your help, what a weird one!
Could you change this to solved please
Thanks