Strange upload error



  • I have fog working perfectly on all my servers, all identical, in the immortal words of Bill and Ted “a most bodacious product”

    However I have one other server, a HP microserver, old box, Dual core 1.3 Ghz CPU no less! It presents an error initially then starts cloning, with speeds of 350-ish MB, and eventually fails.

    The error code is TSC: Fats Calibration failed
    then
    ata2.00: failed to enable AA (error_mask=0x1)
    ata1.00: failed to enable AA (error_mask=0x1)

    There is nothing else on the switch, just a Fog server, this microserver and 2 draytek 2860 routers.

    A picture is worth a thousand wods, so I uploaded the video of it here

    https://youtu.be/Bw8Dq6RJVik

    Does anyone have an idea what it is?

    Thanks

    Julian

    t


  • Developer

    @Julianh WOW! Not something I’d have ever come up with. Thanks a lot for reporting back. I am really glad that we don’t need any kind of special kernel patch to make this work.



  • I fixed it, it was the grapjics card. I swapped it over and now it’s working perfectly.

    I spend, not hours, but days on this, eventually I swapped every component with one that worked.

    Thanks for all your help, what a weird one!

    Could you change this to solved please

    Thanks


  • Developer

    The disk model is pretty close to those I mentioned earlier but I don’t think that my suggestions on broken FPDMA_AA were correct in this case - now that I know that it’s only this one machine having issues. I guess you have the same disk in all or some of the other machines as well.

    It might be memory but from the things we have seen so far I reckon it’s the disk itself - as simple as that. Run another debug session and smartctl -A /dev/sda. If you post a picture of the values we’ll probably be able to tell you how “healthy” your disk is.



  • @Julianh said:

    I forgot to say Tom, It’s only this one machine.

    Do other machines of this model work?



  • @Julianh I forgot to say Tom, It’s only this one machine.



  • @Tom-Elliott
    Hi Tom, the Machine is behaving perfectly well, I doubt there’s a memory issue, but I’ll run a check on it anyway. THanks for the suggestion.

    Julian



  • This is the result of the hdparm -I /dev/sda command. I didn’t know what you wanted, so here’s the lot.4_1449080658879_IMG_0548.JPG 3_1449080658879_IMG_0547.JPG 2_1449080658879_IMG_0546.JPG 1_1449080658879_IMG_0545.JPG 0_1449080658879_IMG_0544.JPG


  • Senior Developer

    @Julianh Is it possible there’s an issue with RAM on this device? Or is this what’s happening on ALL devices?



  • 0_1449079408761_IMG_0542.JPG

    This is the end screen.


  • Developer

    @Julianh said:

    If I run the image in “upload - debug” mode, can I run the commands in that mode?

    Try host -> Basic tasks -> Debug. That should get you to a shell.



  • Thanks you all for getting back to me.

    I’ve uploaded a slightly longer video to https://youtu.be/cg5tpB3kq9M
    It shows the imaging start, but at the end it fails. I have to wait 12 hours for that though.

    Tom, the image continues, as if it were imaging the server, then fails. The imaging will take 12 hours or so it says, it’s 9pm here, so I’ll post the “end” error tomorrow morning.

    Sebastian, there is no raid in the microserver. The drive is a samsung 1TB green. If I run the image in “upload - debug” mode, can I run the commands in that mode?

    Thanks again for your help. I’ll post the image tomorrow.

    Yours

    Julian


  • Senior Developer

    I’m very very confused by this.

    Is there a failure to actually uploading the image? Based on the video, it seems to be doing what it needs to even with all the “errors” on the screen. Of course we only see the point of the first partition calculating the bitmap and uploading the first partition, but does it not go further?


  • Developer

    Hi Julian,

    I have to admit that I haven’t had this error on one of my systems before. So I am just reading up things trying to understand what might be going wrong in your case. So there is a lot of guess work involved and I might be wrong…

    Have you read through the bug report yet? What exactly do you see when it fails in the end of imaging process? Error messages?

    What sort of hard drives have you tried so far? Are those Seagate Momentus SpinPoint disks or HP VB0250EAVER (https://lkml.org/lkml/2015/7/1/235)??

    Please start a debug session for this host or boot a live linux via CD/DVD. Then run hdparm -I /dev/sda to see the model and serial number of the disk. Please let us know. By the way, do you see the same messages when booting a live linux?? dmesg | grep "failed to enable AA"



  • @Julianh Does the microserver use RAID?



  • Hi Sebastian,

    I swapped the hard drive, and got the exact same error as in the video, I also left it, but at the end of the supposed imaging rocess, it failed.

    What does the error code mean, I have the controller set to AHCI disks,

    Thanks

    Julian


  • Developer

    @Julianh said:

    unfortunatley it didn’t work.

    What exactly didn’t work? Do you still see the same thing happening? Same messages and failing upload? Do you get an error message when things fail? What happens if you use a different hard drive??



  • Hi Sebastian,

    I’m working away, so I only had the weekend to try it. I swapped over the hard drive, and then tried the libata.force=noncq option, but unfortunatley it didn’t work.

    Do you have any suggestions I could try, or options I could enable?

    I’m back home at the weekend, then at home for a couple of weeks. Hopefully we can find the solution then.

    Is there ny further information you want?

    Thanks

    Julian


  • Developer

    @Julianh Any news on this. Did you get this solved?


  • Developer

    As you can see in the bug report some hard drives are blacklisted within the libata kernel source code. I thought about suggesting you should try FOG trunk as maybe newer kernel versions have several more drives on the blacklist. But comparing the kernel source code I only see one new hard disk being added as BROKEN_FPDMA_AA. See youself:
    kernel 3.15 (FOG 1.2.0 has 3.15.6): http://lxr.free-electrons.com/source/drivers/ata/libata-core.c?v=3.15#L4150
    kernel 4.3 (FOG trunk): http://lxr.free-electrons.com/source/drivers/ata/libata-core.c#L4143

    Take a look at the source code and see if you can find your specific hard drive. To find out which HD you have in this machine you can boot up any live linux and run hdparm -I /dev/sda. You should see model, serial and firmware information on the first few lines of the output. We might report this to the kernel developers. Or you can just try using a different hard drive in this server.


 

580
Online

5.4k
Users

12.6k
Topics

118.8k
Posts