Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors


  • OS: Debian
    FOG Version: 1.5.9-RC2

    I set up one additional node to my FOG server and sometimes (only with certain images), it PXE boots normally and will throw the following error right before Partclone starts:

    alt text

    However, once it continues in one minute, it begins to image.

    Then, usually during imaging it will display the following message:

    alt text

    But FOG will still image the device correctly once Partclone is completed. It’s bizarre to me that it would say that it could not write the image due to “no such file or directory” and then image it. I also double checked on the node to make sure the image file was in /images and it’s definitely there.

    Perhaps one of my drives on the node is failing? It’s brand new though.

    I’m also curious as to what the ata1:00: failed command could possibly mean. Especially in the context of it imaging a client successfully.

    Any ideas on what may be going on?

    Thanks in advance, this community has been a lifesaver and is much appriciated!

  • Moderator

    @danieln Correct as long as bzImage and bzImage are what you want then the proper kernels will boot.


  • @george1421 said

    FOG (FOS) kernels can be downloaded from here https://fogproject.org/kernels/ download both the x64 and x32 bit kernels. Save the x64 as bzImage and the x32 ad bzImage32 (case is important). Then you can just move the files to /var/www/html/fog/service/ipxe directory on the FOG server. It probably wouldn’t hurt to rename the existing ones before you move the new kernels in. You can confirm the version of the bzImage files with file /var/www/html/fog/service/ipxe/bzImage It should print out the version of the kernel.

    Thank you for this info! I downloaded those files, renamed them, and moved them to /var/www/html/fog/service/ipxe . I ended up keeping the old files and renaming them bzImageOLD and bzImage32OLD respectively. The new output of file /var/www/html/fog/service/ipxe/bzImage is this:

    /var/www/html/fog/service/ipxe/bzImage:     Linux kernel x86 boox executable bzImage, version 4.19.123 (jenkins-agent@Tollana) #1 SMP Sun May 17 01:04:09 CDT 2020, R0-rootFS, swap_dev 0x8, Normal VGA
    
    /var/www/html/fog/service/ipxe/bzImage32:     Linux kernel x86 boox executable bzImage, version 4.19.123 (jenkins-agent@Tollana) #1 SMP Sat May 16 23:59:01 CDT 2020, R0-rootFS, swap_dev 0x7, Normal VGA
    
    /var/www/html/fog/service/ipxe/bxImage32OLD: Linux kernel x86 boot executable bzImage, version 4.19.145 (sebastian@Tollana) #1 SMP Sun Sep 13 05:43:10 CDT 2020, R0-rootFS, swap_dev 0x7, Normal VGA
    
    /var/www/html/fog/service/ipxe/bxImageOLD: Linux kernel x86 boot executable bzImage, version 4.19.145 (sebastian@Tollana) #1 SMP Sun Sep 13 05:35:01 CDT 2020, R0-rootFS, swap_dev 0x8, Normal VGA
    
    

    Version 4.19.123 is what is on the master as well as all the other nodes. I trust that it will look at these since they’re named properly even though the other files are in the same directory but they’re renamed. I will run a recapture/deploy test and report back with findings.

    Thank you both again!

  • Moderator

    @danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:

    is there an easy way to update this kernel

    FOG (FOS) kernels can be downloaded from here https://fogproject.org/kernels/ download both the x64 and x32 bit kernels. Save the x64 as bzImage and the x32 ad bzImage32 (case is important). Then you can just move the files to /var/www/html/fog/service/ipxe directory on the FOG server. It probably wouldn’t hurt to rename the existing ones before you move the new kernels in. You can confirm the version of the bzImage files with file /var/www/html/fog/service/ipxe/bzImage It should print out the version of the kernel.


  • @sebastian-roth said

    Sure, taking a quick look doesn’t hurt. The 1.5.9-RC2 was a release candidate not long before 1.5.9 was released. Possibly that uses a different kernel - depends on when it was installed.

    You should be able to get the kernel version by running the following command on all your nodes: file /var/www/html/fog/service/ipxe/bzImage* (the * will include the 64 bit and 32 bit kernel - the output should show kernel version 4.19.x or possibly 5.6.x.)

    Well I’ll be damned. The kernel on the master and all of the working nodes is version 4.19.123 and the problematic node’s kernel is version 4.19.145. You think that may be what’s causing the issue? But why would some of the other images be working fine then?

    At any rate, is there an easy way to update this kernel without having to do a full reinstall?

  • Senior Developer

    @danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:

    I’d like to circle back around to this to get some more clarifcation. My master node has version 1.5.9-RC2 and this particular node (as well as all the other nodes) have version 1.5.9. I’m unsure as to what the differences between RC2 and the 1.5.9 version are, but do you think this is something worth investigating?

    Sure, taking a quick look doesn’t hurt. The 1.5.9-RC2 was a release candidate not long before 1.5.9 was released. Possibly that uses a different kernel - depends on when it was installed.

    You should be able to get the kernel version by running the following command on all your nodes: file /var/www/html/fog/service/ipxe/bzImage* (the * will include the 64 bit and 32 bit kernel - the output should show kernel version 4.19.x or possibly 5.6.x.)


  • @sebastian-roth said

    you’re thinking its more along the lines of hardware issues with the laptop and not with FOS or the Node itself? I feel like it only throws those ATA errors when connecting to that one node, but I could be wrong. Maybe that’s the next thing i’ll test.

    Yes I would say it’s very unlikely to be caused by FOS or the node unless you have different FOG/kernel versions installed.

    I’d like to circle back around to this to get some more clarifcation. My master node has version 1.5.9-RC2 and this particular node (as well as all the other nodes) have version 1.5.9. I’m unsure as to what the differences between RC2 and the 1.5.9 version are, but do you think this is something worth investigating?

  • Senior Developer

    @danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:

    However, just for my own clarification, those two checksum outputs should be the same if it replicated correctly?

    Exactly!


  • @sebastian-roth

    @danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:

    The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the Master node was:
    e929a14a17c60b2b9a7dfdf18f526232 /images/DellE5450-80-Non-Office/d1p1.img

    The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the problematic node was:
    1d4bf4ac2bcef83013fe4589149b0e30 /images/DellE5450-80-Non-Office/d1p1.img

    That’s very interesting. I did not expect the checksums to be different but good that I asked. To me that means that the file was not replicated from the master to the storage properly. So please delete /images/DellE5450-80-Non-Office/d1p1.img on the storage node and wait till it’s being replicated from the master. Then check md5sums again. The FOG replication services checks filesize and checksums (this check only happens for smaller files because it puts too much load on the server if checksums for large files are calculated on every run) but seems like this is a seldom case where filesize matches but checksum doesn’t.

    I cannot exagerrate how useful this information is for me to know for the future. So thanks a million! I will try that and report back. However, just for my own clarification, those two checksum outputs should be the same if it replicated correctly?

    I am using Zstd for compression. Do you recommend Gzip? What are the pros/cons of both?

    Both are fine. I tend to use Zstd more and more.

    Good to know.

  • Senior Developer

    @danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:

    The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the Master node was:
    e929a14a17c60b2b9a7dfdf18f526232 /images/DellE5450-80-Non-Office/d1p1.img

    The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the problematic node was:
    1d4bf4ac2bcef83013fe4589149b0e30 /images/DellE5450-80-Non-Office/d1p1.img

    That’s very interesting. I did not expect the checksums to be different but good that I asked. To me that means that the file was not replicated from the master to the storage properly. So please delete /images/DellE5450-80-Non-Office/d1p1.img on the storage node and wait till it’s being replicated from the master. Then check md5sums again. The FOG replication services checks filesize and checksums (this check only happens for smaller files because it puts too much load on the server if checksums for large files are calculated on every run) but seems like this is a seldom case where filesize matches but checksum doesn’t.

    I am using Zstd for compression. Do you recommend Gzip? What are the pros/cons of both?

    Both are fine. I tend to use Zstd more and more.

    you’re thinking its more along the lines of hardware issues with the laptop and not with FOS or the Node itself? I feel like it only throws those ATA errors when connecting to that one node, but I could be wrong. Maybe that’s the next thing i’ll test.

    Yes I would say it’s very unlikely to be caused by FOS or the node unless you have different FOG/kernel versions installed.


  • @sebastian-roth Thanks very much for the response and for the very helpful info!

    I should have explained this a bit more in depth earlier. FOS (the Linux OS doing all the work) reads from the file (e.g. d1.p1.img) piping it through a decompression fifo. So if partclone says “No such file” it’s very likely the decompression fifo died for some reason (file corrupted, RAM issue, …) and partclone is not able to read from it anymore.

    Please run file /images/DellE5450-80-Non-Office/d1p1.img and md5sum /images/DellE5450-80-Non-Office/d1p1.img on both your nodes and compare the output. Which compression do you use, Gzip or Zstd?

    That makes sense. It’s just weird to me that it would die on this node when it’s brand new, but i suppose it’s possible. Perhaps I’ll just try a recapture. Or, if I go into the node and manually delete the DellE5450-80 directory, will the Master know to repropogate it? If not, I could try a recapture and see if that works.

    The output of file /images/DellE5450-80-Non-Office/d1p1.img on both the Master node and the node I was having issues with was the following :

    /images/DellE5450-80-Non-Office/d1p1.img: Zstandard compressed data (v0.8+), Dictionary ID: None
    

    The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the Master node was:

    e929a14a17c60b2b9a7dfdf18f526232  /images/DellE5450-80-Non-Office/d1p1.img
    

    The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the problematic node was:

    1d4bf4ac2bcef83013fe4589149b0e30  /images/DellE5450-80-Non-Office/d1p1.img
    

    I am using Zstd for compression. Do you recommend Gzip? What are the pros/cons of both?

    Do you think it’s maybe isolated to the image? I’d assume the ATA errors have something to do with the hard drive but I’m not sure what.

    The ATA errors stem from the same FOS (FOG Linux OS) and I would read that as kind of an issue with the Linux kernel with those particular notebooks. It is possible the deploy is fine despite the messages but I am not sure. When you search the web for those ATA messages people say that very often the SATA cable or even power supply (in PCs) can cause such messages. Often Windows is less picky with this kind of things and so I can imagine for Linux to complain (still trying hard) but Windows not so.

    I will say that I replaced the hard drive on one of the client laptops that was having that issue and it was resolved, but I attempted a hard drive replacement on a separate client and it was still throwing the ATA errors, so maybe it was something else. But you’re thinking its more along the lines of hardware issues with the laptop and not with FOS or the Node itself? I feel like it only throws those ATA errors when connecting to that one node, but I could be wrong. Maybe that’s the next thing i’ll test.

  • Senior Developer

    @danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:

    I only appear to get the “No such file or directory” error on the node I set up yesterday.

    I should have explained this a bit more in depth earlier. FOS (the Linux OS doing all the work) reads from the file (e.g. d1.p1.img) piping it through a decompression fifo. So if partclone says “No such file” it’s very likely the decompression fifo died for some reason (file corrupted, RAM issue, …) and partclone is not able to read from it anymore.

    Please run file /images/DellE5450-80-Non-Office/d1p1.img and md5sum /images/DellE5450-80-Non-Office/d1p1.img on both your nodes and compare the output. Which compression do you use, Gzip or Zstd?

    Do you think it’s maybe isolated to the image? I’d assume the ATA errors have something to do with the hard drive but I’m not sure what.

    The ATA errors stem from the same FOS (FOG Linux OS) and I would read that as kind of an issue with the Linux kernel with those particular notebooks. It is possible the deploy is fine despite the messages but I am not sure. When you search the web for those ATA messages people say that very often the SATA cable or even power supply (in PCs) can cause such messages. Often Windows is less picky with this kind of things and so I can imagine for Linux to complain (still trying hard) but Windows not so.


  • @sebastian-roth said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:

    @danieln Have you tried imaging that exact same machine from both servers and you only get the “No such file or directory” error on the later one?

    And asking again, do you have another notebook - exact same model - that you can deploy to, just to see if you get the same ATA errors?!

    I only appear to get the “No such file or directory” error on the node I set up yesterday. However I am getting ATA errors on the other nodes now too with multiple Dell E5480s. Here’s a screenshot of what i’m seeing of one I am currently imaging from a different node:

    alt text

    And again, it finishes correctly. This is a picture of the same screen moments later:

    alt text

    Do you think it’s maybe isolated to the image? I’d assume the ATA errors have something to do with the hard drive but I’m not sure what.

  • Senior Developer

    @danieln Have you tried imaging that exact same machine from both servers and you only get the “No such file or directory” error on the later one?

    And asking again, do you have another notebook - exact same model - that you can deploy to, just to see if you get the same ATA errors?!


  • @sebastian-roth Weird, right?

    Here is the output of ls -al /images/DellE5450-80-Non-Office/ on the Master Node:

    alt text

    and here is that same output on the Node that was throwing those errors:

    alt text

  • Senior Developer

    @danieln Please run ls -al /images/DellE5450-80-Non-Office/ on your FOG server console and post output here.

    It’s strange you get those many ATA error messages and it would still finish. I would never expect that! Do you have another device of the exact same model? Does it show the same error messages when deploying to that?

294
Online

8.5k
Users

15.3k
Topics

143.3k
Posts