Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors
-
@sebastian-roth said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
@danieln Have you tried imaging that exact same machine from both servers and you only get the “No such file or directory” error on the later one?
And asking again, do you have another notebook - exact same model - that you can deploy to, just to see if you get the same ATA errors?!
I only appear to get the “No such file or directory” error on the node I set up yesterday. However I am getting ATA errors on the other nodes now too with multiple Dell E5480s. Here’s a screenshot of what i’m seeing of one I am currently imaging from a different node:
And again, it finishes correctly. This is a picture of the same screen moments later:
Do you think it’s maybe isolated to the image? I’d assume the ATA errors have something to do with the hard drive but I’m not sure what.
-
@danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
I only appear to get the “No such file or directory” error on the node I set up yesterday.
I should have explained this a bit more in depth earlier. FOS (the Linux OS doing all the work) reads from the file (e.g.
d1.p1.img
) piping it through a decompression fifo. So if partclone says “No such file” it’s very likely the decompression fifo died for some reason (file corrupted, RAM issue, …) and partclone is not able to read from it anymore.Please run
file /images/DellE5450-80-Non-Office/d1p1.img
andmd5sum /images/DellE5450-80-Non-Office/d1p1.img
on both your nodes and compare the output. Which compression do you use, Gzip or Zstd?Do you think it’s maybe isolated to the image? I’d assume the ATA errors have something to do with the hard drive but I’m not sure what.
The ATA errors stem from the same FOS (FOG Linux OS) and I would read that as kind of an issue with the Linux kernel with those particular notebooks. It is possible the deploy is fine despite the messages but I am not sure. When you search the web for those ATA messages people say that very often the SATA cable or even power supply (in PCs) can cause such messages. Often Windows is less picky with this kind of things and so I can imagine for Linux to complain (still trying hard) but Windows not so.
-
@sebastian-roth Thanks very much for the response and for the very helpful info!
I should have explained this a bit more in depth earlier. FOS (the Linux OS doing all the work) reads from the file (e.g.
d1.p1.img
) piping it through a decompression fifo. So if partclone says “No such file” it’s very likely the decompression fifo died for some reason (file corrupted, RAM issue, …) and partclone is not able to read from it anymore.Please run
file /images/DellE5450-80-Non-Office/d1p1.img
andmd5sum /images/DellE5450-80-Non-Office/d1p1.img
on both your nodes and compare the output. Which compression do you use, Gzip or Zstd?That makes sense. It’s just weird to me that it would die on this node when it’s brand new, but i suppose it’s possible. Perhaps I’ll just try a recapture. Or, if I go into the node and manually delete the DellE5450-80 directory, will the Master know to repropogate it? If not, I could try a recapture and see if that works.
The output of file
/images/DellE5450-80-Non-Office/d1p1.img
on both the Master node and the node I was having issues with was the following :/images/DellE5450-80-Non-Office/d1p1.img: Zstandard compressed data (v0.8+), Dictionary ID: None
The output of
md5sum /images/DellE5450-80-Non-Office/d1p1.img
on the Master node was:e929a14a17c60b2b9a7dfdf18f526232 /images/DellE5450-80-Non-Office/d1p1.img
The output of
md5sum /images/DellE5450-80-Non-Office/d1p1.img
on the problematic node was:1d4bf4ac2bcef83013fe4589149b0e30 /images/DellE5450-80-Non-Office/d1p1.img
I am using Zstd for compression. Do you recommend Gzip? What are the pros/cons of both?
Do you think it’s maybe isolated to the image? I’d assume the ATA errors have something to do with the hard drive but I’m not sure what.
The ATA errors stem from the same FOS (FOG Linux OS) and I would read that as kind of an issue with the Linux kernel with those particular notebooks. It is possible the deploy is fine despite the messages but I am not sure. When you search the web for those ATA messages people say that very often the SATA cable or even power supply (in PCs) can cause such messages. Often Windows is less picky with this kind of things and so I can imagine for Linux to complain (still trying hard) but Windows not so.
I will say that I replaced the hard drive on one of the client laptops that was having that issue and it was resolved, but I attempted a hard drive replacement on a separate client and it was still throwing the ATA errors, so maybe it was something else. But you’re thinking its more along the lines of hardware issues with the laptop and not with FOS or the Node itself? I feel like it only throws those ATA errors when connecting to that one node, but I could be wrong. Maybe that’s the next thing i’ll test.
-
@danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the Master node was:
e929a14a17c60b2b9a7dfdf18f526232 /images/DellE5450-80-Non-Office/d1p1.imgThe output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the problematic node was:
1d4bf4ac2bcef83013fe4589149b0e30 /images/DellE5450-80-Non-Office/d1p1.imgThat’s very interesting. I did not expect the checksums to be different but good that I asked. To me that means that the file was not replicated from the master to the storage properly. So please delete /images/DellE5450-80-Non-Office/d1p1.img on the storage node and wait till it’s being replicated from the master. Then check md5sums again. The FOG replication services checks filesize and checksums (this check only happens for smaller files because it puts too much load on the server if checksums for large files are calculated on every run) but seems like this is a seldom case where filesize matches but checksum doesn’t.
I am using Zstd for compression. Do you recommend Gzip? What are the pros/cons of both?
Both are fine. I tend to use Zstd more and more.
you’re thinking its more along the lines of hardware issues with the laptop and not with FOS or the Node itself? I feel like it only throws those ATA errors when connecting to that one node, but I could be wrong. Maybe that’s the next thing i’ll test.
Yes I would say it’s very unlikely to be caused by FOS or the node unless you have different FOG/kernel versions installed.
-
@danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
The output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the Master node was:
e929a14a17c60b2b9a7dfdf18f526232 /images/DellE5450-80-Non-Office/d1p1.imgThe output of md5sum /images/DellE5450-80-Non-Office/d1p1.img on the problematic node was:
1d4bf4ac2bcef83013fe4589149b0e30 /images/DellE5450-80-Non-Office/d1p1.imgThat’s very interesting. I did not expect the checksums to be different but good that I asked. To me that means that the file was not replicated from the master to the storage properly. So please delete /images/DellE5450-80-Non-Office/d1p1.img on the storage node and wait till it’s being replicated from the master. Then check md5sums again. The FOG replication services checks filesize and checksums (this check only happens for smaller files because it puts too much load on the server if checksums for large files are calculated on every run) but seems like this is a seldom case where filesize matches but checksum doesn’t.
I cannot exagerrate how useful this information is for me to know for the future. So thanks a million! I will try that and report back. However, just for my own clarification, those two checksum outputs should be the same if it replicated correctly?
I am using Zstd for compression. Do you recommend Gzip? What are the pros/cons of both?
Both are fine. I tend to use Zstd more and more.
Good to know.
-
@danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
However, just for my own clarification, those two checksum outputs should be the same if it replicated correctly?
Exactly!
-
@sebastian-roth said
you’re thinking its more along the lines of hardware issues with the laptop and not with FOS or the Node itself? I feel like it only throws those ATA errors when connecting to that one node, but I could be wrong. Maybe that’s the next thing i’ll test.
Yes I would say it’s very unlikely to be caused by FOS or the node unless you have different FOG/kernel versions installed.
I’d like to circle back around to this to get some more clarifcation. My master node has version 1.5.9-RC2 and this particular node (as well as all the other nodes) have version 1.5.9. I’m unsure as to what the differences between RC2 and the 1.5.9 version are, but do you think this is something worth investigating?
-
@danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
I’d like to circle back around to this to get some more clarifcation. My master node has version 1.5.9-RC2 and this particular node (as well as all the other nodes) have version 1.5.9. I’m unsure as to what the differences between RC2 and the 1.5.9 version are, but do you think this is something worth investigating?
Sure, taking a quick look doesn’t hurt. The 1.5.9-RC2 was a release candidate not long before 1.5.9 was released. Possibly that uses a different kernel - depends on when it was installed.
You should be able to get the kernel version by running the following command on all your nodes:
file /var/www/html/fog/service/ipxe/bzImage*
(the*
will include the 64 bit and 32 bit kernel - the output should show kernel version 4.19.x or possibly 5.6.x.) -
@sebastian-roth said
Sure, taking a quick look doesn’t hurt. The 1.5.9-RC2 was a release candidate not long before 1.5.9 was released. Possibly that uses a different kernel - depends on when it was installed.
You should be able to get the kernel version by running the following command on all your nodes:
file /var/www/html/fog/service/ipxe/bzImage*
(the*
will include the 64 bit and 32 bit kernel - the output should show kernel version 4.19.x or possibly 5.6.x.)Well I’ll be damned. The kernel on the master and all of the working nodes is version 4.19.123 and the problematic node’s kernel is version 4.19.145. You think that may be what’s causing the issue? But why would some of the other images be working fine then?
At any rate, is there an easy way to update this kernel without having to do a full reinstall?
-
@danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
is there an easy way to update this kernel
FOG (FOS) kernels can be downloaded from here https://fogproject.org/kernels/ download both the x64 and x32 bit kernels. Save the x64 as
bzImage
and the x32 adbzImage32
(case is important). Then you can just move the files to/var/www/html/fog/service/ipxe
directory on the FOG server. It probably wouldn’t hurt to rename the existing ones before you move the new kernels in. You can confirm the version of the bzImage files withfile /var/www/html/fog/service/ipxe/bzImage
It should print out the version of the kernel. -
@george1421 said
FOG (FOS) kernels can be downloaded from here https://fogproject.org/kernels/ download both the x64 and x32 bit kernels. Save the x64 as
bzImage
and the x32 adbzImage32
(case is important). Then you can just move the files to/var/www/html/fog/service/ipxe
directory on the FOG server. It probably wouldn’t hurt to rename the existing ones before you move the new kernels in. You can confirm the version of the bzImage files withfile /var/www/html/fog/service/ipxe/bzImage
It should print out the version of the kernel.Thank you for this info! I downloaded those files, renamed them, and moved them to
/var/www/html/fog/service/ipxe
. I ended up keeping the old files and renaming thembzImageOLD
andbzImage32OLD
respectively. The new output offile /var/www/html/fog/service/ipxe/bzImage
is this:/var/www/html/fog/service/ipxe/bzImage: Linux kernel x86 boox executable bzImage, version 4.19.123 (jenkins-agent@Tollana) #1 SMP Sun May 17 01:04:09 CDT 2020, R0-rootFS, swap_dev 0x8, Normal VGA /var/www/html/fog/service/ipxe/bzImage32: Linux kernel x86 boox executable bzImage, version 4.19.123 (jenkins-agent@Tollana) #1 SMP Sat May 16 23:59:01 CDT 2020, R0-rootFS, swap_dev 0x7, Normal VGA /var/www/html/fog/service/ipxe/bxImage32OLD: Linux kernel x86 boot executable bzImage, version 4.19.145 (sebastian@Tollana) #1 SMP Sun Sep 13 05:43:10 CDT 2020, R0-rootFS, swap_dev 0x7, Normal VGA /var/www/html/fog/service/ipxe/bxImageOLD: Linux kernel x86 boot executable bzImage, version 4.19.145 (sebastian@Tollana) #1 SMP Sun Sep 13 05:35:01 CDT 2020, R0-rootFS, swap_dev 0x8, Normal VGA
Version 4.19.123 is what is on the master as well as all the other nodes. I trust that it will look at these since they’re named properly even though the other files are in the same directory but they’re renamed. I will run a recapture/deploy test and report back with findings.
Thank you both again!
-
@danieln Correct as long as bzImage and bzImage are what you want then the proper kernels will boot.