• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Replication Issue

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    5
    59
    10.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mronh @Sebastian Roth
      last edited by

      @Sebastian-Roth

      Hi there, the problem still, sadly… as asked, bellow the logs ( I guess about the time spend X speed of the lan but guessing)

      cheers 🙂

      1_1539088564934_fogreplicator.Sala209RebootRX.transfer.2 - Storage (YYY.YY.210.208).log

      0_1539088564933_fogreplicator.log

      M 1 Reply Last reply Reply Quote 0
      • M
        mronh @mronh
        last edited by mronh

        @mronh what im doing now to try to bypass this: set the sleeptime to 7200, and set a bandwitch limit of the replication of the server and storage node to 10000 Kbps ( maybe, just maybe, lftpd set a default value if its not set in the parms, and pull it down limit)

        while this, i’ll seek for some bug in my infra structure…

        cheers

        1 Reply Last reply Reply Quote 0
        • S
          Sebastian Roth Moderator
          last edited by

          @mronh Please take a look at the apache and php-fpm logs on the storage node (see my signature on where to find those).

          The master node asks the storage about file size and hash values of the files. I have a feeling that something is going wrong with this check on your servers. And therefore the master node often thinks files are not equal on both nodes and re-transfers.

          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

          M 1 Reply Last reply Reply Quote 0
          • M
            mronh @Sebastian Roth
            last edited by mronh

            @Sebastian-Roth Hi man, sorry by the new delay ( end of the year put some wood in the fire here =/ )

            I’ll attach the log u need, i let the storage node inactive till we find the bug/whatever

            thanks in advance dudes
            0_1539706517590_php7.1-fpm.log

            0_1539706561486_php7.1-fpm.2.log

            M 1 Reply Last reply Reply Quote 0
            • M
              mronh @mronh
              last edited by Sebastian Roth

              @mronh apache error log pasted here cause has “entity error” when tried to upload (1200 lines of it)

              [Tue Oct 09 00:06:05.071128 2018] [mpm_prefork:notice] [pid 4797] AH00163: Apache/2.4.34 (Ubuntu) OpenSSL/1.1.0h configured -- resuming normal operations
              [Tue Oct 09 00:06:05.071170 2018] [core:notice] [pid 4797] AH00094: Command line: '/usr/sbin/apache2'
              [Tue Oct 09 10:03:55.253531 2018] [proxy_fcgi:error] [pid 16424] [client YYY.YY.211.13:60152] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10:03:56.483605 2018] [proxy_fcgi:error] [pid 16420] [client YYY.YY.211.13:60160] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10:03:57.679877 2018] [proxy_fcgi:error] [pid 16422] [client YYY.YY.211.13:60168] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10:03:58.862615 2018] [proxy_fcgi:error] [pid 16421] [client YYY.YY.211.13:60192] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10:04:00.075334 2018] [proxy_fcgi:error] [pid 16423] [client YYY.YY.211.13:60208] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10:04:01.293366 2018] [proxy_fcgi:error] [pid 16424] [client YYY.YY.211.13:60224] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10:04:02.526186 2018] [proxy_fcgi:error] [pid 16420] [client YYY.YY.211.13:60240] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10:04:03.757965 2018] [proxy_fcgi:error] [pid 16422] [client YYY.YY.211.13:60248] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 10 ........
              
              ......... [Tue Oct 09 13:50:11.661757 2018] [proxy_fcgi:error] [pid 2761] [client YYY.YY.211.13:32864] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Tue Oct 09 13:50:12.895096 2018] [proxy_fcgi:error] [pid 1109] [client YYY.YY.211.13:32872] AH01071: Got error 'PHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice:  A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n'
              [Wed Oct 10 00:06:37.231462 2018] [mpm_prefork:notice] [pid 4797] AH00171: Graceful restart requested, doing restart
              AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
              
              1 Reply Last reply Reply Quote 0
              • S
                Sebastian Roth Moderator
                last edited by Sebastian Roth

                @mronh Ok nothing serious in the apache and php-fpm logs. Revisiting the other logs you posted I just noticed this:

                2018-10-05 13:43:26 /images/Sala209RebootRX/d1p1.img -> ftp://fog@YYY.YY.210.208/%2Fimages/Sala209RebootRX/d1p1.img 0-8733059 1.77 MiB/s
                ...
                2018-10-05 13:53:51 /images/Sala209RebootRX/d1p2.img.084 -> ftp://fog@YYY.YY.210.208/%2Fimages/Sala209RebootRX/d1p2.img.084 0-91065975 1.99 MiB/s
                ....
                2018-10-05 17:39:43 /images/Sala209RebootRX/d1p2.img.084 -> ftp://fog@YYY.YY.210.208/%2Fimages/Sala209RebootRX/d1p2.img.084 0-64085945 101.21 MiB/s
                2018-10-05 17:39:43 /images/Sala209RebootRX/d1p1.img -> ftp://fog@YYY.YY.210.208/%2Fimages/Sala209RebootRX/d1p1.img 0-8290491 82.63 MiB/s
                

                To me it looks like file sizes differ and as well transfer speeds are way different too. Doesn’t add up for me yet.

                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                M 1 Reply Last reply Reply Quote 0
                • M
                  mronh @Sebastian Roth
                  last edited by mronh

                  @Sebastian-Roth yeah, I see this too, i put my expected “top speed” of the lan in the speed limit of the replication config ( inside fog GUI), thinking its a issue with lftp ( with none value, its take a default value instead of limitless)

                  but, as none changed for better (at least)… I’ll see with the infrastructure guy here, maybe someone (with no knowledge) tried to “fix” some switches / bgp configs

                  I’ll keep in touch, tks!

                  1 Reply Last reply Reply Quote 0
                  • S
                    Sebastian Roth Moderator
                    last edited by

                    @mronh You are on Ubuntu on your master node, right? We are tracking down a replication issue but we see this on CentOS and possibly RedHa so far.

                    Please run ps ax | grep defunct on your master node and let us know the result of it.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    M 3 Replies Last reply Reply Quote 0
                    • M
                      mronh @Sebastian Roth
                      last edited by mronh

                      @Sebastian-Roth yeap, Debian on server, ubuntu on the storage

                      result=> 7174 pts/0 S+ 0:00 grep --color=auto defunct ( edit: I’ll let the replication process make a full round and then take the result here again)

                      1 Reply Last reply Reply Quote 0
                      • M
                        mronh @Sebastian Roth
                        last edited by

                        @Sebastian-Roth ok, theres some defunct processes in the fog server after some replication ( delete parts and etc… )

                        17914 ? Z 0:00 [sh] <defunct>
                        20702 ? Z 0:00 [sh] <defunct>
                        21632 ? Z 0:00 [sh] <defunct>
                        24658 ? Z 0:00 [sh] <defunct>
                        27305 ? Z 0:00 [sh] <defunct>
                        28423 ? Z 0:00 [sh] <defunct>
                        31260 pts/0 S+ 0:00 grep defunct

                        1 Reply Last reply Reply Quote 0
                        • M
                          mronh @Sebastian Roth
                          last edited by

                          @Sebastian-Roth Hi again, one “dumb” question: why lftp was choosed instead of rsync?

                          cheers

                          1 Reply Last reply Reply Quote 0
                          • S
                            Sebastian Roth Moderator
                            last edited by

                            @mronh Well that’s interesting. I have only seen this issue on CentOS so far but it still needs more investigation as I have not found the root cause of it yet. So possibly this is on Debian as well?! As well interesting that you have sh(ell) defunct processes instead of lftp ones. I hope to find more time in the next days to figure that issue out.

                            Hi again, one “dumb” question: why lftp was choosed instead of rsync?

                            I can’t say for sure as this feature was added to FOG before I joined in to work on FOG. But as FTP is used/needed (e.g. for moving an uploaded image) I guess the team decided to use that same protocol for replication as well.

                            While rsync is definitely a great tool it does need a server part just as FTP does. So we would have to have a rsync daemon (or SSH daemon for tunneled rsync) running to use it. Just another component.

                            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                            1 Reply Last reply Reply Quote 0
                            • S
                              Sebastian Roth Moderator
                              last edited by Sebastian Roth

                              @mronh I have looked into this again I want to ask you to check the apache error and php-fpm logs of one or two of your storage nodes again. Especially the apache logs you posted don’t look like it came from a storage node!! Need the logs from Storage (YYY.YY.210.208).

                              We have fixed two issues in the replication code since the 1.5.4 release and you might want to try using the latest working branch. Within the working branch we also have optimized php-fpm settings. Possibly that will help on the storage node side as well.

                              Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                              Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                              M 2 Replies Last reply Reply Quote 0
                              • M
                                mronh @Sebastian Roth
                                last edited by

                                @Sebastian-Roth Hello! Right, I’ll make the push and update booth server and node, let the rep service make a full round and return the logs.

                                cheers

                                1 Reply Last reply Reply Quote 1
                                • M
                                  mronh @Sebastian Roth
                                  last edited by mronh

                                  @Sebastian-Roth Hi again, after a full round of the rep service, here it is:

                                  I uploaded logs from booth server and storage

                                  1_1541078475992_STORAGE_php7.1-fpm.log

                                  0_1541078475992_STORAGE_error.log

                                  1_1541078503271_SERVER_php7.0-fpm.log

                                  0_1541078503271_SERVER_error.log

                                  0_1541078552345_SERVER_fogreplicator.log

                                  0_1541078653124_SERVER_fogreplicator.Sala209RebootRX.transfer.2 - Storage (YYY.YY.210.208).log

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    Sebastian Roth Moderator
                                    last edited by

                                    @mronh Ok, I have dug through a lot of code in the last two days, found and fixed a couple of issues with replication. All that will be in the next release. Hopefully coming soon. Let me know if you are keen to test those changes beforehand.

                                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                    M 1 Reply Last reply Reply Quote 0
                                    • M
                                      mronh @Sebastian Roth
                                      last edited by

                                      @Sebastian-Roth sure, right now Im using only the server due this issue.

                                      if all became fixed my summer here in the next months will be sooooo much easier… hahaha

                                      what I have to do?

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        Sebastian Roth Moderator
                                        last edited by

                                        @mronh The current changes are on a new branch replication (link) which I will merge into working after a first round of feedback.

                                        Not sure if you have ever installed FOG unstable/testing. This is done using git to checkout the current code and install from that.

                                        git clone https://github.com/FOGProject/fogproject/
                                        cd fogproject
                                        git checkout replication
                                        cd bin
                                        ./installfog.sh
                                        

                                        Important notice: I had to change some of the hashing code too and therefore nodes being on different versions (1.5.4 or working VS. replication branch) will end up replicating images over and over again. So you need to have all nodes on the replication branch or setup up a separate test environment!!

                                        Please make sure you stop replication first (systemctl stop FOGImageReplicator), then update the storage node and after that update master node.

                                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                        M 1 Reply Last reply Reply Quote 0
                                        • J
                                          JGallo
                                          last edited by

                                          @Sebastian-Roth Will those hashing code changes you made help with Ubuntu servers specifically 16.04? I remember earlier this summer that there was replication issues looping due to a hash file not matching but resolved to an extent in the working branch. I’m curious because I’m have many storage nodes and I can switch over from working branch to replication if your changes help.

                                          1 Reply Last reply Reply Quote 0
                                          • S
                                            Sebastian Roth Moderator
                                            last edited by

                                            @JGallo I have tested a fair bit and fixed a couple of issues that still were in the working branch. Also the replication branch is based on working and so it has even more replication issues fixed since 1.5.4!

                                            I can’t promise you this is issue free yet. As I don’t have a test setup with many nodes. But I am sure it’s better than 1.5.4 and working branch were. So I would be very happy if you’d give it a try and post feedback and maybe logs if you still see issues.

                                            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 2 / 3
                                            • First post
                                              Last post

                                            233

                                            Online

                                            12.0k

                                            Users

                                            17.3k

                                            Topics

                                            155.2k

                                            Posts
                                            Copyright © 2012-2024 FOG Project