• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Replication Issue

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    5
    59
    10.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mronh @Sebastian Roth
      last edited by mronh

      @Sebastian-Roth yeap, Debian on server, ubuntu on the storage

      result=> 7174 pts/0 S+ 0:00 grep --color=auto defunct ( edit: I’ll let the replication process make a full round and then take the result here again)

      1 Reply Last reply Reply Quote 0
      • M
        mronh @Sebastian Roth
        last edited by

        @Sebastian-Roth ok, theres some defunct processes in the fog server after some replication ( delete parts and etc… )

        17914 ? Z 0:00 [sh] <defunct>
        20702 ? Z 0:00 [sh] <defunct>
        21632 ? Z 0:00 [sh] <defunct>
        24658 ? Z 0:00 [sh] <defunct>
        27305 ? Z 0:00 [sh] <defunct>
        28423 ? Z 0:00 [sh] <defunct>
        31260 pts/0 S+ 0:00 grep defunct

        1 Reply Last reply Reply Quote 0
        • M
          mronh @Sebastian Roth
          last edited by

          @Sebastian-Roth Hi again, one “dumb” question: why lftp was choosed instead of rsync?

          cheers

          1 Reply Last reply Reply Quote 0
          • S
            Sebastian Roth Moderator
            last edited by

            @mronh Well that’s interesting. I have only seen this issue on CentOS so far but it still needs more investigation as I have not found the root cause of it yet. So possibly this is on Debian as well?! As well interesting that you have sh(ell) defunct processes instead of lftp ones. I hope to find more time in the next days to figure that issue out.

            Hi again, one “dumb” question: why lftp was choosed instead of rsync?

            I can’t say for sure as this feature was added to FOG before I joined in to work on FOG. But as FTP is used/needed (e.g. for moving an uploaded image) I guess the team decided to use that same protocol for replication as well.

            While rsync is definitely a great tool it does need a server part just as FTP does. So we would have to have a rsync daemon (or SSH daemon for tunneled rsync) running to use it. Just another component.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            1 Reply Last reply Reply Quote 0
            • S
              Sebastian Roth Moderator
              last edited by Sebastian Roth

              @mronh I have looked into this again I want to ask you to check the apache error and php-fpm logs of one or two of your storage nodes again. Especially the apache logs you posted don’t look like it came from a storage node!! Need the logs from Storage (YYY.YY.210.208).

              We have fixed two issues in the replication code since the 1.5.4 release and you might want to try using the latest working branch. Within the working branch we also have optimized php-fpm settings. Possibly that will help on the storage node side as well.

              Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

              Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

              M 2 Replies Last reply Reply Quote 0
              • M
                mronh @Sebastian Roth
                last edited by

                @Sebastian-Roth Hello! Right, I’ll make the push and update booth server and node, let the rep service make a full round and return the logs.

                cheers

                1 Reply Last reply Reply Quote 1
                • M
                  mronh @Sebastian Roth
                  last edited by mronh

                  @Sebastian-Roth Hi again, after a full round of the rep service, here it is:

                  I uploaded logs from booth server and storage

                  1_1541078475992_STORAGE_php7.1-fpm.log

                  0_1541078475992_STORAGE_error.log

                  1_1541078503271_SERVER_php7.0-fpm.log

                  0_1541078503271_SERVER_error.log

                  0_1541078552345_SERVER_fogreplicator.log

                  0_1541078653124_SERVER_fogreplicator.Sala209RebootRX.transfer.2 - Storage (YYY.YY.210.208).log

                  1 Reply Last reply Reply Quote 0
                  • S
                    Sebastian Roth Moderator
                    last edited by

                    @mronh Ok, I have dug through a lot of code in the last two days, found and fixed a couple of issues with replication. All that will be in the next release. Hopefully coming soon. Let me know if you are keen to test those changes beforehand.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    M 1 Reply Last reply Reply Quote 0
                    • M
                      mronh @Sebastian Roth
                      last edited by

                      @Sebastian-Roth sure, right now Im using only the server due this issue.

                      if all became fixed my summer here in the next months will be sooooo much easier… hahaha

                      what I have to do?

                      1 Reply Last reply Reply Quote 0
                      • S
                        Sebastian Roth Moderator
                        last edited by

                        @mronh The current changes are on a new branch replication (link) which I will merge into working after a first round of feedback.

                        Not sure if you have ever installed FOG unstable/testing. This is done using git to checkout the current code and install from that.

                        git clone https://github.com/FOGProject/fogproject/
                        cd fogproject
                        git checkout replication
                        cd bin
                        ./installfog.sh
                        

                        Important notice: I had to change some of the hashing code too and therefore nodes being on different versions (1.5.4 or working VS. replication branch) will end up replicating images over and over again. So you need to have all nodes on the replication branch or setup up a separate test environment!!

                        Please make sure you stop replication first (systemctl stop FOGImageReplicator), then update the storage node and after that update master node.

                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                        M 1 Reply Last reply Reply Quote 0
                        • J
                          JGallo
                          last edited by

                          @Sebastian-Roth Will those hashing code changes you made help with Ubuntu servers specifically 16.04? I remember earlier this summer that there was replication issues looping due to a hash file not matching but resolved to an extent in the working branch. I’m curious because I’m have many storage nodes and I can switch over from working branch to replication if your changes help.

                          1 Reply Last reply Reply Quote 0
                          • S
                            Sebastian Roth Moderator
                            last edited by

                            @JGallo I have tested a fair bit and fixed a couple of issues that still were in the working branch. Also the replication branch is based on working and so it has even more replication issues fixed since 1.5.4!

                            I can’t promise you this is issue free yet. As I don’t have a test setup with many nodes. But I am sure it’s better than 1.5.4 and working branch were. So I would be very happy if you’d give it a try and post feedback and maybe logs if you still see issues.

                            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                            1 Reply Last reply Reply Quote 0
                            • J
                              JGallo
                              last edited by

                              @Sebastian-Roth Of course!! I will update server and nodes today to replication branch and get it ready for an image upload. I think the issue was with images being updated and then uploaded to existing images on the FOG server. Replication of a new image definition was fine even to the storage nodes. It will probably be a bit before I can have some concrete information since I don’t have many images that replicate across all nodes since I have storage groups defined in the image.

                              1 Reply Last reply Reply Quote 0
                              • M
                                mronh @Sebastian Roth
                                last edited by

                                @Sebastian-Roth right. I’ll do it next week then.

                                this week im on the leash again… oh boy how I hate the end of the year…=/ hahahaha

                                1 Reply Last reply Reply Quote 0
                                • J
                                  JGallo
                                  last edited by

                                  @Sebastian-Roth I switched over to replication branch and updated all storage nodes along with my fog server. I uploaded image and it seemed like its working fine for original image. I haven’t updated the image since its very new but I when I have a chance, which will be very shortly since a project im currently working on will allow me to update an image on an existing one, I will go ahead and updated it and tail the replication log.

                                  I also noticed in a different post that another user did the same thing and tested replication. Looks like the changes in the replication branched have work. I will update as well once i upload an image to an existing one to see if the updated image replicates properly to storage node.

                                  1 Reply Last reply Reply Quote 1
                                  • S
                                    Sebastian Roth Moderator
                                    last edited by

                                    @JGallo The more feedback I get on this the better. Looking forward to hear from you.

                                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                    M 1 Reply Last reply Reply Quote 0
                                    • M
                                      mronh @Sebastian Roth
                                      last edited by mronh

                                      @Sebastian-Roth Hey man, make the upgrade from booth server and storage ( following ur previous instructions) and now we get some data to think bout. Justo to make it clear, I deleted all images in the storage to force a full replication from begining

                                      I’ll up the logs from server and storage again, but ‘short history’ I see some “Erro fatal: max-retries exceeded (421 There are too many connections from your internet address.)” and “File size mismatch”

                                      Server Side:

                                      2_1542111667714_SERVER_php7.0-fpm.log
                                      1_1542111667714_SERVER_fogreplicator.log
                                      0_1542111667714_SERVER_error.log

                                      Storage Side:

                                      1_1542111678025_STORAGE_php7.1-fpm.log
                                      0_1542111678024_STORAGE_error.log

                                      Besides that, the improves are really good, logs more accurate and steps of the algorithm way more “solidified” way to go man!

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        Sebastian Roth Moderator
                                        last edited by

                                        @mronh Thanks for testing and reporting back. The first thing that jumps at me in the logs are many lines of hash mismatch like this:

                                        File hash mismatch - d1p2.img.002: c8a2b5f37de6e0c7a5eeb0843b9164bac05cc984cada2cfb8da6132ba938bc2a != 7e56e1209070f2b8494e3d60cb6a27c103925bb442056ba43438c456126f027849baf5547ca1e0fec8accc309aae64ba1ae569e8698fe5e8041052cb627ed6b1
                                        

                                        See the different length of the hash sums. I am fairly sure the storage node is not updated to the latest replication commit!!

                                        Please check your web directory, maybe there is some link issue and you have two different versions mixed up. Run ls -al /var/www /var/www/html /var/www/fog and post results here.

                                        Beside that I’d stop replication for now on your master node and maybe try upgrading to the replication branch on the storage node again!

                                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                        M 2 Replies Last reply Reply Quote 0
                                        • M
                                          mronh @Sebastian Roth
                                          last edited by Sebastian Roth

                                          @Sebastian-Roth on the server side

                                          ls -al /var/www /var/www/html /var/www/fog
                                          /var/www:
                                          total 20
                                          drwxr-xr-x  4 root     root     4096 nov 12 16:06 .
                                          drwxr-xr-x 12 root     root     4096 ago 28 13:16 ..
                                          drwxr-xr-x 10 www-data www-data 4096 nov 12 16:14 fog
                                          drwxr-xr-x  2 root     root     4096 ago 28 13:22 html
                                          -rw-r--r--  1 root     root       41 out 10 11:10 index.php
                                          
                                          /var/www/fog:
                                          total 408
                                          drwxr-xr-x 10 www-data www-data   4096 nov 12 16:14 .
                                          drwxr-xr-x  4 root     root       4096 nov 12 16:06 ..
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 16:06 api
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 16:06 client
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 16:06 commons
                                          -rw-r--r--  1 www-data www-data 370070 nov 12 16:06 favicon.ico
                                          lrwxrwxrwx  1 www-data www-data     13 nov 12 16:06 fog -> /var/www/fog/
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 16:06 fogdoc
                                          -rw-r--r--  1 www-data www-data    572 nov 12 16:06 index.php
                                          drwxr-xr-x 13 www-data www-data   4096 nov 12 16:06 lib
                                          drwxr-xr-x 10 www-data www-data   4096 nov 12 16:06 management
                                          drwxr-xr-x  3 www-data www-data   4096 nov 12 16:06 service
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 16:06 status
                                          
                                          /var/www/html:
                                          total 20
                                          drwxr-xr-x 2 root root  4096 ago 28 13:22 .
                                          drwxr-xr-x 4 root root  4096 nov 12 16:06 ..
                                          lrwxrwxrwx 1 root root    13 ago 28 13:22 fog -> /var/www/fog/
                                          -rw-r--r-- 1 root root 10701 ago 28 13:17 index.html
                                          

                                          on the storage side

                                          ls -al /var/www /var/www/html /var/www/fog
                                          /var/www:
                                          total 16
                                          drwxr-xr-x  4 root     root     4096 nov 12 15:59 .
                                          drwxr-xr-x 13 root     root     4096 jul 18 11:56 ..
                                          drwxr-xr-x 10 www-data www-data 4096 nov 12 16:00 fog
                                          drwxr-xr-x  2 root     root     4096 jul 18 12:03 html
                                          
                                          /var/www/fog:
                                          total 408
                                          drwxr-xr-x 10 www-data www-data   4096 nov 12 16:00 .
                                          drwxr-xr-x  4 root     root       4096 nov 12 15:59 ..
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 15:59 api
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 15:59 client
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 15:59 commons
                                          -rw-r--r--  1 www-data www-data 370070 nov 12 15:59 favicon.ico
                                          lrwxrwxrwx  1 www-data www-data     13 nov 12 15:59 fog -> /var/www/fog/
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 15:59 fogdoc
                                          -rw-r--r--  1 www-data www-data    572 nov 12 15:59 index.php
                                          drwxr-xr-x 13 www-data www-data   4096 nov 12 15:59 lib
                                          drwxr-xr-x 10 www-data www-data   4096 nov 12 15:59 management
                                          drwxr-xr-x  3 www-data www-data   4096 nov 12 15:59 service
                                          drwxr-xr-x  2 www-data www-data   4096 nov 12 15:59 status
                                          
                                          /var/www/html:
                                          total 20
                                          drwxr-xr-x 2 root root  4096 jul 18 12:03 .
                                          drwxr-xr-x 4 root root  4096 nov 12 15:59 ..
                                          lrwxrwxrwx 1 root root    13 jul 18 12:03 fog -> /var/www/fog/
                                          -rw-r--r-- 1 root root 10701 jul 18 11:56 index.html
                                          

                                          I’ll make the git pull to the replic rep e install again on the storage and return here

                                          1 Reply Last reply Reply Quote 0
                                          • M
                                            mronh @Sebastian Roth
                                            last edited by

                                            @Sebastian-Roth right…look at this

                                            server side “git checkout replication
                                            Already on ‘replication’
                                            Your branch is up-to-date with ‘origin/replication’.”

                                            storage side “git checkout replication
                                            Already on ‘replication’
                                            Your branch is up-to-date with ‘origin/replication’.”

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 3 / 3
                                            • First post
                                              Last post

                                            242

                                            Online

                                            12.0k

                                            Users

                                            17.3k

                                            Topics

                                            155.2k

                                            Posts
                                            Copyright © 2012-2024 FOG Project