Closed Thread
Results 1 to 6 of 6

Thread: cfmsync -p Problem

  1. #1
    admin_kusu is offline Junior Member
    Join Date
    March 6th, 2008
    Posts
    0
    Downloads
    0
    Uploads
    0

    Default cfmsync -p Problem


    Good day.
    We are working on a test cluster using kusu

    At one point about a week ago cfmsync -p stopped working.
    It failed after some updates appeared on the kusu update repository, and
    not due to changes or actions on our part.

    This is obviously a critical problem for us.

    Some details:
    This was working, until an update sometime about 7-10 days ago.

    cfmsync -f DOES work.

    An earlier failure of cfmsync -p was observed and that was traced to a
    problem where files copied were getting time stamps in the future. Yum
    simply rejected tose files.

    This is a different problem, and we do not know where to start looking..



    --
    With our best regards,

    //Maurice W. Hilarius Telephone: 01-780-456-9771/
    /Hard Data Ltd. FAX: 01-780-456-9772/
    /11060 - 166 Avenue email:maurice@harddata.com/
    /Edmonton, AB, Canada http://www.harddata.com//
    / T5X 1Y3/
    /



  2. #2
    admin_kusu is offline Junior Member
    Join Date
    March 6th, 2008
    Posts
    0
    Downloads
    0
    Uploads
    0

    Default RE: cfmsync -p Problem


    Hi Maurice
    =20
    On the machine that is missing the packages can you look at the contents =
    of the /opt/kusu/etc/packages.lst. This list is what the CFM thinks it =
    has installed already. It gets updated "cfmsync -p" runs. I suspect it =
    is out of sync with what's really installed. Can you confirm this?
    =20
    I've seen this happen when yum failed to install the package. In our =
    case the yum failure was because two nodes had the same IP. The =
    /var/log/yum.logs will also have useful troubleshooting info.
    =20
    Another thing to check is the node group name. We found a bug with =
    nodegroups that had spaces in the name. =20
    =20
    =20
    Mark
    =20

    -----Original Message-----
    From: Maurice Hilarius [mailto:maurice@harddata.com]
    Sent: Friday, March 14, 2008 3:02 PM
    To: kusu-users@osgdc.org
    Cc: Mark Black
    Subject: cfmsync -p Problem


    Good day.
    We are working on a test cluster using kusu

    At one point about a week ago cfmsync -p stopped working.
    It failed after some updates appeared on the kusu update repository, and =
    not due to changes or actions on our part.

    This is obviously a critical problem for us.

    Some details:
    This was working, until an update sometime about 7-10 days ago.

    cfmsync -f DOES work.

    An earlier failure of cfmsync -p was observed and that was traced to a =
    problem where files copied were getting time stamps in the future. Yum =
    simply rejected tose files.

    This is a different problem, and we do not know where to start looking..




    --=20
    With our best regards,

    Maurice W. Hilarius Telephone: 01-780-456-9771
    Hard Data Ltd. FAX: 01-780-456-9772
    11060 - 166 Avenue email:maurice@harddata.com
    Edmonton, AB, Canada http://www.harddata.com/
    T5X 1Y3





  3. #3
    admin_kusu is offline Junior Member
    Join Date
    March 6th, 2008
    Posts
    0
    Downloads
    0
    Uploads
    0

    Default cfmsync -p Problem - more details


    Here is a list of additional packages we were expecting to
    install (from an output of 'ngedit -p compute-centos'):


    OpenIPMI OpenIPMI-libs
    blas ganglia
    ganglia-gmond libtorque
    torque torque-mom
    torque-pam

    Only a single node which we ran for testing this,
    we saw in our logs:
    (ignore dates/timestamps please)
    Mar 04 12:10:57 Erased: OpenIPMI
    Mar 04 12:11:01 Erased: OpenIPMI-libs
    Mar 04 12:55:00 Erased: blas
    Mar 04 12:55:04 Erased: blas
    Mar 04 12:55:18 Erased: ganglia-gmond
    Mar 04 12:55:38 Erased: torque-mom
    Mar 04 12:55:41 Erased: torque-pam

    Those entries were interspersed with pieces of informtion such as:

    http://192.168.80.50/repos/1001/repodata/repomd.xml: [Errno 4] IOError: <urlopen error (113, 'No route to host')>

    I see No good explanation for this error.

    Without that route *nothing* would get installed.

    When I look at logs on that machine then I see the file:
    http://192.168.80.50/repos/1001/repodata/repomd.xml

    This file is perfectly accessible.

    We have no idea what process decided to erase those packages, or why.

    However the result of this change is very problematic.
    In our case, on this cluster, if the packages for 'torque-mom' torque are not in place, then the moab scheduler cannot work.


    On the test node /opt/kusu/sbin/cfmclient appears to consult
    /opt/kusu/etc/package.lst. That list looks like that:

    # Generated automatically. Do not Edit!
    OpenIPMI
    OpenIPMI-libs
    blas
    centos-5-x86_64
    component-base-node
    component-gnome-desktop
    component-nagios-compute-v2_10
    ganglia
    ganglia-gmond
    libtorque
    torque
    torque-mom

    That seems to be ok.

    We enabled debugging for cmfclient.
    We then tried 'cfmsync -p -n compute-centos', and in /tmp we see:

    # cat yum.conf
    [main]
    cachedir=/var/cache/yum
    debuglevel=2
    logfile=/var/log/yum.log
    reposdir=/dev/null
    retries=20
    timeout=30
    assumeyes=1
    tolerant=1

    [kusu-installer]
    name=centos-5-x86_64 - Booger
    baseurl=http://192.168.80.50/repos/1001

    That looks ok too, although I would really hope that I would
    also hope to see here that gpmcheck=0 woudl also be called for
    our extra packages.

    But when we try to update:

    # cat cfm.log
    Updating Packages
    ++ Testing for: /opt/kusu/cfm/6.package.lst
    ++ CFMBaseDir: /opt/kusu/cfm
    ++ NGID = 6
    myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
    BestIPlist = ['192.168.80.50']
    Nothing to remove
    Nothing to add
    Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
    Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh

    There is nothing to add because /opt/kusu/cfm/6.package.lst does
    not exist.
    This is certainly a surprise,
    I would hpe that there should be an _attempt_ to add
    back those erased packages.


    OTOH after 'cfmsync -u -p compute-centos' I get the following in logs:


    # cat cfm.log
    Updating Packages
    ++ Testing for: /opt/kusu/cfm/6.package.lst
    ++ CFMBaseDir: /opt/kusu/cfm
    ++ NGID = 6
    myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
    BestIPlist = ['192.168.80.50']
    Nothing to remove
    Nothing to add
    Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
    Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
    Updating To New Repo Packages
    Running: /usr/bin/yum -y -c /tmp/yum.conf update

    That might even work, but then I see what is missing,
    as this time /tmp/yum.conf shows me this:


    [main]
    cachedir=/var/cache/yum
    debuglevel=2
    logfile=/var/log/yum.log
    reposdir=/dev/null
    retries=20
    timeout=30
    assumeyes=1
    tolerant=1

    [kusu-installer]
    name=centos-5-x86_64 - Booger
    baseurl=http:///repos/1001

    With 'baseurl' stated like this it fails.
    So, at a minimum I have identified that there is a problem in the generation of the "baseurl" path.



    --
    With our best regards,

    //Maurice W. Hilarius Telephone: 01-780-456-9771/
    /Hard Data Ltd. FAX: 01-780-456-9772/
    /11060 - 166 Avenue email:maurice@harddata.com/
    /Edmonton, AB, Canada http://www.harddata.com//
    / T5X 1Y3/
    /



  4. #4
    admin_kusu is offline Junior Member
    Join Date
    March 6th, 2008
    Posts
    0
    Downloads
    0
    Uploads
    0

    Default cfmsync -p Problem - more details


    Hi Maurice

    To recap your email:
    1. The package.lst has the correct entries
    2. You saw "No route to host messages"
    3. You saw "Erased: Packagename" messages in some log file
    4. When running cfmsync -u the log shows the repo IP missing

    For item 1:
    1. Check the contents of /depot/repos/1001 It must contain all the =
    rpm's listed in the package.lst. If it does not then run "repoman -u -r =
    ...."
    On the nodes truncate the /opt/kusu/etc/package.lst file e.g.=20
    # cat /dev/null > /opt/kusu/etc/package.lst
    Then run cfmsync -p That should trigger a retry of the package install.

    2. For the no route to host messages, I'm not sure why you are seeing =
    this. Is there anything unusual in the web servers access_log, or =
    error_log.

    3. If this is the yum.log, this can happen if you have the components =
    selected in ngedit, but the repo does not have them. Ngedit will mark =
    then for removal, because the repository did not contain them, and the =
    next time cfmsync -p is run the packages will be removed. It's =
    important to run "repoman -u -r ..." before ngedit.

    4. We logged this one a while ago, but have not had time to look at it.
    It's slated to be addressed before the final release.


    Mark



    -----Original Message-----
    From: kusu-users-bounces@osgdc.org
    [mailto:kusu-users-bounces@osgdc.org]On Behalf Of Maurice Hilarius
    Sent: Friday, March 14, 2008 5:25 PM
    To: kusu-users@osgdc.org
    Subject: [Kusu-users] cfmsync -p Problem - more details


    Here is a list of additional packages we were expecting to
    install (from an output of 'ngedit -p compute-centos'):


    OpenIPMI OpenIPMI-libs
    blas ganglia
    ganglia-gmond libtorque
    torque torque-mom
    torque-pam

    Only a single node which we ran for testing this,
    we saw in our logs:
    (ignore dates/timestamps please)
    Mar 04 12:10:57 Erased: OpenIPMI
    Mar 04 12:11:01 Erased: OpenIPMI-libs
    Mar 04 12:55:00 Erased: blas
    Mar 04 12:55:04 Erased: blas
    Mar 04 12:55:18 Erased: ganglia-gmond
    Mar 04 12:55:38 Erased: torque-mom
    Mar 04 12:55:41 Erased: torque-pam

    Those entries were interspersed with pieces of informtion such as:

    http://192.168.80.50/repos/1001/repodata/repomd.xml: [Errno 4] IOError: =
    <urlopen error (113, 'No route to host')>

    I see No good explanation for this error.

    Without that route *nothing* would get installed.

    When I look at logs on that machine then I see the file:
    http://192.168.80.50/repos/1001/repodata/repomd.xml

    This file is perfectly accessible.

    We have no idea what process decided to erase those packages, or why.

    However the result of this change is very problematic.
    In our case, on this cluster, if the packages for 'torque-mom' torque =
    are not in place, then the moab scheduler cannot work.


    On the test node /opt/kusu/sbin/cfmclient appears to consult
    /opt/kusu/etc/package.lst. That list looks like that:

    # Generated automatically. Do not Edit!
    OpenIPMI
    OpenIPMI-libs
    blas
    centos-5-x86_64
    component-base-node
    component-gnome-desktop
    component-nagios-compute-v2_10
    ganglia
    ganglia-gmond
    libtorque
    torque
    torque-mom

    That seems to be ok.

    We enabled debugging for cmfclient.
    We then tried 'cfmsync -p -n compute-centos', and in /tmp we see:

    # cat yum.conf
    [main]
    cachedir=3D/var/cache/yum
    debuglevel=3D2
    logfile=3D/var/log/yum.log
    reposdir=3D/dev/null
    retries=3D20
    timeout=3D30
    assumeyes=3D1
    tolerant=3D1

    [kusu-installer]
    name=3Dcentos-5-x86_64 - Booger
    baseurl=3Dhttp://192.168.80.50/repos/1001

    That looks ok too, although I would really hope that I would
    also hope to see here that gpmcheck=3D0 woudl also be called for
    our extra packages.

    But when we try to update:

    # cat cfm.log
    Updating Packages
    ++ Testing for: /opt/kusu/cfm/6.package.lst
    ++ CFMBaseDir: /opt/kusu/cfm
    ++ NGID =3D 6
    myIPs =3D [['192.168.80.110', '255.255.255.0']], installers =3D =
    ['192.168.80.50']
    BestIPlist =3D ['192.168.80.50']
    Nothing to remove
    Nothing to add
    Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
    Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh

    There is nothing to add because /opt/kusu/cfm/6.package.lst does
    not exist.
    This is certainly a surprise,
    I would hpe that there should be an _attempt_ to add
    back those erased packages.


    OTOH after 'cfmsync -u -p compute-centos' I get the following in logs:


    # cat cfm.log
    Updating Packages
    ++ Testing for: /opt/kusu/cfm/6.package.lst
    ++ CFMBaseDir: /opt/kusu/cfm
    ++ NGID =3D 6
    myIPs =3D [['192.168.80.110', '255.255.255.0']], installers =3D =
    ['192.168.80.50']
    BestIPlist =3D ['192.168.80.50']
    Nothing to remove
    Nothing to add
    Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
    Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
    Updating To New Repo Packages
    Running: /usr/bin/yum -y -c /tmp/yum.conf update

    That might even work, but then I see what is missing,
    as this time /tmp/yum.conf shows me this:


    [main]
    cachedir=3D/var/cache/yum
    debuglevel=3D2
    logfile=3D/var/log/yum.log
    reposdir=3D/dev/null
    retries=3D20
    timeout=3D30
    assumeyes=3D1
    tolerant=3D1

    [kusu-installer]
    name=3Dcentos-5-x86_64 - Booger
    baseurl=3Dhttp:///repos/1001

    With 'baseurl' stated like this it fails.
    So, at a minimum I have identified that there is a problem in the =
    generation of the "baseurl" path.



    --=20
    With our best regards,

    //Maurice W. Hilarius Telephone: 01-780-456-9771/
    /Hard Data Ltd. FAX: 01-780-456-9772/
    /11060 - 166 Avenue email:maurice@harddata.com/
    /Edmonton, AB, Canada http://www.harddata.com//
    / T5X 1Y3/
    /

    _______________________________________________
    Kusu-users mailing list
    Kusu-users@osgdc.org
    http://mail.osgdc.org/mailman/listinfo/kusu-users


  5. #5
    admin_kusu is offline Junior Member
    Join Date
    March 6th, 2008
    Posts
    0
    Downloads
    0
    Uploads
    0

    Default Re: RE: cfmsync -p Problem - more details (Mark Black)


    Mark Black wrote:
    > Hi Maurice
    > To recap your email:
    > 1. The package.lst has the correct entries
    > 2. You saw "No route to host messages"
    > 3. You saw "Erased: Packagename" messages in some log file
    > 4. When running cfmsync -u the log shows the repo IP missing
    >
    > For item 1:
    > Check the contents of /depot/repos/1001
    > It must contain all the rpm's listed in the package.lst.
    It does.
    > On the nodes truncate the /opt/kusu/etc/package.lst file
    > e.g. # cat /dev/null > /opt/kusu/etc/package.lst
    > Then run cfmsync -p
    > That should trigger a retry of the package install.
    We will try that. In other words just make the package.lst file an empty
    file?

    > For item 2:
    > For the no route to host messages, I'm not sure why you are seeing this.
    > Is there anything unusual in the web servers access_log, or error_log.
    We will check. I am not sure what would be "unusual".
    Should we just post the contents of those 2 files in response?

    > For item 3:
    > If this is the yum.log, this can happen if you have the components
    > selected in ngedit, but the repo does not have them.
    > Ngedit will mark then for removal, because the repository did not
    > contain them, and the next time cfmsync -p is run the packages will be
    > removed.
    > It's important to run "repoman -u -r ..." before ngedit.
    OK, can do. Will try again.
    Is the logging level we are providing sufficient?

    > 4. We logged this one a while ago, but have not had time to look at
    > it. It's slated to be addressed before the final release.
    How could yum work if it does not get a repo path?


    --
    With our best regards,

    //Maurice W. Hilarius Telephone: 01-780-456-9771/
    /Hard Data Ltd. /
    /11060 - 166 Avenue/
    /Edmonton, AB, Canada /
    / T5X 1Y3/
    /


  6. #6
    admin_kusu is offline Junior Member
    Join Date
    March 6th, 2008
    Posts
    0
    Downloads
    0
    Uploads
    0

    Default Re: RE: cfmsync -p Problem - more details (Mark


    Hi

    Answers below.

    Mark

    -----Original Message-----
    From: kusu-users-bounces@osgdc.org
    [mailto:kusu-users-bounces@osgdc.org]On Behalf Of Maurice Hilarius
    Sent: Saturday, March 15, 2008 2:17 AM
    To: kusu-users@osgdc.org
    Subject: [Kusu-users] Re: RE: cfmsync -p Problem - more details (Mark
    Black)


    Mark Black wrote:
    > Hi Maurice
    > To recap your email:
    > 1. The package.lst has the correct entries
    > 2. You saw "No route to host messages"
    > 3. You saw "Erased: Packagename" messages in some log file
    > 4. When running cfmsync -u the log shows the repo IP missing
    >
    > For item 1:
    > Check the contents of /depot/repos/1001
    > It must contain all the rpm's listed in the package.lst.
    It does.
    > On the nodes truncate the /opt/kusu/etc/package.lst file
    > e.g. # cat /dev/null > /opt/kusu/etc/package.lst
    > Then run cfmsync -p
    > That should trigger a retry of the package install.
    We will try that. In other words just make the package.lst file an empty =

    file?

    Right.

    > For item 2:
    > For the no route to host messages, I'm not sure why you are seeing =
    this.
    > Is there anything unusual in the web servers access_log, or error_log.
    We will check. I am not sure what would be "unusual".
    Should we just post the contents of those 2 files in response?

    The access log is big, and 99.9% non useful information. Post the error =
    log, but I don't expect to get a lot out of it.

    > For item 3:
    > If this is the yum.log, this can happen if you have the components=20
    > selected in ngedit, but the repo does not have them.
    > Ngedit will mark then for removal, because the repository did not=20
    > contain them, and the next time cfmsync -p is run the packages will be =

    > removed.
    > It's important to run "repoman -u -r ..." before ngedit.
    OK, can do. Will try again.
    Is the logging level we are providing sufficient?

    Yes, that's fine.

    > 4. We logged this one a while ago, but have not had time to look at=20
    > it. It's slated to be addressed before the final release.
    How could yum work if it does not get a repo path?

    I thought it did not work?



    --=20
    With our best regards,

    //Maurice W. Hilarius Telephone: 01-780-456-9771/
    /Hard Data Ltd. /
    /11060 - 166 Avenue/
    /Edmonton, AB, Canada /
    / T5X 1Y3/
    /
    _______________________________________________
    Kusu-users mailing list
    Kusu-users@osgdc.org
    http://mail.osgdc.org/mailman/listinfo/kusu-users


Closed Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts