-
March 14th, 2008 07:01 PM #1
cfmsync -p Problem
Good day.
We are working on a test cluster using kusu
At one point about a week ago cfmsync -p stopped working.
It failed after some updates appeared on the kusu update repository, and
not due to changes or actions on our part.
This is obviously a critical problem for us.
Some details:
This was working, until an update sometime about 7-10 days ago.
cfmsync -f DOES work.
An earlier failure of cfmsync -p was observed and that was traced to a
problem where files copied were getting time stamps in the future. Yum
simply rejected tose files.
This is a different problem, and we do not know where to start looking..
--
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. FAX: 01-780-456-9772/
/11060 - 166 Avenue email:maurice@harddata.com/
/Edmonton, AB, Canada http://www.harddata.com//
/ T5X 1Y3/
/
-
March 14th, 2008 07:57 PM #2
RE: cfmsync -p Problem
Hi Maurice
=20
On the machine that is missing the packages can you look at the contents =
of the /opt/kusu/etc/packages.lst. This list is what the CFM thinks it =
has installed already. It gets updated "cfmsync -p" runs. I suspect it =
is out of sync with what's really installed. Can you confirm this?
=20
I've seen this happen when yum failed to install the package. In our =
case the yum failure was because two nodes had the same IP. The =
/var/log/yum.logs will also have useful troubleshooting info.
=20
Another thing to check is the node group name. We found a bug with =
nodegroups that had spaces in the name. =20
=20
=20
Mark
=20
-----Original Message-----
From: Maurice Hilarius [mailto:maurice@harddata.com]
Sent: Friday, March 14, 2008 3:02 PM
To: kusu-users@osgdc.org
Cc: Mark Black
Subject: cfmsync -p Problem
Good day.
We are working on a test cluster using kusu
At one point about a week ago cfmsync -p stopped working.
It failed after some updates appeared on the kusu update repository, and =
not due to changes or actions on our part.
This is obviously a critical problem for us.
Some details:
This was working, until an update sometime about 7-10 days ago.
cfmsync -f DOES work.
An earlier failure of cfmsync -p was observed and that was traced to a =
problem where files copied were getting time stamps in the future. Yum =
simply rejected tose files.
This is a different problem, and we do not know where to start looking..
--=20
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue email:maurice@harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
-
March 14th, 2008 09:24 PM #3
cfmsync -p Problem - more details
Here is a list of additional packages we were expecting to
install (from an output of 'ngedit -p compute-centos'):
OpenIPMI OpenIPMI-libs
blas ganglia
ganglia-gmond libtorque
torque torque-mom
torque-pam
Only a single node which we ran for testing this,
we saw in our logs:
(ignore dates/timestamps please)
Mar 04 12:10:57 Erased: OpenIPMI
Mar 04 12:11:01 Erased: OpenIPMI-libs
Mar 04 12:55:00 Erased: blas
Mar 04 12:55:04 Erased: blas
Mar 04 12:55:18 Erased: ganglia-gmond
Mar 04 12:55:38 Erased: torque-mom
Mar 04 12:55:41 Erased: torque-pam
Those entries were interspersed with pieces of informtion such as:
http://192.168.80.50/repos/1001/repodata/repomd.xml: [Errno 4] IOError: <urlopen error (113, 'No route to host')>
I see No good explanation for this error.
Without that route *nothing* would get installed.
When I look at logs on that machine then I see the file:
http://192.168.80.50/repos/1001/repodata/repomd.xml
This file is perfectly accessible.
We have no idea what process decided to erase those packages, or why.
However the result of this change is very problematic.
In our case, on this cluster, if the packages for 'torque-mom' torque are not in place, then the moab scheduler cannot work.
On the test node /opt/kusu/sbin/cfmclient appears to consult
/opt/kusu/etc/package.lst. That list looks like that:
# Generated automatically. Do not Edit!
OpenIPMI
OpenIPMI-libs
blas
centos-5-x86_64
component-base-node
component-gnome-desktop
component-nagios-compute-v2_10
ganglia
ganglia-gmond
libtorque
torque
torque-mom
That seems to be ok.
We enabled debugging for cmfclient.
We then tried 'cfmsync -p -n compute-centos', and in /tmp we see:
# cat yum.conf
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
reposdir=/dev/null
retries=20
timeout=30
assumeyes=1
tolerant=1
[kusu-installer]
name=centos-5-x86_64 - Booger
baseurl=http://192.168.80.50/repos/1001
That looks ok too, although I would really hope that I would
also hope to see here that gpmcheck=0 woudl also be called for
our extra packages.
But when we try to update:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID = 6
myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
BestIPlist = ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
There is nothing to add because /opt/kusu/cfm/6.package.lst does
not exist.
This is certainly a surprise,
I would hpe that there should be an _attempt_ to add
back those erased packages.
OTOH after 'cfmsync -u -p compute-centos' I get the following in logs:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID = 6
myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
BestIPlist = ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
Updating To New Repo Packages
Running: /usr/bin/yum -y -c /tmp/yum.conf update
That might even work, but then I see what is missing,
as this time /tmp/yum.conf shows me this:
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
reposdir=/dev/null
retries=20
timeout=30
assumeyes=1
tolerant=1
[kusu-installer]
name=centos-5-x86_64 - Booger
baseurl=http:///repos/1001
With 'baseurl' stated like this it fails.
So, at a minimum I have identified that there is a problem in the generation of the "baseurl" path.
--
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. FAX: 01-780-456-9772/
/11060 - 166 Avenue email:maurice@harddata.com/
/Edmonton, AB, Canada http://www.harddata.com//
/ T5X 1Y3/
/
-
March 14th, 2008 10:26 PM #4
cfmsync -p Problem - more details
Hi Maurice
To recap your email:
1. The package.lst has the correct entries
2. You saw "No route to host messages"
3. You saw "Erased: Packagename" messages in some log file
4. When running cfmsync -u the log shows the repo IP missing
For item 1:
1. Check the contents of /depot/repos/1001 It must contain all the =
rpm's listed in the package.lst. If it does not then run "repoman -u -r =
...."
On the nodes truncate the /opt/kusu/etc/package.lst file e.g.=20
# cat /dev/null > /opt/kusu/etc/package.lst
Then run cfmsync -p That should trigger a retry of the package install.
2. For the no route to host messages, I'm not sure why you are seeing =
this. Is there anything unusual in the web servers access_log, or =
error_log.
3. If this is the yum.log, this can happen if you have the components =
selected in ngedit, but the repo does not have them. Ngedit will mark =
then for removal, because the repository did not contain them, and the =
next time cfmsync -p is run the packages will be removed. It's =
important to run "repoman -u -r ..." before ngedit.
4. We logged this one a while ago, but have not had time to look at it.
It's slated to be addressed before the final release.
Mark
-----Original Message-----
From: kusu-users-bounces@osgdc.org
[mailto:kusu-users-bounces@osgdc.org]On Behalf Of Maurice Hilarius
Sent: Friday, March 14, 2008 5:25 PM
To: kusu-users@osgdc.org
Subject: [Kusu-users] cfmsync -p Problem - more details
Here is a list of additional packages we were expecting to
install (from an output of 'ngedit -p compute-centos'):
OpenIPMI OpenIPMI-libs
blas ganglia
ganglia-gmond libtorque
torque torque-mom
torque-pam
Only a single node which we ran for testing this,
we saw in our logs:
(ignore dates/timestamps please)
Mar 04 12:10:57 Erased: OpenIPMI
Mar 04 12:11:01 Erased: OpenIPMI-libs
Mar 04 12:55:00 Erased: blas
Mar 04 12:55:04 Erased: blas
Mar 04 12:55:18 Erased: ganglia-gmond
Mar 04 12:55:38 Erased: torque-mom
Mar 04 12:55:41 Erased: torque-pam
Those entries were interspersed with pieces of informtion such as:
http://192.168.80.50/repos/1001/repodata/repomd.xml: [Errno 4] IOError: =
<urlopen error (113, 'No route to host')>
I see No good explanation for this error.
Without that route *nothing* would get installed.
When I look at logs on that machine then I see the file:
http://192.168.80.50/repos/1001/repodata/repomd.xml
This file is perfectly accessible.
We have no idea what process decided to erase those packages, or why.
However the result of this change is very problematic.
In our case, on this cluster, if the packages for 'torque-mom' torque =
are not in place, then the moab scheduler cannot work.
On the test node /opt/kusu/sbin/cfmclient appears to consult
/opt/kusu/etc/package.lst. That list looks like that:
# Generated automatically. Do not Edit!
OpenIPMI
OpenIPMI-libs
blas
centos-5-x86_64
component-base-node
component-gnome-desktop
component-nagios-compute-v2_10
ganglia
ganglia-gmond
libtorque
torque
torque-mom
That seems to be ok.
We enabled debugging for cmfclient.
We then tried 'cfmsync -p -n compute-centos', and in /tmp we see:
# cat yum.conf
[main]
cachedir=3D/var/cache/yum
debuglevel=3D2
logfile=3D/var/log/yum.log
reposdir=3D/dev/null
retries=3D20
timeout=3D30
assumeyes=3D1
tolerant=3D1
[kusu-installer]
name=3Dcentos-5-x86_64 - Booger
baseurl=3Dhttp://192.168.80.50/repos/1001
That looks ok too, although I would really hope that I would
also hope to see here that gpmcheck=3D0 woudl also be called for
our extra packages.
But when we try to update:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID =3D 6
myIPs =3D [['192.168.80.110', '255.255.255.0']], installers =3D =
['192.168.80.50']
BestIPlist =3D ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
There is nothing to add because /opt/kusu/cfm/6.package.lst does
not exist.
This is certainly a surprise,
I would hpe that there should be an _attempt_ to add
back those erased packages.
OTOH after 'cfmsync -u -p compute-centos' I get the following in logs:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID =3D 6
myIPs =3D [['192.168.80.110', '255.255.255.0']], installers =3D =
['192.168.80.50']
BestIPlist =3D ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
Updating To New Repo Packages
Running: /usr/bin/yum -y -c /tmp/yum.conf update
That might even work, but then I see what is missing,
as this time /tmp/yum.conf shows me this:
[main]
cachedir=3D/var/cache/yum
debuglevel=3D2
logfile=3D/var/log/yum.log
reposdir=3D/dev/null
retries=3D20
timeout=3D30
assumeyes=3D1
tolerant=3D1
[kusu-installer]
name=3Dcentos-5-x86_64 - Booger
baseurl=3Dhttp:///repos/1001
With 'baseurl' stated like this it fails.
So, at a minimum I have identified that there is a problem in the =
generation of the "baseurl" path.
--=20
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. FAX: 01-780-456-9772/
/11060 - 166 Avenue email:maurice@harddata.com/
/Edmonton, AB, Canada http://www.harddata.com//
/ T5X 1Y3/
/
_______________________________________________
Kusu-users mailing list
Kusu-users@osgdc.org
http://mail.osgdc.org/mailman/listinfo/kusu-users
-
March 15th, 2008 06:16 AM #5
Re: RE: cfmsync -p Problem - more details (Mark Black)
Mark Black wrote:
> Hi Maurice
> To recap your email:
> 1. The package.lst has the correct entries
> 2. You saw "No route to host messages"
> 3. You saw "Erased: Packagename" messages in some log file
> 4. When running cfmsync -u the log shows the repo IP missing
>
> For item 1:
> Check the contents of /depot/repos/1001
> It must contain all the rpm's listed in the package.lst.
It does.
> On the nodes truncate the /opt/kusu/etc/package.lst file
> e.g. # cat /dev/null > /opt/kusu/etc/package.lst
> Then run cfmsync -p
> That should trigger a retry of the package install.
We will try that. In other words just make the package.lst file an empty
file?
> For item 2:
> For the no route to host messages, I'm not sure why you are seeing this.
> Is there anything unusual in the web servers access_log, or error_log.
We will check. I am not sure what would be "unusual".
Should we just post the contents of those 2 files in response?
> For item 3:
> If this is the yum.log, this can happen if you have the components
> selected in ngedit, but the repo does not have them.
> Ngedit will mark then for removal, because the repository did not
> contain them, and the next time cfmsync -p is run the packages will be
> removed.
> It's important to run "repoman -u -r ..." before ngedit.
OK, can do. Will try again.
Is the logging level we are providing sufficient?
> 4. We logged this one a while ago, but have not had time to look at
> it. It's slated to be addressed before the final release.
How could yum work if it does not get a repo path?
--
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. /
/11060 - 166 Avenue/
/Edmonton, AB, Canada /
/ T5X 1Y3/
/
-
March 17th, 2008 05:24 PM #6
Re: RE: cfmsync -p Problem - more details (Mark
Hi
Answers below.
Mark
-----Original Message-----
From: kusu-users-bounces@osgdc.org
[mailto:kusu-users-bounces@osgdc.org]On Behalf Of Maurice Hilarius
Sent: Saturday, March 15, 2008 2:17 AM
To: kusu-users@osgdc.org
Subject: [Kusu-users] Re: RE: cfmsync -p Problem - more details (Mark
Black)
Mark Black wrote:
> Hi Maurice
> To recap your email:
> 1. The package.lst has the correct entries
> 2. You saw "No route to host messages"
> 3. You saw "Erased: Packagename" messages in some log file
> 4. When running cfmsync -u the log shows the repo IP missing
>
> For item 1:
> Check the contents of /depot/repos/1001
> It must contain all the rpm's listed in the package.lst.
It does.
> On the nodes truncate the /opt/kusu/etc/package.lst file
> e.g. # cat /dev/null > /opt/kusu/etc/package.lst
> Then run cfmsync -p
> That should trigger a retry of the package install.
We will try that. In other words just make the package.lst file an empty =
file?
Right.
> For item 2:
> For the no route to host messages, I'm not sure why you are seeing =
this.
> Is there anything unusual in the web servers access_log, or error_log.
We will check. I am not sure what would be "unusual".
Should we just post the contents of those 2 files in response?
The access log is big, and 99.9% non useful information. Post the error =
log, but I don't expect to get a lot out of it.
> For item 3:
> If this is the yum.log, this can happen if you have the components=20
> selected in ngedit, but the repo does not have them.
> Ngedit will mark then for removal, because the repository did not=20
> contain them, and the next time cfmsync -p is run the packages will be =
> removed.
> It's important to run "repoman -u -r ..." before ngedit.
OK, can do. Will try again.
Is the logging level we are providing sufficient?
Yes, that's fine.
> 4. We logged this one a while ago, but have not had time to look at=20
> it. It's slated to be addressed before the final release.
How could yum work if it does not get a repo path?
I thought it did not work?
--=20
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. /
/11060 - 166 Avenue/
/Edmonton, AB, Canada /
/ T5X 1Y3/
/
_______________________________________________
Kusu-users mailing list
Kusu-users@osgdc.org
http://mail.osgdc.org/mailman/listinfo/kusu-users
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules