![]() | ||||
| ||||||
| Installing, managing and running a Kusu cluster Discuss the whole Kusu HPC stack here! (Hardware, cluster OS, cluster management, filesystems, schedulers and applications) |
![]() |
| | LinkBack (1) | Thread Tools | Search this Thread | Display Modes |
| |||
|
Hi, I try to install a kusu cluster, with kusu 1.0 on centos 5.2. Master node install went fine. Then I tryed to install a compute node, started addhost, chose "compute-centos-5-x86_64", "eth0", Rack number "0". addhost waits for the clients to boot. I started two clients, one at a time. The first one fails, the second one installs and reboots and seems to be ok. Now because the first one failed, I used a different machine, but that failed as well. Here is the error message I get (copied by hand): Code: Unresolved exception Traceback (most recent call last) File "/opt/kusu/lib/python/kusu/ui/text/navigator.py", line 221, in run self.slelctScreen(0) File "/opt/kusu/ [...] Now I am a bit puzzeled (because one node could be installed), and also a bit lost. Any suggestions would be appreciated |
| |||
|
Hi Newton, Thanks for trying out Kusu! It does seem strange that your second node succeeds where the first failed. Are they both identical in terms of hardware and configuration? Could you send us the contents of /tmp/kusu/exception.dump (if it exists), or try to copy the entire exception message here, as this would really help us in pinpointing where the installation failed. Also, the nodeinstaller sends its log messages back to the master installer's /var/log/messages. It would also be helpful if you could attach it here(after pruning it, if you could :-) Thanks, George |
| |||
|
Hi George, thanx for the quick reply! Quote:
Quote:
Quote:
So here is the complete errormessage (actually, it seems not complete, but as the terminal is frozen, I can not scroll, so complete in the sense of "all I can see") Code: Unresolved exception Traceback (most recent call last) File "/opt/kusu/lib/python/kusu/ui/text/navigator.py", line 221, in run self.slelctScreen(0) File "/opt/kusu/lib/python/kusu/ui/text/navigator.py", in line 143, in selectScreen contentGrid = self.setupContentGrid() File "/opt/kusu/lib/python/kusu/ui/text/navigator.py", line 169, in setupContentGrid self.currentScreen.draw(self.mainScreen, Code: Nov 12 20:10:41 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0: network 10.0/ 16: no free leases Nov 12 20:10:45 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0 Nov 12 20:10:45 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0 Nov 12 20:10:53 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f via eth0 Nov 12 20:10:53 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0 Nov 12 20:10:53 kusu1 xinetd[3304]: START: tftp pid=8569 from=10.0.0.2 Nov 12 20:10:53 kusu1 in.tftpd[8570]: tftp: client does not accept options Nov 12 20:11:27 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:27 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:27 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:27 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:37 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:37 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:37 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:37 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0 Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:23) /dev/hda: Checking if node already exists in /dev. Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:31) /dev/hda does not exist. Creating... Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:50) FORMAT /dev/hda: Create block device, major: 3, minor: 0, path: /dev/hda |
| |||
|
Hi Do you think is possible to get a screen shot (by camera) of the exception screen? With the exception screen shot, it will quickly help us debug this problem. Also, is it possible to go to the alternate screen via Alt-F2. You can try copying out the /tmp/kusu/exception.dump via scp to the master node. You will need to configure the ip on the node using ifconfig: # ifconfig eth0 172.20.0.100 netmask 255.255.0.0 up -Liming |
| |||
|
Hi, Are you able to install a stock CEntOS on the affected node? From what you've described(same hardware/configuration as another working node, frozen screen), it seems like a hardware problem specific to that node. -George |
| |||
|
I will try installing CentOS on that node. I will also give the node that worked a second try, to see if it works again. And I will also try a third node, to see what happens there. Thank you for the suggestions, I will be back with a report. |
| |||
|
All right, after some more testing (installing CentOS from CD) it turns out that there was no hard disk in that node. Actually the other failing nodes had hard disks, but they may have other hardware problems. Well, for the moment I am happy with two nodes ![]() Now I am a bit embarresed about bothering you with my hardware problems, sorry :/ Anyway, thank you for helping me to figure this out. |
| |||
|
Actually, it seems that you have found a corner case that our nodeinstaller hasn't yet taken care of. In fact, it should gracefully exit when no disks are found. Thanks for pointing out the issue and the cause of it. :-) We have filed it in our issue tracker[1]. [1] [#KUSU-1026] nodeinstaller does not handle gracefully when node has no disks - Open Source Grid Development Center |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
LinkBacks (?)
LinkBack to this Thread: http://www.hpccommunity.org/f20/node-install-unresolved-exception-447/ | ||||
| Posted By | For | Type | Date | |
| [#KUSU-1026] nodeinstaller does not handle gracefully when node has no disks - Open Source Grid Development Center | This thread | Refback | November 13th, 2008 05:51 PM | |