HPCCommunity.org
 
Register

Go Back   HPC Community - High Performance Computing (HPC) Community > Kusu > Installing, managing and running a Kusu cluster

Installing, managing and running a Kusu cluster Discuss the whole Kusu HPC stack here! (Hardware, cluster OS, cluster management, filesystems, schedulers and applications)

Reply
 
LinkBack (1) Thread Tools Search this Thread Display Modes
  1 links from elsewhere to this Post. Click to view. #1 (permalink)  
Old November 12th, 2008, 03:50 PM
Junior Member
 
Join Date: November 12th, 2008
Posts: 12
Default node install: Unresolved exception

Hi,

I try to install a kusu cluster, with kusu 1.0 on centos 5.2.
Master node install went fine.
Then I tryed to install a compute node, started addhost, chose "compute-centos-5-x86_64", "eth0", Rack number "0". addhost waits for the clients to boot. I started two clients, one at a time.
The first one fails, the second one installs and reboots and seems to be ok.
Now because the first one failed, I used a different machine, but that failed as well.
Here is the error message I get (copied by hand):

Code:
Unresolved exception

Traceback (most recent call last)

File
"/opt/kusu/lib/python/kusu/ui/text/navigator.py", 
line 221, in run self.slelctScreen(0)

File
"/opt/kusu/  [...]
The message goes on, all mentioning function in python scripts and stating a line.

Now I am a bit puzzeled (because one node could be installed), and also a bit lost.
Any suggestions would be appreciated
Reply With Quote
  #2 (permalink)  
Old November 12th, 2008, 06:43 PM
Project Moderator
 
Join Date: February 29th, 2008
Location: Singapore
Posts: 24
Blog Entries: 5
Default

Hi Newton,

Thanks for trying out Kusu!

It does seem strange that your second node succeeds where the first failed. Are they both identical in terms of hardware and configuration?

Could you send us the contents of /tmp/kusu/exception.dump (if it exists), or try to copy the entire exception message here, as this would really help us in pinpointing where the installation failed.

Also, the nodeinstaller sends its log messages back to the master installer's /var/log/messages. It would also be helpful if you could attach it here(after pruning it, if you could :-)

Thanks,
George
Reply With Quote
  #3 (permalink)  
Old November 12th, 2008, 07:23 PM
Junior Member
 
Join Date: November 12th, 2008
Posts: 12
Default

Hi George,

thanx for the quick reply!
Quote:
It does seem strange that your second node succeeds where the first failed.
agreed.
Quote:
Are they both identical in terms of hardware and configuration?
Yes. Even with the newest bios.
Quote:
Could you send us the contents of /tmp/kusu/exception.dump (if it exists), or try to copy the entire exception message here
/tmp/kusu/exception.dump does not exist on the frontend. If its supposed to be on the node, its not accessible, as the node is completely frozen.
So here is the complete errormessage (actually, it seems not complete, but as the terminal is frozen, I can not scroll, so complete in the sense of "all I can see")
Code:
Unresolved exception

Traceback (most recent call last)

File
"/opt/kusu/lib/python/kusu/ui/text/navigator.py", 
line 221, in run self.slelctScreen(0)

File
"/opt/kusu/lib/python/kusu/ui/text/navigator.py",
in line 143, in selectScreen
contentGrid = self.setupContentGrid()

File
"/opt/kusu/lib/python/kusu/ui/text/navigator.py",
line 169, in setupContentGrid
self.currentScreen.draw(self.mainScreen,
Here is my /var/log/messages:
Code:
Nov 12 20:10:41 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0: network 10.0/
16: no free leases
Nov 12 20:10:45 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0
Nov 12 20:10:45 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
Nov 12 20:10:53 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f 
via eth0
Nov 12 20:10:53 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
Nov 12 20:10:53 kusu1 xinetd[3304]: START: tftp pid=8569 from=10.0.0.2
Nov 12 20:10:53 kusu1 in.tftpd[8570]: tftp: client does not accept options 
Nov 12 20:11:27 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0
Nov 12 20:11:27 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
Nov 12 20:11:27 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f 
via eth0
Nov 12 20:11:27 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
Nov 12 20:11:37 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0
Nov 12 20:11:37 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
Nov 12 20:11:37 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f 
via eth0
Nov 12 20:11:37 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:23)
 /dev/hda: Checking if node already exists in /dev.
Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:31)
 /dev/hda does not exist. Creating...
Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:50)
 FORMAT /dev/hda: Create block device, major: 3, minor: 0, path: /dev/hda
Reply With Quote
  #4 (permalink)  
Old November 13th, 2008, 05:02 AM
Project Moderator
 
Join Date: March 4th, 2008
Posts: 59
Blog Entries: 3
Default

Hi

Do you think is possible to get a screen shot (by camera) of the exception screen? With the exception screen shot, it will quickly help us debug this problem.

Also, is it possible to go to the alternate screen via Alt-F2.

You can try copying out the /tmp/kusu/exception.dump via scp to the master node. You will need to configure the ip on the node using ifconfig:

# ifconfig eth0 172.20.0.100 netmask 255.255.0.0 up

-Liming
Reply With Quote
  #5 (permalink)  
Old November 13th, 2008, 09:33 AM
Junior Member
 
Join Date: November 12th, 2008
Posts: 12
Default

Hi,

a photo of the screen is atached.
Quote:
Also, is it possible to go to the alternate screen via Alt-F2.
Nope. The node is completly frozen, no alternate screens, no ssh, nothing.
Attached Images
File Type: jpg SD530429.jpg (101.3 KB, 9 views)
Reply With Quote
  #6 (permalink)  
Old November 13th, 2008, 10:54 AM
Project Moderator
 
Join Date: February 29th, 2008
Location: Singapore
Posts: 24
Blog Entries: 5
Default

Hi,

Are you able to install a stock CEntOS on the affected node? From what you've described(same hardware/configuration as another working node, frozen screen), it seems like a hardware problem specific to that node.

-George
Reply With Quote
  #7 (permalink)  
Old November 13th, 2008, 10:59 AM
Junior Member
 
Join Date: November 12th, 2008
Posts: 12
Default

I will try installing CentOS on that node. I will also give the node that worked a second try, to see if it works again. And I will also try a third node, to see what happens there.

Thank you for the suggestions, I will be back with a report.
Reply With Quote
  #8 (permalink)  
Old November 13th, 2008, 05:04 PM
Junior Member
 
Join Date: November 12th, 2008
Posts: 12
Default

All right, after some more testing (installing CentOS from CD) it turns out that
there was no hard disk in that node. Actually the other failing nodes had hard
disks, but they may have other hardware problems. Well, for the moment I am
happy with two nodes

Now I am a bit embarresed about bothering you with my hardware problems, sorry :/

Anyway, thank you for helping me to figure this out.
Reply With Quote
  #9 (permalink)  
Old November 13th, 2008, 05:12 PM
Project Moderator
 
Join Date: February 29th, 2008
Location: Singapore
Posts: 24
Blog Entries: 5
Default

Actually, it seems that you have found a corner case that our nodeinstaller hasn't yet taken care of. In fact, it should gracefully exit when no disks are found.

Thanks for pointing out the issue and the cause of it. :-) We have filed it in our issue tracker[1].

[1] [#KUSU-1026] nodeinstaller does not handle gracefully when node has no disks - Open Source Grid Development Center
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump

LinkBacks (?)
LinkBack to this Thread: http://www.hpccommunity.org/f20/node-install-unresolved-exception-447/
Posted By For Type Date
[#KUSU-1026] nodeinstaller does not handle gracefully when node has no disks - Open Source Grid Development Center This thread Refback November 13th, 2008 05:51 PM


All times are GMT. The time now is 01:04 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.