+ Reply to Thread
Results 1 to 9 of 9

Thread: node install: Unresolved exception

  1. #1
    newton is offline Junior Member
    Join Date
    November 12th, 2008
    Posts
    12
    Downloads
    0
    Uploads
    0

    Default node install: Unresolved exception

    Hi,

    I try to install a kusu cluster, with kusu 1.0 on centos 5.2.
    Master node install went fine.
    Then I tryed to install a compute node, started addhost, chose "compute-centos-5-x86_64", "eth0", Rack number "0". addhost waits for the clients to boot. I started two clients, one at a time.
    The first one fails, the second one installs and reboots and seems to be ok.
    Now because the first one failed, I used a different machine, but that failed as well.
    Here is the error message I get (copied by hand):

    Code:
    Unresolved exception
    
    Traceback (most recent call last)
    
    File
    "/opt/kusu/lib/python/kusu/ui/text/navigator.py", 
    line 221, in run self.slelctScreen(0)
    
    File
    "/opt/kusu/  [...]
    
    The message goes on, all mentioning function in python scripts and stating a line.

    Now I am a bit puzzeled (because one node could be installed), and also a bit lost.
    Any suggestions would be appreciated

  2. #2
    George Goh is offline Project Moderator
    Join Date
    February 29th, 2008
    Location
    Singapore
    Posts
    26
    Blog Entries
    5
    Downloads
    16
    Uploads
    11

    Default

    Hi Newton,

    Thanks for trying out Kusu!

    It does seem strange that your second node succeeds where the first failed. Are they both identical in terms of hardware and configuration?

    Could you send us the contents of /tmp/kusu/exception.dump (if it exists), or try to copy the entire exception message here, as this would really help us in pinpointing where the installation failed.

    Also, the nodeinstaller sends its log messages back to the master installer's /var/log/messages. It would also be helpful if you could attach it here(after pruning it, if you could :-)

    Thanks,
    George

  3. #3
    newton is offline Junior Member
    Join Date
    November 12th, 2008
    Posts
    12
    Downloads
    0
    Uploads
    0

    Default

    Hi George,

    thanx for the quick reply!
    It does seem strange that your second node succeeds where the first failed.
    agreed.
    Are they both identical in terms of hardware and configuration?
    Yes. Even with the newest bios.
    Could you send us the contents of /tmp/kusu/exception.dump (if it exists), or try to copy the entire exception message here
    /tmp/kusu/exception.dump does not exist on the frontend. If its supposed to be on the node, its not accessible, as the node is completely frozen.
    So here is the complete errormessage (actually, it seems not complete, but as the terminal is frozen, I can not scroll, so complete in the sense of "all I can see")
    Code:
    Unresolved exception
    
    Traceback (most recent call last)
    
    File
    "/opt/kusu/lib/python/kusu/ui/text/navigator.py", 
    line 221, in run self.slelctScreen(0)
    
    File
    "/opt/kusu/lib/python/kusu/ui/text/navigator.py",
    in line 143, in selectScreen
    contentGrid = self.setupContentGrid()
    
    File
    "/opt/kusu/lib/python/kusu/ui/text/navigator.py",
    line 169, in setupContentGrid
    self.currentScreen.draw(self.mainScreen,
    
    Here is my /var/log/messages:
    Code:
    Nov 12 20:10:41 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0: network 10.0/
    16: no free leases
    Nov 12 20:10:45 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0
    Nov 12 20:10:45 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
    Nov 12 20:10:53 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f 
    via eth0
    Nov 12 20:10:53 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
    Nov 12 20:10:53 kusu1 xinetd[3304]: START: tftp pid=8569 from=10.0.0.2
    Nov 12 20:10:53 kusu1 in.tftpd[8570]: tftp: client does not accept options 
    Nov 12 20:11:27 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0
    Nov 12 20:11:27 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
    Nov 12 20:11:27 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f 
    via eth0
    Nov 12 20:11:27 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
    Nov 12 20:11:37 kusu1 dhcpd: DHCPDISCOVER from 00:1a:92:43:d5:3f via eth0
    Nov 12 20:11:37 kusu1 dhcpd: DHCPOFFER on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
    Nov 12 20:11:37 kusu1 dhcpd: DHCPREQUEST for 10.0.0.2 (10.0.0.1) from 00:1a:92:43:d5:3f 
    via eth0
    Nov 12 20:11:37 kusu1 dhcpd: DHCPACK on 10.0.0.2 to 00:1a:92:43:d5:3f via eth0
    Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:23)
     /dev/hda: Checking if node already exists in /dev.
    Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:31)
     /dev/hda does not exist. Creating...
    Nov 12 20:11:40 10.0.0.2 2008-11-13 02:42:12 INFO kusu.partitiontool.nodes (nodes.py:50)
     FORMAT /dev/hda: Create block device, major: 3, minor: 0, path: /dev/hda
    

  4. #4
    ltsai is offline Project Moderator
    Join Date
    March 4th, 2008
    Posts
    61
    Blog Entries
    3
    Downloads
    3
    Uploads
    0

    Default

    Hi

    Do you think is possible to get a screen shot (by camera) of the exception screen? With the exception screen shot, it will quickly help us debug this problem.

    Also, is it possible to go to the alternate screen via Alt-F2.

    You can try copying out the /tmp/kusu/exception.dump via scp to the master node. You will need to configure the ip on the node using ifconfig:

    # ifconfig eth0 172.20.0.100 netmask 255.255.0.0 up

    -Liming

  5. #5
    newton is offline Junior Member
    Join Date
    November 12th, 2008
    Posts
    12
    Downloads
    0
    Uploads
    0

    Default

    Hi,

    a photo of the screen is atached.
    Also, is it possible to go to the alternate screen via Alt-F2.
    Nope. The node is completly frozen, no alternate screens, no ssh, nothing.
    Attached Images

  6. #6
    George Goh is offline Project Moderator
    Join Date
    February 29th, 2008
    Location
    Singapore
    Posts
    26
    Blog Entries
    5
    Downloads
    16
    Uploads
    11

    Default

    Hi,

    Are you able to install a stock CEntOS on the affected node? From what you've described(same hardware/configuration as another working node, frozen screen), it seems like a hardware problem specific to that node.

    -George

  7. #7
    newton is offline Junior Member
    Join Date
    November 12th, 2008
    Posts
    12
    Downloads
    0
    Uploads
    0

    Default

    I will try installing CentOS on that node. I will also give the node that worked a second try, to see if it works again. And I will also try a third node, to see what happens there.

    Thank you for the suggestions, I will be back with a report.

  8. #8
    newton is offline Junior Member
    Join Date
    November 12th, 2008
    Posts
    12
    Downloads
    0
    Uploads
    0

    Default

    All right, after some more testing (installing CentOS from CD) it turns out that
    there was no hard disk in that node. Actually the other failing nodes had hard
    disks, but they may have other hardware problems. Well, for the moment I am
    happy with two nodes

    Now I am a bit embarresed about bothering you with my hardware problems, sorry :/

    Anyway, thank you for helping me to figure this out.

  9. #9
    George Goh is offline Project Moderator
    Join Date
    February 29th, 2008
    Location
    Singapore
    Posts
    26
    Blog Entries
    5
    Downloads
    16
    Uploads
    11

    Default

    Actually, it seems that you have found a corner case that our nodeinstaller hasn't yet taken care of. In fact, it should gracefully exit when no disks are found.

    Thanks for pointing out the issue and the cause of it. :-) We have filed it in our issue tracker[1].

    [1] [#KUSU-1026] nodeinstaller does not handle gracefully when node has no disks - Open Source Grid Development Center

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts