Monday, August 27, 2012

Topspin 2.1 and RHEL 6.3 - how to get out of this bind

The bug had bitten me again. When I decided to upgrade my ageing computer box (about 9+ yrs. old now) that runs Topspin 2.1 PL6 with an Avance console,  I wanted to go for the RedHat Enterprise Linux 6.3.  Just for nostalgia, in the days when SGI workstations roamed the earth, XWINNMR used to be the software ruling the Brukerland and it had the special requirement that the graphics card support 8-bit colour depth, while the computer hardware and OS moved on to support 24 bit depth typically.  The graphics card that did double duty, has become so obsolete of late that the 'mga' driver,  seemed to cause problems slowing down my systems. That is why I decided to move on with the newer hardware.  That and I also want to move away from ATAPI hard drives.

Instead of making life easy by going with RHEL 5, I decided to go with 6.3.

ProblemCCU won't boot.  

Ending : happy ending :-)   

I will summarize quickly what's the matter.

Background : 

Digging a bit with wireshark packet sniffer,  we could clearly see that the initial conversation between 'spect', which is the CCU11 board and 'ASP_ST2', which is the Linux box does take place.  It proceeds to the point where, spect, which is the diskless client, asks for a port number in which bootparamd is listening. It tries to get the info from the ASP_ST2 server.   bootparamd, similar to nfs or rquotad belongs to the RPC family of servers.  Normally, they register themselves to a program called portmap,  which in turn informs a connecting client such as spect, to which port number the client is supposed to send its communications to talk to that particular server, in this case bootparamd.  

I show here a packet, where this request for port number from  spect (IP 149.236.99.99) to ASP_ST2 (IP 149.236.99.1) is rejected.   In the upper half of the window that summarizes the traffic, note the last line "Portmap GETPORT Reply Port:0 PROGRAM_NOT_AVAILABLE".   Also note the 3rd line, Internet Protocol that shows you who is sending this message to whom i.e. 149.236.99.1 --> sends this packet to --> 149.236.99.99.  


 

Problem and resolution : 

With RHEL 6.3  (which is derived from Fedora 14 and higher), the newer program rpcbind is used in place of the conventional portmap. The following wikipedia page underlines the fact that these two programs are different avatars of the same entity.   The portmap seems to be the older version and rpcbind is the newer version.  With RHEL 6.3, for an inexplicable reason, both the portmap  and rpcbind are installed and turned on by default. 

In a system where portmapper function is fine, you can enter the command rpcinfo -p and get a typical output similar to this :

 program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  34528  status
    100024    1   tcp  52726  status
    100011    1   udp    875  rquotad
    100011    2   udp    875  rquotad
    100011    1   tcp    875  rquotad
    100011    2   tcp    875  rquotad
    100005    1   udp  52514  mountd
    100005    1   tcp  55703  mountd
    100005    2   udp  50364  mountd
    100005    2   tcp  48481  mountd
    100005    3   udp  58813  mountd
    100005    3   tcp  53255  mountd
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    2   tcp   2049  nfs_acl
    100227    3   tcp   2049  nfs_acl
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100227    2   udp   2049  nfs_acl
    100227    3   udp   2049  nfs_acl
    100021    1   udp  46281  nlockmgr
    100021    3   udp  46281  nlockmgr
    100021    4   udp  46281  nlockmgr
    100021    1   tcp  46143  nlockmgr
    100021    3   tcp  46143  nlockmgr
    100021    4   tcp  46143  nlockmgr
    100026    1   udp    721  bootparam

Note that the portmapper (whichever implementation it is i.e. rpcbind or portmap) always listen on port 111

 When both rpcbind and portmapper are running, you get an error message saying that : No RPC program registered.   

I tried turning off rpcbind using chkconfig and left the older portmap running. The problem remained as it is.  But, when I turned off portmap and left the newer rpcbind running, I could see that the RPC servers are registering with the portmapper, as shown by the above listing.

With this setup, CCU11 i.e. spect boots correctly.  Now, let us look at the same packet where ASP_ST2 is sending a reply to spect for the latter's GET_PORT request.  As before, in the upper half of the packet image, the last line shows :
 Portmap GETPORT Reply Port : 724 Port: 724.  Now, if you run rpcinfo -p you can confirm that port 724 is where bootparamd is listening.


A sidetrack on bootparamd and dhcpd :

bootparamd is the precursor to the dhcp protocol and this serves the /usr/diskless/client  file tree to the CCU11, which then bootstraps to the code contained therein and a minimal UNIX environment takes shape.

With TS2.1 onwards, the installation of diskless from the Topspin DVD automatically installs a dhcpd.conf also under /etc/ directory.  Couple of points on that :
  • As long as your console does not have the newer IPSO, you don't need this dhcp server to be running. The entire diskless boot is happening via bootparamd
  • If you happen to upgrade to a newer console AVANCE-II or III that has an IPSO, you need this dhcpd daemon to be running.  
  • With RHEL 6.3, you should place the bruker suppled dhcpd.conf in the /etc/dhcp/  directory, since that is where the script expects the conf file. Your dhcpd daemon will not start, with the default /etc/dhcpd.conf  location.

1 comment:

  1. Recently, I built a RHEL5.9 box and I ended up with exactly the same symptoms. But the cause turned out to be much more simple. When I ran 'rpcinfo -p' I did not see bootparam anywhere. Doing a 'chkconfig' I found that bootparamd was not turned on at the time of booting. Once 'bootparamd' was started, the communication with 'spect' succeeded.

    ReplyDelete