Oracle 10G RAC / CRS install issues (Sun cluster 3.2/ Solaris 10): Diagnostics

Hello All -Just want to share an unique RAC – CRS install issue in Oracle 10G R2 RAC/Sun cluster 3.2/Solaris 10 environment. 

Problem:

Recently we have encountered a problem during installation of  the ClusterResourceServices (CRS).
As we know, Oracle provides the runcluvfy utility that is being used to check whether the system is ready for CRS
or shared storage is configured properly.

As a pre-check list, we did run the following command.

$ /runcluvfy.sh stage -pre crsinst -n node1,node2  -verbose

The final resultant output is

“Pre-check for cluster services setup was successful on all the nodes”

As we know the next step is to install the CRS. In our unique situation, the specify nodes screens could not identify the nodes, even though the cluvfy tool did pass the pre-requisites.

Diagnostics:

The /etc/hosts file, ifconfig -a, the gui installactions related *.err, *.out files as a result of running

(./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2) did not point to anything unusual.

Initially we thought, we got the old version of ORCLudlm package that is needed for sun cluster to integrate well with oracle.

We got the latest ORCLudlm ( “Cluster Membership Monitor”) is installed, we did re-start it up by rebooting the cluster node in cluster mode:-

Verified the cluster status with the following command

Before starting OUI for CRS installation verified all is well by running below clusterware command to display the current cluster nodes

/usr/cluster/bin/scstat -q

/usr/cluster/bin/scstat -g

pkginfo -l ORCLudlm |grep VERSION.

Then it comes down to another important sun cluster daemon for Oracle RAC called “UCMMD

The attempt to start UCMMD daemon met with failure in our case

The output from attempting to start ucmmd:

bash-3.00# clresourcegroup online -emM -n node1 rac-fmwk-rg

rac-fmwk-rg: invalid resource group

clresourcegroup: (C918779) Invalid resource group “rac-fmwk-rg” specified.

bash-3.00# clresourcegroup online -emM -n node2 rac-fmwk-rg

rac-fmwk-rg: invalid resource group

clresourcegroup: (C918779) Invalid resource group “rac-fmwk-rg” specified.
The content in the below brackets is taken from the dun docs for those who are interested to go in detail.
http://docs.sun.com/app/docs/doc/819-0583/6n30h631j?l=en&a=view#ch8_ops-118
[[[The UCMM daemon, ucmmd, manages the reconfiguration of Sun Cluster Support for Oracle Real Application Clusters.

When a cluster is booted or rebooted, this daemon is started only after all components of Sun Cluster Support for Oracle Real Application Clusters are validated.

If the validation of a component on a node fails, the ucmmd fails to start on the node.

To determine the cause of the problem, examine the following files:

The UCMM reconfiguration log file can be found at /var/cluster/ucmm/ucmm_reconf.log

The system messages file

The most common causes of this problem are as follows:

The ORCLudlm package that contains the Oracle UDLM is not installed.

An error occurred during a previous reconfiguration of a component Sun Cluster Support for Oracle Real Application Clusters.

A step in a previous reconfiguration of Sun Cluster Support for Oracle Real Application Clusters timed out, causing the node on which the timeout occurred to panic.

To correct the problem, perform the appropriate recovery action for the cause of the problem and reboot the node on which ucmmd failed to start.]]]

We believed in our particular case, some important messages related to ucmmd are overlooked during the sun cluster 3.2 installations and reconfiguration. Performed the recovery action to start the ucmmd daemon and rebooted the node on which ucmmd failed to start.

Once the ucmmd daemon is up and running the CRS GUI is able to identify the nodes.
In summary, the oracle provided cluvfy pre-check is not totally reliable to give us any indication to proceed further with CRS installation. In addition to lsnodes, the ucmmd daemon must be working properly for oracle CRS to run.  Hope this note is useful in terms where to look for troubleshooting for CRS installation related problems in sun cluster/Solaris combination.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.