Thursday, May 10, 2012

Scipt to automatically move RAC 11gR2 services back to preferred instances

When instance fails in Oracle RAC, the services that were using this instance as a preferred instance are automatically relocated to instances marked as available for this service. But after the failed instance recovers and starts up again, the relocated services are not moved back and need manual srvctl relocate service command from administrator to move them back.

Here is a little Bash script to automate this process. Oracle Clusterware (Grid Infrastructure) can execute user callout scripts on FAN events, like INSTANCE up/down. Place this script under $GRID_HOME/racg/usrco/ and set the execute bits on the file. Then clusterware will execute that script for all FAN events, but the script will start processing only for instance up event.

Why is it needed? We just switched over to 4-node RAC consisting of many different applications, almost each of them connecting to its own schema. We created each application its own service restricting it to 1 (or max 2) nodes (1 or 2 nodes as preferred, all other nodes listed as available). After the first rolling patching, I noticed that the connection count and load on each node was very unbalanced, vast majority of the connections were connected to node1 and the last patched node had almost none and it did not get better over a few hours. This was because most of the services ended up on node1 and I had to manually look over each service and relocate it back where it belongs. This script attempts to automate this process.

Please use the download link for downloading, seems that blogger escapes some characters in the code below.

#!/bin/bash
#
# GI callout script to catch INSTANCE up event from clusterware and relocate services to preferred instance
# Copy or symlink this script to $GRID_HOME/racg/usrco
# Tested on Oracle Linux 5.8 with 11.2.0.3 Oracle Grid Infrastructure and 11.2.0.2 & 11.2.0.3 Oracle Database Enterprise Edition
# 2012 Ilmar Kerm ilmar.kerm@gmail.com
#

LOGFILE=/u02/app/oracle/grid_callout/log.txt
SCRIPTDIR=`dirname $0`

# Determine grid home
if [[ "${SCRIPTDIR:(-11)}" == "/racg/usrco" ]]; then
  CRS_HOME=""${SCRIPTDIR:0:$(( ${#SCRIPTDIR} - 11 ))}""
  export CRS_HOME
fi

# Only execute script for INSTANCE events
if [ "$1" != "INSTANCE" ]; then
  exit 0
fi

STATUS=""
DATABASE=""
INSTANCE=""

# Parse input arguments
args=("$@")
for arg in ${args[@]}; do
  if [[ "$arg" == *=* ]]; then
    KEY=${arg%=*}
    VALUE=${arg#*=}
    
    case "$KEY" in
      status)
        STATUS="$VALUE"
        ;;
      database)
        DATABASE="$VALUE"
        ;;
      instance)
        INSTANCE="$VALUE"
        ;;
    esac
    
  fi
done

# If database, status and instance values are not set, then exit
# status must be up
if [[ -z "$DATABASE" || -z "$INSTANCE" || "$STATUS" != "up" ]]; then
  exit 0
fi

echo "`date`" >> "$LOGFILE"
echo "[$DATABASE][`hostname`] Instance $INSTANCE up" >> "$LOGFILE"

#
# Read database software home directory from clusterware
#
DBCONFIG=`$CRS_HOME/bin/crsctl status res ora.$DATABASE.db -f | grep "ORACLE_HOME="`
if [ -z "$DBCONFIG" ]; then
  exit 0
fi
declare -r "$DBCONFIG"
echo "ORACLE_HOME=$ORACLE_HOME" >> "$LOGFILE"

# Array function
in_array() {
    local hay needle=$1
    shift
    for hay; do
        [[ $hay == $needle ]] && return 0
    done
    return 1
}

#
# Read information about services
#
for service in `$CRS_HOME/bin/crsctl status res | grep -E "ora\.$DATABASE\.(.+)\.svc" | sed -rne "s/NAME=ora\.$DATABASE\.(.+)\.svc/\1/gip"`; do
  SERVICECONFIG=`$ORACLE_HOME/bin/srvctl config service -d $DATABASE -s $service`
  echo "Service $service" >> "$LOGFILE"
  if [[ `echo "$SERVICECONFIG" | grep "Service is enabled" | wc -l` -eq 1 ]]; then
    echo " enabled" >> "$LOGFILE"
    PREFERRED=( `echo "$SERVICECONFIG" | grep "Preferred instances:" | sed -rne "s/.*\: ([a-zA-Z0-9]+)/\1/p" | tr "," "\n"` )
    #
    # Check if current instance is preferred for this service
    #
    if in_array "$INSTANCE" "${PREFERRED[@]}" ; then
      echo " preferred" >> "$LOGFILE"
      #
      # Check if service is already running on current instance
      #
      SRVSTATUS=`$ORACLE_HOME/bin/srvctl status service -d $DATABASE -s $service`
      if [[ "$SRVSTATUS" == *"is not running"* ]]; then
          #
          # if service is not running, then start it
          #
        echo " service stopped, starting" >> "$LOGFILE"
        $ORACLE_HOME/bin/srvctl start service -d "$DATABASE" -s "$service" >> "$LOGFILE"
      else
        #
        # Service is running, but is it running on preferred instance?
        #
        RUNNING=( `echo "$SRVSTATUS" | sed -rne "s/.* ([a-zA-Z0-9]+)/\1/p" | tr "," "\n"` )
        echo "${RUNNING[@]} = ${PREFERRED[@]}"
        if ! in_array "$INSTANCE" "${RUNNING[@]}" ; then
          echo " not running on preferred $INSTANCE" >> "$LOGFILE"
          #
          # Find the first non-preferred running instance
          #
          CURRENT=""
          for inst in "${RUNNING[@]}"; do
            if ! in_array "$inst" "${PREFERRED[@]}" ; then
              CURRENT="$inst"
              break
            fi
          done
          #
          # Relocate
          #
          if [[ -n "$CURRENT" ]]; then
            echo " relocate $CURRENT -> $INSTANCE" >> "$LOGFILE"
            $ORACLE_HOME/bin/srvctl relocate service -d "$DATABASE" -s "$service" -i "$CURRENT" -t "$INSTANCE" >> "$LOGFILE"
          fi
        else
          #
          # Service is already running on preferred instance, no need to do anything
          #
          echo " running on preferred $INSTANCE" >> "$LOGFILE"
        fi
      fi
    fi
  fi
done

Download: relocate_services_callout.sh

Tested on Oracle Linux 5.8 with Oracle Grid Infrastructure 11.2.0.3 and Oracle Database 11.2.0.2 and 11.2.0.3.

8 comments:

  1. Thanks Ilmar, the script worked like a charm !

    ReplyDelete
  2. Hi Ilmar,

    Could you please let me know how i need to run this script to test . I ran the script as shell script it doesnt work . I have placed the script $CRS_HOME/racg/usrco

    ksh -x service_callout.sh
    + LOGFILE=/tmp/grid_callout.txt
    + + dirname service_callout.sh
    SCRIPTDIR=.
    service_callout.sh[11]: "${SCRIPTDIR:(-11)}": bad substitution


    ReplyDelete
    Replies
    1. This is a bash script not ksh script and you need to set the executable bits on this file also:
      chmod a+x $CRS_HOME/racg/usrco/relocate_services_callout.sh

      Then you can just execute $CRS_HOME/racg/usrco/relocate_services_callout.sh with the parameters you need.

      Delete
  3. Can you please give an detail example , as which all parameters to provide to your script for service relocation.

    ReplyDelete
  4. please give an example , as which all parameters to provide to your script for service relocation

    ReplyDelete
    Replies
    1. The parameters are defined by Grid Infrastructure. For example to simulate instance up event for instance "db1" in database "db" execute:

      $CRS_HOME/racg/usrco/relocate_services_callout.sh INSTANCE status=up instance=db1 database=db

      Delete
  5. Hi,
    Thanks a lot .

    Do you have simiilar script for 10g ?

    ReplyDelete
    Replies
    1. I have not tested this script on 10g.

      Delete