Showing posts with label RAC. Show all posts
Showing posts with label RAC. Show all posts

Monday, 2 March 2015

oracleasm createdisk marking disk FAILED with Unable to open device Device or resource busy

While adding new lun on the ASM storage we are getting below error in /var/log/oracleasm

Error:
/etc/init.d/oracleasm createdisk ASM_DISK02 /dev/sdf1
Marking disk "ASM_DATA02" as an ASM disk: [FAILED]

and in /var/log/oracleasm we are getting below error

Unable to open device "/dev/sdf1": Device or resource busy
Unable to open device "/dev/sdf1": Device or resource busy


Solution:

We are using multipath storage for backup.So in /etc/multipath.conf ,we blacklist the wwid of the new luns.

Then we are able to createdisk at asm level.

Friday, 27 February 2015

Moving RAC database from one disk group to other new ASM disk group

We had IO issue in the DB,So we plan to move to the new fast disk and from db side we have to copy the whole DB as per the requirement and standard.

Like below

Old Disk group = +ASM_DATA01
New Disk = +BW_DATA_01, +BW_REDO_01, +BW_CONTROL_01, +BW_FRA_01, +BW_UNDO_01 and +BW_TEMP_01

Splitting the file as below.

BW_DATA_01  ==> Data file
BW_REDO_01 ==>  Redo logiles
BW_CONTROL_01 ==> Controlfiles
BW_FRA_01  ==>  Archive and FRA
BW_UNDO_01  ==>  Undo files
BW_TEMP_01 ==>  Temp files

Below is the steps followed 
+ Added the new disk group to ASM disk
+ Stop and started the db in mount state
+ Copied the Init file and edited to new controlfile
+ Copied the controlfile to the new disk
+ Month the DB with new init file
+ Using RMAN copy the datafiles to new ASM disk
+ Switch the datafiles to new location.
+ Open the database and create the spfile
+ Start the DB in both the RAC node
+ Create the new temp file in new ASM disk group and drop the old temp file
+ Create the new UNDO tablespace in new ASM disk group and drop the old undo files.
+ Change the archive log and FRA location to the New ASM disk group.
+ Create the new Redo log in new ASM disk group and drop the old redo


++ Brief steps:
+ Adding disk group
/dev/sda1  ==> BW_DATA_01  ==> 1.5Tb
/dev/sdb1  ==> BW_REDO_01 ==> 50Gb
/dev/sdc1 ==> BW_CONTROL_01 ==> 50Gb
/dev/sdd1 ==> BW_FRA_01  ==> 500Gb
/dev/sde1 ==> BW_UNDO_01  ==> 100Gb
/dev/sdf1 ==> BW_TEMP_01 ==> 300Gb

/etc/init.d/oracleasm createdisk BW_DATA_01 /dev/sda1
/etc/init.d/oracleasm createdisk BW_REDO_01 /dev/sdb1
/etc/init.d/oracleasm createdisk BW_CONTROL_01 /dev/sdc1
/etc/init.d/oracleasm createdisk BW_FRA_01 /dev/sdd1
/etc/init.d/oracleasm createdisk BW_UNDO_01 /dev/sde1
/etc/init.d/oracleasm createdisk BW_TEMP_01 /dev/sdf1

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks
Using asmca create the disk group in ASM.

####################################################################
++ Checking the current structure DB.

select name from v$controlfile;
select name from v$datafile;
select name from v$tempfile;
select member from v$logfile;

++Starting the activity ,
Login to the DB and switch the logs.
sqlplus

ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER SYSTEM ARCHIVE LOG CURRENT;

alter system checkpoint;
alter system checkpoint;

alter session set nls_date_format='YYYY/MON/DD hh24:mi:ss';
select checkpoint_time,fuzzy,count(*),status
from ( select checkpoint_time,fuzzy,status
       from v$datafile_header
       union all
       select controlfile_time,'CTL',null from v$database)
group by checkpoint_time,fuzzy,status;


####################################################################
++ Disable the Standby
alter system set log_archive_dest_state_2=DEFER scope=BOTH sid='*';

++ shut down and mount the db
shut immediate
startup mount

++ Copying control file to new disk group.
rman target /

copy current controlfile to '+BW_CONTROL_01/TESTDB/CONTROLFILE/BW_control.ctl';

+ Changing the parallelism to the max for parallel copy of db 
CONFIGURE DEVICE TYPE DISK PARALLELISM  60 BACKUP TYPE TO BACKUPSET;

+Creating initfile in text format.
create pfile='/tmp/initTESTDB1.ora' from spfile;
shut immediate

++ edit the pfile in /tmp/initTESTDB1.ora for below
-Controlfile location *.control_files=
-Cluster value to false *.cluster_database=FALSE

++ mount the DB with updated initfile.
startup mount pfile='/tmp/initTESTDB1.ora';

++ Copy the datafiles from old to new mount point
(asmcmd cp +ASM_DATA01  to +BW_DATA_01 or rman copy command or rman "backup as copy database format +BW_DATA_01")

rman target /
backup as copy database format '+BW_DATA_01';
switch database to copy;
alter database open resetlogs;

Ignore this warring :error:RMAN-06497: WARNING: control file is not current, control file AUTOBACKUP skippe

####################################################################
++ Moving initfile to ASM

create SPFILE='+BW_DATA_01/TESTDB/spfileTESTDB.ora' from pfile='/u01/app/oracle/product/11.2.0/db_1/dbs/initTESTDB1.ora';
startup force;
####################################################################

+++Change Undo tablespace to BW_UNDO_01 disk group.

Follow the link

<< http://oracletechdba.blogspot.com/2015/02/moving-undo-tablespace-to-new-asm.html >>

####################################################################

+++Control file multiplex
Follow the link

<< http://oracletechdba.blogspot.com/2015/02/multiplexing-control-file-in-asm.html>> 
####################################################################

+++ Changing Temp tablespace Location to new disk group. 
Follow the link
<< http://oracletechdba.blogspot.com/2015/02/changing-temp-tablespace-location-to.html >>

####################################################################
+++ Changing Redo Log location:
Follow the link
<< http://oracletechdba.blogspot.com/2015/02/changing-redo-log-to-new-disk-group-in.html>>

####################################################################
++ Changing Archive log location

Follow the link
<< http://oracletechdba.blogspot.com/2015/02/changing-archive-log-location.html >>

####################################################################

Restart the DB

####################################################################verify all the files are in new location.

select name from v$controlfile
union
select name from v$datafile
union
select name from v$tempfile
union
select member from v$logfile
union
select filename from v$block_change_tracking

show parameter log
show parameter recover

Changing Redo Log to new disk group in ASM RAC

+++ Changing Redo Log location,This can be done by adding the new redo logs in the new disk group and dropping the old one.

Check current group and redologs.

select group#,thread#,members,status from v$log
++ Add the new redo logs for node-1
ALTER DATABASE ADD LOGFILE THREAD 1
GROUP 1 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m,
GROUP 2 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m,
GROUP 3 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m,
GROUP 4 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m;
++ Add the new redo logs for node-2
ALTER DATABASE ADD LOGFILE THREAD 2
GROUP 51 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m,
GROUP 52 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m,
GROUP 53 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m,
GROUP 54 ('+BW_REDO_01','+BW_CONTROL_01') SIZE 512m;

select group#,thread#,members,status from v$log ;
alter system switch logfile;

++ Drop Old disk group Location 
Alter database drop logfile group 10;
Alter database drop logfile group 11;
Alter database drop logfile group 12;
Alter database drop logfile group 13;

+We can drop only when the redo group is in INACTIVE state,So we have to switch the redo to make the old redo as inactive.
ALTER SYSTEM ARCHIVE LOG CURRENT;
select group#,thread#,members,status from v$log where thread#=2;
Alter database drop logfile group 13;

Drop all the standby redo logfiles from old Diskgroup and add the to New Disk group if necessary. 
and all the members are moved to the new location.

select member from v$logfile;

Moving UNDO tablespace to NEW ASM diskgroup in RAC

+++Change Undo tablespace to New BW_UNDO_01 disk group.
This can done by adding new UNDO tablespace and droping the old undo tablespace.

+ Adding undo for Node-1 
create undo tablespace UNDOTBS01 DATAFILE '+BW_UNDO_01' SIZE 8g;
alter tablespace UNDOTBS01 add datafile '+BW_UNDO_01' SIZE 8G;
alter tablespace UNDOTBS01 add datafile '+BW_UNDO_01' SIZE 8G;
alter tablespace UNDOTBS01 add datafile '+BW_UNDO_01' SIZE 8G;
+ Adding undo for Node-2
create undo tablespace UNDOTBS02 DATAFILE '+BW_UNDO_01' SIZE 8G;
alter tablespace UNDOTBS02 add datafile '+BW_UNDO_01' SIZE 8G;
alter tablespace UNDOTBS02 add datafile '+BW_UNDO_01' SIZE 8G;
alter tablespace UNDOTBS02 add datafile '+BW_UNDO_01' SIZE 8G;

ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS01 SCOPE=BOTH SID='TESTDB1';
ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS02 SCOPE=BOTH SID='TESTDB2';

show parameter undo ==> this value should show the new undo tablespaces.

++ Check any old rollback segment was used by online.
select owner, segment_name, tablespace_name, status from dba_rollback_segs where tablespace_name='UNDOTBS1' and status='ONLINE';

SELECT a.name,b.status , d.username , d.sid , d.serial# FROM v$rollname a,v$rollstat b, v$transaction c , v$session d WHERE a.usn = b.usn
AND a.usn = c.xidusn AND c.ses_addr = d.saddr AND a.name IN ( SELECT segment_name FROM dba_segments WHERE tablespace_name = 'UNDOTBS1');

++ If the above is zero rows then we can drop the old undo
DROP TABLESPACE undotbs1 ;
select file_name from dba_data_files where tablespace_name='UNDOTBS1';

++ For Node 2
SELECT a.name,b.status , d.username , d.sid , d.serial# FROM v$rollname a,v$rollstat b, v$transaction c , v$session d WHERE a.usn = b.usn
AND a.usn = c.xidusn AND c.ses_addr = d.saddr AND a.name IN ( SELECT segment_name FROM dba_segments WHERE tablespace_name = 'UNDOTBS1');

talbespace
DROP TABLESPACE undotbs2 ;
select file_name from dba_data_files where tablespace_name='UNDOTBS2';

Wednesday, 3 December 2014

ocrcheck PROC-26: Error while accessing the physical storage

After server restart on node-1 CRS is not coming up, But during dia we are getting not able to access physical storage,But in real that could not be the issue.
To verify  gpnpd was able to read the profile in OCR disk.
from the node verify the below.

++ OCR check fails

[root@ node-1 bin]# ./ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service

++ But we are able to access the Disk.

[root@ node-1 bin]# ls -lr /dev/oracleasm/disks/*
brw-rw---- 1 grid asmadmin  8, 225 Nov 27 23:31 /dev/oracleasm/disks/OCR_VOTE02
brw-rw---- 1 grid asmadmin  8, 241 Nov 27 23:31 /dev/oracleasm/disks/ASM_FRA02
brw-rw---- 1 grid asmadmin 65,   1 Nov 27 20:12 /dev/oracleasm/disks/ASM_DATA05
brw-rw---- 1 grid asmadmin 65,  17 Nov 27 20:12 /dev/oracleasm/disks/ASM_DATA04

[root@ node-1 bin]#  /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]

[root@ node-1 bin]# /etc/init.d/oracleasm listdisks
ASM_DATA04
ASM_DATA05
ASM_FRA02
OCR_VOTE02
[root@ node-1 bin]#
[root@ node-1 bin]# /etc/init.d/oracleasm querydisk OCR_VOTE02
Disk "OCR_VOTE02" is a valid ASM disk
[root@ node-1 bin]#

++We can able to read the data as well usinf kfed.

[root@ node-1 bin]# ./kfed  read  /dev/oracleasm/disks/OCR_VOTE02 |more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483649 ; 0x008: TYPE=0x8 NUMB=0x1
kfbh.check:                   344414986 ; 0x00c: 0x14875b0a
kfbh.fcn.base:                     1473 ; 0x010: 0x000005c1
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000


So the issue is some thing less follow the next post. 
<<http://oracletechdba.blogspot.com/2014/12/112-grid-crs-is-not-starting-after.html>>

/var/log/messages error with multipathd: asm!.asm_ctl_spec: add path (uevent)

When we rescan iscsi getting below error in /var/log/messages

multipathd: asm!.asm_ctl_vbg7: add path (uevent)
multipathd: asm!.asm_ctl_vbg7: failed to store path info
multipathd: uevent trigger error
multipathd: asm!.asm_ctl_vbg8: add path (uevent)
multipathd: asm!.asm_ctl_vbg8: failed to store path info
multipathd: uevent trigger error
multipathd: ofsctl: add path (uevent)
multipathd: ofsctl: failed to store path info
multipathd: uevent trigger error
multipathd: ofsctl: remove path (uevent)
kernel: ACFSK-0039: Module unloaded.
multipathd: ofsctl: spurious uevent, path not in pathvec
multipathd: uevent trigger error
multipathd: asm!.asm_ctl_vbg8: remove path (uevent)
multipathd: asm!.asm_ctl_vbg8: spurious uevent, path not in pathvec
multipathd: uevent trigger error
multipathd: asm!.asm_ctl_vbg7: remove path (uevent)
multipathd: asm!.asm_ctl_vbg7: spurious uevent, path not in pathvec
multipathd: uevent trigger error


Solution :

Modify /etc/multipath.conf  and add the highlighted two lines to the file on blacklist.
>>
blacklist {
        devnode         "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode         "^hd[a-z][[0-9]*]"
        devnode         "^asm/*"
        devnode         "ofsctl"

}

Monday, 27 October 2014

ORACLE RAC Performance reference

This post will help to find the cumulative of reference for RAC database performance.


++ Top 5 Database and/or Instance Performance Issues in RAC Environment (Doc ID 1373500.1)
++ Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
++ SRDC - Data Collection for the RAC Database/Instance Performance (not Hang) Issues (Doc ID 1675272.1)
++ SQLT Diagnostic Tool (Doc ID 215187.1)
++ Script to Collect RAC Diagnostic Information (racdiag.sql) (Doc ID 135714.1)

Monday, 15 September 2014

RAC database installation INFO: PRVF-9652 : Cluster Time Synchronization Services check failed

Error:
During RAC installation of oracle software.
I am getting "INFO: PRVF-9652 : Cluster Time Synchronization Services check failed"

When testing the Cluster Time Synchronization Services check failed as below.

[grid@node-01 bin]$ ./cluvfy comp clocksync

Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed


Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP


Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
PRVF-5402 : Warning: Could not find NTP configuration file "/etc/ntp.conf" on node "node-01"
PRVF-5405 : The NTP configuration file "/etc/ntp.conf" does not exist on all nodes
PRVF-5414 : Check of NTP Config file failed on all nodes. Cannot proceed further for the NTP tests

Checking daemon liveness...
Liveness check failed for "ntpd"
Check failed on nodes:
        node-01
PRVF-5494 : The NTP Daemon or Service was not alive on all nodes
PRVF-5415 : Check to see if NTP daemon or service is running failed
Clock synchronization check using Network Time Protocol(NTP) failed


PRVF-9652 : Cluster Time Synchronization Services check failed

Verification of Clock Synchronization across the cluster nodes was unsuccessful on all the specified nodes.


Solution:

mv  /etc/sysconfig/ntpd  /etc/sysconfig/ntpd_bk

mv /etc/ntp.conf /etc/ntp.conf_bk

Then run "cluvfy comp clocksync" in both nodes.
[grid@node-01 ~]$cd $GRID_HOME/bin
[grid@node-01 ~]$ cluvfy comp clocksync

Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed


Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Check of clock time offsets passed


Oracle Cluster Time Synchronization Services check passed

Verification of Clock Synchronization across the cluster nodes was successful.
[grid@node-01 ~]$


RAC GRID Root.sh ORA-27091 ORA-15081 unable to queue I/O

ERROR:

[root@node-01 install]# /u01/app/11.2.0/grid/root.sh
Running Oracle 11g root script...

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-2672: Attempting to start 'ora.mdnsd' on 'node-01'
CRS-2676: Start of 'ora.mdnsd' on 'node-01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'node-01'
CRS-2676: Start of 'ora.gpnpd' on 'node-01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node-01'
CRS-2672: Attempting to start 'ora.gipcd' on 'node-01'
CRS-2676: Start of 'ora.cssdmonitor' on 'node-01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'node-01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node-01'
CRS-2672: Attempting to start 'ora.diskmon' on 'node-01'
CRS-2676: Start of 'ora.diskmon' on 'node-01' succeeded
CRS-2676: Start of 'ora.cssd' on 'node-01' succeeded

ASM created and started successfully.

Disk Group OCR_VOTE created successfully.

Errors in file :
ORA-27091: unable to queue I/O
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at line 4
PROT-1: Failed to initialize ocrconfig
PROC-26: Error while accessing the physical storage
ORA-27091: unable to queue I/O
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at line 4

Failed to create Oracle Cluster Registry configuration, rc 255
Oracle Clusterware Repository configuration failed at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6471.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
[root@node-01 install]#


Solution:

The ASM Lib configuration was done with oracle id,But the grid installation i am doing with grid user.

[root@node-02 11202stage]# /usr/sbin/oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=oracle
ORACLEASM_GID=dba
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

[root@node-02 11202stage]#

Update the configuration of ASMlib

/usr/sbin/oracleasm configure -u grid 

[root@node-02 11202stage]# /usr/sbin/oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

[root@node-02 11202stage]#

For fresh installation delete the Disk group at asmlib.

/usr/sbin/oracleasm deletedisk OCR_VOTE01
/usr/sbin/oracleasm deletedisk ASM_DATA01
/usr/sbin/oracleasm deletedisk ASM_FRA01

And Recreate

/usr/sbin/oracleasm createdisk OCR_VOTE01 /dev/sde1
/usr/sbin/oracleasm createdisk ASM_DATA01 /dev/sdf1

/usr/sbin/oracleasm createdisk ASM_FRA01 /dev/sdg1

/usr/sbin/oracleasm listdisks

/usr/sbin/oracleasm scandisks

Now rerun the root.sh

Monday, 9 June 2014

Drop database for RAC and NON RAC

Drop database for non-Rac Database for cloning.

To know what are the files will be droped.
select name from v$datafile;
select member from v$logfile;
select name from v$controlfile;

in SQL>
SQL> shutdown abort;
SQL> startup mount exclusive restrict;
SQL> DROP DATABASE; -- to delete datafile,logfile and controlfile.


RMAN> DROP DATABASE INCLUDING BACKUPS; -- Will delete archivelogs and backup pieces.

Drop database for FOR RAC Database for cloning

srvctl stop database -d <DB NAME> -o abort
from one node
sqlplus "/as sysdba"
SQL>alter system set cluster_database=false scope=spfile sid='*'; 

SQL> shutdown abort;
SQL> startup mount exclusive restrict;
SQL> DROP DATABASE; -- to delete datafile,logfile and controlfile.


RMAN> DROP DATABASE INCLUDING BACKUPS; -- Will delete archivelogs and backup pieces.

Sunday, 8 June 2014

PRVF-5184 : check of following Udev attributes

Error:
======
PRVF-5184 : check of following Udev attributes of /dev/oracleasm/disks/ORC_VOTE01 failed:
"[Owner: found='root' Expected='grid',Group:found='root' Explected='asmadmin',permissions:found='0600'
Expected='0660']"

Solution:
=========
Ignore this and proceeds with installation

PRVF-4661 : Found inconsistent hosts entry in /etc/nsswitch.conf

RAC GRID - Post installation screen fail with PRVF-4661 : Found inconsistent hosts entry in /etc/nsswitch.conf
Error:
======
Oracle rac 11gr2 fails with Configure oracle grid infrastructure for a cluster.
Oracle rac 11gr2 oracle cluster verification utility failed

INFO: Post-check for cluster services setup was unsuccessful.
INFO: Checks did not pass for the following node(s):
INFO:   Node-01,Node-02
INFO:
WARNING:
INFO: Completed Plugin named: Oracle Cluster Verification Utility

Checks did not pass for the following node Post-check for cluster services setup was unsuccessful

ERROR:
PRVF-4661 : Found inconsistent hosts entry in /etc/nsswitch.conf on node Node-01

Solution:
=========
Change the setting of hosts in the file /etc/nsswitch.conf,It should be simitar to both the nodes as well.
Like below.
 hosts:    files   dns   nis

Then run the post cluster verify as below.

 ./cluvfy stage -post crsinst -n all -verbose

Rac GRID instal hangs at oracle.cluster.deployment.ractrans.ClientHandlerSupervisor.threadCleanup

Rac grid installation hangs as below
 

Error:
======
Node-2 hangs as below
 

Exception in thread "Install API Thread" java.lang.NullPointerException
        at oracle.cluster.deployment.ractrans.ClientHandlerSupervisor.threadCleanup(ClientHandlerSupervisor.java:981)
        at oracle.cluster.deployment.ractrans.RACTransfer.cleanup(RACTransfer.java:1757)
        at oracle.cluster.deployment.ractrans.RACTransfer.transferDirStructureToNodes(RACTransfer.java:747)
        at oracle.cluster.deployment.ractrans.RACTransfer.transferDirToNodes(RACTransfer.java:253)
        at oracle.ops.mgmt.cluster.ClusterCmd.transferDirToNodes(ClusterCmd.java:3119)
        at oracle.ops.mgmt.cluster.ClusterCmd.transferDirToNodes(ClusterCmd.java:3038)
        at oracle.sysman.oii.oiip.oiipg.OiipgClusterOps.transferDirToNodes(OiipgClusterOps.java:947)
        at oracle.sysman.oii.oiif.oiifw.OiifwClusterCopyWCCE.doOperation(OiifwClusterCopyWCCE.java:544)
        at oracle.sysman.oii.oiif.oiifb.OiifbCondIterator.iterate(OiifbCondIterator.java:171)
        at oracle.sysman.oii.oiif.oiifw.OiifwActionsPhaseWCDE.doOperation(OiifwActionsPhaseWCDE.java:633)
        at oracle.sysman.oii.oiif.oiifb.OiifbLinearIterator.iterate(OiifbLinearIterator.java:147)
        at oracle.sysman.oii.oiic.OiicInstallAPISession$OiicAPISelCompsInstall.doOperation(OiicInstallAPISession.java:1072)
        at oracle.sysman.oii.oiif.oiifb.OiifbCondIterator.iterate(OiifbCondIterator.java:171)
        at oracle.sysman.oii.oiic.OiicInstallAPISession.doInstallAction(OiicInstallAPISession.java:656)
        at oracle.sysman.oii.oiic.OiicInstallAPISession.access$000(OiicInstallAPISession.java:91)
        at oracle.sysman.oii.oiic.OiicInstallAPISession$OiicActionsThread.run(OiicInstallAPISession.java:948)

Solution:
=========
Confirm that you have disabled the "iptables"

Check if firewall is running
# service iptables status
If it is running, stop it and retry instalation.
# service iptables stop

INS 40912 - Rac Grid installation fail with

Rac Grid installation fail with INS 40912

Error:
--------
INS 40912 virtual host name is assigned to another system on the network


Solution:
--------
This could be due Virtual host name is used by some one or wrong configuration of vitual host
ping <vitual hostname> or <vitual ip>
If ping was working check the network configuration and Restart the newtork

#service network restart

Root.sh fails with crsconfig_lib.pm line 1016

Rac installtion,Root.sh fails with crsconfig_lib.pm line 1016
Error:
=========
Node-2: root.sh fails as below.

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node za-rac-uat-01, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 1016.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed


Solution:
========
As per note: Oracle Grid Infrastructure 11.2.0.2 Installation or Upgrade may fail due to Multicasting Requirement [ID 1212703.1]
Apply patch 9974223
Download the patch and apply as below.
As root user:
#/u01/app/11.2.0/grid/crs/install/rootcrs.pl -unlock

#cd <GRID_HOME>/Opatch/
#./opatch napply -local -oh /u01/app/11.2.0/grid -id 9974223
 

#opatch lsinventory -detail -oh /u01/app/11.2.0/grid
#./opatch lsinventory -oh /u01/app/11.2.0/grid

As a root run below.
#<GRID_HOME>/crs/install/rootcrs.pl -patch

Repeat this on Node-2.Then check the CRS status.

crsctl stat res -t

Node-2 root.sh fails with CRS-2674

During RAC 11gr2 installtion while running root.sh on node-2 you may face below network due to sub-netmask incorrect between nodes.

ERROR:
======
root.sh fails on node-2

2011-08-04 18:26:05: output for start nodeapps is  PRCR-1013 : Failed to start resource ora.net1.network PRCR-1064 : Failed to start resource ora.net1.network on node node-lab-02 CRS-2674: Start of 'ora.net1.network' on 'node-lab-02' failed PRCR-1013 : Failed to start resource ora.ons PRCR-1064 : Failed to start resource ora.ons on node node-lab-02 CRS-2674: Start of 'ora.net1.network' on 'node-lab-02' failed
2011-08-04 18:26:05: output of startnodeapp after removing already started mesgs is PRCR-1013 : Failed to start resource ora.net1.network PRCR-1064 : Failed to start resource ora.net1.network on node node-lab-02 CRS-2674: Start of 'ora.net1.network' on 'node-lab-02' failed PRCR-1013 : Failed to start resource ora.ons PRCR-1064 : Failed to start resource ora.ons on node node-lab-02 CRS-2674: Start of 'ora.net1.network' on 'node-lab-02' failed
2011-08-04 18:26:05: /u01/app/11.2.0/grid/bin/srvctl start nodeapps -n node-lab-02 ... failed
2011-08-04 18:26:05: Running as user grid: /u01/app/11.2.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_NODECONFIG -state FAIL
2011-08-04 18:26:05: s_run_as_user2: Running /bin/su grid -c ' /u01/app/11.2.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_NODECONFIG -state FAIL '
2011-08-04 18:26:05: Removing file /tmp/file7zp6I4
2011-08-04 18:26:05: Successfully removed file: /tmp/file7zp6I4
2011-08-04 18:26:05: /bin/su successfully executed

 PRCR-1013 : Failed to start resource ora.net1.network PRCR-1064 :
 Failed to start resource ora.net1.network on node node-lab-02
 CRS-2674: Start of 'ora.net1.network' on 'node-lab-02' failed
 PRCR-1013 : Failed to start resource ora.ons
 PRCR-1064 : Failed to start resource ora.ons on node node-lab-02
 CRS-2674: Start of 'ora.net1.network' on 'node-lab-02' failed

GSD exists

ASM terminated ORA-00020: maximum number of processes 100 exceeded

ASM terminated ORA-00020: maximum number of processes 100 exceeded


Login to the ASM as sysasm and change the process value,We need to restart the complete crs since asm hosts data of database as well.

We can do its in rolling forward method from one by one instance.



SQL> alter system set processes=300 scope=BOTH sid='*';

System altered.

SQL>


++ Restart the asm instance one by one.

Saturday, 7 June 2014

OS Kernal upgrade on RAC ASM system

OS Kernal upgrade on RAC ASM system

How upgrade OS with out RAC  Grid infrastructure reinstall.

1.Stop the database and GRID from the OS Upgrading node.
    <GRID_HOME>/bin/crsctl stop crs
2.Take OCR back up
    <GRID_HOME>/bin/ocrconfig -export
3.Disable auto start of Grid infra
    $GRID_HOME/bin/crsctl disable crs
4.upgrade the OS(Server team will take care)
       onces the system is up check and apply the compatible ASMLib for the kernal which you has been upgraded.

Adding ASM disk group in VM lab

Adding ASM disk group in VM lab

We received the following message

ERROR at line 1:
ORA-01119: error in creating database file '+DATA'
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup space exhausted

Check the diskspace in ASM:
SQL> select GROUP_NUMBER, NAME,TOTAL_MB, FREE_MB, USABLE_FILE_MB from V$ASM_DISKGROUP;

GROUP_NUMBER NAME TOTAL_MB FREE_MB USABLE_FILE_MB
------------ ------------------------------ ---------- ---------- --------------
1 ARCHIVELOGS 1349 1283 1283
2 DATA 2039 4 4
3 ONLINELOGS 705 581 581

SQL> select substr(name,1,10) name,substr(path,1,20) path, REDUNDANCY, TOTAL_MB, os_mb, free_mb from V$ASM_DISK where GROUP_NUMBER = 2;

NAME PATH REDUNDA TOTAL_MB OS_MB FREE_MB
---------- -------------------- ------- ---------- ---------- ----------
VOL1 ORCL:VOL1 UNKNOWN 2039 2039 4

First try to extend current diskgroup.

SQL> ALTER DISKGROUP data RESIZE DISK VOL1 size 4G;
ALTER DISKGROUP data RESIZE DISK VOL1 size 4G
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15289: ASM disk VOL1 cannot be resized beyond 2039 M

Diskgroup cannot be extended. Only solution is to add a new disk.
After adding the new disk, the new disk device is /dev/sde

List all current disks on Linux:
sfdisk -l

Now create a partition on /dev/sde to span the whole disk.

[root@dbvisit32 /]# fdisk /dev/sde

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-261, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-261, default 261):
Using default value 261

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@dbvisit32 /]# ls -la /dev/sde*
brw-r----- 1 root disk 8, 64 Jun 7 11:59 /dev/sde
brw-r----- 1 root disk 8, 65 Jun 7 11:59 /dev/sde1

Partition /dev/sde1 is now created.

Now make the disk available to ASM.

[root@dbvisit31 ~]# /etc/init.d/oracleasm listdisks
VOL1
VOL2
VOL3
[root@dbvisit31 ~]# /etc/init.d/oracleasm createdisk VOL4 /dev/sde1
Marking disk "/dev/sde1" as an ASM disk: [ OK ]
[root@dbvisit31 ~]# /etc/init.d/oracleasm listdisks
VOL1
VOL2
VOL3
VOL4

Go back to ASM and add the new disk.

SQL> select substr(name,1,10) name,substr(path,1,20) path, REDUNDANCY, TOTAL_MB, os_mb, free_mb from V$ASM_DISK;

NAME PATH REDUNDA TOTAL_MB OS_MB FREE_MB
---------- -------------------- ------- ---------- ---------- ----------
ORCL:VOL4 UNKNOWN 0 2047 0
VOL1 ORCL:VOL1 UNKNOWN 2039 2039 215
VOL2 ORCL:VOL2 UNKNOWN 705 705 606
VOL3 ORCL:VOL3 UNKNOWN 1349 1349 1214

SQL> ALTER DISKGROUP DATA ADD DISK 'ORCL:VOL4';

SQL> select substr(name,1,10) name,substr(path,1,20) path, REDUNDANCY, TOTAL_MB, os_mb, free_mb from V$ASM_DISK where GROUP_NUMBER = 2;

NAME PATH REDUNDA TOTAL_MB OS_MB FREE_MB
---------- -------------------- ------- ---------- ---------- ----------
VOL1 ORCL:VOL1 UNKNOWN 2039 2039 368
VOL4 ORCL:VOL4 UNKNOWN 2047 2047 1892

All done. Disk group DATA now has an extra 2G.