Sunday, September 1, 2013

Building a 5 nodes Oracle EBS using RAC and shared appl top 3.

Building the RAC infrastructure

This step is an independent step from half of the other step. I you think you could do it at later for example you could made a 1+3 node Oracle EBS system (1 node database and 3 node apps tier) with configured load balancer and SSL.

But now I am starting with this.
The main steps will be:
  1. install operating system on the 2 database node,
  2. create and configure all necessary storage partitions and mount points,
  3. check the installed operating system and configure it for Oracle RAC and EBS database,
  4. create unix users and groups,
  5. create stage area,
  6. create asm disk groups,
  7. install Oracle Clusterware 11g Release 2,
  8. install latest PSU patches



Here are 2 sample table which introduce the main configuration attributes of the 2 database node:

Attributes Value/Text description
Server name db01.company.local
IP addresses:
public 10.10.1.10
vip 10.10.1.12
interconnect 192.168.1.1
scan 10.10.1.15, 10.10.1.16
Operating system RedHat Enterprise Linux 5.9 (Tikanga) 64 bit
Memory 12GB
Storage capacity boot - 
swap – 12GB
/ - 50GB
/u01 – 50GB
/u02 - 150GB
/u04 - 150GB
partition for ASM disk groups - 300 GB


Attributes Value/Text description
Server name db02.company.local
IP addresses:
public 10.10.1.11
vip 10.10.1.13
interconnect 192.168.1.2
scan 10.10.1.15, 10.10.1.16
Operating system RedHat Enterprise Linux 5.9 (Tikanga) 64 bit
Memory 12GB
Storage capacity boot - 
swap – 12GB
/ - 50GB
/u01 – 50GB
/u02 - 150GB
partition for ASM disk groups - 300 GB

A short explanation of the above attributes.

I planned for IP address for each node

  • 1 IP for standard public access of the server
  • 1 IP for the old Virtual IP address
  • 1 IP for interconnect communication - as you see this is in a totaly indepedent subnetwork from the any other one
  • 2 IP for scan listener (feature of 11g cluster) - should have DNS name for it too! For example use round robin DNS resolving. I will use "db.company.local" now.

I assigned 7 partition for each node, let me explain them through mount points:
  • boot - regular linux partition for booting
  • swap - regular linux swap partition - size depends on memory
  • / - standard operating system files - linux administrators like to create other ones, it's not problem, let them create if they want.
  • /u01 - first partition that I recommend to put it on storage. This partition is not shared. Could be a standard linux file system (for example ext4 or other Oracle certified file system)
  • /u02 - shared files system, which will be used by all nodes. At least for APPLCSF, but useful for any other shared standard or non standard Oracle EBS directories. (for example DIRECTORY or LIBRARY objects)
  • /u04 - this is a temporary partition. I will use this partition when I will create the intermediate Oracle EBS 1 node database which will be the source of the RAC database. Currently this intermediate database should be create, only the long-awaited 12.2 EBS system promise the possibility to install Oracle EBS into a new database during the rapid installation.
So begin the installation :)

Install operating system on the 2 database node

I beg you pardon, but I won't write down here the installation of a RHEL 5.9 operating system . Please use the Oracle EBS installation guide and the 12.1.3 release notes (761566.1, 1080973.1, 1066312.1) and the 11g cluster and database installation guides to collect input data for linux administrator.

In that case if you have to work together with linux administrator give them these documentation for a correct initial installation (one could then install all necessary rpm packages, correct kernel configuration, network configuration and so on). Please help her/him for setting up the correct kernel attributes and so on.

Do not forget to install the latest oracleasm rpm packages, but don't configure them yet!

Create and configure all necessary storage partitions and mount points

Work together with the storage administrators to create all necessary storage partitions. It could helpful if you give them a diagram like in my previous post, goals of the partitions, size of the partitions.

After creation of the partitions mount them in the database nodes, fix them in the fstab.

Check the installed operating system and configure it for Oracle RAC and EBS database

Allways make this step. Never believe that the system is configured well, you should check it! In that case too when your are the one who have installed the operating system. It takes not more than 15 minutes, but could save hours of unwanted debug time.

Use the 761655.1 EBS release note, the rapid install guide and the cluster installation prerequirements.
Don't forget! If you have an other type of certified OS, you should use an other EBS release note of course.

Made the check steps on both server!!

Check list

Login with unix user with root priviliges.
1. Check LSB and Disrtibution information
# lsb_release –id

2. Check memory parameters
# grep MemTotal /proc/meminfo 
# grep SwapTotal /proc/meminfo

3. Check all required rpm packages
Use these commands for check some critical command availability:
# which ar
# which gcc
# which g++
# which ksh
# which ld
# which linux32
# which make
# which vncserver

Use this rpm query command for all packages in the release note
# rpm -qa --qf "%{n}-%{v}-%{r}-% {arch}\n" | grep <rpm_short_name>
Be aware that you have to have all rpms with the correct arch version at least. 
If the documentation require i386 version, then it is not enough whether the 64 bit version already there, you should have the i386 version too!

The release notes and installation guide documentations usually does not require X server, but it is hardly recommend that you have to have somewhere in the server park. During this post series I will use VNC on all nodes.

4. check the kernel parameters
Check the content of the /etc/sysctl.conf file, but you could use this command too:
# /sbin/sysctl -a | grep <pattern>

Or you could use commands like this one:
# cat /proc/sys/net/ipv4/ip_local_port_range

The values of the parameters are hardly depends on your system requirements, this sysctl.conf file's content is usually good enough for a large system too
kernel.msgmnb = 65536
kernel.msgmni = 2878
kernel.msgmax = 65536
kernel.shmmax = 67576680448
kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.sem = 250 32000 100 142
fs.file-max = 6815744
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 1048576
fs.aio-max-nr = 1048576 

5. check content of the limits.conf
This is a good sample content what you should have in /etc/security/limits.conf
* hard nofile 65536
* soft nofile 4096
* hard nproc 16384
* soft nproc 2047

oracle hard nofile 65536
oracle hard stack 32768
grid hard nofile 65536
grid hard stack 32768

Of course if you will create other unix user than the standard one you should put them into the limits.conf file too. I will use oracle and grid user in this post series

6. check the hosts file content
In my example this 2 file content is good for the 2 node.
For db01 server:
10.10.1.10 db01.company.local db01
10.10.1.11 db02.company.local db02
10.10.1.12 db01vip.company.local db01vip
10.10.1.13 db02vip.company.local db02vip

192.168.1.1            dbic1
192.168.1.2            dbic2

#::1 localhost6.localdomain6 localhost6
127.0.0.1 localhost.localdomain localhost

For db02 server:
10.10.1.11 db02.company.local db02
10.10.1.10 db01.company.local db01
10.10.1.12 db01vip.company.local db01vip
10.10.1.13 db02vip.company.local db02vip

192.168.1.1            dbic1
192.168.1.2            dbic2

#::1 localhost6.localdomain6 localhost6
127.0.0.1 localhost.localdomain localhost

You could see the db02 hosts file content differ only in the first 2 rows. Of course the first row allways should be the same as the server name himself. (except if you use alias)

Don't forget to comment out the localhost6 row. I have many problems when IPv6 installed and configured on linux where we have to install an Oracle RAC systems, so I will not use it now too.

7. check host name with hostname command.
The command result should be the host + domain name, so in my example on the first node: db01.company.local and on the second node: db02.company.local.

8. check the network file content.
The /etc/sysconfig/network file should be like this:
# less /etc/sysconfig/network

NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=db01.company.local
GATEWAY=10.10.1.1

9. check whether IPv6 enabled or not. If yes then disable it.
# chkconfig --list |grep ip6

If ip6tables turned on at any level, then turn off it
# /etc/init.d/ip6tables stop
# chkconfig ip6tables off

Modify modprobe configuration
# vi /etc/modprobe.conf
alias net-pf-10 off
options ipv6 disable=1

10. check Time Server / Network Time Protocol settings. The time server should be somewhere within the network, so I could only mention some example command:
# ps -ef |grep ntpd
# less /etc/ntp.conf
# less /etc/sysconfig/ntpd

11. check 6078836 Oracle linux patch installed or not (on an Oracle EL you should not do it)
NEVER begin an installation without this patch!
Check /usr/lib/libdb.so.2 whether is the same as in the 6078836 patch. If not install it.

12. check the /usr/lib/libXtst.so.6 link, it should link to the /usr/X11R6/lib/libXtst.so.6.1 file
If not use these commands
# unlink /usr/lib/libXtst.so.6
# ln -s /usr/X11R6/lib/libXtst.so.6.1 /usr/lib/libXtst.so.6

13. check mount points. Check that you have got all mount points or not. Check the /etc/fstab file content too, just to be sure it contains all mount point. If not repair it!

14. check network adapters with ifconfig command.

15. check oracleasm command is available under /etc/init.d directory

16. turn off ssh timeout
# vi /etc/ssh/sshd_config

LoginGraceTime 0

Don't forget to made the above check steps on the second node too!!!

If you have made any changes I recommend to reboot both server now.

Create unix users and groups

I will create oracle, grid users and all Oracle recommended unix groups.
First check if they are already created:
# less /etc/group
# less /etc/passwd

If not create groups then create users
# groupadd -g 1000 oinstall
# groupadd -g 1001 dba 
# groupadd -g 1002 oper
# groupadd -g 1020 asmadmin
# groupadd -g 1021 asmdba
# groupadd -g 1022 asmoper

# useradd -u 1100 -g oinstall -G asmadmin,asmdba grid
# useradd -u 1001 -g oinstall -G dba,asmdba,asmadmin,oper oracle

Use the passwd command to add them passwords what you want. Should be the same password on both node per user.
# passwd grid
# passwd oracle

Of course group id, user id depends on local rules or on your practice.

Create stage area

Now create the stage area for installation, I will use a common stage area, so use the /u02 mount points.

Create the "body":
$ mkdir /u02/stage
$ mkdir /u02/stage/zipped
$ mkdir /u02/stage/unzipped
$ mkdir /u02/stage/iso


  • zipped directory will contains all zip files
  • unzipped directory will contains all unzipped files
  • iso directory could contain any necessary iso files, for example quite good to store the the linux installation iso file.
The above structure could guarantee some regulation, so zip and unzipped files will not make a bad mixture.

Create a db directory under zipped and unzipped directory.
Copy the 11.2.0.3 zip files into the /u02/stage/zipped/db directory. (p10404530_112030_Linux-x86-64_1of7.zip … 7of7.zip) Unzip the 1., 2., 3. and 6. zip files into the /u02/stage/unzipped/db.

Check the unzip output log, there could be a CRC error. In this case recopy the bad zip file and unzip it again.

Now you could check the cvuqdisk rpm package:
# rpm -qi cvuqdisk

The cvuqdisk-1.0.9-1.rpm is a necessary package. Install it:
# rpm –ivh /u02/stage/unzipped/db/database/rpm/cvuqdisk-1.0.9-1.rpm

Create ASM disk groups

Check the device configuration

If you don't use multipath then it could be a much easier step. But now I suppose that my linux system configured to use mulitpath. Consult with the linux administrator!

Made the below steps on the first node (db01)

Check all devices:
# fdisk –l

Then make a small table for yourself like this one. It is quite usefull at later when you want to check the devices or when you want to add a new node to this RAC system.
DM device name Size Goal
/dev/dm-0 1 GB RAC_DG ASM partition
/dev/dm-1 200 GB DATA_DG ASM partition
/dev/dm-2 100 GB ARCH_DG ASM partition

Why I use 3 partition? The first one will used for storing only the cluster data, the second one will used for storing data files and the third one will used for storing archive log files.

Make a primary partition on each device, use all cylinder on each. (use fdisk or parted)

Reboot the first node (db01).

After reboot check the multipath configurations, I supposed the below configuration:
# ll /dev/mapper/mpath*

An example for the output of this command:
brw-rw---- 1 root disk 253, 0 Aug 10 11:10 /dev/mapper/mpath5
brw-rw---- 1 root disk 253, 5 Aug 10 11:10 /dev/mapper/mpath5p1
brw-rw---- 1 root disk 253, 1 Aug 10 11:10 /dev/mapper/mpath6
brw-rw---- 1 root disk 253, 3 Aug 10 11:10 /dev/mapper/mpath6p1
brw-rw---- 1 root disk 253, 2 Aug 10 11:10 /dev/mapper/mpath7
brw-rw---- 1 root disk 253, 4 Aug 10 11:10 /dev/mapper/mpath7p1

mpath numbers depends on your storage and linux configuration. Any number is good for us.

Use the below command to see which mpath belong to which dm device
# multipath –l

An example for the output of this command:
mpath7 (3600143801259c5750000400003130000) dm-2 HP,HSV360
[size=100G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:0:4 sdc        8:32  [active][undef]
 \_ 2:0:0:4 sdi        8:128 [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:1:4 sdf        8:80  [active][undef]
 \_ 2:0:1:4 sdl        8:176 [active][undef]
mpath6 (3600143801259c5750000400003210000) dm-1 HP,HSV360
[size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:0:2 sdb        8:16  [active][undef]
 \_ 2:0:0:2 sdh        8:112 [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:1:2 sde        8:64  [active][undef]
 \_ 2:0:1:2 sdk        8:160 [active][undef]
mpath5 (3600143801259c57500004000031b0000) dm-0 HP,HSV360
[size=10G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:1:1 sdd        8:48  [active][undef]
 \_ 2:0:1:1 sdj        8:144 [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:1 sda        8:0   [active][undef]
 \_ 2:0:0:1 sdg        8:96  [active][undef]

Now I could pair of each dm and mpath devices so I could integrate these data into the table.
DM device name Size Goal Multipath device
/dev/dm-0 1 GB RAC_DG ASM partition /dev/mapper/mpath5p1
/dev/dm-1 200 GB DATA_DG ASM partition /dev/mapper/mpath6p1
/dev/dm-2 100 GB ARCH_DG ASM partition /dev/mapper/mpath7p1


Now reboot the second node (db02)

After rebooting check the multipath data on the second node with the same commands:
# ll /dev/mapper/mpath*
and
# multipath –l

Both commands should give back configuration data as on node one. If not please consult with the linux administrators and storage administrators. They should be the same!

Go back to the first node and now we could configure the oracleasm and the ASM diskgroups.

Configure ASM

On the first node configure ASM service.
# /etc/init.d/oracleasm configure
Default user: grid
Default group: asmadmin
Start Oracle ASM library driver on boot: Y
Scan for Oracle ASM disks on boot: Y

# /etc/init.d/oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes

Check whether there are already any ASM disk groups.
# /etc/init.d/oracleasm listdisks

If the command show any ASM disk group and you want to delete it then use these commands.
# /etc/init.d/oracleasm stop
# /etc/init.d/oracleasm deletedisk /dev/mapper/<mpath partition name>
For example
# /etc/init.d/oracleasm deletedisk /dev/mapper/<mpath1p2>
Then
# /etc/init.d/oracleasm start
# /etc/init.d/oracleasm listdisks

Create disk groups

Create the 3 ASM disk group with createdisk commands
# /etc/init.d/oracleasm createdisk RAC01 /dev/mapper/mpath5p1
# /etc/init.d/oracleasm createdisk DATA01 /dev/mapper/mpath6p1
# /etc/init.d/oracleasm createdisk ARCH01 /dev/mapper/mpath7p1

Check again, you should see the 3 disk group name now
# /etc/init.d/oracleasm listdisks

Now go the second node (db02) and configure oracleasm service.
# /etc/init.d/oracleasm configure
Default user: grid
Default group: asmadmin
Start Oracle ASM library driver on boot: Y
Scan for Oracle ASM disks on boot: Y

# /etc/init.d/oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes

# /etc/init.d/oracleasm listdisks

You should see the 3 disk group name now on the second node too.

Scan order reconfiguration

If you don't use multipath the default ASM scan order could be usuable, in my case I had to change it.
Edit the oracleasm configuration file on both node.
# vi /etc/sysconfig/oracleasm
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"

Rescan ASM disks
# /etc/init.d/oracleasm scandisks

Install Oracle Clusterware 11g Release 2

Make the Oracle directory structure. On both node create these directories:
# mkdir -p  /u01/app/11.2.0/grid
# chown -R grid:oinstall /u01/app/11.2.0/
# chmod -R 775 /u01/app/11.2.0/grid

# mkdir -p  /u01/app/oracle
# chown -R oracle:oinstall /u01/app/oracle
# chmod -R 775 /u01/app/oracle

# chown root:oinstall /u01
# chmod 775 /u01
# chown root:oinstall /u01/app
# chmod 775 /u01/app
# chmod 775 /u01/app/11.2.0

Check on both node that you could login through ssh with oracle and grid user.
If not then check ssh configuration (for example /etc/ssh/sshd_config file) for is there any configuration that prevent to login with oracle and grid user.

Do not continue until ssh login not working well!

Login with oracle and grid user and check with touch command that both of them could write files in their new directories.

Now start the Cluster installation. As I wrote down earlier I will use VNC for installation. So I start a vnc server with root user. (I will use it only until the end of the full installation)

# vncserver :1
Give a hard password if it ask for.

Connect with a vnc client to db01:1

In the default xterm window, (it will be a root session!)
# xhosts +
# su – grid

$ cd /u02/stage/unzipped/db/grid
$ ./runInstaller

Go through wizard and use this example guide
Skip software updates
Install and Configure Oracle Grid Infrastructure …
Advanced Installation
Languages
English
<any language what you have to have>
Grid Plug and Play
Cluster name: db
Scanname: db.company.local
Scan port: 1521
                Uncheck: Configure GNS
Cluster Node Configuration
Specify cluster configuration
Edit
Public Hostname: db01.company.local
Virtual Hostname: db01vip.company.local
Add
Public Hostname: db02.company.local
Virtual Hostname: db02vip.company.local
SSH Connectivity
Os Username: grid 
Os Password: <grid user password>
Setup
Test

After this wizard step stop for running the Cluster Verification Utility in an other xterm window. (Do not close the installation wizard window!!!)
# rpm –qa |grep cvuqdisk
# su - grid
$ /u02/stage/unzipped/db/grid/runcluvfy.sh stage -pre crsinst -n db01,db02 –verbose > cluvfy_crsinst.log

If everything looks good in the CVU log file then continue the installation in the wizard window.

Network Interface Usage
interface name, subnet, interface type
bond0 10.10.1.0 Public
bond1 192.168.1.0 Private
Storage Option
Oracle Automatic Storage Management (Oracle ASM)
Create ASM Disk Group
Disk Group Name: RAC_DG 
Redundancy: External
AU Size: 1 MB
Candidate Disk: ORCL:RAC01
ASM Password
User same passwords for these accounts
password: <give what you want> (ignorálható a megjegyzés)
Failure Isolation
Do not use IPMI
Operating System Groups
Oracle ASM DBA: asmdba
Oracle ASM operator: asmoper
Oracle ASM Administrator: asmadmin
Specify Install Location
Oracle Base: /u01/app/grid
Software Location: /u01/app/product/11.2.0/grid
        Create Inventory
Inventory Directory: /u01/app/oraInventory
Perform Prerequisite Checks
Summary
Install

If you get a ASM. PRVF-5150 warning message at prerequisite check step then use 1210863.1 note and made the manual steps on both node:
# /etc/init.d/oracleasm listdisks
# /usr/sbin/oracleasm configure
# id grid
# ls -l /dev/oracleasm/disks/
# dd if=/dev/oracleasm/disks/RAC01 of=/dev/null bs=1024k count=1
after it ignore the warning and continue.

At the end run on both nodes as root user:
# /u01/app/oraInventory/orainstRoot.sh

Run root.sh on both node as root user. First on db01 then on db02.
# /u01/app/product/11.2.0/grid/root.sh

Carefully check the root.sh outputs. If you have any problems investigate them. Sometimes the problem disappear if you just re run the root.sh. :)

If you have ranned only at the first node before rerun the root.sh run this deconfig command:
# /u01/app/product/11.2.0/grid/perl/bin/perl -I/u01/app/product/11.2.0/grid/crs/install -I/u01/app/product/11.2.0/grid/perl/lib /u01/app/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -lastnode

If you have ranned the root.sh on both node run this commands:
First on second node
# /u01/app/product/11.2.0/grid/perl/bin/perl -I/u01/app/product/11.2.0/grid/crs/install -I/u01/app/product/11.2.0/grid/perl/lib /u01/app/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig –force

Then on the first (and last) node (db01)
# /u01/app/product/11.2.0/grid/perl/bin/perl -I/u01/app/product/11.2.0/grid/crs/install -I/u01/app/product/11.2.0/grid/perl/lib /u01/app/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -lastnode

Ignore any error or warning message during running deconfig command.

Add the remaing ASM disks

If root.sh finally ranned succesfully on both nodes and the installation wizard ended then add the remaining 2 ASM disk group.

On db01
# su – grid
$ . oraenv 
+ASM1
$ asmca 

A Java window will appaer. Use the function and the example data from below:
Disk Groups
Create
Data Group Name: DATA_DG
Redundancy: External
choose ORCL:DATA01 
Disk Groups
Create
Data Group Name: ARCH_DG
Redundancy: External
choose ORCL:ARCH01

Install the latest PSU patches

Install now the latest PSU patches. You could reach them on Oracle Support site. Never skip step!!!

No comments:

Post a Comment