What is RAC
Single Instance DB: DB 1to1 Instance
RAC DB: DB 1toM Instances
One instance on one Server
N instances = n Servers
DB = combined processing power and memory of n Servers : throughput
and scalability
High availability: prevents single point of failure in the given n number
servers
Instances can join and leave dynamically
Instances share common data files, control files
Each instance will have separate rollback segments, undo tablespace and
log files
Instances can simultaneously run transactions against database
Instances communicate with each other through cache fusion technology
Background Processes
LMSn (Lock Manager Services) - Global Cache Service Processes
Number of LMSn depends on messaging process among nodes for buffer
block movement.
Copy blocks (read consistent/write) from holding instance buffer cache to
requesting instances buffer cache
This process consumes a lot of CPU time
Minimum one LMS process per instance; in 10.2 and above there are 36
LMSn (LMS0-LMS9 LMSa-LMSz)
This LMS process can be specified by gcs_server_process in 10.1 and
above; default value is 1(single CPU system)
Implements local part of GRD
How set of each blocks are handled by LMSn background processes are
decided by a hash function and a block is always handled by one and same
LMSn process
The Number of blocks served by LMSn is available at v$CR_BLOCK_SERVER
Internal view is x$kjmsdp
Background Processes
LCKn – lock processes or enquee process is used for managing locks; act as lock requesting agents on behalf of an instance
Based on the parameter gc_lck_procs in 9.0.1 and below
Assists LMS processes
Manages instance resource requests
Manages cross instance call operations
By default only one lock process is started
Maximum 10 lock processes is possible
All instances should have same number of lock processes
These processes communicate with remote instance LMD process for locks
which are requested by user process on another instance and these processes (LCK) are from that instance only.
Based on the priority of request either lock requests are buffered at source instance and send simultaneously or directly been sent to other instance
Background Processes
LMDn – Lock Manager Domain – Global Enquee Service (GES)
Domain
One per Instance
Manages the requests for global enquees by updating the status
when granted or revoked from an instance
It performs global deadlock detection
Monitors lock conversion timeouts
Manages global enquees and global resource access
Handles remote enquee requests (resource requests from another
instance)
Internal view is x$KJMDDP
Background Processes
LMON – Lock Monitor – Global Enquee Service Monitor
One LMON Process per instance
Monitors cluster to maintain global enquees and resources
handles both instance and process abnormal expiration
Reconfiguration of locks and resources when instance joins or
leaves is also handled
Manages recovery process for cluster enquees
Every 10 minutes this process executes dynamic lock remastering
each LMON monitors its instances for dead lock detection, library
cache locks and dictionary cache locks
Background Processes
DIAG: Diagnosibility Process
Creates diagnostic data in the event of instance and process failure
in background_dump_dest location
Monitors health of instance
Also checks instance hang and deadlock
Understanding RAC: VIP
Introduced in Oracle 10.1
All applications connect using VIP, used by listener.
Each node will be defined with a VIP
VIP Failover between nodes: If node fails VIP is automatically reloaded to another
node i.e., check with ifconfig on new node to have VIP address: Thus multiple VIPs
may exist on a system during failover.
Only applies to newly connecting Sessions
VIP is configured thru VIPCA utility
Understanding RAC: Logs
Logs are created on $ORA_CRS_HOME/log/nodename directory
Logs : alert<node>.log : Oracle clusterware alert log
Client Log files of OCR such as OCRCHECK, OCRCONFIG, OCRDUMP, CSS,
CLSCFG,OIFCFG
Crsd/Crsd.log, cssd/cssd.log, evmd/evmd.log, racg/racg(log files of VIP and ONS)
Understanding RAC: heartbeats
CSS maintains Heartbeats
Heartbeats are of two types
Network
Runs across interconnect
Disk
Runs at voting disk
It has internal I/O timeout in Seconds
Can be specified using disk timeout parameter in 10.2.0.2
It is Maximum time allowed for voting file I/O to complete
If exceeds file will be marked as offline
The default is 200 Seconds
Crsctl is used to get and set this timeout
Crsctl get css disktimeout
Crsctl set css disktimeout <time in seconds>
Understanding RAC: Dynamic Performance Views
Each V$ view has a GV$ view
Each gv$ view has an additional column inst_id
PZnn (0-99) are used to return data to remote hosts
Catclust.sql is used to create the views
Understanding RAC: Segmented Arrays
Arrays of objects (segments): Objects too large to fit in Granule, Arrays may grow dynamically
Objects using segmented arrays are enquees (lock), enquee resources (resources), transactions and transaction branches
External view is x$ksmdd
RAC Segmented Arrays: GCS Resources, GES Enquees, GCS shadows
Components
Components: VD and OCR
VD: Voting disk : Member node information in stored into it
OCR: Oracle cluster Registry: cluster configuration information is stored into it
Both are Stored on shared storage
Interconnect is a network between nodes which nodes only will access
Interconnect can be redundant, can be on same network
Interconnect is recommended to use UDP on gigabit Ethernet, crossover cables are not supported
Oracle clusterware manages node membership and prevents split brain syndrome: where each node tries to manage the database due to communication failure in interconnect
Voting Disk
Voting disk needs to be on a shared disk
At least three voting disks recommended by oracle for high availability
If single voting disk is configured then external mirroring is required
A maximum of 32 voting disks are allowed
Cluster uses voting disk to determine which instances are members of cluster
OCR
This is used to store and manage the components which oracle clusterware controls such as database, instances, services, listener, VIPS and any other application. OCR repository stores this information in the form of key-values directory tree structure
It must reside on shared disk and accessible by all nodes
Recommended to multiplex OCR
Updatable thru DBCA, SRVCTL and OEM
Services: Node membership Service
CSS: Cluster Synchronization Service: Manages Cluster node membership and notifies node joins and exits to members.
Maintains Heartbeat and VD stores current status of each node
CSS provides node membership, group access, basic cluster locking
UNIX/Linux process of CSS: ocssd, ocssd.bin, init.cssd
Windows process of CSS: ocssd.exe and Service is OracleCSService
Run as administrator user
Also works with non-RAC systems
Can integrate with third party cluster products
Resource Mgt Service
CRS: Cluster Ready Service: crs process manages cluster resources like database, instance, service, listener, VIP, application process.
Crs manages (start, stop, monitor and failover) resource based on resource configuration information stored in OCR
Crs generates events when resource status changes
Crs restart components when a failure occurs. the CRS process makes
three attempts to start the Oracle Notification Service (ONS), one attempt to
start an Oracle database, and five attempts to restart other database
components. The CRS process does not attempt to restart the VIP.
Linux/unix process of crs: crsd.bin
Windows process of crs: crsd.exe and Service name is OracleCRSService
Runs as administrator user
Maintain state information in OCR (Oracle cluster repository)
It automatically get restarted
Event Mgt Service
EVM: Event management: publishes events created by CRS
Linux/unix process of evm: evmd, evmd.bin, evmlogger
Windows process of evm: evmlogger.exe, evmd.exe and service name is OracleEVMService
Runs as administrator user
Generates messages when events occur which can be accessed for programmatic purposes
Starts racgevt thread
FAN Event Services
RACG: clusterware support for complex resources, runs server
callout scripts when FAN Event occurs
Linux/unix process for RACG: racgmain, racgimon
Windows process for RACG: racgmain.exe, racgimon.exe
Runs as processes or threads not as service
Node application Services
Are standard set of application Services automatically launched by
RAC
VIP
Oracle Net listener
GSD
ONS
They run on each node and relocated to other nodes
Global Service Daemon
This daemon executes srvctl commands: it receives tasks such as startup and shutdown from srvctl and executes them locally on each node and results are sent back to srvctl
This daemon is by default installed on nodes and it should not be killed
GSD Launches internal Services
GCS: Global Cache Service: manages cache fusion and in memory data buffers
GES: Global Enqueue Service: Manages inter instance locking and RAC recover
GSD is externally controlled where as GCS and GES are OS threads and these threads manage GRD (Global Resource Directory: set of virtual tables in memory)
Listener
Listens incoming client requests, it is server side component of oracle net.
It Manages traffic by receiving client request for network session with Server and brokers the client request
Cache Fusion
A diskless cache coherency mechanism in Real Application Clusters that provides copies of blocks directly from a holding instance's memory cache to a requesting instance's memory cache.
Cache Fusion addresses several types of concurrency as described under the following headings:
- Concurrent Reads on Multiple Nodes
- Concurrent Reads and Writes on Different Nodes
- Concurrent Writes on Different Nodes
Concurrent Reads on Multiple Nodes
Concurrent reads on multiple nodes occur when two instances need to read the same data block. Real Application Clusters resolves this situation without synchronization because multiple instances can share data blocks for read access without cache coherency conflicts.
Concurrent Reads and Writes on Different Nodes
A read request from an instance for a block that was modified by another instance and not yet written to disk can be a request for either the current version of the block or for a read-consistent version. In either case, the Global Cache Service Processes (LMSn) transfer the block from the holding instance's cache to the requesting instance's cache over the interconnect.
Concurrent Writes on Different Nodes
Concurrent writes on different nodes occur when the same data block is modified frequently by different instances. In such cases, the holding instance completes its work on the data block after receiving a request for the block. The GCS then converts the resources on the block to be globally managed and the LMSn processes transfer a copy of the block to the cache of the requesting instance. The main features of this processing are:
- The Global Cache Service (GCS) tracks a each version of a data block, and each version is referred to as a past image (PI). In the event of a failure, Oracle can reconstruct the current version of a block by using the information in a PI.
- The cache-to-cache data transfer is done through the high speed IPC interconnect, thus eliminating disk I/O.
- Cache Fusion limits the number of context switches because of the reduced sequence of round trip messages. Reducing the number of context switches enables greater cache coherency protocol efficiency. The database writer (DBWn) processes are not involved in Cache Fusion block transfers.
Notification Services
ONS: Oracle Notification Service: Transmits FAN (Fast Application
Notification) Events. Introduced in 10.1 and is key component of FAN
JDBC depends on ONS to propagate DB events
Windows process for ONS: ons.exe
Linux/Unix process for ONS: ons
Onsctl command is used to start, stop, reconfigure and monitor ONS daemon
Racgons is used to configure server side ons configuration
Ons configuration is controlled by ons.config file of CRS_HOME/opmn/conf
folder
For every FAN event status change and all executables in ORA_CRS_HOME/racg/usrco are launched
It extends functionality of event manager daemon to push event messages to multiple targets (other RAC nodes)
Utilized to extend RAC high availability and load balancing to mid-tier applications
Global Cache Service
It is controlling process that implements cache fusion
Manages the status of data blocks and transfer of data blocks across buffer caches of all instances
It is integrated with Buffer cache Manager and does fast lookup of resource information in GRD
Maintains block mode for blocks in global role
It employs various background processes such as
LMSn: Global Cache Service processes
LMD: Global Enqueue Service Daemon
LMON: Global Enqueue Service Monitor
Is is background process that monitor entire cluster to manage global resources
Manages instance and process expiration
It does recovery for GCS
It also handles part of recovery associated with global resources
The services provided by LMON are called as cluster group services
Actual name is Lock Monitor Lock or Enqueue Service Monitor and it manages both local and global hence name is LMON so Global Enqueue Service Monitor
Enqueues are shared memory structures that serialize access to database resources. Enqueues are local to instance on single Instance database and Global to database on RAC.
GRD: Global Resource Directory
This is data structures which resides in memory, associated with Global resources and distributes across all instances/nodes in cluster
It is GCS and GES which maintain the GRD. These Services record information of resources and enqueues held globally
Each node manages its part of global resources in GRD
Process Monitor Daemon
OPROCD: Oracle process monitor Daemon: This process is locked in memory to monitor cluster and provide I/O Fencing. It performs its check and stops and wakes up in expected time. If it does not wake up in expected time then OPROCD resets the processor and reboots the node. If OPROCD is failed to reboot the node then clusterware will do it.
On Linux below 10.2.0.3 hangcheck timer module is used
OPROCD takes –t timeout(in milliseconds, default is 1000 milliseconds) and
–m margin(acceptable margin before rebooting default is 500 milliseconds) parameter
Parameters for changing the OPROCD values are in init.cssd of /etc/init.d folder and the parameters are OPROCD_DEFAULT_TIMEOUT=<N MILLISECOND>, OPROCD_DEFAULT_MARGIN=<N MILLI SECOND>
Linux (on 10.2.0.4)/unix process of OPROCD: oprocd
Windows Service for OPROCD: OraFenceService
No comments:
Post a Comment