PowerCenter High Availability & Grid

PowerCenter High Availability & Grid Presentation Subtitle Presented by Balaji Rajagopalan and Vivek Kulshreshtha May 2016 Agenda • High Availabili...
Author: Mercy Stanley
2 downloads 2 Views 252KB Size
PowerCenter High Availability & Grid Presentation Subtitle Presented by Balaji Rajagopalan and Vivek Kulshreshtha May 2016

Agenda •

High Availability



• Node Process • Administrator Console • Repository Service • Integration Service • Data Integration Service • Model Repository Service Recovery • • • •

Repository Service Integration Service Sessions Data Integration Service #INFA16

Continued…

• GRID • Integration Service • Data Integration Service

#INFA16

PowerCenter High Availability Overview • Informatica PowerCenter High Availability Option provides high availability of all PowerCenter components, seamless failover and recovery of stopped or interrupted work, and simplified set up and management through a web-based administration console.

• PowerCenter HA relies on the underlying IT infrastructure to achieve end-to-end HA. i.e. highly available database, file system, network and hardware servers.

• High availability is not a solution for disaster recovery. You can use high availability features to implement a disaster recovery solution.

#INFA16

PowerCenter High Availability Benefits • Resilience. Highly available systems can tolerate temporary connection failures until a timeout period expires or the failure is resolved. The system tries to reconnect for a specified period of time. If the failure is resolved, there is no interruption in end user activity.

• Restart and failover. In highly available systems, when a machine becomes unavailable, processes running on the machine can be restarted on the same machine or on a backup machine. By allowing processes to restart on the same machine or fail over to another machine, the system minimizes or eliminates the downtime due to the failure and maximizes the system operational time.

#INFA16

Benefits Continued … • Recovery. In highly available systems, an interrupted service can complete its operations after it is restarted. A service may be stateful—that is, it records its state of operation in a shared location periodically. When a failure occurs, the system must retrieve the state of the affected service so that it can automatically restart or recover jobs that have terminated abnormally.

#INFA16

PowerCenter HA Capabilities • Service failover from primary to backup services and nodes. • Automatically ensures service availability on primary or backup servers should the primary server fail.

• Session and workflow recovery from checkpoints. • Automatic or configurable recovery for sessions affected by service failures.

• Resiliency to network and external failures. • Automatic re-connect within resilience timeout constraints handles transient network errors and connection failures.

#INFA16

Capabilities Continued …

• Centralized web based configuration and administration • Ability to create, manage and monitor a high availability

configuration of all PowerCenter services through a webbased administration console.

• Enable Administrators to visually identify single point of failures within the Informatica environment.

#INFA16

Achieving PowerCenter HA

• Core Services Availability • At least 2 nodes configured w/ core services for fail-over • Application Services Availability • At least 2 nodes configured as primary and backup for services

• Informatica Services (Tomcat – Service Manager) • Configure service to restart automatically if it terminates unexpectedly

#INFA16

Achieving HA Continued…

• External Systems Availability • For Repository, Source/Target/Lookup database to be highly available, use highly available versions of databases (e.g. Oracle RAC, IBM DB2)

• Use highly available FTP servers and Message Queues. • Configure network to be highly available • Need shared directory for config, log files, storage (stores state for session and workflow recovery).

• Shared directory can be on HA file system (e.g. VERITAS

Cluster File System, IBM GPFS) to remove point of failure #INFA16

Achieving HA Continued… • Before 9.6, there is hard dependency on cluster file system and we certify HA against a handful of them. This is for failover and recovery aspects where it is required to avoid split-brain scenarios.

• From 9.6, it is being augmented to support database-based persistence in addition to the current Cluster File System (CFS) based persistence.

• By default, it uses the Repository database from the Repository Service associated with the Integration Service

• Windows HA is supported from 9.6. This was unsupported so far because of lack of certified CFS for Windows. #INFA16

Achieving HA Continued…

#INFA16

Achieving HA Continued… • Key underlying components to achieve a PowerCenter High Availability solution are: • Highly Available Database

• Redundant Network • Highly Available Clustered file system (Optional starting from 9.6) Domain Integration #1 Network (Redundant Physical Elements)

General Recommendations: • Network 1 GB >

Repository

Cluster File System Manager

HA Database

I/O Fencing Solution

Informatica Repositories Domain Project1 Project2 Global Project3 Project4

Cluster Volume Manager SAN (Redundant Physical Elements)

• HA CFS w/ Heartbeat and Failover • Redundant Network

Integration #1

Repository

Coordinator Data LUN 1 Coordinator Data LUN 2 Coordinator Data LUN n

Actual config depend on environment and SLA reqs.

Application Stack

Presentation Layer

Domain

#INFA16

DR with PowerCenter • Incorporating PowerCenter into

Disaster Recovery Solutions: • Primary Data Center should be configured with PowerCenter HA including underlying HA infrastructure.

• Backup Informatica Nodes & Services configured passive (cold standby) mode.

Master Gateway Service Primary Repository Service Backup Integration Service

Veritas Cluster File System /usr/informatica/infa_shared

Node 1

Node 2

Primary Data Center

SAN Repository DB

Standby Repository DB

Mirror SAN

Backup Data Center

Veritas Cluster File System /usr/informatica/infa_shared

• External Systems are actively

replicated across Data Centers by respective vendors.

Backup Gateway Service Backup Repository Service Primary Integration Service

Node 3 Master Gateway Service Primary Repository Service Backup Integration Service

Node 4 Backup Gateway Service Backup Repository Service Primary Integration Service

#INFA16

DR with PowerCenter • Backup Informatica Nodes &

Services become active only when Primary Data Center goes down and replication of required data has been completed. • Requires scripting / integration with 3rd party operation management tools.

Master Gateway Service Primary Repository Service Backup Integration Service

Backup Gateway Service Backup Repository Service Primary Integration Service

Veritas Cluster File System /usr/informatica/infa_shared

Node 1

Node 2 SAN Repository DB

Standby Repository DB

Mirror SAN

Primary Data Center Backup Data Center

Veritas Cluster File System /usr/informatica/infa_shared

Node 3 Master Gateway Service Primary Repository Service Backup Integration Service

Node 4 Backup Gateway Service Backup Repository Service Primary Integration Service

#INFA16

High Availability Scenarios Detailed Walkthrough

#INFA16

Failure & HA Master Administrator Core Services Application Services

Gateway Gateway D o m a I n

Log Services Application Services

• Multiple

Gateway nodes and services configured for failover.

Domain DB

#INFA16

Failure & HA • Multiple Gateway Master Administrator Core Services Application Services

Master D o m a I n

Administrator Core Services Application Services



nodes and services configured for failover. Master Gateway node fails and the other gateway node becomes the master.

Domain DB

• Adminconsole, Coreservices failover. • Application Services failover if configured for backup. #INFA16

Service Failover… Primary Node

Application Services

Gateway Backup Node D o m a I n

Application Services No Process

• Services •

configured for Primary Backup. Process running on only one node and no Process running on Backup Node.

#INFA16

Service Failover… Primary Node

Application Services

• Services configured Gateway Backup Node

D o m a I n

Application Services



for Primary Backup. Process running on only one node and no Process running on Backup Node.

• Service fails on Primary Node and the Domain tries to restart the •

process on the same node based on the Maximum Restart Attempts configured at the domain. If service does not come up on the primary node the process is started on Backup Node. #INFA16

Grid & Failover… • Integration Service Node 1

Integration Service

Gateway Node 2 G R I D

Integration Service

on Grid run in ActiveActive mode.

• Process run on all

the nodes configured on the Grid.

#INFA16

Grid & Failover… • When the Non-

Master Fails, the node process G R Integration Service Integration Service restarts the process I D on the node depending on the Max Restart Attempt. If the process does not start, it remains failed and had to be manually restarted. Jobs will not be dispatched to the failed node. Node 1 Master



Gateway Node 2

#INFA16

Grid & Failover… • When the Master Node 1 Master Integration Service

G R I D

Gateway Node 2 Master Integration Service

Fails, the node process restarts the process on the node depending on the Max Restart Attempt.

• If the process does not start, it remains failed and had to be manually •

restarted. The other node in the Grid becomes the master IS and any new workflow request will be handled by the new master.

#INFA16

HA for MRS/DIS (New in 9.6) • High Availability: Resiliency & Failover Platform Capability • Failover (Primary/ Backup) support for MRS and DIS • Resiliency between services to temporary network glitches in support of failover use case

• What does this mean for IDS? • JDBC/ODBC Client will automatically reconnect in case of DIS/ MRS failover - Connection will not get disconnected

• SQL Queries that are currently in progress will fail and user has to re-issue the query

• Not supported for Web Services – Requests will fail in case of a failover #INFA16

DIS Grid…

• DIS can be configured to run on grid from 9.5 • Similar to Integration Service Grid. Has one Master DIS. • Master DIS has write access to MRS and non-Master DIS has only read access.

• Request can reach any of the node in the GRID depending on the type of request.

• Connection based and Round-Robin are the two load

balancers available. (Webservice or SQL DS use connection based Load Balancing.)

• Failover works similar to Integration Service. #INFA16

Recovery

#INFA16

Recovery…

• Repository Service. • After Failover, the repository service would restore the

connection state of each client connection it had during the failover.

• Recovery is enabled only with High Availability option. • Without HA, the Service failover would be there as

defined by the Max Restart Attempts or Restart Period but there won’t be any recovery of old connections or requests. #INFA16

Recovery… • Integration Service. • After Failover, the Integration Service reconnects to the Repository service. • Enable HA Recovery has to be set at the Workflow level to recover the existing workflows that were running during the failover.

• If HA Recovery is not enabled then workflow would remain terminated. • Scheduled workflows would be rescheduled automatically. • If we need the Sessions to be recovered automatically, we would need to set Enable HA Recovery for the workflow and choose the Recovery Strategy at the Session level.

#INFA16

Recovery Continued… • $PMStorageDir is used to store the intermediate state information of workflow and session and hence this directory should be common across the nodes in the gird.

• Incorrect $PMStorageDir across the grid would result in unexpected behavior.

• Recovery on DIS • Automatic recovery for workflows marked for auto recovery on restart of DIS – if they got terminated because DIS had crashed

• Restarts workflow from the beginning of the task that was previously executing and continues with the rest of the workflow #INFA16

Questions?? Informatica Global Customer Support

#INFA16

User Groups Informatica User Groups are a great way for • you to invest in your professional development and learn about new Informatica offerings.

LEARN MORE AT IW16 : Go to the Solutions Expo Informatica Pavilion / Ecosystem & Innovation Area:

• Local Chapter Leaders manage each IUG

• • •

online and via in person meetings

• • •

Network and Socialize



Discover how colleagues and peers use Informatica



https://network.informatica.com/welcome/

Find and share content, best practices & tips Learn about the latest technologies and solutions from Informatica

Talk to regional user group leaders Learn about meeting plans Join your regional user group

• When: • • •

Monday 6:00pm – 8:30pm Tuesday 10:45am – 2:15pm Wednesday 10:30am – 1:45pm

• Where: •

Moscone West Hall Level One

#INFA16