PowerCenter High Availability & Grid Presentation Subtitle Presented by Balaji Rajagopalan and Vivek Kulshreshtha May 2016
Agenda •
High Availability
•
• Node Process • Administrator Console • Repository Service • Integration Service • Data Integration Service • Model Repository Service Recovery • • • •
Repository Service Integration Service Sessions Data Integration Service #INFA16
Continued…
• GRID • Integration Service • Data Integration Service
#INFA16
PowerCenter High Availability Overview • Informatica PowerCenter High Availability Option provides high availability of all PowerCenter components, seamless failover and recovery of stopped or interrupted work, and simplified set up and management through a web-based administration console.
• PowerCenter HA relies on the underlying IT infrastructure to achieve end-to-end HA. i.e. highly available database, file system, network and hardware servers.
• High availability is not a solution for disaster recovery. You can use high availability features to implement a disaster recovery solution.
#INFA16
PowerCenter High Availability Benefits • Resilience. Highly available systems can tolerate temporary connection failures until a timeout period expires or the failure is resolved. The system tries to reconnect for a specified period of time. If the failure is resolved, there is no interruption in end user activity.
• Restart and failover. In highly available systems, when a machine becomes unavailable, processes running on the machine can be restarted on the same machine or on a backup machine. By allowing processes to restart on the same machine or fail over to another machine, the system minimizes or eliminates the downtime due to the failure and maximizes the system operational time.
#INFA16
Benefits Continued … • Recovery. In highly available systems, an interrupted service can complete its operations after it is restarted. A service may be stateful—that is, it records its state of operation in a shared location periodically. When a failure occurs, the system must retrieve the state of the affected service so that it can automatically restart or recover jobs that have terminated abnormally.
#INFA16
PowerCenter HA Capabilities • Service failover from primary to backup services and nodes. • Automatically ensures service availability on primary or backup servers should the primary server fail.
• Session and workflow recovery from checkpoints. • Automatic or configurable recovery for sessions affected by service failures.
• Resiliency to network and external failures. • Automatic re-connect within resilience timeout constraints handles transient network errors and connection failures.
#INFA16
Capabilities Continued …
• Centralized web based configuration and administration • Ability to create, manage and monitor a high availability
configuration of all PowerCenter services through a webbased administration console.
• Enable Administrators to visually identify single point of failures within the Informatica environment.
#INFA16
Achieving PowerCenter HA
• Core Services Availability • At least 2 nodes configured w/ core services for fail-over • Application Services Availability • At least 2 nodes configured as primary and backup for services
• Informatica Services (Tomcat – Service Manager) • Configure service to restart automatically if it terminates unexpectedly
#INFA16
Achieving HA Continued…
• External Systems Availability • For Repository, Source/Target/Lookup database to be highly available, use highly available versions of databases (e.g. Oracle RAC, IBM DB2)
• Use highly available FTP servers and Message Queues. • Configure network to be highly available • Need shared directory for config, log files, storage (stores state for session and workflow recovery).
• Shared directory can be on HA file system (e.g. VERITAS
Cluster File System, IBM GPFS) to remove point of failure #INFA16
Achieving HA Continued… • Before 9.6, there is hard dependency on cluster file system and we certify HA against a handful of them. This is for failover and recovery aspects where it is required to avoid split-brain scenarios.
• From 9.6, it is being augmented to support database-based persistence in addition to the current Cluster File System (CFS) based persistence.
• By default, it uses the Repository database from the Repository Service associated with the Integration Service
• Windows HA is supported from 9.6. This was unsupported so far because of lack of certified CFS for Windows. #INFA16
Achieving HA Continued…
#INFA16
Achieving HA Continued… • Key underlying components to achieve a PowerCenter High Availability solution are: • Highly Available Database
• Redundant Network • Highly Available Clustered file system (Optional starting from 9.6) Domain Integration #1 Network (Redundant Physical Elements)
General Recommendations: • Network 1 GB >
Repository
Cluster File System Manager
HA Database
I/O Fencing Solution
Informatica Repositories Domain Project1 Project2 Global Project3 Project4
Cluster Volume Manager SAN (Redundant Physical Elements)
• HA CFS w/ Heartbeat and Failover • Redundant Network
Integration #1
Repository
Coordinator Data LUN 1 Coordinator Data LUN 2 Coordinator Data LUN n
Actual config depend on environment and SLA reqs.
Application Stack
Presentation Layer
Domain
#INFA16
DR with PowerCenter • Incorporating PowerCenter into
Disaster Recovery Solutions: • Primary Data Center should be configured with PowerCenter HA including underlying HA infrastructure.
• Backup Informatica Nodes & Services configured passive (cold standby) mode.
Master Gateway Service Primary Repository Service Backup Integration Service
Veritas Cluster File System /usr/informatica/infa_shared
Node 1
Node 2
Primary Data Center
SAN Repository DB
Standby Repository DB
Mirror SAN
Backup Data Center
Veritas Cluster File System /usr/informatica/infa_shared
• External Systems are actively
replicated across Data Centers by respective vendors.
Backup Gateway Service Backup Repository Service Primary Integration Service
Node 3 Master Gateway Service Primary Repository Service Backup Integration Service
Node 4 Backup Gateway Service Backup Repository Service Primary Integration Service
#INFA16
DR with PowerCenter • Backup Informatica Nodes &
Services become active only when Primary Data Center goes down and replication of required data has been completed. • Requires scripting / integration with 3rd party operation management tools.
Master Gateway Service Primary Repository Service Backup Integration Service
Backup Gateway Service Backup Repository Service Primary Integration Service
Veritas Cluster File System /usr/informatica/infa_shared
Node 1
Node 2 SAN Repository DB
Standby Repository DB
Mirror SAN
Primary Data Center Backup Data Center
Veritas Cluster File System /usr/informatica/infa_shared
Node 3 Master Gateway Service Primary Repository Service Backup Integration Service
Node 4 Backup Gateway Service Backup Repository Service Primary Integration Service
#INFA16
High Availability Scenarios Detailed Walkthrough
#INFA16
Failure & HA Master Administrator Core Services Application Services
Gateway Gateway D o m a I n
Log Services Application Services
• Multiple
Gateway nodes and services configured for failover.
Domain DB
#INFA16
Failure & HA • Multiple Gateway Master Administrator Core Services Application Services
Master D o m a I n
Administrator Core Services Application Services
•
nodes and services configured for failover. Master Gateway node fails and the other gateway node becomes the master.
Domain DB
• Adminconsole, Coreservices failover. • Application Services failover if configured for backup. #INFA16
Service Failover… Primary Node
Application Services
Gateway Backup Node D o m a I n
Application Services No Process
• Services •
configured for Primary Backup. Process running on only one node and no Process running on Backup Node.
#INFA16
Service Failover… Primary Node
Application Services
• Services configured Gateway Backup Node
D o m a I n
Application Services
•
for Primary Backup. Process running on only one node and no Process running on Backup Node.
• Service fails on Primary Node and the Domain tries to restart the •
process on the same node based on the Maximum Restart Attempts configured at the domain. If service does not come up on the primary node the process is started on Backup Node. #INFA16
Grid & Failover… • Integration Service Node 1
Integration Service
Gateway Node 2 G R I D
Integration Service
on Grid run in ActiveActive mode.
• Process run on all
the nodes configured on the Grid.
#INFA16
Grid & Failover… • When the Non-
Master Fails, the node process G R Integration Service Integration Service restarts the process I D on the node depending on the Max Restart Attempt. If the process does not start, it remains failed and had to be manually restarted. Jobs will not be dispatched to the failed node. Node 1 Master
•
Gateway Node 2
#INFA16
Grid & Failover… • When the Master Node 1 Master Integration Service
G R I D
Gateway Node 2 Master Integration Service
Fails, the node process restarts the process on the node depending on the Max Restart Attempt.
• If the process does not start, it remains failed and had to be manually •
restarted. The other node in the Grid becomes the master IS and any new workflow request will be handled by the new master.
#INFA16
HA for MRS/DIS (New in 9.6) • High Availability: Resiliency & Failover Platform Capability • Failover (Primary/ Backup) support for MRS and DIS • Resiliency between services to temporary network glitches in support of failover use case
• What does this mean for IDS? • JDBC/ODBC Client will automatically reconnect in case of DIS/ MRS failover - Connection will not get disconnected
• SQL Queries that are currently in progress will fail and user has to re-issue the query
• Not supported for Web Services – Requests will fail in case of a failover #INFA16
DIS Grid…
• DIS can be configured to run on grid from 9.5 • Similar to Integration Service Grid. Has one Master DIS. • Master DIS has write access to MRS and non-Master DIS has only read access.
• Request can reach any of the node in the GRID depending on the type of request.
• Connection based and Round-Robin are the two load
balancers available. (Webservice or SQL DS use connection based Load Balancing.)
• Failover works similar to Integration Service. #INFA16
Recovery
#INFA16
Recovery…
• Repository Service. • After Failover, the repository service would restore the
connection state of each client connection it had during the failover.
• Recovery is enabled only with High Availability option. • Without HA, the Service failover would be there as
defined by the Max Restart Attempts or Restart Period but there won’t be any recovery of old connections or requests. #INFA16
Recovery… • Integration Service. • After Failover, the Integration Service reconnects to the Repository service. • Enable HA Recovery has to be set at the Workflow level to recover the existing workflows that were running during the failover.
• If HA Recovery is not enabled then workflow would remain terminated. • Scheduled workflows would be rescheduled automatically. • If we need the Sessions to be recovered automatically, we would need to set Enable HA Recovery for the workflow and choose the Recovery Strategy at the Session level.
#INFA16
Recovery Continued… • $PMStorageDir is used to store the intermediate state information of workflow and session and hence this directory should be common across the nodes in the gird.
• Incorrect $PMStorageDir across the grid would result in unexpected behavior.
• Recovery on DIS • Automatic recovery for workflows marked for auto recovery on restart of DIS – if they got terminated because DIS had crashed
• Restarts workflow from the beginning of the task that was previously executing and continues with the rest of the workflow #INFA16
Questions?? Informatica Global Customer Support
#INFA16
User Groups Informatica User Groups are a great way for • you to invest in your professional development and learn about new Informatica offerings.
LEARN MORE AT IW16 : Go to the Solutions Expo Informatica Pavilion / Ecosystem & Innovation Area:
• Local Chapter Leaders manage each IUG
• • •
online and via in person meetings
• • •
Network and Socialize
•
Discover how colleagues and peers use Informatica
•
https://network.informatica.com/welcome/
Find and share content, best practices & tips Learn about the latest technologies and solutions from Informatica
Talk to regional user group leaders Learn about meeting plans Join your regional user group
• When: • • •
Monday 6:00pm – 8:30pm Tuesday 10:45am – 2:15pm Wednesday 10:30am – 1:45pm
• Where: •
Moscone West Hall Level One
#INFA16