Leveraging Public Clouds for Financial Services Workloads

Leveraging Public Clouds for Financial Services Workloads A Comprehensive Capital Adequacy and Review (CCAR) Case Study Presented by 1 Overview Fin...
Author: Sydney Lynch
8 downloads 0 Views 228KB Size
Leveraging Public Clouds for Financial Services Workloads A Comprehensive Capital Adequacy and Review (CCAR) Case Study Presented by

1

Overview Financial institutions are facing ever increasing, and uneven, demands for computationally-based analysis. With the growth of available data, advanced algorithms, competitive pressures, government regulations, and shrinking deadlines, the analysts and IT organizations within these institutions are struggling to find ways to meet these demands. Monte Carlo simulation, back testing of algorithms and governmentmandated analysis all share the challenge of acquiring computational resources for workloads that may be large, time-sensitive and difficult to forecast. This case study highlights one example of a financial workload and how it was moved to a public cloud. Specifically, we describe the CCAR regulatory analysis that motivated the project; we review the technical and organizational challenges associated with migration to the cloud; and we summarize the rewards of leveraging new approaches and using the cloud to resolve these problems. These challenges include security concerns, relationship to existing process, costs, technical experience, vendor choices and more. The rewards include delivering faster response to the business, improving overall operating efficiency, and driving improved business practices. Based on this initial success, several other time-sensitive workloads were migrated to a public cloud, thus enabling the organization be more responsive to customers and stakeholders.

What is CCAR and the Regulatory Framework? After the global financial crisis of 2007-2009, the Federal Reserve (Fed) developed a series of tests to assess the financial strength of key institutions. That is, the Federal Reserve requires selected banks and other financial institutions in the U.S. to perform annual stress tests known as Comprehensive Capital Adequacy and Review – CCAR. The CCAR analysis has several macroeconomic scenarios with a two-year time horizon. These include variations on employment rates, interest rate assumptions, stock market performance and more. The institution must revalue its assets and liabilities for each scenario and prepare pro forma financial statements (on a quarterly basis over the twoyear projection horizon) for each stress-test scenario. CCAR results inform capital management decisions, including dividend payments and share buybacks. Regulators have the authority to modify or veto capital actions if an institution’s CCAR results – or process – are unsatisfactory. Therefore, having confidence in the actual testing being performed, being able to reproduce results, and delivering results on time are critical business requirements.

2

Case Study Background A major financial institution learned that it was in scope for the CCAR regulations. This had a potentially large impact on the organization because these regulatory requirements would govern capital management actions, including share repurchases, dividend payments to shareholders, and liquidity requirements. A key part of the regulatory requirement would be the running of simulations of various market conditions One of the institution’s business lines is variable annuities, which are designed to provide retirement income for life for retail policyholders. Variable annuities (VAs) are complex, path-dependent financial instruments that are valued via Monte Carlo simulations. These simulations are resource intensive and are used as part of the regular business process. The addition of the CCAR stress testing would further strain internal resources. While this case study focuses on a specific CCAR example, it is important to note that other financial analyses (such as mortgage-backed securities (MBS) are broadly similar). VAs and MBS are both classes of financial products that are issued by prospectus; cash flows of both depend on purchasers’ behavior and/or the history of changes in interest rates and other market rates; and both are computationally demanding. Finally, VAs and MBS may represent tens of billions of dollars of financial exposure. The methodologies discussed in this paper can be applied to these and other examples.

Business Need: Acquire Additional Compute Resources Performing CCAR stress tests for variable annuities is a large-scale computational problem. The drivers of demand for compute power include the following: • • • •

Portfolio size: VAs represent tens of billions of dollars of balance sheet exposure. Portfolio complexity: VAs are long-dated, path-dependent financial instruments. Model complexity: Valuation and risk models are based on large-scale Monte Carlo simulation. Increasing model resolution increases run time. Compressed time lines: Shorter reporting deadlines require higher throughput.

From a computational resource perspective, the CCAR stress test results are submitted to the Federal Reserve annually. Compute demand spikes for a few weeks per year while the CCAR computations are being performed and reverts to baseline levels afterwards. Two alternatives for providing the additional compute power were identified: • •

Expand the internal resources; or Migrate the CCAR workload to a public cloud.

The advantages and disadvantages of each alternative are listed in Table 1 below.

3

Alternative Expand internal resources

Migrate CCAR workload to public cloud

Advantages Consistent with current practice (no changes to policies/procedures). Experience with previous data center expansions (low implementation risk). Capacity readily available via public cloud. Compute expense based on actual usage (operating expense).

Disadvantages / Issues Low utilization rates outside of CCAR window (peak demand about 5x average demand). Considerable initial capital expense. Extend corporate policies to cover cloud computing, including data security. Need to port application and data feeds to external environment. Lack of skill sets to design, build and operate cloud environment.

Table 1. Alternatives for providing compute capacity for CCAR.

Decision: Run on Public Cloud After thorough review, the organization decided to run the CCAR workloads for variable annuities in a public cloud environment. The key factors driving the decision were the cost of underutilizing an expanded internal environment versus the challenges of transferring the application to the cloud. The organization decided that the cloud challenges could be mitigated with planning, training and partners but the cost of acquiring additional servers and subsequent underutilization would be a permanent impact. As part of the process, Table 2, listed below, reviewed the key challenges with a cloud based approach and highlights the resolution for each issue.

4

Issue Extend corporate policies to cover cloud computing, including data security.

Resolution Use secure transfer for communication between the internal data center and the cloud. Use virtual private cloud to for the cloud environment. Use the cloud provider’s identity management tools to manage access rights. Leverage the cloud provider’s capabilities and certifications (e.g., ISO 27001, ISO 27017, ISO 27018).

Need to port application and data feeds to external environment.

The institution’s “cloud policy” was not finalized when the CCAR computations were performed on the cloud the first time. However, the cost to build and operate the cloud were modest, enabling the institution to “learn by doing.” Re-create the internal environment in the cloud; perform parallel runs to demonstrate consistency and correctness. Use Cycle Computing’s CycleCloud to manage data transfer between sites.

Lack of skill sets to design, build and operate cloud environment.

Cloud environment was configured to replicate internal environment. Engage Cycle Computing to design and optimize cloud workflow and environment. Outsource operations and maintenance to third-party managed-services provider.

Table 2: Issue Resolution

5

With the clear mandate on how to move forward, the team developed an overall architecture for leveraging a public cloud for this project. This included handling the data flow, the spin up and spin down of instances, controls on what could be done, by whom and for how long, security and more. The diagram below highlights the basic architecture as well as a number of key points including: • •

Secure connections are used to transfer all data between the internal data center and the cloud. All cloud components (storage, schedulers, execute nodes, etc.) are protected within a virtual private network.

6

Cloud environment design and implementation Cycle Computing was a key partner in moving CCAR and other production workloads to the cloud. Prior to the CCAR cloud initiative, the institution had used Cycle Computing’s software to manage distributed computing (for VA analytics) in its internal data centers. This increased the confidence of the overall team.

Design: Cycle Computing worked with the internal team to understand all the challenges and concerns within the team and the broader organization. These included the logistics of how the runs would actually work, how the data would be transferred, and how all of this could be performed with full security and compliance. Cycle Computing provided a reference architecture for performing cloud computing, including both (i) data transfer to/from the cloud, and computations and (ii) data management within the cloud. The reference architecture integrated the existing internal production environment with new cloud-based requirements and was the basis for the production design. Implementation: Cycle Computing implemented the architecture with the internal team with special focus on the transfer of data between the cloud and internal environments. Data integrity and security were key concerns of the organization and multiple tests and analyses were performed to insure the integrity of the solution. Test and validation: A key initial part of the validation process was the verification that results obtained with the cloud approach were consistent and correct. To verify this, the team performed parallel runs on both the internal and cloud environments and analyzed results to insure consistency. During this phase, Cycle Computing’s engineers also monitored performance and configurations of the software to ensure that expected throughput was obtained. Performance tuning: Once the environment was validated and the process approved by the various groups, Cycle Computing recommended configuration changes to accommodate differences between the internal and cloud environments, especially related to changes in CPU and IO throughput. This delivered additional performance and also assisted in managing cloud costs through the use of targeted instance types and other techniques.

7

The Outcome and What Happened Next Using the cloud fully satisfied all requirements for performing CCAR analytics for variable annuities. The project as planned (delivering the required yearly reports) exceeded expectations in terms of costs and performance. With the use of the CycleCloud tool suite and a focus on optimization, the application was able to scale larger than originally planned shortening runtimes and the choice of instance types enabled lower costs. Based on additional analysis and experience, the organization found 3 key drivers that they could leverage for success within the cloud. There are: •

Number of cores: The primary driver for increased throughput is the vast size of a cloud, versus internal capacity. For example, increasing the number of cores by 7x would cut runtime from one week to one day.



Configure hardware to match workload: A second driver for increased throughput is the flexibility of a cloud environment. In the cloud, hardware (amount of RAM, number of cores per processor, etc.) is specified by the user at the start of each run. The hardware profile evolves contemporaneously with changes in the workload. In contrast, hardware profile for an internal grid evolves slowly, typically with an annual budget cycle.



Hardware refresh cycle: A third driver for increased cloud productivity of the cloud is that major cloud providers refresh hardware (CPU, GPU, storage, etc.) in sync with OEMs. In contrast, internal hardware is generally upgraded on multiyear depreciation schedules.

These drivers combined with the success of the initial project lead the team to transfer other key production workloads to a public cloud. This came about because many financial batch workloads, including variable annuity analytics, are well-suited to cloud computing. Each policy (or instrument, or security) is independent of the others in the portfolio. Therefore, each policy is independent of all others. Thus, adding cores cuts the run time without adding complexity to the computation. This is true for a broad spectrum of financial-batch workloads.

8

Examples of additional workloads moved to public clouds Month End reports: As part of standard internal operating procedures, the organization runs month-end reports to analyze the portfolio positions for both reporting and risk management. Historically, these reports had taken weeks to run and therefore were limited in their usability helping to manage the ongoing business. By moving this workload to the cloud the delivery time was reduced from weeks to days making the information significantly more useful. Daily batch reports: The daily batch cycle was also ported to a public cloud. Harnessing the additional capacity of the cloud enabled portfolio managers to use a wider range of more sophisticated models, while meeting the stringent time requirements of the daily production process. Thus, the benefit of running this workload on the cloud is deeper understanding of the risk profile of the book without increasing the total compute expense.

Summary: Financial institutions are facing ever increasing, and uneven, demands for computationally-based analysis. With the growth of available data, advanced algorithms, competitive pressures, government regulations, and shrinking deadlines, the analysts and IT organizations within these institutions are struggling to find ways to meet these demands. Monte Carlo simulation, back testing of algorithms and governmentmandated analysis all share the challenge of acquiring computational resources for workloads that may be large, time-sensitive and difficult to forecast. This case study highlighted one example of a financial workload and how it was moved to a public cloud. Based on this initial success, several other time-sensitive workloads were migrated to a public cloud, thus enabling the organization be more responsive to customers and stakeholders while preserving full compliance with the organization’s policies and procedures, including requirements for data security. Key takeaways from this effort was the value in moving into public cloud with clear understanding of the benefits looking to be leveraged and the value of moving in stages to enable the organization as a whole to understand the benefits and the new possibilities.

9

Suggest Documents