Data Rationalization: Why, What and How?

Data Rationalization White Paper by

Priyanka Mandal 25 July 2016

Nomura Research Institute Financial Technologies India Pvt. Ltd.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |1

Data Rationalization: Why, What and How?

Contents Executive Summary ................................................................................................................................ 3 Need for Data Rationalization ............................................................................................................... 4 Steps for Rationalising Data................................................................................................................... 5 Understanding the Process of Data Rationalization ............................................................................ 6 Ingraining Rationalization in Business reality ..................................................................................... 8 Global IDs: Data Rationalization Suite ................................................................................................. 9 Data Discovery ......................................................................................................... 11 Data Profiling........................................................................................................... 12 Data Quality............................................................................................................. 13 Data Integration........................................................................................................ 14 Conclusion ............................................................................................................................. 15

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |2

Data Rationalization: Why, What and How?

Executive Summary Data Management is imperative to Organizations

Data Rationalization is the radical step to good data governance

Organisations of all types and sizes have experienced significant growth in the amounts of data they use, generate, process and analyse. With that has increased, the technology architectures that house it.

Companies incur huge expenses because of data management problems, they miss the market with their products and mergers fail to deliver intended results. Management of the huge amounts of data in the best possible way to derive meaningful information out of them remains a pain to CEOs, CDOs, CIOs and CROs equally and savages off the health of the business.

The Butler Group, a division of Datamonitor, estimates that approximately 80 percent of vital business information is currently stored in unmanaged repositories, making its efficient and effective use a nearly impossible feat. Most organizations employ Enterprise Resource Planning (ERP) to aggregate, store, manage and analyse data from many business activities. As ERP provides an integrated view of core business processes, facilitates information flow between all business operations, and handles connections to internal and external stakeholders, it entails reliable, integrated and efficiently governed data. ERP systems don’t work in silos and generally interact with many other systems such as CRM or SRM and other ERP systems as their existence necessitates integrating varied organizational systems and facilitating error-free transactions and production. With such interactions, the complexity of the data landscape increases aggressively and the need for a Single Source Of Truth (SSOT) to ensure that all systems access the same data keeps becoming more and more imminent. While the need for SSOT is understood, one of the biggest trouble which hits back at enterprises like a Frisbee is the integration of bad data and the use of such data in business operations resulting in present losses and misconstrued future.

“Although the problem is big, so is the potential payback. The best part of a Data Rationalization project is the demonstrable greater and quicker ROI." - AMR Research (an independent US research and industry analyst)

Master data that is correctly classified, normalized and rationalized with a common taxonomy is the key to good governance and successful growth of organizations. Data rationalization is one of the most cardinal, crucial and necessary steps that organizations should undertake to ensure data quality. Global IDs through automated discovery, data profiling, quality analysis and metadata documentation allows companies to create transparency, enhance accuracy and reduce the resources required to manage data assets.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |3

Data Rationalization: Why, What and How?

Why are organizations failing in governance of data? Need for Data Rationalization All large data ecosystems generally suffer from storing redundant, outdated and trivial data. The systems may even contain multiple copies of the outdated and trivial data. It goes without saying that enterprises would like to cleanse their system and take out the ROT (Redundant-OutdatedTrivial) data. Such ecosystems contain bad data for some very basic reasons which are difficult to weed out.

Large organizations often possess thousands of databases, acquired through decades of organic growth or M&A activity. In addition, many of these databases contain redundant data. [1]

Hugeness & Complexity in data In his book, “Too Big To Know, David Weinberger, an American technologist, professional speaker, and commentator explains a key property of the networking of enterprises: hugeness. In the era of e-commerce and too big to fail banks, with technological advancements, the amount of data has become humungous and that increases complexity. And as data ecosystem get more and more complex, it becomes more difficult to understand and that in turn means that the governance and security of that data is increasingly an issue. Companies today, have environments with thousands or even tens of thousands of databases. It is quite impossible to know which data is sensitive and where it resides, let alone reacting and reporting in stress conditions. However technical this complexity sounds, such an ecosystem is a major hindrance to business. If the data environment is not understood by any, getting analytical information based on the data becomes difficult and worse still, even if that happens,

businesses could end up getting wrong information about their customers/products/business and taking incorrect decisions, even leading to dire consequences such as the 2007 Financial Crisis. Such hugeness and complexity in data generally stems from Mergers & Acquisitions and even plain organic growth of companies becoming large institutions.

Merger and Acquisition Activities Mergers and Acquisitions are one of the prime reasons why businesses have disparate systems as silos and unintegrated data. Mergers and acquisitions are among the biggest challenges for enterprises and their IT organizations to navigate. M&As result in success by consolidating operations and inventory as well as sharing and integrating designs and leveraging use of common data, people, processes and operations. If enterprises fail to inherit the data from both the companies, synergies are rendered vulnerable and the integration process is dragged. This in itself may threaten the success of the merger or acquisition.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |4

Data Rationalization: Why, What and How?

Data Rationalization: What is it? - Steps for Rationalizing Data Data Rationalization forms the backbone of effective Data Governance. Rationalized data is accurate, complete, relevant, and trustworthy and remains consistent across locations, channels and services in an enterprise. Data Rationalization helps provide a common dictionary to the enterprise. By identifying common data entities, and how these relate to other pieces of data, MDM solutions become better at accommodating the needs of all the systems which require the master/reference data.

To be able to effectively locate, classify, reuse, and manage enterprise data assets, it is necessary to be able to form a comprehensive inventory. Big businesses become humongous in their volumes of data because of their diverse lines of business, applications and technologies. This ungoverned data is not only difficult to work with but drives huge costs for the businesses. Visibility into all forms of data becomes the first major step towards bringing them all together. Whether the data is held in structured or unstructured formats, it has to be curated and brought upon a common platform. Once the infrastructure outlook of the business has changed to being data-centric, the overhead of the company reduces manifold. But, there should be a repository of semantic domains (e.g. business names, definitions, and relationships) embedded within our database models that can be reused for centralized operations and efficient use. Data-centric infrastructure should be sustainable and profitable to the company. Businesses shouldn’t be spending a huge chunk on maintaining data in the governed form. Normalized, integrated and rationalized data is the key to sustainability. It empowers businesses with enhanced applications, newer insights into their customer behaviour and product/service development while reducing costs.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |5

Data Rationalization: Why, What and How?

Understanding the Process of Data Rationalization Master data rationalization is a multistep, iterative process that involves the discovery, profiling, cleaning, classification, attribute enrichment and integration of master data. Key to this process is the proper classification of the item master record based on the business’ data dictionary. Most systems use some sort of taxonomy to classify items. However, for use throughout the enterprise and with external partners, organizations should select a taxonomy that delivers depth and breadth, such as UNSPSC (the United Nations Standard Products and Services Code), and that allows granular visibility of the item. [2]

Steps of Data Rationalization 

Step 1: Discovery & Profiling Master data rationalization begins with the recognition of all the structured and unstructured sources of the enterprise’s data and their metadata. This involves the involvement of data stewards or people responsible for maintaining data in all lines of business. Data has to be extracted from all internal systems and any third party or external systems as well. This data has to be stored in a database and their scope and range is recognised.

Step 2: Cleansing Once profiled and aggregated, the data is subjected to an initial screening to identify duplicate records. Businesses should write rules to identify exact matches and probable matches for LEIs, country names and other attributes (e.g. Client name). But rule-based processing will generally be inadequate to manage the volume of data. This process would require SMEs to identify and eradicate the redundancy.

Step 3: Classification Classification is a step of paramount importance. With the data dictionary as the classification standard, all records have to be identified and classified correctly. Here the most critical element is that the businesses have to lay down their taxonomy holistically in a way which covers all their involvements exhaustively. Then again, the best practice would be to use widely adopted taxonomies such as UNSPSC, NATO, or eClass which shall improve the performance over legacy or proprietary taxonomies and to append any customised taxonomy unto it. This step shall give the best results if it uses a tool which has a built-in taxonomy manager grown over the years.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |6

Data Rationalization: Why, What and How?

Step 4: Mapping & Lineage Classification takes all records and puts them under some predefined hoods. As important as this step is, it is equally important to know how are the records related and how have they flown across systems in time. Data mapping bridges the relationship between all attributes as has been defined in the databases and also should be able to auto-map all the implicit relationships. This ensures that the relations between records are as they should be and exhibit coherence. The data life cycle, called data lineage, includes information about the data's origins and where it moves over time and describes what happens to data as it goes through diverse processes. It simplifies tracing errors back to their sources and reduces the many risks associated with managing data, such as security, privacy and intentional and accidental exposure of sensitive data.

Step 5: Integration Once the records have been cleansed, enriched, mapped and its lineage established, they undergo a second round of duplicate identification. All redundancy is removed, this time through manual intervention by SMEs even with more precision. Any anomalies are remediated.

After these operations are conducted, businesses should be able to cut down on the number of databases and eventually searching and reporting should be faster and in an organised and efficient way.

Steps for Rationalizing Data

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |7

Data Rationalization: Why, What and How?

Ingraining Rationalization in Business reality - Real Life Examples Some real life examples shall clarify how master data management helps to serve customer better as well as helps businesses to improve revenue opportunities and increases goodwill.




360o view of Customer:

Enhanced Customer Experience:

Retail Bank: Customer Profiling & Predictive Analysis: To categorise customers based on their behaviours and target appropriate customers with different products and services.

Profile all databases to know where data about customer resides. Eliminate redundancy by destroying duplicate data. Strengthen customer profile by attribute enrichment for address, phone numbers, SSN (PAN) numbers etc. Create golden source of truth and generate & distribute data in standardised formats. Create profiles and apply automated business rules to achieve efficient targeting.

Introduction of new opportunities in the form of services like: Offering credit cards to customer based on their savings account. Enhanced customer satisfaction

Healthcare Company: Outcome-based treatment:

Aggregated Patient Data

Patient data must be aggregated from unstructured sources, the data must be kept private, secure and HIPAA compliant.

Patient data was discovered and gathered from unstructured sources, linked through years of records and aggregated. All critical data and LEIs were identified and they were segregated and secured. This data was used to recognize patterns between patient demographics and geographies.

Targeting patients with appropriate treatment Recognition of geographies where healthcare and hygiene was neglected Discovery of diseases which were common and rare. Predictive analysis on demographics and diseases.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |8

Data Rationalization: Why, What and How?

How to achieve data ecosystem rationalization? [1] -Global IDs: Data Rationalization Suite “Global IDs data rationalization is like pulling back the curtains and getting the giant panoramic view of a vista you’d only seen before in pieces. Only when you can see and appreciate the data landscape in its entirety can you begin to make thoughtful and intelligent decision based on what the data can do for your business” -

Arka Mukherjee, Founder and CEO

The Remedy Data Rationalization allows organizations to simplify their data landscape, systematically eliminating redundant databases and significantly reducing data management costs. It creates a path toward greater efficiencies and lower costs through:   

Decommissioning databases with obsolete, duplicate, non-critical or otherwise unused information Rationalizing databases that have similar information but are critical to the business Protecting and monitoring critical databases that contain core business information

The Data Rationalization Solution Suite (DRSS) is a comprehensive suite of applications that allows organizations to rationalize their core databases in a systematic way. In order to create a foundation for rationalization, DRSS performs four core activities    

Data Discovery Data Profiling Data Quality Master Data Integration

Once these activities are complete, candidates for decommissioning and rationalization are identified. A program is initiated to systematically reduce cost by reducing the number of databases that need to be maintained. Integrate disparate data to create “golden copy” In information systems design and theory single source of truth (SSOT), also known as single point of truth (SPOT) or golden copy refers to the practice of discovering, linking, aggregating and storing information in such a way that every data element is stored exactly once (e.g., in no more than a single row of a single table). Any possible linkages to this data element (possibly in other areas of the relational schema) are by reference only. Because all other locations of the data just refer back to the primary "source of truth" location, updates to the data element in the primary location propagate to the entire system without the possibility of a duplicate value somewhere being forgotten. [3] © 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

Page |9

Data Rationalization: Why, What and How?

Deployment of an SSOT architecture is becoming increasingly important in enterprise settings where incorrectly linked duplicate or de-normalized data elements (a direct consequence of intentional or unintentional de-normalization of any explicit data model) poses a risk for retrieval of outdated, and therefore incorrect, information. [3] The Data Rationalization Solution Suite (DRSS) was specifically created to help financial services organizations govern and rationalize market data. The software can scan and monitor ~300 types of exchange and non-exchange data feeds to understand the level of redundancy across these data feeds. Duplicative market data feeds become potential candidates for rationalization.

Product Suite The Global IDs Product Suite contains 30 layers of product functionalities to address the diversity and complexity of corporate data landscapes.

Data Discovery

Data Profiling

Data Recognition

Data Lineage

Data Comparison

Data Classification

Data Mapping

Data Rationalization: Iterative Process

Data Rationalization: Iterative process

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 10

Data Rationalization: Why, What and How?

Bring to light all that has been dashed in dark corners: Data Discovery “Getting value out of big data is more than just slicing and dicing billions of records. It requires discovering what you have and getting the data ready for analysis to use without boundaries” -

Peter Schlampp, Vice President of Products, Platfora

The foremost and one of the most complex steps of the process of data management remains answering these questions:  What data is available?  How are the data sources structured?  What are the characteristics of these data sources? Data discovery does just that. It helps uncover the architectures and the metadata of data sources and discover the semantics of a data element in data sets. The metadata objects in the data store help applications to make sense of the data. Metadata is a means to foster integration of diverse applications, a way to cull and relate information from data silos, a challenge currently faced by electronic records. Managing metadata is the direction of the near future, particularly as content management, records management, and e-discovery systems converge and consolidate. Deciding what metadata to keep depends on the needs of a diverse set of interested parties in legal, compliance, records management, information technology, and business functions. [4]



Once all metadata is available, the choice of what data is important to us and where it resides becomes clear. Having access to metadata means there is an understanding of the expanse of the data ecosystem, what data sources have been used, what data is in use and what lies redundant. © 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 11

Data Rationalization: Why, What and How?

What does the data say? : Data Profiling Once all the data sources have been discovered, we can form ideas about the data landscape, and can now choose to know about the data and analyze its quality. Data Profiling is a systematic analysis of the content of a data source (Ralph Kimball). Data profiling elucidates what sort of data is stored, what data is related to it and what are the specific sources where they reside. It thus helps build a relationship, a mapping. Analysis of this data gives answers to questions like  Is the data of sufficient quality to support the business purpose(s) for which it is being used?  Are any specific issues within the data decreasing its suitability for these business purposes? [5] Create & Execute

Plan & Design

ETL Data Cleansing

Profile Data Sources Analyze Findings Design systems

Review & Manage Define Audit Proceudres Implement Jobs Report

Data Profiling Data Profiling ensures:     

Trust in data Finding problems in advance Shorten development time on projects Improve understanding of data & business knowledge Design newer services & products

With profiled data, whose semantics are clear, one can build a common data dictionary or a taxonomy which shall contain definitions of all enterprise wide entities. This forms the foundation or the base layer of good data governance, which shall result in compliance to any regulation or standards and inevitable business growth.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 12

Data Rationalization: Why, What and How?

Is your data good? : Data Quality “In God we trust, all others must bring data.” W. Edwards Deming. Now imagine, if all we did was bring bad data. It would be an apocalypse. There is an immense awareness about data quality these days. But then again, organizations have ended up having bad quality soiled data, where some verticals of business have good data while others requiring the same data may end up with bad ones. This scenario is still bad as there is no dependability on the data. The focus should be on having good dependable data across business lines, across assets and across all verticals and horizontals of enterprise. The dimensions of data quality can be summed up in the diagram below: Is there Rotten Outdated Trivial (ROT) data?

Does the data reflect the semantics used in your business?


Completeness Is all the necessary data present?

Accuracy Is data consistent across the enterprise?

Data Quality Timeliness


Integrity Is the data available at needed times?

Are there copies of data which say different stories?

Once the data has been profiled, rules of data quality can be reverse engineered out of the data landscape. This structure of the data is then tested against the dimensions of good quality data. The ownership of the data is established as the same data could be unimportant to some people while they may be extremely crucial for some others. This is generally referred to as Data Stewardship in the data management world. Any duplication of data is removed and the data is cleansed and eventually can be presented to the data stewards to ameliorate their quality further. Data Quality is not a one-time process that shall relieve the headache of the CDOs but a recurrent one which measures and monitors the quality of data assets and continuously improves the quality of the data.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 13

Data Rationalization: Why, What and How? Fighting the gargantuan: Data Integration Half the work is done in forms of data discovery, profiling and measuring it against data quality metrics. What remains to be done is create the single source of truth out of the data. A data glossary/data dictionary is created, which enables data stewards to build and manage a common business vocabulary and make it available across an organization. This vocabulary ensures that all data assets are recognised and classified under proper semantics, which provides association between technical metadata and business context. While automation through business rules can help govern the classified data, manual intervention is required as all business rules can’t be simulated. Local data stewards, who understand their businesses clearly and comprehensively have to work along with the tool to eliminate duplicate data, do away with unused sources and data, eradicate redundancy and store relevant information in such a way that it facilitates all lines of business and operations which require it. The benefits of this process are multitudinous. A 360 view on all business entities is one of the most notable ones. Once the information of which data is important, where is that data, how is it related to other important data and what is the lineage of the data is obtained, the complexity decreases manifold and tracking data becomes easy and simpler, the access time is reduced and any reporting is faster and brings out coherent reliable results. An enterprise wide knowledge of the data not only improves business but also makes it stress-situation ready.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 14

Data Rationalization: Why, What and How?

Conclusion Rationalizing data on an enterprise scale is a herculean task and will always cause pain to businesses, data officers and risk officers. Hence, beginning at the root cause level becomes extremely crucial. Unless these fundamental issues that have been highlighted in this paper are dealt with, all the other tasks that face the business will be impossible to tackle or, at best, any results derived out of them will be dysfunctional. In other words, discovering and understanding your data landscape, building a strong infrastructure and cleaning and organizing data are necessary conditions to effectively manage, govern, understand, and analyse information assets, realize significant time savings and minimize the amount of real analysis so often performed when changes or new applications are required and ultimately increase the ROI or the value of the business.

Global IDs Uniqueness [1] In contrast to traditional manual approaches that focus on reducing costs in silos, Global IDs software reverseengineers the data ecosystem to identify candidates for database rationalization. Since data ecosystems are large and complex, reducing costs from these environments has a significant ROI. This perspective allows organizations to see their enterprise data in a holistic manner, allowing visibility into the way in which business is conducted across the enterprise. The Global IDs machine centric approach to master data management creates a foundation for firms to manage their data assets. Through automated discovery, data profiling, quality analysis and metadata documentation we allow companies to create transparency, enhance accuracy and reduce the resources required to manage data assets. Their machine-centric approach to data governance is much more cost-effective than traditional approaches. It is: 1. Automated (greater than 90%) 2. High speed 3. Continuously evolving (through increased awareness of the data landscape) Some of the world's largest organizations have used this approach to bring transparency and visibility into complex data landscapes.

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 15

Data Rationalization: Why, What and How?

References & Further Readings 

Data Rationalization in the capital Markets Sector

Agile Data Rationalization for Operational Intelligence

Achieving Successful Applications Rationalization Initiative

Managing Company’s Data Portfolio Using Data Rationalization

Citations  

[1] Data Rationalization : [2] Item Master Data Rationalization:

 

[3] Single source of truth: [4]Examining Metadata:

[5] Data Profiling:


Priyanka Mandal Associate Software Engineer - IT Consulting NRI FinTech India Pvt. Ltd. Office : +91-33-6604-1000 Email : [email protected]

Agomoni Sarkar Associate Software Engineer - IT Consulting NRI FinTech India Pvt. Ltd. Office : +91-33-6604-1000 Email : [email protected]

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 16

Data Rationalization: Why, What and How?

About Us Nomura Research Institute Ltd. Nomura Research Institute (NRI), founded in 1965, is a leading provider of consulting & system solutions. Headquartered in Tokyo, Japan, NRI has a presence in all the major financial centers around the world, providing various services in the areas of management & system consulting, system integration, IT management, and IT solutions for the financial, manufacturing, and service industries. With more than 5,000 employees worldwide, NRI is able to leverage its global consulting business to deliver innovative, cross-asset, front-end financial IT solutions for investment banks, asset managers, banks and insurance providers in the global market. For more information, visit

Nomura Research Institute Financial Technologies India Pvt. Ltd. Founded in 2001 and acquired by NRI in 2012, Nomura Research Institute Financial Technologies India Pvt. Ltd (NRI FinTech) is a wholly owned subsidiary of NRI. For more information, visit

Global IDs Global IDs was founded in 2001 by Dr. Arka Mukherjee, a data management industry expert with extensive experience in master data management and data warehousing. By predicting the data deluge facing Fortune 500 companies, the Global IDs team was able to address the specific challenges of complex data environments almost 10 years before the advent of Big Data. We are passionate about data design and information management and take great pride in building software that solves complex problems for the world's most demanding institutions. Based in Princeton, NJ, Global IDs provides software for enterprise information management (EIM). Over the last 10 years, Global IDs has provided Data Management Software products to the world’s largest companies. For more information, visit

The entire content of this report is subject to copyright with all rights reserved. The report is provided solely for information purposes and is not to be construed as providing advice, recommendations, endorsements, representations or warranties of any kind whatsoever. Inquiries to: Marketing Department Nomura Research Institute Financial Technologies India Pvt. Ltd. Globsyn Crystals, Tower I, 6th Floor, Block EP, Sector Salt Lake Electronics Complex Kolkata 700 091 India Tel: +91-33-6604-1000 E-mail: [email protected] Website:

© 2016 Nomura Research Institute Financial Technologies India Pvt. Ltd. All rights reserved.

P a g e | 17