bg-arrow-down icon-arrow-up icon-back-to-top icon-linkedin icon-menu icon-search icon-twitter logo-white slider-arrow-left-gray slider-arrow-left slider-arrow-right-gray slider-arrow-right

Big Data Overload: Call for Business Metadata Governance

Have you ever been steered to a conclusion by a convincing analysis, only to see significantly different findings a week later? Maybe the underlying data feeds changed, or someone revised a minor calculation. But there is no quick way to tell and meanwhile the credibility of the analysis is gone.

This frustrating scenario is but one of the information governance issues confronting banks as the use of big data proliferates. Across the industry, institutions are working to tap the power of large-scale data analysis, but stumbling into all sorts of traps and inconsistencies as various analysts charge off in different directions:

  • Data streams are managed inconsistently across business silos, with scant central oversight or documentation on how they are cleansed, refreshed and combined.
  • Techniques for calculating performance metrics and model variables diverge among analytical teams, leading to differing interpretations and/or representations of the same fact set.
  • Definitions of foundational metrics (e.g., balances) are changed without regard to the cascade effect on dependent metrics, or the potential impact on statistical models based on other versions.

A proper end-to-end analytical governance structure is essential in controlling these risks. Strategically, data governance is a central tenet of a data-driven organization that values data as a corporate asset and knows it must ensure the delivery of trustworthy, secure information to support informed decision-making and efficient business processes. From an execution perspective, governance permeates all levels of data management within the enterprise, from databases to data models to applications.

Many organizations are in need of a comprehensive review of the business metadata management framework. A coordinated effort is needed to establish and maintain a governance framework that ensures: 1) consistency and documentation of business terms and definitions; 2) tracking of “data lineage” from raw inputs to analytic output; and 3) version control and audit trails that show who made changes, when, and most importantly, why.

Shaky Guidance System?
Business metadata provides an organizational map for the conversion of raw descriptive data into metrics and models for decision-making. A simple example for this would be the documented technique for calculating and updating a customer’s average monthly product balance.

In projects and applications involving multiple teams, each with multiple analysts, flaws and inconsistencies in even one metric can wreak havoc. The stakes grow exponentially in a typical big data project that may rely on 500 to 1,000+ metrics. That is why the quality of data governance, especially with business metadata, has such a strong and growing impact in banking.

Currently business metadata governance is receiving way too little management attention at many banks, increasing the risk of building shaky guidance systems. Via big data platforms, players are sorting through huge libraries of internal and external customer information — everything from banking transaction patterns to gleanings from social media — to drive marketing, sales and product development. Analytics also play a growing role in risk management and regulatory compliance.

But the more that advanced analytics proliferate, the harder they are to control. More data inputs are being harnessed to drive more sophisticated analytical techniques — and by larger teams, each of which can make changes. Unmanaged, this complexity is working against the organization in three major areas:

Reports and models. Especially in a multi-channel environment, there is a growing need for collaboration and coordination among business units. Executives bring their various perspectives for decision-making but still need to be reading off the same page, analytically speaking. Banks do have model validation committees, but they are having trouble keeping up. The quest for flexibility and continuous improvement is winning out over the need for documentation and consistency.

Data democratization. An enterprise-level initiative, data democratization seeks to enable non-technical business analysts to easily extract their own data. The goal is to free up more staff resources for analysis, as opposed to data trench work. But unless it is firmly grounded in sound data governance, data democratization can spawn analytical anarchy. In particular, well-formulated and -executed metadata management processes are needed for all metrics and models that analysts create and share among themselves.

Regulatory compliance. Regulators are pushing large banking organizations to be smarter and more anticipatory in their business practices, including capital and liquidity planning. Along with evaluating the robustness of predictive models, they are looking for precision in the compilation of source data and good documentation.

Sound data governance is essential in meeting these requirements without reactively having to document how data sets were created after they have been put into use (the backwards approach often seen today). Governance is also a swing factor in leveraging regulator-mandated analytics for performance improvement in other areas of the bank.

Real Consequences
Beyond the big picture issues, each major bank has its individual challenges with business metadata governance, slowing progress toward organizational goals.

Conflicts are festering beneath the corporate radar, not coming to the attention of management. At one of the 10 largest U.S. banks, for example, a project was launched to optimize the use of promotional pricing for core deposit growth. Predictive models were needed to guide customer-level decisions about the selective use of rate offers, with much depending on a proper foundation of business metrics.

These metrics were to be derived from a variety of granular data sources, both internal and external to the bank, and ingested into a big data environment. The expectation was that when the data scientists got to the core work of model construction and refinement, they could summon building block metrics from a robust and easily accessible library, which in this instance was slated to include more than 500 possible variables to analyze 10 years of data across millions of customers.
But three problems quickly surfaced:

  1. Duplicative effort. Data scientists in various business silos wound up building their own versions of core metrics. Though some definitions were identical, others varied, and all were separately named. These redundancies left project managers in the position of having to sort through tons of code and adjudicate metric definitions and naming conventions that all would use. Ultimately a lot of expensive effort went to waste.
  2. Version control. As metrics were refined over a series of revisions, it became increasing difficult to track which changes were made at which stage. Were the latest revisions from one data scientist inadvertently made to an older version of the code issued by another, omitting interim work from someone else and embedding a flaw?
  3. Tangled interdependencies. To verify a suite of new reports, project team leaders needed to be able to drill down and review not only how underlying models were constructed, but also all of the linkages within the overall library of metrics. In an environment where linked generations of metrics and models were fast evolving, data lineage and dependency maps of the derived products became a major governance issue.

Four Key Questions
As banks consume ever more data they are creating ever more sophisticated metrics, often developed by a wide cross-section of users with various — and sometimes conflicting — business needs. At a technical level, banks have three metadata governance issues to work on: metadata catalog; data lineage tracking; and version control and audit trails (Figure 1: Fundamentals of Sound Business Metadata Management).

Figure 1: Fundamentals of Sound Business Metadata Management

The bigger picture is about achieving better results in less time. As banks continue to build their analytical capabilities, executives should be asking four key questions:

  • What proportion of our analysts’ time is being spent on data wrangling, versus generating value-added analytics?
  • What proportion of our models can be automatically (and reliably) validated and refreshed with the latest data, with minimal manual effort?
  • How well are we leveraging analytics across projects?
  • How well are our analytics documented, including the timely tracking of successive changes made during the course of model development and testing, and not as an afterthought or under regulatory duress?

In many instances, these questions reveal the need for improved data governance, not just incremental improvements here and there, but a comprehensive review and overhaul. Given that the required investments, complexity and performance impact of big analytics will only grow, the time to strengthen the foundation is now.

Rich Solomon and Kaushik Deka are Managing Directors in the New York office of Novantas. They can be reached at and, respectively.

For more information, contact Novantas Marketing

+1 (212) 953-4444

Related Materials


New Study Proves Current Enterprise Data Strategies Ineffective

Novantas recently participated in research by Harvard Business Review Analytic Services sponsored by Cloudera that examined the emerging challenges and opportunities of data strategy and management.


Cloudera Announces Sixth Annual Data Impact Awards Finalists

Novantas is honored to be one of the top 5 finalists of Cloudera’s Data Impact Awards in the Customer Insights category.


Case Study: A Spark-based Distributed Simulation Optimization Architecture for Portfolio Optimization in Retail Banking

Strata Data Conference NYC 2018