The existence of Agile Integration Software gives rise to solid footing for a new model that is emerging for Master Data Management.
In the context of Agile Master Data Management, a Data Master is considered to be not just a schema with all the standard ancillary information about field definitions, units of measure, formats, etc., but it is actually a series of automatically generated services that make the master data accessible live from the multiple root sources that are federated and transformed as required to fulfill the requirements of the Master definition. So, once a Master Data is defined, you can use it in many ways without having to know anything about where the data comes from or whether the official sources behind it changed yesterday or will change tomorrow.
Since the bottom line is about accessing or moving the correct data, we turn the MDM exercise upside-down. The classic approach of defining data models for the whole organization is a daunting task, smiled upon, I am sure, by those Systems Integrators who have a history of making lots of money attacking tasks that are verging on the impossible. Can you really know what the data needs to be from the perspective of every potential use? As always... the Planning group sees the world differently from the Operations group who sees it differently from Finance.
Instead of focusing on an unwieldy data model, consider the approach of a having a growing body of for-purpose schemas, most of which have been defined because the data is required for a project. So what you need, rather than a big data model, is a mechanism to manage this metadata in reasonable chunks, identifying, tagging, and "sanctioning" some as Master Data, and then making them available for re-use in multiple ways with appropriate security.
With Agile Integration Software such as Stone Bond Technologies' Enterprise Enabler, you already have defined reusable virtual schemas , packaged with all the sources of record, federation, transformation, validation, and sometimes the data workflow relevant to general use of the data. With a single click, you deploy one of these nuggets in any number of ways, such as a web service (SOAP, JSON, REST) or as an ADO.net object, or as a SharePoint external list. These particular modes of delivery are virtual, on-demand, and they are not just for reading data, but contain all the information for full CRUD (Create, Read, Update, Delete). Instructions for transaction rollback are included as well as end-user awareness for securing data access to individual permissions. ETL with federation, data workflow, and notifications are processes that work with or on specific data sets and can also be packaged for reuse.
Agile MDM leverages those nuggets by "sanctioning" the ones that have general usability as Master Data Services. Then, where there are gaps, new virtual schemas can be built.
Since the generation of these nuggets, rich with all the layers of information, is a matter of minutes, their proliferation could be a challenge at some point. The System Integrators need to reinvent their Best Practices for a more agile set of processes, to ensure success for these Agile MDM projects and avoid the proverbial herding of cats. In many ways, Agile MDM parallels the growth of social media. It's extremely easy to generate complex integrations, so there could easily be many for each project. The key for MDM is to start to manage the situation at the right point, late enough to leverage the body of for-purpose metadata, but early enough to guide re-usability of Master Data before there are too many to review.
Thursday, May 3, 2012
Friday, March 30, 2012
Data Virtualization: No Need to Look Beyond Your Own Back Yard
How is this possible? I did a few Google searches looking for "data virtualization" and "SharePoint" together. SharePoint 2007 and 2010 were designed with the concept of data virtualization built in. How is it possible that only one of the players in this new-hype of data virtualization and federation shows up on the radar as feeding SharePoint? After all, SharePoint is already found in most medium-to-large businesses, so there's no need to look for a separate dashboard or application development platform. You probably have access to it in your company right now!
Out-of-the box, SharePoint has the capabilities to interact with data virtually. I'm not talking about the old SharePoint lists, which you can write to, store data in and change; I'm talking about the BCS External Lists, which were designed with rich features for data virtualization. Why Microsoft doesn’t promote this is beyond me. Conceptually and technically, SharePoint could become the only interface end users in your company need ; they can access data from any Saas, or on-premise application, or any data from electronic instruments in one place. Federated data aligned from multiple systems can be presented in SharePoint out-of-box web parts with live refreshes from all of the sources. SharePoint even has the capability to enforce end-user specific security at the data field level for all CRUD (Create, Read, Update, Delete) functions, using SSS (formerly SSO) or claims authorization approved live at the endpoint application.
Probably one reason that these features are not often leveraged has to do with the scary-looking requirements for the format of the metadata you have to provide and import into the Business Data Catalog. It is, actually, pretty complex, with the need to generate a big hairy XML file and provide a series of specialized web services to use it. But that's where the product companies that specialize in data federation and virtualization should all be stepping handily into the picture. Are those companies oblivious to the ubiquitous state of Microsoft's SharePoint? Stone Bond Technologies, a leader in enterprise data federation and virtualization automatically generates the metadata for SharePoint as well as auto-generating and deploying the web services that execute the bi-directional (if desired) interaction with the backend applications live, federating and transforming across multiple sources.
This approach does not require a team of ten architects, twenty programmers, and long development/implementation cycles. Maybe that's one of the disconnects: the architects and developers can't even imagine such projects taking less than a day. Just think - they could actually take a vacation once in a while.
Out-of-the box, SharePoint has the capabilities to interact with data virtually. I'm not talking about the old SharePoint lists, which you can write to, store data in and change; I'm talking about the BCS External Lists, which were designed with rich features for data virtualization. Why Microsoft doesn’t promote this is beyond me. Conceptually and technically, SharePoint could become the only interface end users in your company need ; they can access data from any Saas, or on-premise application, or any data from electronic instruments in one place. Federated data aligned from multiple systems can be presented in SharePoint out-of-box web parts with live refreshes from all of the sources. SharePoint even has the capability to enforce end-user specific security at the data field level for all CRUD (Create, Read, Update, Delete) functions, using SSS (formerly SSO) or claims authorization approved live at the endpoint application.
Probably one reason that these features are not often leveraged has to do with the scary-looking requirements for the format of the metadata you have to provide and import into the Business Data Catalog. It is, actually, pretty complex, with the need to generate a big hairy XML file and provide a series of specialized web services to use it. But that's where the product companies that specialize in data federation and virtualization should all be stepping handily into the picture. Are those companies oblivious to the ubiquitous state of Microsoft's SharePoint? Stone Bond Technologies, a leader in enterprise data federation and virtualization automatically generates the metadata for SharePoint as well as auto-generating and deploying the web services that execute the bi-directional (if desired) interaction with the backend applications live, federating and transforming across multiple sources.
This approach does not require a team of ten architects, twenty programmers, and long development/implementation cycles. Maybe that's one of the disconnects: the architects and developers can't even imagine such projects taking less than a day. Just think - they could actually take a vacation once in a while.
Friday, March 23, 2012
IT and OT - Shall the Twain Never Meet?
Yesterday I learned a new term in a conversation with Gartner: “Operations Technology," or “OT.” Operations Technology, they tell us, is separate from “IT,” which has the business technology focus. Having frustratingly observed this gap in the past, I found it particularly interesting to see that finally it is being recognized, although the recognition could instead work against closing the gap, forcing categorization one way or the other.
When I first graduated in computer science and began working, I worked at Shell Oil Company, along with the other dinosaurs. There were two totally separate groups, “business computing” and “technical computing.” The business side handled the mundane COBOL programs for accounting and such as well as the infamous midnight data uploads. We on the technical side focused on all the fun stuff that impacted the actual core business of an integrated oil company. I made sure I never admitted to having any knowledge of COBOL, lest I be drawn to the dark side. My focus and passion for computer graphics (in the days when you had to write to the pixel on/off level) led me to develop graphics for 3D reservoir simulation, engineering design, and manufacturing plant simulation.
As I moved on in my career, I observed from both sides the lack of mutual respect for the activities and technology requirements on the other side of the divide. I also discovered that operations couldn’t make the best decisions without a view of the market and the impact of those decisions on the overall well-being of the company. I think the hype wave of re-engineering in the late 80’s recognized this, but the IT department usually did not touch anything on the operations side, and did not want to deal with systems that include process control, complex mathematical optimizers and simulators. So the re-engineering efforts, moving on to BPM, pretended that there was nothing beyond the head office.
It was on this OT side that I began to contemplate how to put in the hands of the engineers a tool that would allow them to get the data they needed for their various models without programming. After all, they were not programmers, and it was impossible to get the programmers to leave what the engineers called the “Ivory Tower” to help out. (That turned out to be the origin of Enterprise EnablerÒ, an Agile Integration platform that naturally bridges the gap between IT and OT).
Apart from some ad hoc custom programming to feed the likes of SAP, there has been little done to bridge the gap. In my humble opinion,this is largely due to two factors. First, the Big Consultants rarely had the depth of knowledge to delve into this type of industry-specific systems and requirements. Second, the Big Integration products were designed as either ETL or EAI, neither of which alone could handle the needs of operations. Of course, integration solutions that bridge IT and OT would likely be funded by both, and there aren’t many people who are in the position to buy that understand enough technology or care enough to make it happen.
It’s much safer to get excited about Big Data and figure out some way that BD can be important. Forget closing the gap between IT and OT. That sounds too hard. But if your competitors decide to bridge the gap, you will definitely lose the edge!
When I first graduated in computer science and began working, I worked at Shell Oil Company, along with the other dinosaurs. There were two totally separate groups, “business computing” and “technical computing.” The business side handled the mundane COBOL programs for accounting and such as well as the infamous midnight data uploads. We on the technical side focused on all the fun stuff that impacted the actual core business of an integrated oil company. I made sure I never admitted to having any knowledge of COBOL, lest I be drawn to the dark side. My focus and passion for computer graphics (in the days when you had to write to the pixel on/off level) led me to develop graphics for 3D reservoir simulation, engineering design, and manufacturing plant simulation.
As I moved on in my career, I observed from both sides the lack of mutual respect for the activities and technology requirements on the other side of the divide. I also discovered that operations couldn’t make the best decisions without a view of the market and the impact of those decisions on the overall well-being of the company. I think the hype wave of re-engineering in the late 80’s recognized this, but the IT department usually did not touch anything on the operations side, and did not want to deal with systems that include process control, complex mathematical optimizers and simulators. So the re-engineering efforts, moving on to BPM, pretended that there was nothing beyond the head office.
It was on this OT side that I began to contemplate how to put in the hands of the engineers a tool that would allow them to get the data they needed for their various models without programming. After all, they were not programmers, and it was impossible to get the programmers to leave what the engineers called the “Ivory Tower” to help out. (That turned out to be the origin of Enterprise EnablerÒ, an Agile Integration platform that naturally bridges the gap between IT and OT).
Apart from some ad hoc custom programming to feed the likes of SAP, there has been little done to bridge the gap. In my humble opinion,this is largely due to two factors. First, the Big Consultants rarely had the depth of knowledge to delve into this type of industry-specific systems and requirements. Second, the Big Integration products were designed as either ETL or EAI, neither of which alone could handle the needs of operations. Of course, integration solutions that bridge IT and OT would likely be funded by both, and there aren’t many people who are in the position to buy that understand enough technology or care enough to make it happen.
It’s much safer to get excited about Big Data and figure out some way that BD can be important. Forget closing the gap between IT and OT. That sounds too hard. But if your competitors decide to bridge the gap, you will definitely lose the edge!
Monday, February 6, 2012
MDM's Unsustainable Tech Debt
One of the nice side-effects of Agile Integration Software is the ability to get useful master data easily and quickly. For most companies, an enterprise MDM project takes years to achieve value, and with the huge effort required to maintain each step, tech debt can skyrocket. Before the project is halfway done, changes and new data and sources have already impacted the viability of the outcome. For an MDM project to bring the promised inherent integrity that offers value, the tech debt simply cannot be ignored. Every single change anywhere in the MDM supply chain must be accommodated immediately and correctly throughout the interdependent process and data networks that constitute MDM.
Only stagnant companies don't change! And only a handful of data sets in any company will remain stable enough to last through the MDM implementation lifecycle. The bottom line is that with the current approach to MDM and the speed with which data is proliferating, MDM, is a self-contradictory concept, and we are likely to see long term initiatives slowly and expensively committing suicide.
MDM - long time-to-value:
Consider the components of the cost of implementing MDM.
Is MDM even sustainable in its current manifestations? With the complexity of an implementation, the accumulation of tech debt begins as soon as the first Master Data is defined. Every step along the implementation path is fraught with instability.
Here is another opportunity for rescue by the paradigm of Agile Integration Software.
More on tech debt:
http://agileintegrationsoftware.blogspot.com/2011/04/hows-your-tech-debt.html
http://agileintegrationsoftware.blogspot.com/2011/05/lean-and-mean-beats-sloth.html
Only stagnant companies don't change! And only a handful of data sets in any company will remain stable enough to last through the MDM implementation lifecycle. The bottom line is that with the current approach to MDM and the speed with which data is proliferating, MDM, is a self-contradictory concept, and we are likely to see long term initiatives slowly and expensively committing suicide.
MDM - long time-to-value:
Consider the components of the cost of implementing MDM.
- To start with, you can count on a costly (high six or seven figures) software purchase, likely requiring multiple products.
- A team of consultants with a range of expertise to implement the multi-year project.
- Internal resources to manage, guide, and work with the consultants
- Assurance that everyone is using the same data for decisions
- Quality and correctness of data
- Continuously changing sources and formats of data. A heavy solution will have difficulty responding immediately to these changes, leaving gaps in the validity of the information.
- Latency of data availability due to staging data in a data warehouse or other database. With the current speed of business, users and decision makers need data that is as fresh as possible.
Is MDM even sustainable in its current manifestations? With the complexity of an implementation, the accumulation of tech debt begins as soon as the first Master Data is defined. Every step along the implementation path is fraught with instability.
- Defining each master data schema that can be everything to everyone who needs the data
- Determining the most correct sources for each component of the master
- Determining the criteria for correctness of the data
- Determining the optimal refresh time
- Designing a database or data warehouse entry appropriate to the master
- Implementing the integration necessary to populate the master data store ...or,
- Defining and implementing ways users can access and aggregate the data components directly from the sourcesThis doesn't include all the discovery and such that the experts and company team must perform. Clearly this constitutes a large, time-consuming effort, that generally is nowhere near agile or responsive to changes in the company, systems, and requirements.
Here is another opportunity for rescue by the paradigm of Agile Integration Software.
More on tech debt:
http://agileintegrationsoftware.blogspot.com/2011/04/hows-your-tech-debt.html
http://agileintegrationsoftware.blogspot.com/2011/05/lean-and-mean-beats-sloth.html
Monday, November 21, 2011
Data Quality and your Enabled Enterprise
As a good example of Agile Integration Software, Enterprise Enabler's data quality features and capabilities serve a representative discussion. (http://www.enterpriseenabler.com/) In the context of data integration, I tend to think of data cleansing and profiling in two separate categories, "batch" and “in transit," or "real time."
Batch - Often this is performed as a first-step-project to an integration implementation to ensure that any existing data that is being used is as correct as possible. The context of correctness is generally defined by the source for which it exists. When the source is an existing data warehouse, the correctness is usually considered with respect to a pre-defined master data definition.
In-Transit or Real-Time - Once the integration is in place, new data is being generated and flows through the organization and systems via the agile integration framework. This data must be validated as soon as it appears in play, as well as when it is passed to its destination, since the definition of "correctness" is ultimately determined by the target use.
With Agile Integration, the philosophy is to focus on the data required for the purpose of the project at hand. While cleansing/validating an entire database or data warehouse full of data may be important, the chances are that it is not important for any particular integration project. Addressing the subset needed means a more efficient project and faster time-to-value.
Pre-validating existing data
Using the inherent capabilities of Enterprise Enabler to discover data schemas and objects, one can simply "point" the appropriate AppComm (application communicator) to a database or application that is to become a source to the integration, and the schema or services available are presented. Select the tables, fields, objects, etc. of interest, and grab a sample or the full set of data. In a configured process, the data can be cleaned, validated and standardized using pre-built rules, external tools, or special logic for each unit of data, by field, by record, or by other cross-section. Rules for logging, notifications, and mediation are configured as part of the process. With this approach, you are focused specifically on the data that will be used for the subsequent integration, and a staging database is not required. Once this process is configured, it can be triggered to automatically run as desired to ensure ongoing monitoring and validation of new data. The results can be fed to a BI tool or spreadsheet for statistical analysis on the data quality ("profiling").
With the AppComm approach, combined with the ability to easily create virtual relationships across disparate sources, cross validation ("matching") across systems or merging data to enrich it, becomes a reasonable exercise, without having to design and build a consolidated staging database. Of course, if the situation still requires a staging database, there's no more efficient way to populate it than Enterprise Enabler.
After you have completed this step, the chances are that the new data that will be captured from here forward needs to be cleansed, too. This can be done "real-time" as it is being acquired from the source and passed to the federation and transformation steps of an integration.
Validating data on-the-fly
As is the nature of Agile Integration, Enterprise Enabler offers multiple places where data cleansing, validation, and remediation can be managed within the flow of data through an integration. Some amount of detection of erroneous data is done as a natural part the data acquisition by the intelligent AppComm technology. Driven by metadata definitions, AppComms check not only for valid data (type, format, etc.), but also for the expected schema. Additionally,
o Validation/cleansing rules, pre-built processes or 3rd party tools can be dropped in or invoked for detection and mediation at various points in execution:
· As soon as the data has been acquired
· As it is being transformed and merged with other sources
· After it has been transformed
· By the destination's Appcomm before/as the data is being posted (plus transaction rollback and assurance in the case of multiple destinations)
· Anywhere in the data workflow process surrounding the transformations
o Enterprise Master System ensures that the data comes from the correct source when an end user invokes a particular piece of information.
o Since Enterprise Enabler's user interface ("Designer Studio") is tied directly to a copy of the run-time engines, as you design an integration, you can do a trial run from the studio and see a sample of the data for inspection to get an idea of the quality of data you are dealing with.
Still don’t trust your data?
Sometimes there are situations where validation rules just won't cut it. Example: setting hard minimum and maximum values for something coming from a physical processing plant. You may be able to determine a reasonable range, but only with the knowledge of what happened yesterday will you be able to determine that a "way out of whack" set of numbers are actually due to a disruption at some part of the plant yesterday. Enterprise Enabler has a preview/analysis feature that holds the result data (post transformation and process) just before it is posted to the destination, in a virtual store, only to be released and posted after review and approval by an authorized human being. That person can do quick tests on ranges, averages, etc. as a gut feel reality check and then fix it if necessary before releasing the data set.
And for those of you who care about data governance
Only an AIS is a single end-to-end integration solution. This means that security can be maintained throughout the integration infrastructure. Developers and Data Analysts log in with the permissions of their role and group, and anything they build or change is logged with who-what-when stamp. Every object in Enterprise Enabler is locked down in such a way, preventing intentional or accidental diversion or modification of data and their flows through the enterprise.
And what about bad data in your ERP?
My apologies, but I just can't help saying to the ERP vendors, "shame on you" for not taking the responsibility to ensure that the data captured and generated by your system is completely correct. How could you let that happen? People trusted you! Ok. Ok.. I'll stop short of calling for an "Occupy ERP" movement.
Alltogether..
With all of the various angles on Data Quality, it’s clear that Agile Integration inherently brings a range of capabilities that are simply not possible with other DQ products. Whether you are looking to correct existing data or ensure the quality of new data as it is created, the fact that the data quality is handled as a natural aspect of integration means a more efficient overall solution.
Thursday, November 17, 2011
Big Data Quality
Big Data means big data quality issues, right? Well, of course, right. Big data means more data that can be bad or go bad one way or another. Big, bad data could have big bad consequences. But just think about some of the ways Big Data may have be in better shape than others.
Big Data
- is usually captured automatically, without manual intervention
- often has been gathered over many years, so that the framework for capture and validation at the source has improved and been "debugged" over time. Various standards may also play a role in the data capture and ultimate quality. Examples might be weather related data and GIS data.
- is often used in ways where analytics and conclusions improve with data volume and errors in individual data become less important. Data quality is essential for Business Intelligence (BI), but from some perspectives, and some aspects of data quality, DQ may move into the background.
Big Data from Social Media has some additional considerations.
- Capture mechanisms are well known. Facebook, emails, Twitter, etc.
- We know that the quality of information from these is highly questionable - that's the nature, and the beauty of the beast.
- We also know that they are well structured. For example an email has a very easily determined structure: there is the header, the body, attachments, etc. The content of the unstructured data (body, attachments) can be searched for relevant information and key words. Bad data might be a corrupted attachment or garbled text in the body, but other than that, errors are, almost by definition, not really bad data.
- What do you/we want from social Media’s Big Data? Mostly the trends of the masses. If you clean it up that very exercise could corrupt the data.
Senile data forgets its source and loses relevance and accuracy
There is an altogether different situation with many of the nouveau trendy Corporate Big Data projects. In this case, big data is likely to be consolidated data coming from a number of sources, including those suffering from data senility. Senile data has been through the wringer, moved from residence to residence, been "cleansed" and perhaps never saw the light. A data warehouse usually is populated with data from a huge number of sources, and fallible humans have pored through it, run human-defined cleansing and validation algorithms, and then subjected it to manually-programmed integration code. It is incumbent upon the mining and analysis functions to accommodate assumptions about data quality.
So, as you can see, data quality and cleansing becomes an altogether different problem for Big Data.
Monday, October 31, 2011
Mainframe nearly to the cloud…
Most people know that Salesforce.com is one of the first and certainly most successful SaaS (Software-as-a-Service) applications on the market. One good thing is that Salesforce stores all the data in the cloud and manages it, eliminating the need for their customers to have the skills and the hardware, software, and maintenance costs to keep it on-premise. That good thing is also the biggest downside of SaaS: the concern that the data is stored in the cloud. Unfortunately, companies worry about having their data stored off-premise with very little control over its management, security, and perhaps even accessibility.
Nevertheless, Salesforce.com has a huge customer base and offers business functionality important to every business I can think of. While business sectors like financial institutions and healthcare could easily make valuable use of the functionality of Salesforce and other cloud apps, the risk and regulatory restrictions make storing their data in the cloud impossible.
These institutions simply cannot make copies of their data or move it to the cloud. The data that is inherent to the functionality of Salesforce may not necessarily be the concern, but often it must be presented to users side-by-side with ancillary data that must come from the company's backend, on-premise systems.
But all is not lost! Agile Integration Software (AIS) naturally solves this problem by creating federated views from multiple sources and making them available to any application, complete with end user access authentication. Here is the crux of the solution with Salesforce as the example:
1. Salesforce.com offers the capability of modifying the screens, so anyone who is conversant in doing that can modify a screen to populate the data from an external source. One option would be to configure it to call a web service when the screen is presented or refreshed.
2. Within a few minutes, an Agile Integration Software, such as Stone Bond's Enterprise Enabler Virtuoso, can be configured, generating metadata that virtualizes and aligns backend data with Salesforce data, and packages it as a web service compliant with Salesforce. Optionally, this would be a bi-directional (Read/write) connection.
3. When an end user brings up the Salesforce page, Salesforce calls the web service, and Enterprise Enabler Virtuoso accesses the on-premise data live, aligns it with the relevant Salesforce data, and sends it to Salesforce screen. With the bi-directional option, data can be entered or corrected on the screen to automatically update not only the Salesforce data, but also the on-premise data, assuming the end user has proper permissions to write back to those systems.
Companies have spent millions of dollars over the last few years trying to do this, and with the Agile Integration Software as the basis, Enterprise Enabler Virtuoso was configured in three weeks to incorporate this Salesforce connectivity. Now it is available off-the-shelf so that anyone can implement it in a few minutes or at most a day.
The diagrams below depict the data residence and flow where on-premise data is required in a Salesforce.com implementation. The first is the common solution where a copy of the on-premise data is made and resides on the Salesforce cloud. I don’t need to tell you the overhead and pervasive concern with doing this. The second shows the on-premise remaining on-premise, where it belongs, and AIS accessing, federating, and delivering a data view virtually to the Salesforce page.
http://tiny.cc/id5h0
Subscribe to:
Posts (Atom)

