Tuesday, April 16, 2013

The Domain, It’s Plain, Is Mainly Under Strain

Data Virtualization is a technology for ETL, EAI, as well as Business Intelligence

I’m enjoying talking with enterprise architects about Data Virtualization (DV) in their companies, how they are fitting it in their overall architecture and what their expectations are. What I’m seeing is the strong tendency toward thinking of DV as a tool for Business Intelligence and less as the basis for “convergence” of all silos of integration. The latter is the topic of conversations I had not too long ago with several Gartner analysts.

Typically data virtualization is considered only for “on-demand” uses. This makes sense if your fundamental premise is that DV is essentially the same having a database, which is never pro-active about delivering data. That means that you must apply technology on top of it to provide queries and act on them. Why not apply the same federation to other forms of data integration? On-demand or delivered physically to an endpoint, the approach eliminates staging databases and other overhead.

Enterprise Data Virtualization Domains

A few of the enterprise architects I have spoken with recently are very clear that agile data federation is one of the main reasons data virtualization is compelling. However, it is difficult – arguably impossible- to define an enterprise data model that is “everything to everybody.” DV brings the opportunity to think “agile,” and I just don’t remember any “single universal” approach to anything in IT that has ever succeeded for more than a few cool success stories about a handful of companies.

Just because a data model is virtual, that definitely does not mean that it is not brittle. In the beginning you may have a clean model, but the natural progression of additional and changing requirements will eventually yield the traditional issues of staging databases and data warehouses and lead to instability because of the intense level of interdependency across the model. That “extra mile” of custom programming required to make the data exactly relevant to each end use also carries risk and overhead to maintain.




Smaller domains for Data Virtualization tend to increase agility

Consider instead defining reusable rapidly-built, for-purpose federations that include appropriate data validation and conversion to exactly the requirements of the end usages, and that deliver to the endpoint either physically or on-demand as a service. From that perspective, it is not necessary to always define the universe as a single model, and it is not necessary to use different products and development approaches whether you want to use the same data federation to move data (ETL), to handle operational transactions (EAI) with a surrounding orchestration, or on-demand as a web service or other calling mode.

Perhaps “reusable rapidly-built, for-purpose” seems a bit oxymoronic. Think instead of the concept of Master Data, where each definition is constructed in a way that is reusable and is sanctioned for a specific context. One would never have a Master Data definition that is a complete enterprise data model. Instead, there would be useful subsets of that, such as Customer, which with Data Virtualization, is packaged with all the sources, transformation, alignment, and business rules needed to access it. That Customer can be called upon for any use; it can be incorporated in data workflows, ETL, on-demand, uploading to a database or application, dashboard, or queried for reporting.

Bite-sized chunks are always more palatable. While it’s possible to define a virtual enterprise model, queries against that model will almost always yield information that needs further modification to make sense with the objectives and in the context of the calling program. What does that mean? It means that additional data manipulation, even complex transform will most often be required. So, from the perspectives of agility, of ease of use and understanding, and consideration of execution risk, enterprise models should be undertaken with discretion.

For-purpose mapping and data conversion, as with Stone Bond's Enterprise Enabler®, ensures that the usage receives the data in exactly the form required, with the most appropriate performance and latency.



Friday, February 8, 2013

The Second Face of Data Virtualization

                                                                                                                                                 
The industry, the analysts, and the rest of us are trying to get a handle on the many faces of Data Virtualization (DV). As soon as we think we get it, another use case shows up. If we define DV as federating data “in flight” as opposed to putting it in a central database/warehouse/mart, then we are talking about streamlining EAI, ETL, ESB, SOA, and reporting (aka BI, BA, etc.), where data is either physically moved or served on-demand. The same multi-source federation is valuable for all these patterns, however DV has become known nearly exclusively as a support technology for Business Intelligence.

How Unfortunate!

While DV is valuable across these other patterns, I want to highlight the new architectural pattern taking hold as a second key application of the technology that is being referred to as Transactional Data Virtualization (TDV). This pattern would simply not be possible without TDV.

Transactional Data Virtualization

Most people associate DV exclusively with its “first face,” Business Intelligence. The second face may surprise you as you absorb the concept of taking DV from reporting to operations. Data is federated live directly from the sources, served up on-demand to applications or to end users in portals, and updated by the application or user, effecting a heretofore unparalleled agility and efficiency to your business.

I wrote a blog some time ago about turning dashboards into operational consoles, where TDV makes data actionable. KPI dashboards can be turned into interactive user interfaces where users can take action on the conclusions and decisions based on the KPIs. Our customers are streamlining their customer-facing applications and portals by using TDV to federate data from multiple backend systems and present it on web pages. Certain fields, for example an address or preferences, are editable. If the end user is authorized, and write-back is enabled, Enterprise Enabler® updates that information in the source systems, with transaction management. If the changed information needs to reside in more systems than the original source of record, it updates those systems at the same time. I’m sure you picked up that this means you don’t have to run a separate synchronization step, which means that latency in synchronizing can be reduced an irrelevance.

Data Federation Must Have a Transformation Engine

Data Virtualization brings tremendous value any time data is required that must be compiled with or put in the context of data from one or more other sources. In my experience, this happens close enough to “always” to round up. Unless there is a transformation engine involved that federates multiple disparate sources, the federation will require custom coding to align the data properly. That’s a strike against agility, which is not good, but with such an engine, DV enables fluid complex integrations of many flavors.

On the Horizon

Enterprises with forward-looking IT organizations are incorporating Data Virtualization into their Enterprise Architecture and are quickly reaping the ROI as they reduce tech debt. The days of heavy, complex, infrastructure are numbered as CIOs elect to eliminate the old obstinate and unyielding integration platforms to finally deliver true agility to the business.




Tuesday, January 15, 2013

No Transformation Engine? No Agility.



Transformation Engine
     It keeps coming back to data transformation. That was the very first challenge with respect to integration that intrigued me years ago, because if all data at its various sources were in the same units of measure, had the same names, didn't need to be run through a formula to be in the same numeric context, were spelled the same, etcetera, etcetera, etcetera, then all you would have to do is get it all together where you wanted it.  I believe that the need to transform and manipulate data remains the single most important impediment to speedy, streamlined data flow throughout an enterprise. 
    The emergence of transformation engines aligned with the early Extract/Transform/Load (ETL) integration architecture pattern years ago. Unfortunately, it generally is not considered an essential function of element of other patterns, which continues to astound me. Regardless of whether the incumbent architecture is EAI, ESB, ETL, B2B, or Data Virtualization, the same issues are present, but the transformation engine is often not part of the solution. That means that all that data transformation is done by one-off coding or scripting, sometimes augmented by limited-scope conversion utilities. It seems like an “unmentionable” topic: people turning their heads the other way and pretending that it’s not important. In fact, it’s every bit as big a deal for the other patterns as it is for ETL!
     Without a transformation Engine, it is impossible to streamline the logic that makes the data work meaningfully. Without it, all data transformation yields brittle break points, impeding the ability to adjust quickly to changes in business or technical requirements and generally slows development and execution time and promotes the attitude of “it works. Don’t touch it.”  

What should you look for in a transformation engine?
·         Well, first of all, look for a transformation engine! If there is one, then consider…
·         A single transformation must handle
1.   Many sources to one (no staging required)
2.   Multiple source types (databases, Cloud apps, electronic instruments, web services, etc.etc.)
3.   Lookups, alignment, complex data manipulation, filtering, etc.
4.  En route” data cleansing
·       Transformations must be completely metadata driven
·     All metadata should all be configured and modified in a single interface, without leaving to separate tools.

What are some typical characteristics of transformation engines that do not meet all these criteria?
·      XSLT engines, for example, operate only on data structured as XML. That means you must perform a separate transformation for each source that does not inherently handle its data in XML. Any transformation engine that requires incoming and /or output data to be in a particular format violates point 2 above.
o   Result: custom coding, or utilities that must be executed to handle the conversion at both ends of the transformation. More hand coding , and in the end to manually coded transformations
·   Classic ETL engines perform only one-to-one transformations, violating point 1 above. The only way to use these engines in a pattern that requires alignment across multiple sources is to stage the data physically or virtually.
o   Result: Development time includes designing the data model to align the sources, building the model, and implementing full transformations from each source to the model, and then from the model to the destination(s).
·    Many integration products have some amount of data conversion utilities built in.
o   Caution: These are always limited in scope and require leaving the environment to use a scripting or programming language to implement complex data manipulation.

     Just remember that without a transformation engine, you are looking at plenty of overhead and most likely the inability to merge, transform move data in real time from multiple sources. Not only will run-time performance be impacted, but the use of coding inevitably means dramatic reduction of agility in development and ability to adjust to changing requirements. 

     Now let’s take a brief look at data transformation in a data virtualization pattern. The typical picture of the concept of data virtualization is something like this:

     Each arrow represents the logic to transform the data from how it is in the particular source to how it needs to be represented in the virtual model. If there is no transformation engine, this logic will require a considerable amount of manual coding, although I concede that there may be a very small subset of the useful manifestations of this pattern that are simple enough to configure the data manipulation without any custom coding or transformation engine.  However, if you drop in the full transformation capability onto each arrow, you have a powerful, agile implementation that can be developed quickly and modified in seconds.

     A variation on this pattern brings the ability to write back to the sources: what we call bi-directional data virtualization. But that’s for another day. 
     You may want to check out Stone Bond Technologies’ Enterprise Enabler® if you want to see a product that has a transformation engine that meets all the mentioned criteria.




Friday, November 2, 2012

Tech Debt Out-of-the-Box ... "And all the Ills of Integration-kind were Unleashed"

We often talk about having cool capabilities “out of the box,” which is a good thing. That means that you don’t have to do anything but a quick install and you can start using the feature. That is, unless you are talking about Legacy Integration Software (LIS), in which case, when you first “opened the box,” it began spewing Tech Debt before anything else happened. All the ills of integration-kind were unleashed. Years later, you are still prisoner to your Pandora’s Box.

You launch a new project using Legacy Integration Software. First open Pandora’s Box:
        1. Install new instance and all related tools
        2. Apply 64 patches; you can implement the work-arounds in a few months when you start actually developing the integrations.
        3. Send team to a few weeks of Legacy Integration University
        4. Better hire a few consultants, too.
Tech Debt abounds already!

Below is an actual post on a recent Integration Consortium’s LinkedIn Group discussion. The topic has to do with updating customer communications to a standard XML format as opposed to legacy file ftps. The Legacy Integration Software limits the options. Adding XML to the mix means that the LIS needs additional work.

**************
"We already have in house Informatica footprint. So we plan to use that for generating all outbound files. Here is the concern.
We have two options for generating these files.
1) DB -> Informatica -> Standard XML -> XSLT -> Custom File
2) DB -> Informatica -> Custom File
First option provides the benefit of standardization/canonical information model, long term migration path of custom files to standardized xml, and less development since only one Informatica process is required and all custom format are through XSLT.
Problem we see with this approach is the potential performance issue; outbound XML file size in some cases is more than 1GB due to XML tags while the corresponding custom file is a 50MB or so. Second issue is an additional hop that makes support/troubleshooting activities a little harder i.e. where/why a file generation process failed."
*******************

No one should have to think about these things. Agile Integration Software (AIS) like Stone Bond’s Enterprise Enabler would require only a single process for this solution. The differences in the mapping and destination format required would be handled by passing the customer ID, which determines either which map to run or passes variables directly into the transformation engine at run-time to modify the actions. The same process can alternatively step through a standard XML, although the value of doing that escapes me. Performance would not be an issue, and troubleshooting through the streamlined solution is simplified. Stone Bond customers implement such B2B transactions using DBAs as opposed to specially-skilled programmers.

Why, in the twenty-first century, do you have to jump through hoops to get data wherever you want it whenever you want it? Here we are, musing over the “leading edge” Big Data hype and allocating millions of dollars for pilot projects next year, when we can’t even get clean, quick, agile data exchange with our customers and business partners. Does that make sense? I don’t think so. Isn’t it time to embrace twenty-first century technology and start eliminating the Tech Debt you have accumulated instead of continuing on a path that parallels the national debt?

Agile Integration is easy to try out. You do owe it to your shareholders.



Sunday, October 7, 2012

2 Keys to Identifying Agile Integration Software

For years I’ve been talking about Agile Integration Software (AIS) as a class of products that enable very flexible and agile data flows across the enterprise, eliminating the clunky, expensive, time consuming infrastructure of the past. This paradigm is agnostic to integration patterns and shares metadata for all uses, such as Data Virtualization (DV), ETL, EAI, SOA, etc.

Over these years, I have come to understand that AIS is not a class of products. As it turns out, Enterprise Enabler® is the only product that has all of the characteristics necessary to fulfill the AIS vision in scope and flexibility. So much for a product-agnostic concept-oriented blog! Many of the features can be found in other products, but somehow there is always at least one critical feature missing that negates the possibility of agility for the solution.

As I think about what constitutes agility in this space, a few things come to mind that are imperatives in such a platform. The two most telling indicators lead the list:
  1. The product must have a transformation engine that aligns and transforms data a) from multiple disparate sources at once and b) in their native formats. Without this, complex integrations get little assistance from the platform itself, but are accomplished by extensive custom coding. A streamlined, high-performance data virtualization solution is impossible. Writing back to the sources becomes cumbersome at best, and live, real-time end user interaction with the endpoint simply cannot be effective. Older one-to-one transformation engines do not satisfy the needs; XSLT transformation engines also do not meet the two criteria, because all data must be converted to XML before it is transformed, and the XML output must then be converted to the destination format. Each of those conversions: to and from XML are effectively additional full transformations that generally must be accomplished with custom coding.
  2.  There must be a single Integrated Development Environment (IDE) that crosses the entire scope of functionality for all integration patterns. This IDE must incorporate the run-time engines in order to be able to design, develop, test, deploy, and monitor in the same environment. Leaving the environment for anything dramatically reduces the speed of implementation and of change, which is essential for agility, time to value, and minimizing tech debt.
Some things are only possible with AIS
The most recent feature that has been touted by all is in the Data Virtualization space, where data from multiple sources is brought together, cleaned, aligned and transformed without actually moving, staging, or copying the data anywhere, and then delivering it virtually as a “view” to an application or dashboard on demand, again without ever moving it. The most powerful functionality that simply cannot be done successfully with other technologies is data virtualization with write-back to the sources. Enterprise Enabler automatically generates data virtualization, including the ability to write back securely to the sources. These integrations are published for consumption in multiple formats for consumption, such as web services, ADO.Net driver, SharePoint External List, and others.

                          (Click picture to see full chart)

Bottom line value of Enterprise Enabler (AKA Agile Integration Software)
  • Time to Value is reduced by up to 90%
  • Tech debt, the cost of maintenance and change over time is similarly reduced
  • No need for expensive, specialized skill sets particular to a specific tool
  • Streamlined architecture inherently enables high performance
  • Lower security risks: with Data Virtualization data remains securely in the sources of record, without copies being made
  • With write-back to sources in Data Virtualization patterns, dashboards become interactive consoles instead of simply reporting tools
  • Without copies being made, a multitude of application specific databases become unnecessary, and synchronization activity is reduced
  • Latency issues are removed since applications and end users have always the most current (“live”) information.
  • Maintenance and change over time is no longer an overwhelming problem.
 We have observed that while Enterprise Enabler automatically generates bidirectional services, competing product companies tend to proliferate bi-directional lip service.

Monday, September 17, 2012

The Virtual Cycle: Bi-Directional Data Virtualization


With all the buzz about data virtualization (DV), it surprises me how little bi-directional data virtualization is discussed.  Without the ability to write back to sources, the use of DV is limited to Business Intelligence, and other reporting.  Of course, it’s hugely important for that, but when you add write-back to the sources, you are opening up a whole new world of possibilities for a new dimension of interaction with data. 

Suddenly all those dashboards become consoles where business operations can be performed, with end users not just viewing data, but correcting it, updating it, and taking action on decisions. Any application can leverage bi-directional DV to access federated data and to write back to the sources without having to know where it came from, treating it as a single entity. This capability goes a long way to reducing the time to value of many IT initiatives beyond reporting and analytics.

For those skeptics who are not already familiar with bi-directional data virtualization, the first questions are typically, “How do you handle the security to make sure users only write back when they have permission to do so?” and “What happens if  there’s a failure writing back to one of multiple sources?” 

The short answers are that end user security is handled for full CRUD capabilities using SSS or other models, and transaction rollback is managed using two phase commit or other modes.

Now, we can move on to the cool things you can do with this.  Suppose your training is frustrating and time consuming for new employees to learn how to navigate and use multiple systems that are necessary for them to handle their responsibilities. They need to log in to SAP, then the CRM, and then a scheduling system, plus a spreadsheet, all just to perform one task. You could build a browser based app that presents the relevant data from all these systems in one screen, aligned and meaningfully presented. This is the standard data virtualization, which is essentially a reporting tool. Now, turn on write capability for appropriate fields, and voila! That browser screen is a full-service, role-based application, interacting directly with backend systems and data stores just as if you were logged in to all of them. This, my friends, is the virtuous virtual cycle of bi-directional data virtualization.

Using an Agile Integration Software like Enterprise EnablerÒ you can leverage all the federated data services not just for BI, but also for Business Operations. These light weight rapidly deployed nuggets enable this third generation of data virtualization to make agile business a reality.


Tuesday, September 4, 2012

Agile Integration: Foundation for All That Hype



By definition, Agile Integration Software (AIS) is charged with accommodating all data sources, standards, etc. and moving data agilely throughout the enterprise, adjusting to changes over time. As it does, it captures information about all of the participating endpoints and modalities of data movement. The more an AIS like Enterprise EnablerÒ is used, the more it learns and the more metadata it maintains about the information that is involved in the company’s activities.  Perhaps it’s time to think about AIS as a central core of actionable metadata and a useful engine that can be leveraged for use in new initiatives as they are defined, as opposed to continuing to use AIS to adapt to whatever the new initiative independently demands.  

Consider Master Data Management (MDM) for example. Like all other hype waves, the need is there, and the initiative is fueled by the analysts in concert with a surge of new tech companies as well as the Big Players.  Your company decides it better get on the bandwagon; MDM is imperative to maintaining competitive advantage.  One of the top project architects is charged with establishing MDM, so he studies, attends symposia, and learns what it’s all about.

Next comes a technology selection phase, looking at the emerging companies, but mostly looking at the Big Guys, since that’s always safe, and besides the architect already knows the “Rep” really well. It seems the Big Guy has just bought one of the up-and-coming MDM companies, so they are off to the races.  After a couple of years, the architects begin contemplating questions like, “How does this MDM solution relate to the last decade’s SOA path?”   Of course, at this point it’s obvious that SOA is pretty closely related, or could be, had the two initiatives not each been addressed in its own blissful vacuum.  How does the MDM metadata relate to SOA, and how does it relate to the metadata that your multiple integration platforms require?  Some, with wishful thinking, fall for the Big Guys’ claims of interoperability across all the products of the companies they bought.  In the Quadrant or Wave, they cover all the requisite features, but Alas, poor Yorick! Those mighty features are compartmentalized and each discipline is a separate product with separate underpinnings unable to work cleanly together.  


Let’s look at this from a different perspective. With Agile Integration Software comes a complete flexible integrated metadata stack for use and reusability across all the historic and forward-looking integration schema and models.  Instead of integration adapting to the stand-alone fragmented hype solutions, leverage the power of an existing AIS platform  that brings together the disciplines of Data and Application Integration, Application Development, as well as all the special initiatives of SOA, MDM, B2B, Middleware, Virtualization, Federation, Cloud, Change Management and Big Data, all leveraging the AIS integrated metadata stack.

That means eliminating lots of steps to accomplish any task.  With visibility via the metadata stack, a Master Data definition can be combined with all the associated sanctioned sources and all the related business rules and security. Auto-generated bi-directional web services can handle security and rollback to federated sources. SOA is just another mechanism to make an integration or process available for consumption. Different Data Masters and integrations are chosen at run time based on the current state of any other activity throughout the system.

And finally, AIS monitors for change throughout the metadata stack, validating against the actual endpoints and determining the potential impact and remediating and reconciling as necessary across all those hype wave initiatives. This means stability with agility in your IT infrastructure.