I just recently realized that the problem of federated query optimization that my colleagues and I think about is a completely different problem from the one that has been so well addressed by academics and big database vendors. Even the more contemporary players in the federation and virtualization world don’t extend this concept across disparate sources, and they focus only on run-time speed, but not agility.
• What good to us is a query optimizer that assumes all of the
data sources are relational databases?
• And adding XML to the mix just doesn't "cut the mustard!
• What if, in order to use these tools, I have to construct a
universal data model that includes all of the data that could
possibly be in play? (The clunky antithesis of agility!)
• Do I have to anticipate every data query I might want to
• What if there is a lot of transformation that needs to be
performed along the way to make the data meaningful
across the sources?
For "pull" integration, where a user's browser interaction or a calling program triggers and specifies the data to be accessed, a SQL query is a universally comfortable way to access information. For a live query in virtual federation, that needs to be interpreted by the federating software into whatever the endpoints understand. The data flowing in from multiple connections needs to be synchronized as the query is being fulfilled from the disparate systems. A "push" integration typically is usually better known, with at least the sources pinned down ahead of time, and often with the exact data being sent each time.