Data Discovery vs Service Discovery

bigdataWith huge amounts of data being created and stored across multiple systems in different formats across the enterprise, Big Data has become a big issue for many organizations. The demand for real-time (or near real-time) information from multiple systems of record is forcing companies to rethink their current data strategy by placing mission critical information at the center of the IT portfolio. One of the biggest challenges that organizations face with this new data strategy is the ability to find and discovery the relevant data. Many organizations go down the service-oriented architecture (SOA) path to create data services to address the integration issues that exist from using disparate tools and technologies for each system. This is a great start to solving the problem, but this is also where most organizations struggle to effectively leverage their SOA investment to facilitate data discovery from the data services.

Most organizations start with service discovery hoping that leads to data discovery. Organizations can publish information about their services in Universal Description Discovery and Integration (UDDI) registries. Developers can issue runtime queries against the registry to find services that might support what they are looking for.  In most cases, the entry in the UDDI registry will have binding information to help a developer know how to access the service either at design time (static binding) or even at runtime (dynamic binding). The biggest driver here is the lack of adoption of UDDI in organizations, and the lack of discovery capability for actual data in the inquiry APIs.  The UDDI API allows for queries to find a service by name, or to search across the tModels.  Assuming your organization has mapped your data model to a tModel, you might be able to find a service that can return the type of data you want. This is not very useful if you are trying to find actual data.  The inability to search across time, location and context are big inhibitors of why UDDI has not seen wider adoption across the enterprise, and it is the wrong tool for data discovery. Organizations should first find the data, and once they find the data they can locate the service that can access the data.

So how does an organization first find the data?  The answer is pretty simple, use a metadata catalog (MDC). A MDC not only describes the enterprise information and where to find it, but also defines how it is secured and automatically managed. It holds the policies and parameters, allowing the provisioning of the data services previously discussed. The MDC aggregates multiple sources of varying information into a consistent, well defined view through standard meta data definitions. It supports the ability to discover data by time, location and context. Organizations that get to a standard metadata format will have the ability to describe the various data formats that exist within the enterprise and provide a ‘link’ to a service definition that allows them to access that specific piece of data. Metadata catalogs are an important foundational component of any organization that is dealing with big data and data services. Once organizations figure this out, they will experience higher returns on investment of their data, as well as increased agility across the enterprise.