Java Technology for Business Intelligence by Chris Trayhorn, Publisher of mThink Blue Book, November 15, 2000 Figure 1 Typical infrastructure required for developing and deploying Business Intelligence applications See larger image Business intelligence makes the enterprise “smart.” Although not new in and of itself, business intelligence can be seen as the process of transforming data into information and ultimately into knowledge that is valuable to the corporation. Applications such as data warehousing, data mining, enterprise information portals (EIPs), and knowledge management systems (which can all comprise a business intelligence solution) can provide insight into customer retention, purchasing patterns, and even future behavior. They can also consolidate the presentation of and access to data stored throughout the company. These applications can not only tell you what has happened but why and what may happen given certain business conditions – allowing for exploration of “what if” scenarios. Business intelligence touches on every aspect of IT, such as enterprise resource planning, supply chain, and Customer Relationship Management. By improving their ability tocollect, interpret, and act on their information assets, companies realize more-efficient operations and decision-making, fueling top-line growth and rewarding shareholders. It is universally acknowledged that information is a valuable asset and competitive advantage, especially in e-business. However, extracting potentially valuable information from the massive volumes of data collected by operational systems is the biggest challenge most companies face in developing business intelligence systems. The priorities for business intelligence system designers and applications developers are interoperability, scalability, and adaptability, but traditional IT practices have focused on corporate or departmental solutions with their own internal standards for data interchange, system access, and security. In order to bridge this gap and allow for the creation of trading exchanges and robust, semantically rich business intelligence and data warehousing applications that capitalize on lucrative new business models such as business-to-business e-commerce, participants need to utilize standards-based development and computing models. These models act as a blueprint for new applications and systems development as well as a common set of standards and interfaces through which applications can interact and explore and exchange information. The Data Interoperability Challenge Demands on corporate data warehouses have been steadily accelerating for the past several years, as businesses generate and collect more and more information over the Web. As a result of this information growth, people at all levels inside the enterprise – as well as suppliers, customers, and others in the value chain – are clamoring for subsets of the vast stores of information – such as billing, shipping, and inventory information – that can benefit them. Collecting and storing vast amounts of data is one thing; utilizing and deploying that data throughout the organization is another. The technical challenges inherent in integrating disparate data formats, platforms, and applications are significant. However, emerging standards such as the application programming interfaces (APIs) that comprise the Java platform, as well as XML technologies, can facilitate the interchange of data and the development of next-generation data warehousing and business intelligence applications. Java technology has been used extensively for client-side access and in the presentation layer, and it is emerging as a significant force for developing scalable, mission-critical server-side programs. The Java 2 Platform, Enterprise Edition (J2EE) provides the object, transaction, and security support for building robust, adaptable enterprise-class systems. Incompatible Metadata One of the key problems limiting data interoperability that business intelligence developers must solve is incompatible metadata formats. Metadata can be defined as information about data or simply “data about data.” In practice, metadata is what most tools, databases, applications, and other information processes use to define, relate, and manipulate data objects within their own environments. It defines the structure and meaning of data objects managed by an application, so that the application knows how to process requests or jobs involving those data objects. The problem is that most applications define metadata differently, using different programming structures, syntaxes, and semantics as well as storing metadata in different data management systems with different file formats. An example of metadata is the schema or model that database programmers create that defines the tables, fields in a table, and table relationships in a database. The database management system uses this metadata to determine which tables and rows to access in response to an end user transaction or query. Developers can use this schema to create views for users. Also, users can browse the schema to better understand the structure and function of the database tables before launching a query. Data warehousing and business intelligence developers in particular are familiar with the problems incompatible metadata formats cause. A typical data warehousing application requires the integration of many different types of tools for the extraction and transformation of data, often from different operating systems and software applications. Data must then be transported in stages to the data warehouse, where it is merged with data collected from other sources – each with its own set of metadata. Query, reporting, and analysis tools likewise need to maintain common metadata to ensure that the data views maintained by the tools are synchronized with the associated database schemas. Without a common model for creating metadata, developers must hard-wire discrete interfaces between applications to allow for the exchange and synchronization of data. The high cost of developing such a system, in terms of development and maintenance, can be prohibitive. Companies have been limited in developing solutions that require the exchange of data from multiple, heterogeneous applications. To address the metadata issue, a group of companies – including Hyperion, IBM, Inline Software, Oracle, SAS Institute, Sun, and Unisys – have joined forces to develop the Java Metadata Interface (JMI) API, which permits the access and manipulation of metadata in Java with standard metadata services. JMI is based on the Meta Object Facility (MOF) specification from the Object Management Group (OMG). The MOF provides a model and a set of interface definition language (IDL) interfaces for the creation, storage, access, and interchange of metadata and metamodels (higher-level abstractions of metadata). Metamodel and metadata interchange is done via XML and uses the XML Metadata Interchange (XMI) specification, also from the OMG. JMI defines a Java mapping of the MOF IDL interfaces as well as the contracts necessary to connect to a metadata repository (see “J2EE Connector Architecture,” later in this white paper). The goal is to overcome the limitations caused by proprietary systems’ use of different and incompatible semantics, structures, and syntax for metadata. The lack of metadata interoperability prevents the sharing of data between applications and has limited the development of robust BI systems. JMI is part of a larger strategy of utilizing Java technology to create an end-to-end data warehousing and business intelligence solutions framework. Through the Java Community Process, industry experts are extending the functionality of J2EE in new areas relevant to data warehousing and business intelligence independent software vendors and users. Another specification in the works is the Java OLAP (JOLAP) API, which will provide Java-based access to OLAP servers and multidimensional databases. Business Intelligence and J2EE Metadata management is just one aspect of creating a successful business intelligence solution. As applications become more Web-centric and integrated with operational systems such as ERP and CRM, it is important that new applications be developed and deployed with a scalable, robust, and secure development and deployment framework. J2EE was specifically designed to meet the rigorous needs of enterprise computing as well as those for data interchange and interoperability. Although the benefits of J2EE for building enterprise applications are many, of specific interest to data warehousing and business intelligence developers are scalability, multitier support, platform independence, and security: Scalability Data warehousing and business intelligence applications typically involve ad hoc combinations and transformations of large amounts of data, making scalability of the underlying system critical. As the number of users increases, J2EE can reliably manage millions of transactions during major Web surges. Multitier Support Multitier architectures are composed of tiers of application logic separated from the data tier and the client user interface. Multitier architectures bring high levels of scalability and reliability to Web applications, in that unpredictable demand levels and changes in application code will not require rewriting of the entire application. Platform Independence Java Virtual Machines are available on a wide range of computing platforms, from handheld PDAs to servers to mainframes. As users’ expectations for information to facilitate decision making increase, there is a greater need to distribute information on a variety of devices and platforms. Security Java has been designed from the ground up with security as a central feature. The security architecture of J2EE defines simple, flexible relationships between protected resources, the roles that have access to those resources, components, and users. A key technology of J2EE is Enterprise JavaBeans (EJB), an architecture for the development of component-based distributed business applications. Applications written with the EJB architecture are scalable, transactional, secure, and multiuser-aware. These applications may be written once and then deployed on any server platform that supports J2EE. The EJB architecture makes writing components easy for developers, who do not need to understand or deal with complex, system-level details such as thread management, resource pooling, and transaction and security management. These issues are all taken care of by the EJB server, allowing developers to focus on writing business logic. Applications are then composed by combination of EJB components, sometimes supplied by different vendors, into modules that can be deployed, managed, and executed in any compliant J2EE implementation. This allows for role-based development, in which component assemblers, platform providers, and application assemblers can focus on their area of responsibility, further simplifying application development. J2EE Connector Architecture Although accessing data stored in relational databases is a relatively trivial matter with the JDBC API, most applications will require access to larger amounts of data stored in back-office applications as well as legacy computing environments. J2EE defines the Connector Architecture, which allows access to data within the Java environment by defining a set of contracts that need to be fulfilled between the back-end system and the J2EE platform to support security, transactions, and resource management. The connector acts as an interface between the J2EE platform and the targeted data source, which allows for transparent connectivity between these two systems. This simplifies the integration of operational systems, data warehouses, and mainframe-based systems, because only one connector needs to be provided for any single back-end data source. Connectors can be built that access metadata repositories, either locally or remotely on other platforms. Utilizing the connector architecture to access a metadata repository and using JMI to manipulate the metamodels and metadata stored in that repository enhance interoperability between applications, tools, services, and disparate data sources. Java Technology for Business Intelligence As we have seen, the J2EE platform provides key benefits for building data warehousing and business intelligence applications, tools, and services by providing a solid architectural framework that simplifies complex development and shortens product time to market. By leveraging the J2EE platform, organizations can take advantage of the scalability, multitier architecture support, and security of Java, which has become the de facto industry standard for building transactional Web-based applications. The support of industry-leading companies in developing extensions to J2EE for the data warehousing and business intelligence marketplace makes J2EE a compelling platform for deployment of such applications. Filed under: White Papers Tagged under: Utilities About the Author Chris Trayhorn, Publisher of mThink Blue Book Chris Trayhorn is the Chairman of the Performance Marketing Industry Blue Ribbon Panel and the CEO of mThink.com, a leading online and content marketing agency. He has founded four successful marketing companies in London and San Francisco in the last 15 years, and is currently the founder and publisher of Revenue+Performance magazine, the magazine of the performance marketing industry since 2002.