
| What
are XML? |
Presented
by: |

Copyright 2000� |
|
|
|
|
PORTABLE DATA / PORTABLE CODE:
XML & JAVATM TECHNOLOGIES
Prepared for Sun Microsystems, Inc. by:
JP Morgenthal
Director of Research, NC.Focus
(516) 792-0997
FAX: (516) 792-0996
Table of Contents
Executive Summary
Origins of the XML Standard
Using XML
Synergy of XML & Java Technologies
Portable Data and Code For the Enterprise
Electronic
Data Exchange and E-Commerce
Electronic Data
Interchange (EDI)
Enterprise
Application Integration (EAI)
Publishing
Software
Development
Sun, XML Technology, and the Java Platform
Java
Platform Standard Extension for XML Technology
XML
Technology Makes Sense for the Java Platform
Conclusions
Appendix A: Resources
Appendix B: About NC.Focus
Prior to 1998, the exchange of data and
documents was limited to proprietary or loosely defined document
formats. But the advent of Hypertext Markup Language (HTML)--the
presentation markup language for displaying interactive data in a Web
browser--offered the enterprise a standard format for exchange with a
focus on interactive visual content. However, HTML is rigidly defined
and cannot support all enterprise data types, and those shortcomings
provided the impetus to create the Extensible Markup Language (XML). The
XML standard allows the enterprise to define its own markup languages
with emphasis on specific tasks, such as electronic commerce,
supply-chain integration, data management, and publishing.
For those reasons, XML is rapidly
becoming the strategic instrument for defining corporate data across a
number of application domains. The properties of XML markup make it
suitable for representing data, concepts, and contexts in an open,
platform-, vendor-, and language-neutral manner. It uses
tags--identifiers that signal the start and end of a related block of
data--to create a hierarchy of related data components called elements.
In turn, this hierarchy of elements provides context--implied meaning
based on location--and encapsulation. As a result there is a greater
opportunity to reuse this data outside of the application and data
sources from which it was derived.
XML technology has already been
successfully used to furnish solutions for mission-critical data
exchange, publishing, and software development. Additionally, XML has
become the incentive for groups of companies within a specific industry
to work together to define industry-specific markup languages (sometimes
referred to as vocabularies). These initiatives create a foundation for
information sharing and exchange across an entire domain rather than on
a one-to-one basis.
Sun Microsystems, along with other major
vendors, such as IBM, Novell, Oracle, and even Microsoft, are strong
supporters of the XML standard. Indeed, Sun Microsystems coordinated and
underwrote the World Wide Web Consortium (W3C) working group that
delivered the XML specification. Sun is also the creator of the JavaTM
platform--a family of specifications that form a ubiquitous application
development and runtime environment. It is now Sun Microsystems'
intention to ensure that XML technology and the Java platform join in a
way that is complementary to both.
XML and Java technologies have many
complementary features, and when used in combination they enable a
powerful platform for sharing and processing of data and documents.
While XML can clearly define data and documents in an open and neutral
manner, there is still a need to develop applications that can process
it. The Java platform offers a homogeneous computing environment with
portable code that can be downloaded over a network to any Java virtual
machine. Together, XML and Java technologies allow enterprises to apply
Write Once, Run AnywhereTM fundamentals to the processing of
data and documents generated by both Java technology and non-Java
technology sources. By extending the Java platform standards to include
XML technology, companies will obtain a long-term secure solution for
including support for XML technologies in their applications written in
the Java programming language.
Introduction
The purpose of this paper is two-fold: To
introduce the Extensible Markup Language (XML), as well as how it
benefits the enterprise, and to explain the cooperative environment
formed by integrating XML and Java technologies into a solution. Readers
familiar with XML may want to concentrate on the sections that
specifically deal with the discussion of using XML with Java technology.
The Extensible Markup Language (XML) is
syntax for developing specialized markup languages, which adds
identifiers, or tags, to certain characters, words, or phrases within a
document so that they may be recognized and acted upon during future
processing. "Marking up" a document or data results in the
formation of a hierarchical container that is platform-, language-, and
vendor-independent and separates the content from any environment that
may process it.
Because XML is a recommendation of the
W3C (World Wide Web Consortium), the group responsible for creating and
maintaining all core Web technical specifications, it reflects a true
industry accord that provides the first real opportunity to liberate the
business intelligence that is trapped within disparate data sources
found in the enterprise. XML does this by providing a format that can
represent structured and unstructured data, along with rich descriptive
delimiters, in a single atomic unit. In other words, XML can represent
data found in common data sources, such as databases and applications,
but also in non-traditional data sources, such as word processing
documents and spreadsheets. Previously, non-traditional data sources
were constrained by proprietary data formats and hardware and operating
system platform differences.
W3C has released and maintains the
Extensible Markup Language 1.0 as the official specification that
defines the rationale behind the development of XML and the rules for
processing XML-formatted data and documents (see Appendix A: Resources
for associated Web links).
The process of making SGML simpler and
Internet-aware gave rise to the XML specification. This section explains
the process that led to the development and adoption of XML technology
as a W3C Recommendation.
The first standardized markup language,
SGML (Standard Generalized Markup Language), is still a heavily used
international standard maintained by the ISO (International Standards
Organization). SGML gave the publishing industry a machine- and
process-independent method of separating content from presentation. In
publishing, the presentation is usually a form of printed medium and the
machines that support those objectives. SGML simply lets authors define
the characteristics of the print version without requiring them to
include machine-specific codes.
To date, HTML (Hypertext Markup Language)
is the most popular application of SGML. It acts as the presentation
syntax that Web browsers use to render documents visually. Clearly, the
Web is one of today's most powerful communication vehicles, illustrating
the importance of authoring in a markup language. However, HTML is too
specific to represent information generically, and SGML is too
overbearing to use in tandem with the Web, therefore the XML language
emerged as a simpler, generalized markup for the Web.
The distinction between SGML and HTML
spurred the development of the XML specification. Jon Bosak, an engineer
at Sun Microsystems and generally regarded as the "father of XML",
realized the limitations of HTML early on. Bosak had used SGML
extensively for managing technical documentation on behalf of large
vendors-first Novell and then Sun Microsystems. This experience led
Bosak to drive higher the expectations for publishing on the Web and
demand nothing less powerful than SGML as the delivery tool.
Bosak's persistence prompted the W3C to
recognize SGML and its associated style sheet language, DSSSL. He was
also offered the opportunity to lead the Web SGML Activity (later
renamed XML). Part of Bosak's responsibility was to obtain funding for
the W3C Activity and build the team to design the specification. Bosak
did both; Sun Microsystems underwrote the effort and a number of SGML
experts participated in the development of the specification.
XML technology enables companies to
develop application-specific languages that better describe their
business data. This section provides a brief overview of what it means
to use XML data and what a XML document looks like.
By applying XML technology, one is
essentially creating a new markup language. For example, an application
of the XML language would produce the likes of an Invoice Markup
Language or a Book Layout Markup Language. Each markup language should
be specific to its creator's individual needs and goals.
Part of creating a markup language
includes defining the elements, attributes, and rules for their use. In
the XML language, this information is stored inside of a document type
definition (DTD). DTDs may be included within XML documents or the DTD
can be external to it. If the DTD is stored externally then the XML
document must provide a reference to the DTD. If a document does provide
a DTD and the document adheres to the rules specified in the DTD then it
is considered valid.
The following is an example of a Document
Type Definition that defines an element named BILLING_PARTY along with
both its required and optional sub-elements.

The example states that the element
BILLING_PARTY must have one sub-element named ACCOUNT_NUMBER directly
following it and optionally may be followed by any of the contact
information fields.
Of note, it is not a requirement that a
DTD be provided. Documents without DTDs that follow the rules of the XML
specification are designated as well formed, but not valid. An XML
parser can identify whether a document is well formed and valid.
The following is an example of a
well-formed XML document:

This example illustrates how XML can
provide developers with the ability to define application-specific tags,
such as <INVOICE> and <BILLING_PARTY>. But, it is the
resulting markup language that gives the enterprise the power to
leverage and reuse this invoice description across many applications.
For example, the invoice document could
be rendered into HTML and displayed to the user in a Web shopping
scenario, it could be delivered to a Point-of-Sale (POS) terminal in a
store to be rendered into a receipt, and it could be sent to the
back-office where it would be used to update the accounting and
inventory systems. Additionally, this XML document could be generated by
an existing sales application, illustrating how the output of one system
can be used as input to another and thereby providing a simple means by
which application integration can occur.
This example also illustrates how XML
technology supports semi-structured data. First, it provides
encapsulation, which tells us where data starts and stops with regard to
a single element. Second, it provides context; <price> inside of
<item> tells us that the price relates to that single item.
Finally, the XML language provides meta-information, such as currency on
price. By using the power of attributes to represent currency, this same
format can be leveraged across the globe for multiple monetary concerns.
Best of all, this format is extensible, which means that any one company
could extend it to support data that is specific to its needs.
The Java platform's portable code
capability has been invaluable in fostering a collaborative environment.
For example, the XML-Dev mailing list used Java technology as the basis
for a collaborative project called SAX (Simple API for XML). SAX is a
Java technology interface that allows applications to integrate with any
XML parser to receive notification of parsing events. Every major Java
technology-based parser available now supports this interface. It was
developed by a group of individuals participating in the mailing list
who leveraged Java platform's portability to speed development and share
ideas.
Without Java technology, the SAX
developers would have had a much more difficult time building this
interface. First, they would have been required to share portable C or
C++ code; a very difficult thing to create. Secondly, all of the SAX
creators would have needed a C or C++ compiler for their platforms,
which requires them to build and debug their own versions of the SAX
implementation; a time consuming task at best. Instead, the participants
needed only to download a widely available version of the Java
Development Kit (JDKTM) and a Java technology-based parser
that supported the SAX interface.
Here are some other key synergies that
the Java platform shares with the XML standard:
- The Java platform intrinsically
supports the Unicode standard, making child's play of processing an
international XML document. For platforms without native Unicode
support, the application must implement its own handling of Unicode
characters, which adds complexity to the overall solution.
- The Java technology binding to the W3C
Document Object Model (DOM) provides developers with a highly
productive environment for processing and querying XML documents.
The Java platform can become a ubiquitous runtime environment for
processing XML documents.
- The Java platform's intrinsic support
of the object-oriented programming means that developers can build
applications by creating hierarchies of Java objects. Similarly, the
XML specification offers a hierarchical representation of data.
Because the Java platform and XML content share this common
underlying feature, they are extremely compatible for representing
each other's structures.
- Applications written in the Java
programming language that process XML can be reused on any tier in a
multi-tiered client/server environment, offering an added level of
reuse for XML documents. The same cannot be said of scripting
environments or platform-specific binary executables.
When using XML and Java technologies
together, there is a greater interoperability formed with other
applications both inside and outside of the Enterprise. This section
provides some examples of business imperatives that can leverage XML and
Java technologies simultaneously.
For those just beginning to explore XML,
it is not uncommon to feel that the language is being pitched for every
IT ailment. The reason for this is clear: XML delivers interoperability
of data across applications and hardware. In today's mostly
heterogeneous computing environments, interoperability is still the
biggest problem. By virtue of the support it has garnered from the
largest vendors in the world, such as IBM, Oracle, Sun Microsystems,
Microsoft, and SAP, XML delivers like no other computing initiative
since ASCII. The result of vendor support is immediate data
interoperability with perhaps one small requirement of adherence to a
selected vocabulary.
The following is a brief categorical
breakdown of the types of tasks that have become less complex thanks to
the XML standard, and, where applicable, a description of how XML and
Java technologies simplify these tasks.
Electronic Data Exchange and E-Commerce
Processing data from other departments
and/or enterprises should be a simple task given the industry's vast
knowledge of communications, networking, and data processing, but
unfortunately, that's not the case. Validating data format and ensuring
content correctness are still major hurdles to achieving simple,
automated exchanges of data. Using XML technology as the format for data
exchange may quickly remedy most of this problem for the following
reasons:
- Electronic data exchange of
non-standard data formats requires developers to build proprietary
parsers for each data format. XML technology eliminates this
requirement by using a standard XML parser.
- An XML parser can immediately provide
some content validation by ensuring that all the required fields are
provided and are in the right order. This function, however,
requires the availability of a DTD. Additional content validation is
possible by developing applications using the W3C Document Object
Model--an application programming interface that facilitates
exploration of XML documents--that apply field validation rules to
content by element.
Additionally, content and format
validation can be completed outside of the processing application and
perhaps even on a different machine. The effect of this approach is
two-fold: It reduces the resources used on the processing machine and
speeds up the processing application's overall throughput since the it
does not need to first validate the data. Secondly, the approach offers
companies the opportunity to accept or deny the data at time of receipt
instead of requiring them to handle exceptions during processing.
When XML markup is combined with Java
technology it becomes significantly easier to build electronic data
exchange applications for a couple of reasons. First, the Java platform
is Internet-enabled, which immediately facilitates connectivity over
TCP/IP between the exchanging parties. As a result, these parties can
use the Internet as an exchange transport. Moreover, technically
sophisticated enterprises can provide the tools and technologies to help
less sophisticated ones participate in electronic data exchange.
In addition, both XML and the Java
platform intrinsically support Unicode character sets so both
environments enable enterprises to support development of
internationalized applications. Using the Unicode standard, applications
can represent characters in multiple national languages. With XML markup
as the format for data exchange and an internationalized application
written in the Java language for processing, XML documents can be
exchanged globally.
Electronic Data Interchange (EDI)
EDI is a special category of data
exchange that nearly always uses a VAN (Value-Added Network) as the
transmission medium. It relies on either the X12 or EDIFACT standards to
describe the documents that are being exchanged. Currently, EDI is a
very expensive environment to install and possibly requires
customization depending upon the terms established by the exchanging
parties. For this reason, there are a number of enterprises and
independent groups examining the XML language as a possible format for
X12 and EDIFACT documents, although no decisions have been reached.
However, one area where XML can provide
immediate value is in establishing a vocabulary and format for the
definition of EDI documents. This is especially useful when one trading
partner for its own internal use has extended the base X12 and EDIFACT
documents. Using XML data, trading partners could communicate the schema
of their EDI documents. Longer term, this information may potentially
become an automated part of the exchange process, thus simplifying and
reducing implementation costs.
Enterprise Application Integration (EAI)
Enterprise Application Integration (EAI)
is best described as making one or more disparate applications act as
one single application. This is a complex task that requires that data
to be replicated and distributed to the right systems at the right time.
For example, when integrating accounting and sales systems, it may be
necessary for the sales system to send sales orders to the accounting
system to generate invoices. Furthermore, the accounting system must
send invoice data into the sales system to update the sales
representatives. If done correctly, a single sales transaction will
generate the sales order and the invoice automatically, thus eliminating
the potentially erroneous manual re-entry of data.
An enterprise can accomplish EAI by using
many methods; some of which will be made easier by using XML markup. For
example, when integrating applications using messaging, the
communicating applications must agree on the message formats. Since,
there is little chance that two disparate applications might share
similar data structures, interim format capable of handling
semi-structured data is needed. XML can make it possible for EAI to
easily represent semi-structured data.
Another form of integration uses shared
data mediums, such as a database or memory. Business data is aggregated
from multiple data sources, such as legacy applications and databases,
and presented as a semi-structured document to other applications. In
the case of the accounting and sales systems integration, the aggregated
data set would contain all the data necessary to represent a complete
sales transaction to any other system. This document would then be
stored in the shared data medium and accessed as needed by the sales or
accounting system.
Because the Java platform supports
connectivity to a diverse set of middleware services, such as databases,
transaction processing monitors, asynchronous messaging systems, and
object request brokers, it makes an excellent tool for developing EAI
applications. The Java enterprise application programming interfaces
(APIs), which include Enterprise JavaBeansTM architecture,
Java Interface Definition Language, Java Database Connectivity (JDBCTM),
Java Messaging Service (JMSTM), Java Naming and Directory
InterfaceTM (JNDI), and Java Transaction Server (JTS) APIs,
let developers access many of the tools used for integration with
non-Java technology environments. XML lets developers represent Java
object data as it travels in and out of the Java virtual machine and
across non-Java technology-based middleware.
Publishing
The XML language retains much of SGML's
capabilities and is as useful to publishing as is its parent. Indeed,
many of the initiatives surrounding the XML specification within the W3C
focus on publishing objectives. XML can be used to provide print
publishing with ways of organizing content for maximum reusability. For
example, it can be used to represent the similar and car-specific
sections of an automotive user manual. This provides maximum reusability
of content and streamlines production practices.
In addition to simplifying content
organization, XML technology also simplifies generation to multiple
output mediums. For example, the Extensible Style sheet Language (XSL)--an
application of XML that is used to describe how to transform a
document--can be used to generate output from a single XML document to a
myriad of devices, such as printers, plotters, and print presses.
Besides supporting traditional printing
objectives, XML technology can also be used for newer electronic
publishing tasks. That is, XML can be used to "mark up"
images, video streams, audio stream, and other assorted binary data
objects. This provides a way to index, search, and manipulate streams
within applications.
However, there is more to publishing than
just organizing content and reproduction. Workflow and storage are two
key aspects of a robust publishing environment that require specific
business logic to implement. And, this is where Java technology comes
into play. The Java platform's connectivity to the network and storage
environments makes it a perfect platform on which to implement
publishing systems. With Java technology, it is possible to build a
robust shared publishing environment that can handle authoring,
processing, distribution, workflow, and storage.
Software Development
Three key areas of software development
that XML has impacted are the sharing of application architectures, the
building of declarative environments, and scripting facilities. Each of
these will be discussed further in this section.
In February 1999, the OMG (Object
Management Group) publicly stated its intention to adopt the XMI (XML
Metadata Interchange) specification. XMI is a XML based vocabulary that
describes application architectures that are designed using the Unified
Modeling Language (UML). UML is a standard set of rules that describe
system elements and the relationships between them. With the adoption of
XMI, it becomes possible to share a single UML model across a
large-scale development team that is using a diverse set of application
development tools. This level of communication over a single design
makes large-scale development teams much more productive. Also, because
the model is represented in XML, it easily can be centralized in a
repository, which makes it easier to maintain and change the model as
well as provide overall version control.
XMI illustrates how XML simplifies the
software development process, but it also can simplify design of overall
systems. Since XML content is embodied in a document that must be parsed
to provide value, it is a given that that a XML technology-based
application will be a declarative application. A declarative application
decides what a document means for itself. In contrast, an imperative
application will make assumptions about the document it is processing
based on predefined logic. The Java compiler is imperative because it
expects any file it reads to be a Java class file. A declarative
environment would first parse the file, examine it, and make a decision
about the type of document it is. Then, based on this information the
declarative application would take a course of action.
The concept of declarative environments
is extremely popular right now, especially when it comes to business
rules processing. These applications allow developers to declare a set
of rules that then get submitted to a rules engine, which will match
behavior (actions) to rules for each piece of data it examines. XML
technology can also provide developers the ability to develop and
process their own action (scripting) languages. The XML language is a
meta-language so it can be used to create any other language, including
a scripting language. This is a powerful use of XML technology that the
industry is just starting to explore.
Sun Microsystems has had a long
association with the XML specification. At this time Sun will extend its
support to ensure that the enterprise has access to XML technology from
within the Java platform. This section outlines Sun's vision of XML
technology as it relates relative to the Java platform. It also
illustrates how XML and Java technologies will work together to provide
a portable code/portable data environment.
Since early 1998, early adopters of the
XML specification have been using Java technology to parse XML and build
XML applications for a variety of reasons. Java technology's portability
provides developers with an open and accessible market for sharing their
work and XML data portability provides the means to build declarative,
reusable application components.
Development efforts within the XML
community clearly illustrate this benefit. In contrast to many other
technology communities, those building on XML technology always have
been driven by the need to remain open and facilitate sharing. Java
technology has enabled these communities to share markup languages as
well as code to process markup languages across most major hardware and
operating system platforms.
Sun Microsystems' vision for XML and Java
technologies is to provide a platform that embodies portable data and
portable maintainable code to produce platform-independent
standards-based applications. Sun Microsystems clearly recognizes that
the enterprise IT community has accepted XML technology because of its
ability to simply represent semi-structured data. And, the availability
of Java virtual machines on every major hardware platform and operating
system means that users will have the ability to process that
semi-structured data anywhere in the enterprise.
Java Platform Standard Extension for XML
Technology
To further help users leverage the power
of XML and the Java platform, Sun Microsystems is working through the
Java Community Process to develop a Java platform standard extension for
the XML specification. The Java Community Process is a formal process
for developing Java specifications, to produce standard extensions to
the Java platform. The outcome of this process will rapidly produce a
high-quality specification which uses an inclusive, consensus building
process that not only delivers the specification, but also the reference
implementation and its associated suite of compatibility tests.
The Java Platform Standard Extension for
XML technology proposes to provide basic XML functionality to read,
manipulate, and generate text. This functionality will conform to the
XML 1.0 specification and will leverage existing efforts around Java
technology APIs for XML technology, including the W3C Document Object
Model (DOM) Level 1 Core Recommendation and the SAX (Simple API for XML)
programming interface version 1.0.
The intent of supporting a XML technology
standard extension is to:
- Ensure that it easy for developers to
use XML and XML developers to use Java technologies
- Provide a base from which to add XML
features in the future
- Provide a standard for the Java
platform to ensure compatible and consistent implementations
- Ensure a high-quality integration with
the Java platform
The Java Community Process gives Java
technology users the opportunity to participate in the active growth of
the Java platform. These extensions eventually will become supported
standards within the Java platform, thus providing consistency for
applications written in the Java programming language going forward. The
Java Platform Standard Extension for XML technology will offer companies
a standard way to create and process XML documents within the Java
platform.
XML Technology Makes Sense For the Java
Platform
To reiterate, as a result of formatting
data using XML technology, data is interoperable across heterogeneous
systems. To meet the objectives of the enterprise, the Java platform
must also interoperate with existing applications and systems. The XML
language provides a data-centric method of moving data between Java
technology and non-Java technology platforms. While CORBA represents the
method of obtaining interoperability in a process-centric manner, it is
not always possible to use CORBA connectivity. In these cases the XML
language does an excellent job of representing the state of Java objects
as they leave and re-enter the Java virtual machine.
XML will also be used to define
deployment descriptors for Enterprise JavaBeansTM (EJB)
architecture. Deployment descriptors describe for EJB implementations
the rules for packaging and deploying an Enterprise JavaBeans component.
According to Sun Microsystems, the next release of EJB will use XML
technology for deployment descriptors providing, once again, data
interoperability for Java platform. It will also be used as a standard
for transmission of mission-critical business data inside of the Java 2
platform.
XML technology holds much promise for the
future. It is an industry-wide recognized language for building
representations of semi-structured data that could be shared intra- and
inter-enterprise. However, XML lets companies describe only the data and
its structure. Additional processing logic must be applied to ensure
document validity, transportation of the documents to interested
parties, and for transforming the data into a form more useful to
everyday business systems.
The Java platform offers the enterprise a
method for building libraries once to handle the creation, maintenance,
distribution, and processing of XML documents and to leverage that
process across a multitude of hardware and operating system platforms.
Sun Microsystems' initiative to extend the Java platform to incorporate
standard processing of XML technology will give companies the security
they need to build these libraries without fearing the rapidly changing
programming interfaces and functionality.
XML and Java technologies are clearly the
two most important developments for Internet computing since the advent
of the original ARPAnet project, the basis for today's Internet. Whether
used together or separately, these two standards empower the enterprise
to forge new electronic partnerships that leverage the availability and
ubiquity of the Internet to exchange and share data.
http://www.w3.org/TR/1998/REC-xml-19980210
- The Extensible Markup Language 1.0
http://java.sun.com/xml
- Provides news, information, and links to other resources.
http://java.sun.com/aboutJava/communityprocess
- Provides preview of technologies and is home to the Java Community
ProcessSM program.
http://www.jxml.com
- JXML Inc.'s home page and home of XML and Java technologies
initiatives. Also home of the Java-XML Mailing List.
NC.Focus was founded in 1996 and has
become a leading technology research and advisory firm specializing in
enterprise application integration (EAI) techniques, tools, and
technologies. We provide the unique service of identifying the impact
and benefits of enterprise application integration trends and
technologies on business, incorporated with an in-depth explanation of
how the tools and technologies operate. As a result, we empower our
clients to better identify and choose technologies based upon specific
organizational needs and requirements.
For additional information, or to
subscribe to the NC.Focus services, please contact JP Morgenthal,
president and Director of Research for NC.Focus, at (516) 792-0997 or jp@ncfocus.com.

|