Unpublished.
| Jeffrey C. Lockledge
Department of Industrial and Manufacturing Engineering Wayne State University USA |
Filippo A. Salustri, P.Eng.
Industrial and Manufacturing Systems Engineering The University of Windsor Canada |
Until recently, efforts in this area focused on using relatively low-level transfer protocols (e.g. FTP, NFS, etc.) to share data. Various initiatives such as CALS[1] and NIIIP [2] have sought to synthesize information transfer systems that specialize these more generic forms to meet the requirements of engineering.
Now, however, the focus of attention is moving away from these initiatives and towards the use of newer tools (particularly Java and CORBA). The latest entry into the list of standards for internet-based information transfer is the Extensible Markup Language (XML) [3], which allows the specification of specialized document structure. The authors believe that XML can form the foundation of a powerful new approach to specifying engineering design knowledge in a way that is inherently internet-enabled. However, due to the intentional generality of XML and the specific requirements of engineering design, significant work must be done to specify how XML can be used in this domain. This paper sets forth an overview of the kind of system the authors are constructing, explaining its motivation and justification, and introducing the major features expected to be present in the completed system. We call the evolving system the Engineering Ontology Markup Language (EOML).
The rest of this paper is organized as follows. A brief overview of XML and its components is presented, along with a basic rationale for its use to represent design knowledge. The next section presents the authors' ideas of how XML can be used in this domain, and lays out the foundations of the structures needed to support it within EOML. A discussion of future work is then presented, with emphasis on defining a development path to get from this project's current state to a useful, "industrial strength" application for use by engineering practitioners. Finally, some conclusions of the authors' current work and experiences in this area are given.
XML is somewhat misnamed: is it not a markup language itself, but rather a language for developing markup languages, a meta-markup language. HTML is one markup language that can be defined in XML; other such languages tuned to specific application domains can also be developed, and any XML compliant system should be able to read and parse that document, no matter what the specific markup language is.
XML is not a single standard; rather it consists of three components (so far). XML itself defines the syntax and grammar of a document's structure. Tags are used to set off markup entities from the actual content of the document. A Document Type Definition (DTD) defines what tags can appear in a document, how those tags can be nested, and what attributes each tag can have.
XML makes no commitments about presentation appearance of documents; XSL, the Extensible Style Language [4], handles this. By separating content specification in XML from presentation specification in XSL, XML-compliant applications can be built that use document content without necessarily presenting it (e.g. a case-based reasoning engine) without the overhead associated with handling presentation matters.
Finally, linking documents together will eventually be handled by XLL, the Extensible Linking Language[5]. In the current specification of HTML, linking is done only through the <a> tag. XLL will allow a far richer functionality, including multi-directional links, linking t other kinds of entities than the usual URLs (Uniform Resource Locators), kinds of links (e.g. table of contents, section headings, indices, etc.), and alterable behavior (the action that occurs when a link is activated).
Clearly, XML is strategically placed to provide a mechanism for transferring structured design information over the internet. If this were possible, then various Web-based search/query systems could be developed to provide access to that information across a broad spectrum of user communities, thus helping to integrate otherwise disparate segments of industry. This kind of integration is one of the fundamental advantages that an XML-based approach to design knowledge specification would achieve.
The other major benefit is that this approach leverages all of the existing (and soon to exist) technology of Web-based information transfer. The infrastructure (the internet) is already in place; many fundamental tools, such as browsers, query systems, etc. either exist or are currently under development. There is little doubt in the internet community that XML will become the critical tool for Web-based communications. A design knowledge specification standard such as EOML will take advantage of all this expertise and technology, thus lowering the cost and risk associated with moving to this new technology.
However, the most fundamental question is: what is the nature of the structure that EOML would have to exhibit to be useful in engineering applications? Without a sufficient answer to this question, little else can happen.
Let us consider a very simple example of representing information about
automobiles. It might be represented in EOML as follows:
| <full-size-car>
<name>Taurus</name> <Color> <RGBColor> <Red>255</red> <Green>0</Green> <Blue>0</Blue></RGBColor></color> <InteriorColor> <RGBColor> <Red>0</red> <Green>0</Green> <Blue>0</Blue></RGBColor></InteriorColor> <engine><number-of-cylinders>6</number-of-cylinders></engine> <transmission><speeds>4</speeds></transmission> </full-size-car> |
|
|
The structure in the above example is provided by XML tags such as <full-size-car>. But these tags are specific to a particular class of engineered product and are not found in conventional XML documents (such as HTML). Given some description such as that above, how does an application know that a Color is a meaningful element of a FULL-SIZE-CAR? In order to support this kind of structure, there must be some way of specifying both what special tags can exist in a document to describe the structure and what those tags mean. In other words, a mechanism must exist for capturing ontologies of design knowledge.
An ontology is a formal structure that provides a deep categorization of knowledge so that it can be reasoned with at various levels of abstraction; in essence, it provides the means to associate a semantics with a set of terms that denote categories of entities of interest. EOML will allow the construction of ontologies for knowledge about engineered products. But in order to define the system, we must have an understanding of the kinds of XML entities the system will manipulate in order to represent the ontologies.
Developing the logical framework for an ontology building system such as that proposed herein goes beyond the scope of this paper and, in any event, it is a matter currently under development by the authors. The emphasis here is on the question of whether XML is able to represent these ontologies; the exact form of the ontological constructs does not matter here. However, insofar as the XML-based system described in the next section takes advantage of some of the authors' work in ontologies for engineering [6], some discussion of that work is relevant here for expository purposes.
In the authors' work, an entity can be either an object or a class. Object and class are taken to mean roughly what they mean in a typical object-oriented framework, except that we consider them to be disjoint types. Entities are named, and those names are used to identify entities. But it is the case that the same name may be used to refer to different entities under different circumstances. The authors therefore use the notion of context to group terms (name/entity pairs) into collections. A network of terms forms a unit of knowledge. Within a context, a given name represents only one entity, but the same name may be used to identify different entities in different contexts. Contexts may (and usually are) nested; the "outermost" contexts contain terms that are commonly understood, whereas the "innermost" contexts contain terms specific to particular applications, agents, problems, tasks, etc. Substantial effort is currently being put into formalizing the notion of context [7]. The importance of context has also been recognized by the WWW Consortium, who are currently investigating the notion of namespaces (essentially the same as contexts) [8].
Entities can have attributes of three kinds: properties, components, and abstractions. A property is an inherent characteristic whose value cannot be derived from any one component or abstraction of the entity; for example, mass, shape, and size are properties. A component is an entity that may appear as an attribute value of another entity; this kind of relation is called a part/whole relation, and the study of these kinds of relations, called mereology, is an emerging field in the area of knowledge representation [9]. It seems obvious that mereology will play a very important role in the development of ontologies for engineering. There are many different kinds of part/whole relations, and there is currently no clear mechanism to categorize and formalize them; however, the matter is being vigorously pursued by researchers. Finally, abstractions are those attributes used to relate objects to classes, and subclasses to superclasses.
Given the ontological entities described very briefly above, a mapping to the kinds of constructs available in XML is now needed. The major constituents of an XML document are elements, used to Denmark regions of a document that are to be treated, somehow, as a unit. Elements are demarked by tags that identify the kind of element and any special attributes that an element may have. Since XML deals only with syntactic constructs, it distinguishes between content that is parseable and content that is not parseable. Content that is not parseable is delivered unmodified through the parser to an underlying applications (such as a browser). This allows content that is strictly present for semantic purposes to be passed through the syntactic XML component.
XML-based knowledge representations must clearly distinguish between the syntactic and the semantic components of an ontology. XML can be used to establish if an ontology is syntactically well-formed, whereas the underlying application must be responsible for the semantic analysis of the ontology.
The authors expect that translators between contexts can be developed however this is outside the concern of this paper.
The second section of the DTD describes the structure of the ontology.
The ontology has three main components: class definitions, an abstraction
mechanism, and a collection mechanism. This allows the author to
create an ontology with their own classes of objects and the knowledge
structure which is appropriate to their application.
| <!ELEMENT context (ontology | context)+ >
<!-- A single context may have multiple ontologies or sub-contexts but must have at least one. --> <!ATTLIST context name id #REQUIRED> <!-- Contexts require a name. --> <!ELEMENT ontology (classdef | abstraction-of | contains)+ >
<!ELEMENT abstraction-of (classref , classref+)>
<!ATTLIST abstraction-of
<!ELEMENT contains (classref , classref+)>
<!ELEMENT classdef (attribute)*>
<!ELEMENT attribute (name , type)>
<!ELEMENT name (#pcdata) >
<!ELEMENT type (classref | primitive) >
<!ELEMENT classref empty >
<!ELEMENT primitive (#pcdata) >
|
|
|
A class definition is made up of zero or more attributes, which allows for the possibility of an object which has no attributes. This would be required in the situation mentioned earlier in which individuals (e.g. those in the engine repair shop) recognize that an object (e.g. the hood) exists but is of little intrinsic interest because it is outside the scope of their work.
The term used for abstraction, abstraction-of, is different that the term typically used in object oriented programming, is-a. The EOML's abstraction mechanism is more specialized than it is in traditional object-oriented languages (OOL). This is one of the things that separates an ontology from a OOL. From a purely programmatic standpoint, it is irrelevant why a set of concepts can be abstracted by another. From the standpoint of knowledge representation, and the reasoning that may be done from it, the differentiation is significant. For example, a specialization of one object from another means that there is a well known taxonomic structure at work which is recognized among may contexts. A similarity, on the other hand, may be completely coincidental, and/or purely a function of the current context.
The collection mechanism, which is referenced by the contains clause, allows the author to indicate when a class has, as part of its identity, a set of other classes. This would permit, for example, a user to have a car that has four wheels and potentially a radio. In product engineering it is particularly important to have a mechanism to include such optional features.
The following (Figure 3) is an example ontology definition. It
is created using the DTD given above and describes the relationship between
a car, specializations of that car, and a subset of its components.
The reader should realize that a real ontology would be much larger and
this is only given as an example. The ontology defined in Figure
3 could be used to generate the example given in Figure 1.
| <context name="The-Large-Car-Company">
<ontology name="Car-Structure"> <classdef name="Economy" > <attribute><name>Color</name> <type> <classref name="RGBColor" /> </type></attribute> </classdef> <classdef name="Full-Size-Car"> <attribute><name>Color</name> <type><classref name="RGBColor" /></type></attribute> <attribute><name>InteriorColor</name> <type><classref name="RGBColor"/></type></attribute> </classdef> <classdef name="Car"> <attribute><name>Name</name> <type><primitive>string</primitive></type></attribute> </classdef> <abstraction-of kind="Specialization"> <classref name="Car" /> <classref name="Full-Size-Car" /> <classref name="Economy" /></abstraction-of> <contains><classref name="car" /> <classref name="Engine" /> <classref name="Transmission" /></contains> <classdef name="Engine"> <attribute><name>Number-of-Cylinders</name> <type><primitive>integer</primitive></type></attribute> <attribute><name>Number-of-Cylinders</name> <type><primitive>integer</primitive></type></attribute> </classdef> <classdef name="Transmission"> <attribute><name>Speeds</name> <type><primitive>integer</primitive></type></attribute> </classdef> <classdef name="RGBColor"> <attribute><name>Red</name> <type><primitive>integer</primitive></type></attribute> <attribute><name>Green</name> <type><primitive>integer</primitive></type></attribute> <attribute><name>Blue</name> <type><primitive>integer</primitive></type></attribute> </classdef> </ontology> </context> |
|
|
The XML-Data effort [10] provides the means to define the characteristics of classes of objects, including those that define concepts and relations between concepts. As such, there is a potential for this initiative to be useful in developing ontologies. However, it is unclear to the current authors at this time whether XML-Data will support a rich enough environment to capture the specific kinds of knowledge necessary in engineering applications. In any event, should XML-Data be sufficient, a significant amount of work is still needed to provide mechanisms for developing, validating, and using its schemas (mechanically similar to ontologies of the current authors).
The Chemical Markup Language (CML) is one of many specific applications (others include Bioinformatic Sequence Markup Language, Wireless Markup Language, and Mathematical Markup Language) that must define specialized XML DTD's to support their application domains. But these applications work within fairly restricted domains well grounded in physical reality. On the other hand, there is a virtually unlimited number of ontologies possible to describe engineered products. Thus these efforts, though tremendously important, are insufficient for the authors' purpose.
Ultimately, the authors envision a collection of tools that will allow: