Next: Formalising Domain Ontologies with Up: Knowledge Mediation in the Previous: Mediation by Viewpoints

Graphs for Structured Web Knowledge Bases

The representation of the various domain-specific Web knowledge resources as XML documents, DTDs, local and shared ontologies in a form suitable for knowledge-based reasoning requires transformations on different levels. In [Eus00], it has been pointed out that graphs are an adequate means to transform data and information from heterogeneous sources on different abstraction levels into re-usable knowledge. SEAMLESS exploits a convenient formal correspondence between mathematical semantic structures (DAGs) or feature graphs, syntactic term encoding and logical specifications. It uses graphterms for the encoding of feature graphs as well as object graphs, and term rewriting for the implementation of operations. Having derived canonical term representations for classes of isomorphic DAGs allows to perform matching operations in an effective way [Eus97].

Graphs capture the structure of a wide range of data and knowledge modelling means as for example entity relationship diagrams. They are perhaps the most common natural means for capturing abstract structure of composed systems and to facilitate construction tasks. A complex site in the World-Wide Web has a natural abstraction as a graph where node parameters may be bound to page locations (URL's) and links describe structural and semantic relationships between parts of the hyper-document. For instance, a software or hardware system may be modularised as a graph or configuration of ``components'' or "parts" and ``connectors'' [Gar95]. Generally, nodes represent data values, object identifiers or the domain of values and edges the relationships between them.

Most important, labelled directed graphs offer a formal set-theoretic data model for subsets of the standard XML for exchanging Web data and meta-data, and for the design of query languages. Query languages like XML-QL (cf. [DFF98]) and data management systems like Lore (cf. [GMW99]) use labelled directed graphs to model (semi)-structured data and regular path-expressions to navigate through graphs. In the remainder of this paper, a slight variant of the graph data models [DFF98], [GMW99]) shall be used for the representation of XML documents.

Using the data model in definition , concrete XML documents can be mapped rather straightforward into syntactic graphs where certain nodes of the graph are annotated with semantic values of the constituent the node represents. For example, a modem document which conforms to the DTD in Figure could be modelled by the XML-graph below.

Figure: Graph modelling of a modem

An important abstraction mechanism for building knowledge bases is building a hierarchy of classes or concepts and attributes that characterise them, based on the content of their knowledge objects. The classes are ordered, based on the subsumption of their description. Hierarchical orderings can be exploited to built more efficient retrieval and matching procedures. In this case, feature graphs will be employed to structure sets of XML documents in an uniform graph-based framework.

Linear (serial) term representations, named graphterms, were devised as data structure for the encoding of labelled DAGs and feature graphs [Eus97]. From features, constants and variables, terms are constructed, if features are seen equivalently as binary predicates that must be interpreted as functional relations. Let f be a feature symbol, X a variable. A feature graphterm is an expression f(X,Graphtermlist), where
Graphtermlist is either the empty list [], or denotes a list of feature graphterms. To ensure that graphterms represent directed, acyclic graphs, further axioms constrain the valid terms. A graphterm algebra and canonical forms for classes of isomorphic objects were constructed (cf. [Eus97]).

The first step in setting-up a Web knowledge-base consists in extracting the graph views from Web resources stored in different persistent data sources to map it onto structures on which logical and algebraic operations apply. Graph views define specific syntactic graphs from existing Web resources. XML DTDs are wrapped into feature graphterms where tags are mapped into feature symbols. Each node variable could be associated with a set of properties by an attribute list. Parsing rules from XML DTDs into feature graphs have to specify the syntax - syntax relation. XML elements and DTDs are represented using the linear recursively constructed graphterms of the format
element(AttributeName(Value),ListofSubelements) or
element(Variable,ListofSubelements), where ListofSubelements could be the empty list. This results into the encodings in Figure .

Figure: Graphterm Representation PC Modem Card and its DTD

Unifying information requests and information offers is the central operation when designing mediators. To define that two structured graph objects ``match'' it is recalled that structured representations are merely an alternative kind of predicate calculus formalism. Two objects match if and only if the predicate calculus formula associated with one of them unifies with the predicate calculus formula associated with the other. Canonical forms enable effective graph storing and matching within a logic-based framework. Various operations on graphs that enrich the built-in facilities of a logic programming language were devised and implemented. These functions can be used to query graph-based knowledge bases, to construct or transform graphs and reason with them. Especially:

Conjunctions of sequences of features (path expressions), which may include wildcards, for example,
, can be used to retrieve matching concepts.
Let O be an XML graph and be feature graphs and var(C) denote the set of variables of any expression. The graph O is an instance of the feature graph C, C subsumes O, if their exists a term substitution such that , supposing canonical term representations.

Next: Formalising Domain Ontologies with Up: Knowledge Mediation in the Previous: Mediation by Viewpoints

Andre Everts
Mon Sep 4 20:44:27 MET DST 2000