|
|
Index Data > Zebra > Zebra - User's Guide and Reference > Chapter 9. GRS-1 Record Model and Filter Modules Table of Contents NoteThe functionality of this record model has been improved and replaced by the DOM XML record model. See Chapter 7, DOM XML Record Model and Filter Module .
The record model described in this chapter applies to the fundamental,
structured
record type Many basic subtypes of the grs type are currently available:
Although input data can take any form, it is sometimes useful to describe the record processing capabilities of the system in terms of a single, canonical input format that gives access to the full spectrum of structure and flexibility in the system. In Zebra, this canonical format is an "SGML-like" syntax.
To use the canonical format specify Consider a record describing an information resource (such a record is sometimes known as a locator record ). It might contain a field describing the distributor of the information resource, which might in turn be partitioned into various fields providing details about the distributor, like this:
<Distributor>
<Name> USGS/WRD </Name>
<Organization> USGS/WRD </Organization>
<Street-Address>
U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW
</Street-Address>
<City> ALBUQUERQUE </City>
<State> NM </State>
<Zip-Code> 87102 </Zip-Code>
<Country> USA </Country>
<Telephone> (505) 766-5560 </Telephone>
</Distributor>
The keywords surrounded by <...> are
tags
, while the sections of text
in between are the
data elements
.
A data element is characterized by its location in the tree
that is made up by the nested elements.
Each element is terminated by a closing tag - beginning
with The first tag in a record describes the root node of the tree that makes up the total record. In the canonical input format, the root tag should contain the name of the schema that lends context to the elements of the record (see the section called “GRS-1 Internal Record Representation”). The following is a GILS record that contains only a single element (strictly speaking, that makes it an illegal GILS record, since the GILS profile includes several mandatory elements - Zebra does not validate the contents of a record against the Z39.50 profile, however - it merely attempts to match up elements of a local representation with the given schema):
<gils>
<title>Zen and the Art of Motorcycle Maintenance</title>
</gils>
Zebra allows you to provide individual data elements in a number of variant forms . Examples of variant forms are textual data elements which might appear in different languages, and images which may appear in different formats or layouts. The variant system in Zebra is essentially a representation of the variant mechanism of Z39.50-1995. The following is an example of a title element which occurs in two different languages.
<title>
<var lang lang "eng">
Zen and the Art of Motorcycle Maintenance</>
<var lang lang "dan">
Zen og Kunsten at Vedligeholde en Motorcykel</>
</title>
The syntax of the
variant element
is
Variant elements are terminated by the general end-tag </>, by the variant end-tag </var>, by the appearance of another variant tag with the same class and value settings, or by the appearance of another, normal tag. In other words, the end-tags for the variants used in the example above could have been omitted. Variant elements can be nested. The element
<title>
<var lang lang "eng"><var body iana "text/plain">
Zen and the Art of Motorcycle Maintenance
</title>
Associates two variant components to the variant list for the title element. Given the nesting rules described above, we could write
<title>
<var body iana "text/plain>
<var lang lang "eng">
Zen and the Art of Motorcycle Maintenance
<var lang lang "dan">
Zen og Kunsten at Vedligeholde en Motorcykel
</title>
The title element above comes in two variants. Both have the IANA body type "text/plain", but one is in English, and the other in Danish. The client, using the element selection mechanism of Z39.50, can retrieve information about the available variant forms of data elements, or it can select specific variants based on the requirements of the end-user. In order to handle general input formats, Zebra allows the operator to define filters which read individual records in their native format and produce an internal representation that the system can work with.
Input filters are ASCII files, generally with the suffix
Generally, an input filter consists of a sequence of rules, where each rule consists of a sequence of expressions, followed by an action. The expressions are evaluated against the contents of the input record, and the actions normally contribute to the generation of an internal representation of the record. An expression can be either of the following:
An action is surrounded by curly braces ({...}), and consists of a sequence of statements. Statements may be separated by newlines or semicolons (;). Within actions, the strings that matched the expressions immediately preceding the action can be referred to as $0, $1, $2, etc. The available statements are:
The following input filter reads a Usenet news file, producing a record in the WAIS schema. Note that the body of a news posting is separated from the list of headers by a blank line (or rather a sequence of two newline characters.
BEGIN { begin record wais }
/^From:/ BODY /$/ { data -element name $1 }
/^Subject:/ BODY /$/ { data -element title $1 }
/^Date:/ BODY /$/ { data -element lastModified $1 }
/\n\n/ BODY END {
begin element bodyOfDisplay
begin variant body iana "text/plain"
data -text $1
end record
}
If Zebra is compiled with support for Tcl enabled, the statements described above are supplemented with a complete scripting environment, including control structures (conditional expressions and loop constructs), and powerful string manipulation mechanisms for modifying the elements of a record. |
|||
|
|
||||
| Copyright Index Data ApS 2008 | ||||