|
|
Index Data > Zebra > Zebra - User's Guide and Reference > Main Components The Zebra system is designed to support a wide range of data management applications. The system can be configured to handle virtually any kind of structured data. Each record in the system is associated with a record schema which lends context to the data elements of the record. Any number of record schemas can coexist in the system. Although it may be wise to use only a single schema within one database, the system poses no such restrictions. The Zebra indexer and information retrieval server consists of the following main applications: the zebraidx indexing maintenance utility, and the zebrasrv information query and retrieval server. Both are using some of the same main components, which are presented here.
The virtual Debian package The core Zebra module is the meat of the zebraidx indexing maintenance utility, and the zebrasrv information query and retrieval server binaries. Shortly, the core libraries are responsible for
The Debian package The zebraidx indexing maintenance utility loads external filter modules used for indexing data records of different type, and creates, updates and drops databases and indexes according to the rules defined in the filter modules.
The Debian package This is the executable which runs the Z39.50/SRU/SRW server and glues together the core libraries and the filter modules to one great Information Retrieval server application.
The Debian package The YAZ server frontend is a full fledged stateful Z39.50 server taking client connections, and forwarding search and scan requests to the Zebra core indexer. In addition to Z39.50 requests, the YAZ server frontend acts as HTTP server, honoring SRU SOAP requests, and SRU REST requests. Moreover, it can translate incoming CQL queries to PQF queries, if correctly configured.
YAZ
is an Open Source
toolkit that allows you to develop software using the
ANSI Z39.50/ISO23950 standard for information retrieval.
It is packaged in the Debian packages
The hard work of knowing what to index, how to do it, and which part of the records to send in a search/retrieve response is implemented in various filter modules. It is their responsibility to define the exact indexing and record display filtering rules.
The virtual Debian package
The DOM XML filter uses a standard DOM XML structure as internal data model, and can thus parse, index, and display any XML document. A parser for binary MARC records based on the ISO2709 library standard is provided, it transforms these to the internal MARCXML DOM representation. The internal DOM XML representation can be fed into four different pipelines, consisting of arbitraily many sucessive XSLT transformations; these are for
The DOM XML filter pipelines use XSLT (and if supported on your platform, even EXSLT), it brings thus full XPATH support to the indexing, storage and display rules of not only XML documents, but also binary MARC records. Finally, the DOM XML filter allows for static ranking at index time, and to to sort hit lists according to predefined static ranks. Details on the experimental DOM XML filter are found in Chapter 7, DOM XML Record Model and Filter Module .
The Debian package NoteThe functionality of this record model has been improved and replaced by the DOM XML record model. See the section called “DOM XML Record Model and Filter Module”. The Alvis filter for XML files is an XSLT based input filter. It indexes element and attribute content of any thinkable XML format using full XPATH support, a feature which the standard Zebra GRS-1 SGML and XML filters lacked. The indexed documents are parsed into a standard XML DOM tree, which restricts record size according to availability of memory. The Alvis filter uses XSLT display stylesheets, which let the Zebra DB administrator associate multiple, different views on the same XML document type. These views are chosen on-the-fly in search time. In addition, the Alvis filter configuration is not bound to the arcane BIB-1 Z39.50 library catalogue indexing traditions and folklore, and is therefore easier to understand. Finally, the Alvis filter allows for static ranking at index time, and to to sort hit lists according to predefined static ranks. This imposes no overhead at all, both search and indexing perform still O(1) irrespectively of document collection size. This feature resembles Googles pre-ranking using their Pagerank algorithm. Details on the experimental Alvis XSLT filter are found in Chapter 8, ALVIS XML Record Model and Filter Module .
The Debian package NoteThe functionality of this record model has been improved and replaced by the DOM XML record model. See the section called “DOM XML Record Model and Filter Module”.
The GRS-1 filter modules described in
Chapter 9,
GRS-1 Record Model and Filter Modules
are all based on the Z39.50 specifications, and it is absolutely
mandatory to have the reference pages on BIB-1 attribute sets on
you hand when configuring GRS-1 filters. The GRS filters come in
different flavors, and a short introduction is needed here.
GRS-1 filters of various kind have also been called ABS filters due
to the
The
grs.marc
and
grs.marcxml
filters are suited to parse and
index binary and XML versions of traditional library MARC records
based on the ISO2709 standard. The Debian package for both
filters is
GRS-1 TCL scriptable filters for extensive user configuration come
in two flavors: a regular expression filter
grs.regx
using TCL regular expressions, and
a general scriptable TCL filter called
grs.tcl
are both included in the
A general purpose SGML filter is called
grs.sgml
. This filter is not yet packaged,
but planned to be in the
The Debian package
|
|||
|
|
||||
| Copyright Index Data ApS 2008 | ||||