|
|
Index Data > Zebra > Zebra - User's Guide and Reference > RPN queries and semantics The PQF grammar is documented in the YAZ manual, and shall not be repeated here. This textual PQF representation is not transmistted to Zebra during search, but it is in the client mapped to the equivalent Z39.50 binary query parse tree. The RPN parse tree - or the equivalent textual representation in PQF - may start with one specification of the attribute set used. Following is a query tree, which consists of atomic query parts (APT) or named result sets , eventually paired by boolean binary operators , and finally recursively combined into complex query trees. Attribute sets define the exact meaning and semantics of queries issued. Zebra comes with some predefined attribute set definitions, others can easily be defined and added to the configuration. Table 5.1. Attribute sets predefined in Zebra
The use attributes (type 1) mappings the
predefined attribute sets are found in the
attribute set configuration files NoteThe Zebra internal query processing is modeled after the BIB-1 attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with the section called “Zebra general Bib1 Non-Use Attributes (type 2-6)”. A pair of sub query trees, or of atomic queries, is combined using the standard boolean operators into new query trees. Thus, boolean operators are always internal nodes in the query tree. Table 5.2. Boolean operators
For example, we can combine the terms information and retrieval into different searches in the default index of the default attribute set as follows. Querying for the union of all documents containing the terms information OR retrieval :
Z> find @or information retrieval
Querying for the intersection of all documents containing the terms information AND retrieval : The hit set is a subset of the corresponding OR query.
Z> find @and information retrieval
Querying for the intersection of all documents containing the terms information AND retrieval , taking proximity into account: The hit set is a subset of the corresponding AND query (see the PQF grammar for details on the proximity operator):
Z> find @prox 0 3 0 2 k 2 information retrieval
Querying for the intersection of all documents containing the terms information AND retrieval , in the same order and near each other as described in the term list. The hit set is a subset of the corresponding PROXIMITY query.
Z> find "information retrieval"
Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list , and are often called Attributes-Plus-Terms (APT) queries. Atomic (APT) queries are always leaf nodes in the PQF query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to Zebra's default values. See the section called “BIB-1 Attribute Set” for details. Table 5.3. Atomic queries (APT)
Querying for the term information in the default index using the default attribute set, the server choice of access point/index, and the default non-use attributes.
Z> find information
Equivalent query fully specified including all default values:
Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
Finding all documents which have the term debussy in the title field.
Z> find @attr 1=4 debussy
The scan operation is only supported with atomic APT queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan . For example, we might want to scan the title index, starting with the term debussy , and displaying this and the following terms in lexicographic order:
Z> scan @attr 1=4 debussy
Named result sets are supported in Zebra, and result sets can be used as operands without limitations. It follows that named result sets are leaf nodes in the PQF query tree, exactly as atomic APT queries are. After the execution of a search, the result set is available at the server, such that the client can use it for subsequent searches or retrieval requests. The Z30.50 standard actually stresses the fact that result sets are volatile. It may cease to exist at any time point after search, and the server will send a diagnostic to the effect that the requested result set does not exist any more. Defining a named result set and re-using it in the next query, using yaz-client. Notice that the client, not the server, assigns the string '1' to the named result set.
Z> f @attr 1=4 mozart
...
Number of hits: 43, setno 1
...
Z> f @and @set 1 @attr 1=4 amadeus
...
Number of hits: 14, setno 2
NoteNamed result sets are only supported by the Z39.50 protocol. The SRU web service is stateless, and therefore the notion of named result sets does not exist when accessing a Zebra server by the SRU protocol. The numeric use (type 1) attribute is usually referred to from a given attribute set. In addition, Zebra let you use any internal index name defined in your configuration as use attribute value. This is a great feature for debugging, and when you do not need the complexity of defined use attribute values. It is the preferred way of accessing Zebra indexes directly. Finding all documents which have the term list "information retrieval" in an Zebra index, using its internal full string name. Scanning the same index.
Z> find @attr 1=sometext "information retrieval"
Z> scan @attr 1=sometext aterm
Searching or scanning the bib-1 use attribute 54 using its string name:
Z> find @attr 1=Code-language eng
Z> scan @attr 1=Code-language ""
It is possible to search in any silly string index - if it's defined in your indexation rules and can be parsed by the PQF parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results.
Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
See also the section called “Mapping from PQF atomic APT queries to Zebra internal register indexes” for details, and the section called “The SRU Server” for the SRU PQF query extension using string names as a fast debugging facility. As we have seen above, it is possible (albeit seldom a great idea) to emulate XPath 1.0 based search by defining use (type 1) string attributes which in appearance resemble XPath queries . There are two problems with this approach: first, the XPath-look-alike has to be defined at indexation time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact gets populated from a possible entirely different XML element than it pretends to access.
When using the GRS-1 Record Model
(see Chapter 9,
GRS-1 Record Model and Filter Modules
), we have the
possibility to embed
life
XPath expressions
in the PQF queries, which are here called
use (type 1)
xpath
attributes. You must enable the
NoteOnly a very restricted subset of the XPath 1.0 standard is supported as the GRS-1 record model is simpler than a full XML DOM structure. See the following examples for possibilities. Finding all documents which have the term "content" inside a text node found in a specific XML DOM subtree , whose starting element is addressed by XPath.
Z> find @attr 1=/root content
Z> find @attr 1=/root/first content
Notice that the
XPath must be absolute, i.e., must start with '/', and that the
XPath
Z> find @attr 1=/root//text() content
Z> find @attr 1=/root/first//text() content
Searching inside attribute strings is possible:
Z> find @attr 1=/link/@creator morten
Filter the addressing XPath by a predicate working on exact
string values in
attributes (in the XML sense) can be done: return all those docs which
have the term "english" contained in one of all text sub nodes of
the subtree defined by the XPath
Z> find @attr 1=/record/title[@lang='en'] english
Z> find @attr 1=/link[@creator='sisse'] sibelius
Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
Combining numeric indexes, boolean expressions, and xpath based searches is possible:
Z> find @attr 1=/record/title @and foo bar
Z> find @and @attr 1=/record/title foo @attr 1=4 bar
Escaping PQF keywords and other non-parseable XPath constructs
with
Z> find @attr {1=/root/first[@attr='danish']} content
Z> find @attr {1=/record/@set} oai
WarningIt is worth mentioning that these dynamic performed XPath queries are a performance bottleneck, as no optimized specialized indexes can be used. Therefore, avoid the use of this facility when speed is essential, and the database content size is medium to large.
The Z39.50 standard defines the
Explain attribute set
Exp-1, which is used to discover information
about a server's search semantics and functional capabilities
Zebra exposes a "classic"
Explain database by base name
The attribute-set In addition, the non-Use BIB-1 attributes, that is, the types Relation , Position , Structure , Truncation , and Completeness are imported from the BIB-1 attribute set, and may be used within any explain query.
The following Explain search attributes are supported:
A search in the use attribute
See
Classic Explain only defines retrieval of Explain information
via ASN.1. Practically no Z39.50 clients supports this. Fortunately
they don't have to - Zebra allows retrieval of this information
in other formats:
List supported categories to find out which explain commands are supported:
Z> base IR-Explain-1
Z> find @attr exp1 1=1 categorylist
Z> form sutrs
Z> show 1+2
Get target info, that is, investigate which databases exist at this server endpoint:
Z> base IR-Explain-1
Z> find @attr exp1 1=1 targetinfo
Z> form xml
Z> show 1+1
Z> form grs-1
Z> show 1+1
Z> form sutrs
Z> show 1+1
List all supported databases, the number of hits
is the number of databases found, which most commonly are the
following two:
the
Z> base IR-Explain-1
Z> find @attr exp1 1=1 databaseinfo
Z> form sutrs
Z> show 1+2
Get database info record for database
Z> base IR-Explain-1
Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
Identical query with explicitly specified attribute set:
Z> base IR-Explain-1
Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
Get attribute details record for database
Z> base IR-Explain-1
Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
Identical query with explicitly specified attribute set:
Z> base IR-Explain-1
Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
Most of the information contained in this section is an excerpt of the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS found at . The BIB-1 Attribute Set Semantics from 1995, also in an updated BIB-1 Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of Zebra's capabilities, and the example queries.
A use attribute specifies an access point for any atomic query.
These access points are highly dependent on the attribute set used
in the query, and are user configurable using the following
default configuration files:
For example, some few BIB-1 use
attributes from the
att 1 Personal-name
att 2 Corporate-name
att 3 Conference-name
att 4 Title
...
att 1009 Subject-name-personal
att 1010 Body-of-text
att 1011 Date/time-added-to-db
...
att 1016 Any
att 1017 Server-choice
att 1018 Publisher
...
att 1035 Anywhere
att 1036 Author-Title-Subject
New attribute sets can be added by adding new
In addition, Zebra allows the access of internal index names and dynamic XPath as use attributes; see the section called “Zebra's special access point of type 'string'” and the section called “Zebra's special access point of type 'XPath' for GRS-1 filters”. Phrase search for information retrieval in the title-register, scanning the same register afterwards:
Z> find @attr 1=4 "information retrieval"
Z> scan @attr 1=4 information
Relation attributes describe the relationship of the access point (left side of the relation) to the search term as qualified by the attributes (right side of the relation), e.g., Date-publication <= 1975. Table 5.4. Relation Attributes (type 2)
NoteAlwaysMatches searches are only supported if alwaysmatches indexing has been enabled. See the section called “The default.idx file” The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See the section called “Structure Attributes (type 4)”.
Z> find @attr 1=Title @attr 2=1 music
...
Number of hits: 11745, setno 1
...
Z> find @attr 1=Title @attr 2=2 music
...
Number of hits: 11771, setno 2
...
Z> find @attr 1=Title @attr 2=3 music
...
Number of hits: 532, setno 3
...
Z> find @attr 1=Title @attr 2=4 music
...
Number of hits: 11463, setno 4
...
Z> find @attr 1=Title @attr 2=5 music
...
Number of hits: 11419, setno 5
The relation attribute Relevance (102) is supported, see the section called “Relevance Ranking and Sorting of Result Sets” for full information. Ranked search for information retrieval in the title-register:
Z> find @attr 1=4 @attr 2=102 "information retrieval"
The relation attribute
AlwaysMatches (103)
is in the default
configuration
supported in conjecture with structure attribute
Phrase (1)
(which may be omitted by
default).
It can be configured to work with other structure attributes,
see the configuration file
AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct PQF syntax. An empty search term may be supplied.
Z> find @attr 1=Title @attr 2=103 ""
Z> find @attr 1=Title @attr 2=103 @attr 4=1 ""
The position attribute specifies the location of the search term within the field or subfield in which it appears. Table 5.5. Position Attributes (type 3)
Note
Zebra only supports first-in-field seaches if the
The structure attribute specifies the type of search term. This causes the search to be mapped on different Zebra internal indexes, which must have been defined at index time.
The possible values of the
Table 5.6. Structure Attributes (type 4)
The structure attribute values
Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
Z> find @attr 1=Title @and mozart amadeus
The structure attribute value
Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
Z> find @attr 1=Body-of-text @or bach @or salieri teleman
This
Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
The structure attribute value
Z> find @attr 4=107 10
Z> find @attr 1=4 @attr 4=107 10
Z> find @attr 1=1010 @attr 4=107 10
In
the GILS schema (
Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
NoteThe exact mapping between PQF queries and Zebra internal indexes and index types is explained in the section called “Mapping from PQF atomic APT queries to Zebra internal register indexes”. The truncation attribute specifies whether variations of one or more characters are allowed between search term and hit terms, or not. Using non-default truncation attributes will broaden the document hit set of a search query. Table 5.7. Truncation Attributes (type 5)
The truncation attribute values 1-3 perform the obvious way:
Z> scan @attr 1=Body-of-text schnittke
...
* schnittke (81)
schnittkes (31)
schnittstelle (1)
...
Z> find @attr 1=Body-of-text @attr 5=1 schnittke
...
Number of hits: 95, setno 7
...
Z> find @attr 1=Body-of-text @attr 5=2 schnittke
...
Number of hits: 81, setno 6
...
Z> find @attr 1=Body-of-text @attr 5=3 schnittke
...
Number of hits: 95, setno 8
The truncation attribute value
Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke
Z> find @attr 1=Body-of-text @attr 5=102 schnit.*ke
...
Number of hits: 89, setno 10
The truncation attribute value
Z> find @attr 1=Body-of-text @attr 5=102 schnit+ke
Z> find @attr 1=Body-of-text @attr 5=102 schni[a-t]+ke
The truncation attribute value
Z> find @attr 1=Body-of-text @attr 5=100 schnittke
...
Number of hits: 81, setno 14
...
Z> find @attr 1=Body-of-text @attr 5=103 schnittke
...
Number of hits: 103, setno 15
...
The Table 5.8. Completeness Attributes (type = 6)
The
The NoteThe exact mapping between PQF queries and Zebra internal indexes and index types is explained in the section called “Mapping from PQF atomic APT queries to Zebra internal register indexes”. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright Index Data ApS 2008 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||