|
|
Index Data > Zebra > Zebra - User's Guide and Reference > Extended Zebra RPN Features
The Zebra internal query engine has been extended to specific needs
not covered by the
Zebra defines a hardwired
The
Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
Combination with other index types can be made. For example, to
find all records which are
not
indexed in
the
Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
Warning
The special string index
Zebra extends the BIB-1 attribute types, and these extensions are
recognized regardless of attribute
set used in a Table 5.9. Zebra Search Attribute Extensions
The embedded sort is a way to specify sort within a query - thus removing the need to send a Sort Request separately. It is both faster and does not require clients to deal with the Sort Facility.
All ordering operations are based on a lexicographical ordering,
except
when the
The possible values after attribute For example, searching for water, sort by title (ascending)
Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
Or, searching for water, sort by title ascending, then date descending
Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
Rank weight is a way to pass a value to a ranking algorithm - so that one APT has one value - while another as a different one. See also the section called “Relevance Ranking and Sorting of Result Sets”. For example, searching for utah in title with weight 30 as well as any with weight 20:
Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
Zebra supports the searchResult-1 facility. If the Term Reference Attribute (type 10) is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query. WarningExperimental. Do not use in production code. Zebra computes - unless otherwise configured - the exact hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility in the binary encoded Z39.50 search response packages. By setting an estimation limit size of the resultset of the APT leaves, Zebra stoppes processing the result set when the limit length is reached. Hit counts under this limit are still precise, but hit counts over it are estimated using the statistics gathered from the chopped result set.
Specifying a limit of For example, we might be interested in exact hit count for a, but for b we allow hit count estimates for 1000 and higher.
Z> find @and a @attr 11=1000 b
NoteThe estimated hit count facility makes searches faster, as one only needs to process large hit lists partially. It is mostly used in huge databases, where you you want trade exactness of hit counts against speed of execution. WarningDo not use approximative hit count limits in conjunction with relevance ranking, as re-sorting of the result set only works when the entire result set has been processed.
By default Zebra computes precise hit counts for a query as
a whole. Setting attribute 12 makes it perform approximative
hit counts instead. It has the same semantics as
The attribute (12) can occur anywhere in the query tree. Unlike regular attributes it does not relate to the leaf (APT) - but to the whole query. WarningDo not use approximative hit count limits in conjunction with relevance ranking, as re-sorting of the result set only works when the entire result set has been processed. Zebra extends the Bib1 attribute types, and these extensions are recognized regardless of attribute set used in a scan operation query. Table 5.10. Zebra Scan Attribute Extensions
If attribute Result Set Narrow (type 8)
is given for scan, the value is the name of a
result set. Each hit count in scan is
Consider for example the case of scanning all title fields around the scanterm mozart , then refining the scan by issuing a filtering query for amadeus to restrict the scan to the result set of the query:
Z> scan @attr 1=4 mozart
...
* mozart (43)
mozartforskningen (1)
mozartiana (1)
mozarts (16)
...
Z> f @attr 1=4 amadeus
...
Number of hits: 15, setno 2
...
Z> scan @attr 1=4 @attr 8=2 mozart
...
* mozart (14)
mozartforskningen (0)
mozartiana (0)
mozarts (1)
...
Zebra 2.0.2 and later is able to skip 0 hit counts. This, however, is known not to scale if the number of terms to skip is high. This most likely will happen if the result set is small (and result in many 0 hits).
The attribute-set
This feature is enabled when defining the
Warning
The This attribute set allows one to search GRS-1 filter indexed records by XPATH like structured index names. Warning
The Table 5.11. Zebra specific IDXPATH Use Attributes (type 1)
See
Search for all documents starting with root element
Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
Z> find @attr idxpath 1=1 @attr 4=3 root/
Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
Search for all documents where specific nested XPATH
Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
Search for CDATA string text in any element
Z> find @attrset idxpath @attr 1=1016 text
Z> find @attr 1=_XPATH_CDATA text
Search for CDATA string anothertext in any attribute:
Z> find @attrset idxpath @attr 1=1015 anothertext
Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
Search for all documents with have an XML element node including an XML attribute named creator
Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
Combining usual
Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
Scanning is supported on all
Z> scan @attrset idxpath @attr 1=1016 text
Z> scan @attr 1=_XPATH_ATTR_CDATA anothertext
Z> scan @attrset idxpath @attr 1=3 @attr 4=3 ''
The rules for PQF APT mapping are rather tricky to grasp in the first place. We deal first with the rules for deciding which internal register or string index to use, according to the use attribute or access point specified in the query. Thereafter we deal with the rules for determining the correct structure type of the named register. Zebra understands four fundamental different types of access points, of which only the numeric use attribute type access points are defined by the Z39.50 standard. All other access point types are Zebra specific, and non-portable. Table 5.12. Access point name mapping
Numeric use attributes
are mapped
to the Zebra internal
string index according to the attribute set definition in use.
The default attribute set is According to normalization and numeric use attribute mapping, it follows that the following PQF queries are considered equivalent (assuming the default configuration has not been altered):
Z> find @attr 1=Body-of-text serenade
Z> find @attr 1=bodyoftext serenade
Z> find @attr 1=BodyOfText serenade
Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
Z> find @attr 1=1010 serenade
Z> find @attrset BIB-1 @attr 1=1010 serenade
Z> find @attrset bib1 @attr 1=1010 serenade
Z> find @attrset Bib1 @attr 1=1010 serenade
Z> find @attrset b-I-b-1 @attr 1=1010 serenade
The
numerical
String indexes can be accessed directly,
independently which attribute set is in use. These are just
ignored. The above mentioned name normalization applies.
String index names are defined in the
used indexing filter configuration files, for example in the
Zebra internal indexes can be accessed directly,
according to the same rules as the user defined
string indexes. The only difference is that
Zebra internal index names are hardwired,
all uppercase and
must start with the character
Finally, Internally Zebra has in its default configuration several different types of registers or indexes, whose tokenization and character normalization rules differ. This reflects the fact that searching fundamental different tokens like dates, numbers, bitfields and string based text needs different rule sets. Table 5.13. Structure and completeness mapping to register types
If a
Structure
attribute of
Phrase
is used in conjunction with a
Completeness
attribute of
Complete (Sub)field
, the term is matched
against the contents of the phrase (long word) register, if one
exists for the given
Use
attribute.
A phrase register is created for those fields in the
GRS-1
Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
...
bayreuther festspiele (1)
* beethoven bibliography database (1)
benny carter (1)
...
Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
...
Number of hits: 0, setno 5
...
Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
...
Number of hits: 1, setno 6
If
Structure
=
Phrase
is
used in conjunction with
Incomplete Field
- the
default value for
Completeness
, the
search is directed against the normal word registers, but if the term
contains multiple words, the term will only match if all of the words
are found immediately adjacent, and in the given order.
The word search is performed on those fields that are indexed as
type
Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
...
beefheart (1)
* beethoven (18)
beethovens (7)
...
Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
...
Number of hits: 18, setno 1
...
Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography"
...
Number of hits: 2, setno 2
...
If the
Structure
attribute is
Word List
,
Free-form Text
, or
Document Text
, the term is treated as a
natural-language, relevance-ranked query.
This search type uses the word register, i.e. those fields
that are indexed as type
If the
Structure
attribute is
Numeric String
the term is treated as an integer.
The search is performed on those fields that are indexed
as type
If the
Structure
attribute is
URX
the term is treated as a URX (URL) entity.
The search is performed on those fields that are indexed as type
If the Structure attribute is Local Number the term is treated as native Zebra Record Identifier. If the Relation attribute is Equals (default), the term is matched in a normal fashion (modulo truncation and processing of individual words, if required). If Relation is Less Than , Less Than or Equal , Greater than , or Greater than or Equal , the term is assumed to be numerical, and a standard regular expression is constructed to match the given expression. If Relation is Relevance , the standard natural-language query processor is invoked. For the Truncation attribute, No Truncation is the default. Left Truncation is not supported. Process # in search term is supported, as is Regxp-1 . Regxp-2 enables the fault-tolerant (fuzzy) search. As a default, a single error (deletion, insertion, replacement) is accepted when terms are matched against the register contents. Each term in a query is interpreted as a regular expression if the truncation value is either Regxp-1 (@attr 5=102) or Regxp-2 (@attr 5=103) . Both query types follow the same syntax with the operands: Table 5.14. Regular Expression Operands
The above operands can be combined with the following operators: Table 5.15. Regular Expression Operators
If the first character of the Since the plus operator is normally a suffix operator the addition to the query syntax doesn't violate the syntax for standard regular expressions. For example, a phrase search with regular expressions in the title-register is performed like this:
Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
Combinations with other attributes are possible. For example, a ranked search with a regular expression:
Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright Index Data ApS 2008 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||