|
|
Index Data > Zebra > Zebra - User's Guide and Reference > Chapter 10. Field Structure and Character Sets Table of Contents In order to provide a flexible approach to national character set handling, Zebra allows the administrator to configure the set up the system to handle any 8-bit character set — including sets that require multi-octet diacritics or other multi-octet characters. The definition of a character set includes a specification of the permissible values, their sort order (this affects the display in the SCAN function), and relationships between upper- and lowercase characters. Finally, the definition includes the specification of space characters for the set. The operator can define different character sets for different fields, typical examples being standard text fields, numerical fields, and special-purpose fields such as WWW-style linkages (URx).
Zebra 1.3 and Zebra versions 2.0.18 and earlier required that the field
type is a single character, e.g. Version 2.1 of Zebra can also be configured - per field - to use the ICU library to perform tokenization and normalization of strings. This is an alternative to the "charmap" files which has been part of Zebra since its first release.
The field types, and hence character sets, are associated with data
elements by the indexing rules (say
Example 10.1. Field types
Following are three excerpts of the standard
# Traditional word index
# Used if completenss is 'incomplete field' (@attr 6=1) and
# structure is word/phrase/word-list/free-form-text/document-text
index w
completeness 0
position 1
alwaysmatches 1
firstinfield 1
charmap string.chr
...
# Null map index (no mapping at all)
# Used if structure=key (@attr 4=3)
index 0
completeness 0
position 1
charmap @
...
# Sort register
sort s
completeness 1
charmap string.chr
|
|||
|
|
||||
| Copyright Index Data ApS 2008 | ||||