.TH UPMENDEX 1 \"===================================================================== .if t .ds TX \fRT\\h'-0.1667m'\\v'0.20v'E\\v'-0.20v'\\h'-0.125m'X\fP .if n .ds TX TeX .\" LX definition must follow TX so LX can use TX .if t .ds LX \fRL\\h'-0.36m'\\v'-0.15v'\s-2A\s0\\h'-0.15m'\\v'0.15v'\fP\*(TX .if n .ds LX LaTeX \"===================================================================== .SH NAME upmendex \- Multilingual index processor .SH SYNOPSIS \fBupmendex\fR [-ilqrcgf] [\fB-s\fI sty\fR] [\fB-d\fI dic\fR] [\fB-o\fI ind\fR] [\fB-t\fI log\fR] [\fB-p\fI no\fR] [\fB--\fR] [\fI idx0 idx1 idx2 ...\fR] .br \fBupmendex\fR \fB--help\fR .SH DESCRIPTION .PP The program \fIupmendex\fR is a general purpose multilingual hierarchical index generator working with up\*(LX, Xe\*(LX and Lua\*(LX; it accepts one or more input files (\fI.idx\fR; often produced by a text formatter such as \*(LX families), sorts the entries, and produces an output file which can be formatted. It supports Latin (including non-English), Greek, Cyrillic, Korean Hangul and Han (Hanzi ideographs) scripts, as well as Japanese Kana. It is almost compatible with \fImakeindex\fR and \fImendex\fR, and additional feature for handling readings of kanji words is also available. .RE The formats of the input and output files are specified in a style file. The readings of kanji words can be specified in a dictionary file. .RE The index can have up to three levels (0, 1, and 2) of subitem nesting. .SH OPTIONS .PP .TP 10 \fB-i\fR Take input from stdin, even when index files are specified. .TP 10 \fB-l\fR Set \'sort by character order\'. By default, \'sort by word order\' is used. Details are described below. .TP 10 \fB-q\fR Quiet mode; send no message to \fIstderr\fR, except error messages and warnings. .TP 10 \fB-r\fR Disable implicit page range formation. By default, three or more successive pages are automatically abbreviated as a range (e.g. 1\(en5). .TP 10 \fB-c\fR Compress sequence of intermediate blanks (space(s) and/or tab(s)) into a space and ignore leading and trailing blank(s). By default, blanks in the index key are retained. .TP 10 \fB-g\fR Make Japanese index head \fIA\fR-line (\fIA, Ka, Sa, ...\fR; 10 characters) of the \fIgojuon\fR table (Japanese syllabary). By default, all 48 characters in the \fIgojuon\fR table are used. .TP 10 \fB-f\fR Force to output characters even if the scripts are not supported by \fIupmendex\fR. .TP 10 \fB-s\fI sty\fR Employ \fIsty\fR as the style file. .TP 10 \fB-d\fI dic\fR Employ \fIdic\fR as the dictionary file. The dictionary file is composed of lists of <\fIindex_word\fR\ \fIreading\fR>. .TP 10 \fB-o\fI ind\fR Employ \fIind\fR as the output index file. By default, the file name is created by appending the extension \fIind\fR to the base name of the first input file. .TP 10 \fB-t\fI log\fR Employ \fIlog\fR as the transcript file. By default, the file name is created by appending the extension \fIilg\fR to the base name of the first input file. .TP 10 \fB-p\fI no\fR Set the starting page number of the output index list to be \fIno\fR. The argument \fIno\fR may be numerical or one of the following: \fIany\fR (the next page to the end of contents), \fIodd\fR (the next odd page to the end of contents), \fIeven\fR (the next even page to the end of contents). .TP 10 \fB--help\fR Show summary of options. .TP 10 \fB--\fR Arguments after \fB--\fR are not taken as options. This is useful when the input file name starts with '-'. .SH "STYLE FILE" The style file informs \fIupmendex\fR about the format of the \fIidx\fR input files and the intended format of the final output file. The format is upper compatible with the one for \fImakeindex\fR and \fImendex\fR. The style file contains a list of <\fIspecifier\fR\ \fIattribute\fR> pairs. There are two types of specifiers: input and output. Pairs do not have to appear in any particular order. A line begun by \'%\' is a comment. .PP \fBInput file style parameter\fR .TP 30 \fBkeyword\fR "\\\\indexentry" .RS Command with an argument of index entry which is going to be processed. .RE .TP 30 \fBarg_open\fR \'{\' .RS Opening delimiter which shows the beginning of index entry. .RE .TP 30 \fBarg_close\fR \'}\' .RS Closing delimiter which shows the end of index entry. .RE .TP 30 \fBrange_open\fR \'(\' .RS Opening delimiter which shows the beginning of page range. .RE .TP 30 \fBrange_close\fR \')\' .RS Closing delimiter which shows the end of page range. .RE .TP 30 \fBlevel\fR \'!\' .RS Delimiter which shows lower level. .RE .TP 30 \fBactual\fR \'@\' .RS Symbol which shows the next sequence is to appear as index strings in the output file. .RE .TP 30 \fBencap\fR \'|\' .RS Symbol which shows the next sequence is to be used as command name attached to the page number. .RE .TP 30 \fBpage_compositor\fR "-" .RS Separator between page levels for a style with multi-levels of page numbers. .RE .TP 30 \fBpage_precedence\fR "rnaRA" .RS Priority of expression for page number. \'R\' and \'r\' correspond to Roman. \'n\' corresponds to arabic numeral. \'A\' and \'a\' correspond to Latin alphabet. .RE .TP 30 \fBquote\fR \'"\' .RS Escape character for \fIupmendex\fR parameters. .RE .TP 30 \fBescape\fR \'\\\\\' .RS Escape character for general scripts. .RE \fBOutput file style parameter\fR .TP 30 \fBpreamble\fR "\\\\begin{theindex}\\n" .RS Preamble of output file. .RE .TP 30 \fBpostamble\fR "\\n\\n\\\\end{theindex}\\n" .RS Postamble of output file. .RE .TP 30 \fBsetpage_prefix\fR "\\n \\\\setcounter{page}{" .RS Prefix of page number if start page is designated. .RE .TP 30 \fBsetpage_suffix\fR "}\\n" .RS Suffix of page number if start page is designated. .RE .TP 30 \fBgroup_skip\fR "\\n\\n \\\\indexspace\\n" .RS Strings to insert vertical space before new section of index. .RE .TP 30 \fBlethead_prefix\fR "" .RS Prefix of heading for newly appeared heading letter. .RE .TP 30 \fBheading_prefix\fR "" .RS Same as \fBlethead_prefix\fR. (compatible with makeindex) .RE .TP 30 \fBlethead_suffix\fR "" .RS Suffix of heading for newly appeared heading letter. .RE .TP 30 \fBheading_suffix\fR "" .RS Same as \fBlethead_suffix\fR. (compatible with makeindex) .RE .TP 30 \fBlethead_flag\fR 0 .RS Flag to control output of heading letters in Latin, Greek and Cyrillic scripts. \'0\', \'1\', \'-1\' and \'2\' respectively denotes no output, uppercase, lowercase and titlecase. .RE .TP 30 \fBheading_flag\fR 0 .RS Same as \fBlethead_flag\fR. (Note: makeindex uses a different name \fBheadings_flag\fR) .RE .TP 30 \fBheadings_flag\fR 0 .RS Same as \fBlethead_flag\fR. (compatible with makeindex) .RE .TP 30 \fBkana_head\fR "" .RS Heading characters of Kana specified by a string. By default, it is controlled by \fBletter_head\fR and command line option \fB-g\fR. (Extended by upmendex) .RE .TP 30 \fBhangul_head\fR "ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ" .RS Heading characters of Hangul specified by a string. (Extended by upmendex) .RE .TP 30 \fBtumunja\fR "ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ" .RS Heading characters of Hangul specified by a string. (Deprecated, Extended by upmendex) .RE .TP 30 \fBhanzi_head\fR "" .RS Heading strings of hanzi (Kanji, Hanja) specified by a string, which is concatenated of items with a separator \';\'. (Extended by upmendex) .RE .TP 30 \fBdevanagari_head\fR "ऄअआइईउऊऋऌऍऎएऐऑऒओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसह" .RS Heading characters of Devanagari specified by a string. (Experimental, Extended by upmendex) .RE .TP 30 \fBthai_head\fR "กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮ" .RS Heading characters of Thai script specified by a string. (Experimental, Extended by upmendex) .RE .TP 30 \fBitem_0\fR "\\n \\\\item " .RS Command sequence inserted between primary level entries. .RE .TP 30 \fBitem_1\fR "\\n \\\\subitem " .RS Command sequence inserted between sub level entries. .RE .TP 30 \fBitem_2\fR "\\n \\\\subsubitem " .RS Command sequence inserted between subsub level entries. .RE .TP 30 \fBitem_01\fR "\\n \\\\subitem " .RS Command sequence inserted between primaly and sub level entries. .RE .TP 30 \fBitem_x1\fR "\\n \\\\subitem " .RS Command sequence inserted between primary and sub level entries when main entry does not have page number. .RE .TP 30 \fBitem_12\fR "\\n \\\\subsubitem " .RS Command sequence inserted between sub and subsub level entries. .RE .TP 30 \fBitem_x2\fR "\\n \\\\subsubitem " .RS Command sequence inserted between sub and subsub level entries when sub level entry does not have page number. .RE .TP 30 \fBdelim_0\fR ", " .RS Delimiter string between primary level entry and first page number. .RE .TP 30 \fBdelim_1\fR ", " .RS Delimiter string between sub level entry and first page number. .RE .TP 30 \fBdelim_2\fR ", " .RS Delimiter string between subsub level entry and first page number. .RE .TP 30 \fBdelim_n\fR ", " .RS Delimiter string between page numbers commonly used for any entry level. .RE .TP 30 \fBdelim_r\fR "--" .RS Delimiter string between pages to show page range. .RE .TP 30 \fBdelim_t\fR "" .RS Delimiter string output at the end of page number list. .RE .TP 30 \fBsuffix_2p\fR "" .RS String to be inserted in place of \fBdelim_n\fR and the next page number when the two pages are contiguous. .RE It works only when the parameter is defined. .RE .TP 30 \fBsuffix_3p\fR "" .RS String to be inserted in place of \fBdelim_r\fR and the third page number when the three pages are contiguous. The parameter is prior to \fBsuffix_mp\fR. .RE It works only when the parameter is defined. .RE .TP 30 \fBsuffix_mp\fR "" .RS String to be inserted in place of \fBdelim_r\fR and the last page number when the three or more pages are contiguous. .RE It works only when the parameter is defined. .RE .TP 30 \fBencap_prefix\fR "\\\\" .RS Prefix for an encapsulating command when the encapsulating command is added to the page number. .RE .TP 30 \fBencap_infix\fR "{" .RS Prefix just before the page number when the encapsulating command is added to the page number. .RE .TP 30 \fBencap_suffix\fR "}". .RS Suffix after the page number when the encapsulating command is added to the page number. .RE .TP 30 \fBline_max\fR 72 .RS Maximum number of one line. If exceed the number, lines are folded. .RE .TP 30 \fBindent_space\fR "\t\t" .RS Space for indent which inserted to top of folded line. .RE .TP 30 \fBindent_length\fR 16 .RS Length of space for indent which inserted to top of folded line. .RE .TP 30 \fBsymhead_positive\fR "Symbols" .RS Strings to output as heading letter for symbols when lethead_flag or heading_flag or headings_flag is positive number. .RE .TP 30 \fBsymhead_negative\fR "symbols" .RS Strings to output as heading letter for symbols when lethead_flag or heading_flag or headings_flag is negative number. .RE .TP 30 \fBsymbol\fR "" .RS Strings to output as heading letter for symbols when symbol_flag is non zero. .RE If specified, the option is prior to symhead_positive and symhead_negative. (Extended by (up)mendex) .RE .TP 30 \fBnumhead_positive\fR "Numbers" .RS Strings to output as heading letter for numbers when lethead_flag or heading_flag or headings_flag is positive number and symbol_flag is 2. .RE .TP 30 \fBnumhead_negative\fR "numbers" .RS Strings to output as heading letter for numbers when lethead_flag or heading_flag or headings_flag is negative number and symbol_flag is 2. .RE .TP 30 \fBsymbol_flag\fR 1 .RS Flag to output of symbol. If \'0\', do not output headings for symbols and numbers. If \'1\', output symbols and numbers as a group of symbols. If \'2\', output symbols and numbers separately. (Extended by (up)mendex) .RE .TP 30 \fBletter_head\fR 1 .RS Flag of heading letter for Japanese Kana. If \'1\' and \'2\', Katakana and Hiragana is used, respectively. (Extended by (up)mendex) .RE .TP 30 \fBpriority\fR 0 .RS Flag of sorting method for index words composed of Japanese and non-Japanese (ex. Latin scripts). If non zero, one space (U+0020) is inserted between Japanese sequence and non-Japanese sequence in sorting procedure. (Extended by (up)mendex) .RE .TP 30 \fBcharacter_order\fR "SNLGCJKHDTah" .RS Order of scripts and symbols. \'S\', \'N\', \'L\', \'G\', \'C\', \'J\', \'K\', \'H\', \'D\', \'T\', \'a\' and \'h\' respectively denotes symbol, number, Latin, Greek, Cyrillic, Japanese Kana, Korean Hangul, Hanzi, Devanagari, Thai, Arabic and Hebrew script. \'@\' denotes scripts which are not explicitly designated and the order are configured by icu_rules or icu_locale. Please make sure that \'S\' and \'N\' are next to each other if symbol_flag=1, since numbers are classified as a part of symbol. (Extended by upmendex) .RE .TP 30 \fBscript_preamble\fR "" .RS Preamble of script block in output file, specified by string 2. One of script names must be specified in the string 1: \'latin\', \'cyrillic\', \'greek\', \'kana\', \'hangul\', \'hanzi\', \'devanagari\', \'thai\', \'arabic\', or \'hebrew\'. (Extended by upmendex) .RE .TP 30 \fBscript_postamble\fR "" .RS Postamble of script block in output file, specified by string 2. One of script names must be specified in the string 1: \'latin\', \'cyrillic\', \'greek\', \'kana\', \'hangul\', \'hanzi\', \'devanagari\', \'thai\', \'arabic\', or \'hebrew\'. (Extended by upmendex) .RE .TP 30 \fBicu_locale\fR "" .RS Locale in ICU collator. By default, "root sort order" is set. (Extended by upmendex) .RE .TP 30 \fBicu_rules\fR "" .RS Customized collation rules in ICU collator. Unicode characters in UTF-8 encoding and following escape sequences are accepted: \fB\\Uhhhhhhhh\fR (8-digit hexadecimal [0-9A-Fa-f]), \fB\\uhhhh\fR (4-digit hexadecimal), \fB\\xhh\fR (2-digit hexadecimal), \fB\\x{h...}\fR (1..8-digit hexadecimal), and \fB\\ooo\fR (3-digit octal [0-7]). If icu_rules and icu_locale are simultaneously specified, collation rules specified by icu_rules are added on collation rules specified by icu_locale. By default, locale is used. (Extended by upmendex) .RE Ref. , .RE .TP 30 \fBicu_attributes\fR "" .RS Attributes in ICU collator. Followings are available: "alternate:shifted", "alternate:non-ignorable", "strength:primary", "strength:secondary", "strength:tertiary", "strength:quaternary", "strength:identical", "french-collation:on", "french-collation:off", "case-first:off", "case-first:upper-first", "case-first:lower-first", "case-level:on", "case-level:off", "normalization-mode:on", "normalization-mode:off", "numeric-ordering:on", "numeric-ordering:off" (Extended by upmendex) .RE Ref. , .RE .PP .SH ABOUT JAPANESE PROCESSING .PP \fIupmendex\fR has an additional feature to simplify the procedure of handling Japanese indexes, compared to \fImakeindex\fR. Users can save the effort of manually specifying a reading for every kanji word. .RE Japanese kanji words are usually sorted by the syllables of their readings (\fI\'Yomi\'\fR), which can be represented by kana (Hiragana, Katakana) scripts. \fIupmendex\fR accepts index words specified in kana expression directly on an input file, and also accepts conversion from index words in Kanji or symbols to phonogram scripts by referring to Japanese dictionaries. .RE .LP Examples of internal simplification of syllables are shown below. .PP .RS .br かぶしきがいしゃ かふしきかいしや .br マッキントッシュ まつきんとつしゆ .br ワープロ わあふろ .RE .LP The dictionary file consists of list with <\'index_word\' \'reading\'>. The index word can be written in any scripts (kanji, kana, etc), and the reading can be in any phonograms such as Hiragana or Katakana scripts. The delimiter between the index word and its reading is one or more tab(s) or space(s). .RE An example of a Japanese dictionary is shown below. .PP .RS .br 漢字 かんじ .br 読み よみ .br 環境 かんきょう .br $ ドル .RE .LP Here, each index word is allowed to have only one Yomi. Though some kanji words (ex. 「表」) may have more than one Yomi\'s (ex. 「ひょう」 and 「おもて」), only one of them can be registered in the dictionary. When some different Yomi\'s are needed, they should be specified explicitly in kana expression (ex. \\index{ひょう@表} or \\index{おもて@表}) on the input file. .RE Moreover, a dictionary file is automatically referred by setting the file name at an environment variable \fIINDEXDEFAULTDICTIONARY\fR. The dictionary set by the environment variable can be used together with file(s) specified by \fI-d\fR option. .PP .SH ABOUT SORTING PROCEDURE .PP \fIupmendex\fR sorts indexes as is (\'sort by word order\') by default. Setting \fI-l\fR option, spaces between words in an index are truncated prior to sorting procedure (\'sort by character order\'). .RE Even when sort by character order, the index at output remains the original sequence without the truncation. .RE Follows show an example. .PP .RS \fIsort by word order sort by character order\fR .br X Window Xlib .br Xlib XView .br XView X Window .RE .LP In addition, two sorting methods can be applied for indexes which contains both Japanese kana and other scripts (e.g. Latin script). By setting \fIpriority\fR 0 (default) and 1 at a style file, a space between Japanese Kana and other scripts is inserted and not inserted respectively, prior to the sorting procedure. .RE Follows show an example. .PP .RS \fIpriority=0 priority=1\fR .br index sort indファイル .br indファイル index sort .RE .PP .SH ENVIRONMENT VARIABLES \fIupmendex\fR refers environment variables as follows. .PP .TP 10 \fIINDEXSTYLE\fR Directory where index style files exist. .TP 10 \fIINDEXDEFAULTSTYLE\fR Index style file to be referred to as default. .TP 10 \fIINDEXDICTIONARY\fR Directory where dictionary files exist. .TP 10 \fIINDEXDEFAULTDICTIONARY\fR Dictionary file which is automatically read. .PP .SH DETAIL Detailed specification is compatible with \fImakeindex\fR. .PP .SH KNOWN ISSUES When plural page number expression is used, \fI.idx\fR files should be specified along with the order of page numbers. Otherwise, wrong page numbers might be output. .PP .SH "SEE ALSO" .BR tex(1), .BR latex(1), .BR makeindex(1), .BR mendex(1). .br International Components for Unicode (ICU): , .SH AUTHOR This manual page was written by Takuji Tanaka based on the mendex manual page written by Japanese \*(TX Development Community.