man-pages/man1/gendict.1.html
2021-03-31 01:06:50 +01:00

224 lines
4.5 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of GENDICT</TITLE>
</HEAD><BODY>
<H1>GENDICT</H1>
Section: ICU 66.1 Manual (1)<BR>Updated: 1 June 2012<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
<B>gendict</B>
- Compiles word list into ICU string trie dictionary
<A NAME="lbAC">&nbsp;</A>
<H2>SYNOPSIS</H2>
<B>gendict</B>
[
<B>--uchars</B>
|
<B>--bytes</B>
<B>--transform</B><I> transform</I>
]
[
<B>-h</B>, <B>-?</B>, <B>--help</B>
]
[
<B>-V</B>, <B>--version</B>
]
[
<B>-c</B>, <B>--copyright</B>
]
[
<B>-v</B>, <B>--verbose</B>
]
[
<B>-i</B>, <B>--icudatadir</B><I> directory</I>
]
<I> input-file</I>
<I> output-file</I>
<A NAME="lbAD">&nbsp;</A>
<H2>DESCRIPTION</H2>
<B>gendict</B>
reads the word list from
<I>dictionary-file</I>
and creates a string trie dictionary file. Normally this data file has the
<B>.dict</B>
extension.
<P>
Words begin at the beginning of a line and are terminated by the first whitespace.
Lines that begin with whitespace are ignored.
<A NAME="lbAE">&nbsp;</A>
<H2>OPTIONS</H2>
<DL COMPACT>
<DT id="1"><B>-h</B>, <B>-?</B>, <B>--help</B>
<DD>
Print help about usage and exit.
<DT id="2"><B>-V</B>, <B>--version</B>
<DD>
Print the version of
<B>gendict</B>
and exit.
<DT id="3"><B>-c</B>, <B>--copyright</B>
<DD>
Embeds the standard ICU copyright into the
<I>output-file</I>.
<DT id="4"><B>-v</B>, <B>--verbose</B>
<DD>
Display extra informative messages during execution.
<DT id="5"><B>-i</B>, <B>--icudatadir</B><I> directory</I>
<DD>
Look for any necessary ICU data files in
<I>directory</I>.
For example, the file
<B>pnames.icu</B>
must be located when ICU's data is not built as a shared library.
The default ICU data directory is specified by the environment variable
<B>ICU_DATA</B>.
Most configurations of ICU do not require this argument.
<DT id="6"><B>--uchars</B>
<DD>
Set the output trie type to UChar. Mutually exclusive with
<B>--bytes.</B>
<DT id="7"><B>--bytes</B>
<DD>
Set the output trie type to Bytes. Mutually exclusive with
<B>--uchars.</B>
<DT id="8"><B>--transform</B>
<DD>
Set the transform type. Should only be specified with
<B>--bytes.</B>
Currently supported transforms are:
<B>offset-&lt;hex-number&gt;,</B>
which specifies an offset to subtract from all input characters.
It should be noted that the offset transform also maps U+200D
to 0xFF and U+200C to 0xFE, in order to offer compatibility to
languages that require these characters.
A transform must be specified for a bytes trie, and when applied
to the non-value characters in the
<I>input-file</I>
must produce output between 0x00 and 0xFF.
<DT id="9"><B> input-file</B>
<DD>
The source file to read.
<DT id="10"><B> output-file</B>
<DD>
The file to write the output dictionary to.
</DL>
<A NAME="lbAF">&nbsp;</A>
<H2>CAVEATS</H2>
The
<I>input-file</I>
is assumed to be encoded in UTF-8.
The integers in the
<I>input-file</I>
that are used as values must be made up of ASCII digits. They
may be specified either in hex, by using a 0x prefix, or in
decimal.
Either
<B>--bytes</B>
or
<B>--uchars</B>
must be specified.
<A NAME="lbAG">&nbsp;</A>
<H2>ENVIRONMENT</H2>
<DL COMPACT>
<DT id="11"><B>ICU_DATA</B>
<DD>
Specifies the directory containing ICU data. Defaults to
<B>${prefix}/share/icu/66.1/</B>.
Some tools in ICU depend on the presence of the trailing slash. It is thus
important to make sure that it is present if
<B>ICU_DATA</B>
is set.
</DL>
<A NAME="lbAH">&nbsp;</A>
<H2>AUTHORS</H2>
Maxime Serrano
<A NAME="lbAI">&nbsp;</A>
<H2>VERSION</H2>
1.0
<A NAME="lbAJ">&nbsp;</A>
<H2>COPYRIGHT</H2>
Copyright (C) 2012 International Business Machines Corporation and others
<A NAME="lbAK">&nbsp;</A>
<H2>SEE ALSO</H2>
<B><A HREF="http://www.icu-project.org/userguide/boundaryAnalysis.html">http://www.icu-project.org/userguide/boundaryAnalysis.html</A></B>
<P>
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="12"><A HREF="#lbAB">NAME</A><DD>
<DT id="13"><A HREF="#lbAC">SYNOPSIS</A><DD>
<DT id="14"><A HREF="#lbAD">DESCRIPTION</A><DD>
<DT id="15"><A HREF="#lbAE">OPTIONS</A><DD>
<DT id="16"><A HREF="#lbAF">CAVEATS</A><DD>
<DT id="17"><A HREF="#lbAG">ENVIRONMENT</A><DD>
<DT id="18"><A HREF="#lbAH">AUTHORS</A><DD>
<DT id="19"><A HREF="#lbAI">VERSION</A><DD>
<DT id="20"><A HREF="#lbAJ">COPYRIGHT</A><DD>
<DT id="21"><A HREF="#lbAK">SEE ALSO</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:13 GMT, March 31, 2021
</BODY>
</HTML>