224 lines
4.5 KiB
HTML
224 lines
4.5 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of GENDICT</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>GENDICT</H1>
|
|
Section: ICU 66.1 Manual (1)<BR>Updated: 1 June 2012<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
<B>gendict</B>
|
|
|
|
- Compiles word list into ICU string trie dictionary
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
<B>gendict</B>
|
|
|
|
[
|
|
<B>--uchars</B>
|
|
|
|
|
|
|
<B>--bytes</B>
|
|
|
|
<B>--transform</B><I> transform</I>
|
|
|
|
]
|
|
[
|
|
<B>-h</B>, <B>-?</B>, <B>--help</B>
|
|
|
|
]
|
|
[
|
|
<B>-V</B>, <B>--version</B>
|
|
|
|
]
|
|
[
|
|
<B>-c</B>, <B>--copyright</B>
|
|
|
|
]
|
|
[
|
|
<B>-v</B>, <B>--verbose</B>
|
|
|
|
]
|
|
[
|
|
<B>-i</B>, <B>--icudatadir</B><I> directory</I>
|
|
|
|
]
|
|
<I> input-file</I>
|
|
|
|
<I> output-file</I>
|
|
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
<B>gendict</B>
|
|
|
|
reads the word list from
|
|
<I>dictionary-file</I>
|
|
|
|
and creates a string trie dictionary file. Normally this data file has the
|
|
<B>.dict</B>
|
|
|
|
extension.
|
|
<P>
|
|
|
|
Words begin at the beginning of a line and are terminated by the first whitespace.
|
|
Lines that begin with whitespace are ignored.
|
|
<A NAME="lbAE"> </A>
|
|
<H2>OPTIONS</H2>
|
|
|
|
<DL COMPACT>
|
|
<DT id="1"><B>-h</B>, <B>-?</B>, <B>--help</B>
|
|
|
|
<DD>
|
|
Print help about usage and exit.
|
|
<DT id="2"><B>-V</B>, <B>--version</B>
|
|
|
|
<DD>
|
|
Print the version of
|
|
<B>gendict</B>
|
|
|
|
and exit.
|
|
<DT id="3"><B>-c</B>, <B>--copyright</B>
|
|
|
|
<DD>
|
|
Embeds the standard ICU copyright into the
|
|
<I>output-file</I>.
|
|
|
|
<DT id="4"><B>-v</B>, <B>--verbose</B>
|
|
|
|
<DD>
|
|
Display extra informative messages during execution.
|
|
<DT id="5"><B>-i</B>, <B>--icudatadir</B><I> directory</I>
|
|
|
|
<DD>
|
|
Look for any necessary ICU data files in
|
|
<I>directory</I>.
|
|
|
|
For example, the file
|
|
<B>pnames.icu</B>
|
|
|
|
must be located when ICU's data is not built as a shared library.
|
|
The default ICU data directory is specified by the environment variable
|
|
<B>ICU_DATA</B>.
|
|
|
|
Most configurations of ICU do not require this argument.
|
|
<DT id="6"><B>--uchars</B>
|
|
|
|
<DD>
|
|
Set the output trie type to UChar. Mutually exclusive with
|
|
<B>--bytes.</B>
|
|
|
|
<DT id="7"><B>--bytes</B>
|
|
|
|
<DD>
|
|
Set the output trie type to Bytes. Mutually exclusive with
|
|
<B>--uchars.</B>
|
|
|
|
<DT id="8"><B>--transform</B>
|
|
|
|
<DD>
|
|
Set the transform type. Should only be specified with
|
|
<B>--bytes.</B>
|
|
|
|
Currently supported transforms are:
|
|
<B>offset-<hex-number>,</B>
|
|
|
|
which specifies an offset to subtract from all input characters.
|
|
It should be noted that the offset transform also maps U+200D
|
|
to 0xFF and U+200C to 0xFE, in order to offer compatibility to
|
|
languages that require these characters.
|
|
A transform must be specified for a bytes trie, and when applied
|
|
to the non-value characters in the
|
|
<I>input-file</I>
|
|
|
|
must produce output between 0x00 and 0xFF.
|
|
<DT id="9"><B> input-file</B>
|
|
|
|
<DD>
|
|
The source file to read.
|
|
<DT id="10"><B> output-file</B>
|
|
|
|
<DD>
|
|
The file to write the output dictionary to.
|
|
</DL>
|
|
<A NAME="lbAF"> </A>
|
|
<H2>CAVEATS</H2>
|
|
|
|
The
|
|
<I>input-file</I>
|
|
|
|
is assumed to be encoded in UTF-8.
|
|
The integers in the
|
|
<I>input-file</I>
|
|
|
|
that are used as values must be made up of ASCII digits. They
|
|
may be specified either in hex, by using a 0x prefix, or in
|
|
decimal.
|
|
Either
|
|
<B>--bytes</B>
|
|
|
|
or
|
|
<B>--uchars</B>
|
|
|
|
must be specified.
|
|
<A NAME="lbAG"> </A>
|
|
<H2>ENVIRONMENT</H2>
|
|
|
|
<DL COMPACT>
|
|
<DT id="11"><B>ICU_DATA</B>
|
|
|
|
<DD>
|
|
Specifies the directory containing ICU data. Defaults to
|
|
<B>${prefix}/share/icu/66.1/</B>.
|
|
|
|
Some tools in ICU depend on the presence of the trailing slash. It is thus
|
|
important to make sure that it is present if
|
|
<B>ICU_DATA</B>
|
|
|
|
is set.
|
|
</DL>
|
|
<A NAME="lbAH"> </A>
|
|
<H2>AUTHORS</H2>
|
|
|
|
Maxime Serrano
|
|
<A NAME="lbAI"> </A>
|
|
<H2>VERSION</H2>
|
|
|
|
1.0
|
|
<A NAME="lbAJ"> </A>
|
|
<H2>COPYRIGHT</H2>
|
|
|
|
Copyright (C) 2012 International Business Machines Corporation and others
|
|
<A NAME="lbAK"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<B><A HREF="http://www.icu-project.org/userguide/boundaryAnalysis.html">http://www.icu-project.org/userguide/boundaryAnalysis.html</A></B>
|
|
|
|
<P>
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="12"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="13"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="14"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DT id="15"><A HREF="#lbAE">OPTIONS</A><DD>
|
|
<DT id="16"><A HREF="#lbAF">CAVEATS</A><DD>
|
|
<DT id="17"><A HREF="#lbAG">ENVIRONMENT</A><DD>
|
|
<DT id="18"><A HREF="#lbAH">AUTHORS</A><DD>
|
|
<DT id="19"><A HREF="#lbAI">VERSION</A><DD>
|
|
<DT id="20"><A HREF="#lbAJ">COPYRIGHT</A><DD>
|
|
<DT id="21"><A HREF="#lbAK">SEE ALSO</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:13 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|