man-pages/man3/Genlex.3o.html


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of Genlex</TITLE>
</HEAD><BODY>
<H1>Genlex</H1>
Section: OCaml library (3o)<BR>Updated: 2020-01-30<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>

<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>

Genlex - A generic lexical analyzer.
<A NAME="lbAC">&nbsp;</A>
<H2>Module</H2>

Module   Genlex
<A NAME="lbAD">&nbsp;</A>
<H2>Documentation</H2>

<P>
Module
<B>Genlex</B>

<BR>&nbsp;:&nbsp;
<B>sig  end</B>

<P>
<P>
A generic lexical analyzer.
<P>
This module implements a simple 'standard' lexical analyzer, presented
as a function from character streams to token streams. It implements
roughly the lexical conventions of OCaml, but is parameterized by the
set of keywords of your language.
<P>
Example: a lexer suitable for a desk calculator is obtained by
<B>let lexer = make_lexer [+;-;*;/;let;=; (; )]  </B>

<P>
The associated parser would be a function from
<B>token stream</B>

to, for instance,
<B>int</B>

, and would have rules such as:
<P>
<P>
<B>let rec parse_expr = parser</B>


<B>| [&lt; n1 = parse_atom; n2 = parse_remainder n1 &gt;] -&gt; n2</B>

<B>and parse_atom = parser</B>

<B>| [&lt; 'Int n &gt;] -&gt; n</B>

<B>| [&lt; 'Kwd (; n = parse_expr; 'Kwd ) &gt;] -&gt; n</B>

<B>and parse_remainder n1 = parser</B>

<B>| [&lt; 'Kwd +; n2 = parse_expr &gt;] -&gt; n1+n2</B>

<B>| [&lt; &gt;] -&gt; n1</B>

<B><P>
</B>

One should notice that the use of the
<B>parser</B>

keyword and associated
notation for streams are only available through camlp4 extensions. This
means that one has to preprocess its sources e. g. by using the
<B>-pp</B>

command-line switch of the compilers.
<P>
<P>
<P>
<P>
<P>
<I>type token </I>

=
<BR>&nbsp;|&nbsp;Kwd
<B>of </B>

<B>string</B>

<BR>&nbsp;|&nbsp;Ident
<B>of </B>

<B>string</B>

<BR>&nbsp;|&nbsp;Int
<B>of </B>

<B>int</B>

<BR>&nbsp;|&nbsp;Float
<B>of </B>

<B>float</B>

<BR>&nbsp;|&nbsp;String
<B>of </B>

<B>string</B>

<BR>&nbsp;|&nbsp;Char
<B>of </B>

<B>char</B>

<BR>&nbsp;
<P>
The type of tokens. The lexical classes are:
<B>Int</B>

and
<B>Float</B>

for integer and floating-point numbers;
<B>String</B>

for
string literals, enclosed in double quotes;
<B>Char</B>

for
character literals, enclosed in single quotes;
<B>Ident</B>

for
identifiers (either sequences of letters, digits, underscores
and quotes, or sequences of 'operator characters' such as
<B>+</B>

,
<B>*</B>

, etc); and
<B>Kwd</B>

for keywords (either identifiers or
single 'special characters' such as
<B>(</B>

,
<B>}</B>

, etc).
<P>
<P>
<P>
<I>val make_lexer </I>

:
<B>string list -&gt; char Stream.t -&gt; token Stream.t</B>

<P>
Construct the lexer function. The first argument is the list of
keywords. An identifier
<B>s</B>

is returned as
<B>Kwd s</B>

if
<B>s</B>

belongs to this list, and as
<B>Ident s</B>

otherwise.
A special character
<B>s</B>

is returned as
<B>Kwd s</B>

if
<B>s</B>

belongs to this list, and cause a lexical error (exception
<B>Stream.Error</B>

with the offending lexeme as its parameter) otherwise.
Blanks and newlines are skipped. Comments delimited by
<B>(*</B>

and
<B>*)</B>

are skipped as well, and can be nested. A
<B>Stream.Failure</B>

exception
is raised if end of stream is unexpectedly reached.
<P>
<P>
<P>

<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
<DT id="2"><A HREF="#lbAC">Module</A><DD>
<DT id="3"><A HREF="#lbAD">Documentation</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:44 GMT, March 31, 2021
</BODY>
</HTML>