239 lines
5.6 KiB
HTML
239 lines
5.6 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of pdftotext</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>pdftotext</H1>
|
|
Section: User Commands (1)<BR>Updated: 15 August 2011<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
pdftotext - Portable Document Format (PDF) to text converter
|
|
(version 3.03)
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
<B>pdftotext</B>
|
|
|
|
[options]
|
|
[<I>PDF-file</I>
|
|
|
|
[<I>text-file</I>]]
|
|
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
<B>Pdftotext</B>
|
|
|
|
converts Portable Document Format (PDF) files to plain text.
|
|
<P>
|
|
|
|
Pdftotext reads the PDF file,
|
|
<I>PDF-file</I>,
|
|
|
|
and writes a text file,
|
|
<I>text-file</I>.
|
|
|
|
If
|
|
<I>text-file</I>
|
|
|
|
is not specified, pdftotext converts
|
|
<I>file.pdf</I>
|
|
|
|
to
|
|
<I>file.txt</I>.
|
|
|
|
If
|
|
<I>text-file</I>
|
|
|
|
is '-', the text is sent to stdout.
|
|
<A NAME="lbAE"> </A>
|
|
<H2>OPTIONS</H2>
|
|
|
|
<DL COMPACT>
|
|
<DT id="1"><B>-f</B><I> number</I>
|
|
|
|
<DD>
|
|
Specifies the first page to convert.
|
|
<DT id="2"><B>-l</B><I> number</I>
|
|
|
|
<DD>
|
|
Specifies the last page to convert.
|
|
<DT id="3"><B>-r</B><I> number</I>
|
|
|
|
<DD>
|
|
Specifies the resolution, in DPI. The default is 72 DPI.
|
|
<DT id="4"><B>-x</B><I> number</I>
|
|
|
|
<DD>
|
|
Specifies the x-coordinate of the crop area top left corner
|
|
<DT id="5"><B>-y</B><I> number</I>
|
|
|
|
<DD>
|
|
Specifies the y-coordinate of the crop area top left corner
|
|
<DT id="6"><B>-W</B><I> number</I>
|
|
|
|
<DD>
|
|
Specifies the width of crop area in pixels (default is 0)
|
|
<DT id="7"><B>-H</B><I> number</I>
|
|
|
|
<DD>
|
|
Specifies the height of crop area in pixels (default is 0)
|
|
<DT id="8"><B>-layout</B>
|
|
|
|
<DD>
|
|
Maintain (as best as possible) the original physical layout of the
|
|
text. The default is to 'undo' physical layout (columns,
|
|
hyphenation, etc.) and output the text in reading order.
|
|
<DT id="9"><B>-fixed</B><I> number</I>
|
|
|
|
<DD>
|
|
Assume fixed-pitch (or tabular) text, with the specified character
|
|
width (in points). This forces physical layout mode.
|
|
<DT id="10"><B>-raw</B>
|
|
|
|
<DD>
|
|
Keep the text in content stream order. This is a hack which often
|
|
"undoes" column formatting, etc. Use of raw mode is no longer
|
|
recommended.
|
|
<DT id="11"><B>-nodiag</B>
|
|
|
|
<DD>
|
|
Discard diagonal text (i.e., text that is not close to one of the
|
|
0, 90, 180, or 270 degree axes). This is useful for skipping
|
|
watermarks drawn on body text.
|
|
<DT id="12"><B>-htmlmeta</B>
|
|
|
|
<DD>
|
|
Generate a simple HTML file, including the meta information. This
|
|
simply wraps the text in <pre> and </pre> and prepends the meta
|
|
headers.
|
|
<DT id="13"><B>-bbox</B>
|
|
|
|
<DD>
|
|
Generate an XHTML file containing bounding box information for each
|
|
word in the file.
|
|
<DT id="14"><B>-bbox-layout</B>
|
|
|
|
<DD>
|
|
Generate an XHTML file containing bounding box information for each
|
|
block, line, and word in the file.
|
|
<DT id="15"><B>-enc</B><I> encoding-name</I>
|
|
|
|
<DD>
|
|
Sets the encoding to use for text output. This defaults to "UTF-8".
|
|
<DT id="16"><B>-listenc</B>
|
|
|
|
<DD>
|
|
Lists the available encodings
|
|
<DT id="17"><B>-eol</B><I> unix | dos | mac</I>
|
|
|
|
<DD>
|
|
Sets the end-of-line convention to use for text output.
|
|
<DT id="18"><B>-nopgbrk</B>
|
|
|
|
<DD>
|
|
Don't insert page breaks (form feed characters) between pages.
|
|
<DT id="19"><B>-opw</B><I> password</I>
|
|
|
|
<DD>
|
|
Specify the owner password for the PDF file. Providing this will
|
|
bypass all security restrictions.
|
|
<DT id="20"><B>-upw</B><I> password</I>
|
|
|
|
<DD>
|
|
Specify the user password for the PDF file.
|
|
<DT id="21"><B>-q</B>
|
|
|
|
<DD>
|
|
Don't print any messages or errors.
|
|
<DT id="22"><B>-v</B>
|
|
|
|
<DD>
|
|
Print copyright and version information.
|
|
<DT id="23"><B>-h</B>
|
|
|
|
<DD>
|
|
Print usage information.
|
|
(<B>-help</B>
|
|
|
|
and
|
|
<B>--help</B>
|
|
|
|
are equivalent.)
|
|
</DL>
|
|
<A NAME="lbAF"> </A>
|
|
<H2>BUGS</H2>
|
|
|
|
Some PDF files contain fonts whose encodings have been mangled beyond
|
|
recognition. There is no way (short of OCR) to extract text from
|
|
these files.
|
|
<A NAME="lbAG"> </A>
|
|
<H2>EXIT CODES</H2>
|
|
|
|
The Xpdf tools use the following exit codes:
|
|
<DL COMPACT>
|
|
<DT id="24">0<DD>
|
|
No error.
|
|
<DT id="25">1<DD>
|
|
Error opening a PDF file.
|
|
<DT id="26">2<DD>
|
|
Error opening an output file.
|
|
<DT id="27">3<DD>
|
|
Error related to PDF permissions.
|
|
<DT id="28">99<DD>
|
|
Other error.
|
|
</DL>
|
|
<A NAME="lbAH"> </A>
|
|
<H2>AUTHOR</H2>
|
|
|
|
The pdftotext software and documentation are copyright 1996-2011 Glyph
|
|
& Cog, LLC.
|
|
<A NAME="lbAI"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdfdetach">pdfdetach</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdffonts">pdffonts</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdfimages">pdfimages</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdfinfo">pdfinfo</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdftocairo">pdftocairo</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdftohtml">pdftohtml</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdftoppm">pdftoppm</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdftops">pdftops</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdfseparate">pdfseparate</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdfsig">pdfsig</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+pdfunite">pdfunite</A></B>(1)
|
|
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="29"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="30"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="31"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DT id="32"><A HREF="#lbAE">OPTIONS</A><DD>
|
|
<DT id="33"><A HREF="#lbAF">BUGS</A><DD>
|
|
<DT id="34"><A HREF="#lbAG">EXIT CODES</A><DD>
|
|
<DT id="35"><A HREF="#lbAH">AUTHOR</A><DD>
|
|
<DT id="36"><A HREF="#lbAI">SEE ALSO</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:22 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|