man-pages/man1/pdftotext.1.html
2021-03-31 01:06:50 +01:00

239 lines
5.6 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of pdftotext</TITLE>
</HEAD><BODY>
<H1>pdftotext</H1>
Section: User Commands (1)<BR>Updated: 15 August 2011<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
pdftotext - Portable Document Format (PDF) to text converter
(version 3.03)
<A NAME="lbAC">&nbsp;</A>
<H2>SYNOPSIS</H2>
<B>pdftotext</B>
[options]
[<I>PDF-file</I>
[<I>text-file</I>]]
<A NAME="lbAD">&nbsp;</A>
<H2>DESCRIPTION</H2>
<B>Pdftotext</B>
converts Portable Document Format (PDF) files to plain text.
<P>
Pdftotext reads the PDF file,
<I>PDF-file</I>,
and writes a text file,
<I>text-file</I>.
If
<I>text-file</I>
is not specified, pdftotext converts
<I>file.pdf</I>
to
<I>file.txt</I>.
If
<I>text-file</I>
is '-', the text is sent to stdout.
<A NAME="lbAE">&nbsp;</A>
<H2>OPTIONS</H2>
<DL COMPACT>
<DT id="1"><B>-f</B><I> number</I>
<DD>
Specifies the first page to convert.
<DT id="2"><B>-l</B><I> number</I>
<DD>
Specifies the last page to convert.
<DT id="3"><B>-r</B><I> number</I>
<DD>
Specifies the resolution, in DPI. The default is 72 DPI.
<DT id="4"><B>-x</B><I> number</I>
<DD>
Specifies the x-coordinate of the crop area top left corner
<DT id="5"><B>-y</B><I> number</I>
<DD>
Specifies the y-coordinate of the crop area top left corner
<DT id="6"><B>-W</B><I> number</I>
<DD>
Specifies the width of crop area in pixels (default is 0)
<DT id="7"><B>-H</B><I> number</I>
<DD>
Specifies the height of crop area in pixels (default is 0)
<DT id="8"><B>-layout</B>
<DD>
Maintain (as best as possible) the original physical layout of the
text. The default is to 'undo' physical layout (columns,
hyphenation, etc.) and output the text in reading order.
<DT id="9"><B>-fixed</B><I> number</I>
<DD>
Assume fixed-pitch (or tabular) text, with the specified character
width (in points). This forces physical layout mode.
<DT id="10"><B>-raw</B>
<DD>
Keep the text in content stream order. This is a hack which often
&quot;undoes&quot; column formatting, etc. Use of raw mode is no longer
recommended.
<DT id="11"><B>-nodiag</B>
<DD>
Discard diagonal text (i.e., text that is not close to one of the
0, 90, 180, or 270 degree axes). This is useful for skipping
watermarks drawn on body text.
<DT id="12"><B>-htmlmeta</B>
<DD>
Generate a simple HTML file, including the meta information. This
simply wraps the text in &lt;pre&gt; and &lt;/pre&gt; and prepends the meta
headers.
<DT id="13"><B>-bbox</B>
<DD>
Generate an XHTML file containing bounding box information for each
word in the file.
<DT id="14"><B>-bbox-layout</B>
<DD>
Generate an XHTML file containing bounding box information for each
block, line, and word in the file.
<DT id="15"><B>-enc</B><I> encoding-name</I>
<DD>
Sets the encoding to use for text output. This defaults to &quot;UTF-8&quot;.
<DT id="16"><B>-listenc</B>
<DD>
Lists the available encodings
<DT id="17"><B>-eol</B><I> unix | dos | mac</I>
<DD>
Sets the end-of-line convention to use for text output.
<DT id="18"><B>-nopgbrk</B>
<DD>
Don't insert page breaks (form feed characters) between pages.
<DT id="19"><B>-opw</B><I> password</I>
<DD>
Specify the owner password for the PDF file. Providing this will
bypass all security restrictions.
<DT id="20"><B>-upw</B><I> password</I>
<DD>
Specify the user password for the PDF file.
<DT id="21"><B>-q</B>
<DD>
Don't print any messages or errors.
<DT id="22"><B>-v</B>
<DD>
Print copyright and version information.
<DT id="23"><B>-h</B>
<DD>
Print usage information.
(<B>-help</B>
and
<B>--help</B>
are equivalent.)
</DL>
<A NAME="lbAF">&nbsp;</A>
<H2>BUGS</H2>
Some PDF files contain fonts whose encodings have been mangled beyond
recognition. There is no way (short of OCR) to extract text from
these files.
<A NAME="lbAG">&nbsp;</A>
<H2>EXIT CODES</H2>
The Xpdf tools use the following exit codes:
<DL COMPACT>
<DT id="24">0<DD>
No error.
<DT id="25">1<DD>
Error opening a PDF file.
<DT id="26">2<DD>
Error opening an output file.
<DT id="27">3<DD>
Error related to PDF permissions.
<DT id="28">99<DD>
Other error.
</DL>
<A NAME="lbAH">&nbsp;</A>
<H2>AUTHOR</H2>
The pdftotext software and documentation are copyright 1996-2011 Glyph
&amp; Cog, LLC.
<A NAME="lbAI">&nbsp;</A>
<H2>SEE ALSO</H2>
<B><A HREF="/cgi-bin/man/man2html?1+pdfdetach">pdfdetach</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdffonts">pdffonts</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdfimages">pdfimages</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdfinfo">pdfinfo</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdftocairo">pdftocairo</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdftohtml">pdftohtml</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdftoppm">pdftoppm</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdftops">pdftops</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdfseparate">pdfseparate</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdfsig">pdfsig</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?1+pdfunite">pdfunite</A></B>(1)
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="29"><A HREF="#lbAB">NAME</A><DD>
<DT id="30"><A HREF="#lbAC">SYNOPSIS</A><DD>
<DT id="31"><A HREF="#lbAD">DESCRIPTION</A><DD>
<DT id="32"><A HREF="#lbAE">OPTIONS</A><DD>
<DT id="33"><A HREF="#lbAF">BUGS</A><DD>
<DT id="34"><A HREF="#lbAG">EXIT CODES</A><DD>
<DT id="35"><A HREF="#lbAH">AUTHOR</A><DD>
<DT id="36"><A HREF="#lbAI">SEE ALSO</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:22 GMT, March 31, 2021
</BODY>
</HTML>