512 lines
13 KiB
HTML
512 lines
13 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of lwpcook</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>lwpcook</H1>
|
|
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2019-11-29<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
lwpcook - The libwww-perl cookbook
|
|
<A NAME="lbAC"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
|
|
|
|
This document contain some examples that show typical usage of the
|
|
libwww-perl library. You should consult the documentation for the
|
|
individual modules for more detail.
|
|
<P>
|
|
|
|
All examples should be runnable programs. You can, in most cases, test
|
|
the code sections by piping the program text directly to perl.
|
|
<A NAME="lbAD"> </A>
|
|
<H2>GET</H2>
|
|
|
|
|
|
|
|
It is very easy to use this library to just fetch documents from the
|
|
net. The LWP::Simple module provides the <B>get()</B> function that return
|
|
the document specified by its <FONT SIZE="-1">URL</FONT> argument:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::Simple;
|
|
$doc = get '<A HREF="http://search.cpan.org/dist/libwww-perl/';">http://search.cpan.org/dist/libwww-perl/';</A>
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
or, as a perl one-liner using the <B>getprint()</B> function:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
perl -MLWP::Simple -e 'getprint "<A HREF="http://search.cpan.org/dist/libwww-perl/">http://search.cpan.org/dist/libwww-perl/</A>"'
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
or, how about fetching the latest perl by running this command:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
perl -MLWP::Simple -e '
|
|
getstore "<A HREF="ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz">ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz</A>",
|
|
"perl.tar.gz"'
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
You will probably first want to find a <FONT SIZE="-1">CPAN</FONT> site closer to you by
|
|
running something like the following command:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
perl -MLWP::Simple -e 'getprint "<A HREF="http://www.cpan.org/SITES.html">http://www.cpan.org/SITES.html</A>"'
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Enough of this simple stuff! The <FONT SIZE="-1">LWP</FONT> object oriented interface gives
|
|
you more control over the request sent to the server. Using this
|
|
interface you have full control over headers sent and how you want to
|
|
handle the response returned.
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
$ua = LWP::UserAgent->new;
|
|
$ua->agent("$0/0.1 " . $ua->agent);
|
|
# $ua->agent("Mozilla/8.0") # pretend we are very capable browser
|
|
|
|
$req = HTTP::Request->new(
|
|
GET => '<A HREF="http://search.cpan.org/dist/libwww-perl/');">http://search.cpan.org/dist/libwww-perl/');</A>
|
|
$req->header('Accept' => 'text/html');
|
|
|
|
# send request
|
|
$res = $ua->request($req);
|
|
|
|
# check the outcome
|
|
if ($res->is_success) {
|
|
print $res->decoded_content;
|
|
}
|
|
else {
|
|
print "Error: " . $res->status_line . "\n";
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The lwp-request program (alias <FONT SIZE="-1">GET</FONT>) that is distributed with the
|
|
library can also be used to fetch documents from <FONT SIZE="-1">WWW</FONT> servers.
|
|
<A NAME="lbAE"> </A>
|
|
<H2>HEAD</H2>
|
|
|
|
|
|
|
|
If you just want to check if a document is present (i.e. the <FONT SIZE="-1">URL</FONT> is
|
|
valid) try to run code that looks like this:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::Simple;
|
|
|
|
if (head($url)) {
|
|
# ok document exists
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The <B>head()</B> function really returns a list of meta-information about
|
|
the document. The first three values of the list returned are the
|
|
document type, the size of the document, and the age of the document.
|
|
<P>
|
|
|
|
More control over the request or access to all header values returned
|
|
require that you use the object oriented interface described for <FONT SIZE="-1">GET</FONT>
|
|
above. Just s/GET/HEAD/g.
|
|
<A NAME="lbAF"> </A>
|
|
<H2>POST</H2>
|
|
|
|
|
|
|
|
There is no simple procedural interface for posting data to a <FONT SIZE="-1">WWW</FONT> server. You
|
|
must use the object oriented interface for this. The most common <FONT SIZE="-1">POST</FONT>
|
|
operation is to access a <FONT SIZE="-1">WWW</FONT> form application:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
$ua = LWP::UserAgent->new;
|
|
|
|
my $req = HTTP::Request->new(
|
|
POST => '<A HREF="https://rt.cpan.org/Public/Dist/Display.html');">https://rt.cpan.org/Public/Dist/Display.html');</A>
|
|
$req->content_type('application/x-www-form-urlencoded');
|
|
$req->content('Status=Active&Name=libwww-perl');
|
|
|
|
my $res = $ua->request($req);
|
|
print $res->as_string;
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Lazy people use the HTTP::Request::Common module to set up a suitable
|
|
<FONT SIZE="-1">POST</FONT> request message (it handles all the escaping issues) and has a
|
|
suitable default for the content_type:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use HTTP::Request::Common qw(POST);
|
|
use LWP::UserAgent;
|
|
$ua = LWP::UserAgent->new;
|
|
|
|
my $req = POST '<A HREF="https://rt.cpan.org/Public/Dist/Display.html',">https://rt.cpan.org/Public/Dist/Display.html',</A>
|
|
[ Status => 'Active', Name => 'libwww-perl' ];
|
|
|
|
print $ua->request($req)->as_string;
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The lwp-request program (alias <FONT SIZE="-1">POST</FONT>) that is distributed with the
|
|
library can also be used for posting data.
|
|
<A NAME="lbAG"> </A>
|
|
<H2>PROXIES</H2>
|
|
|
|
|
|
|
|
Some sites use proxies to go through fire wall machines, or just as
|
|
cache in order to improve performance. Proxies can also be used for
|
|
accessing resources through protocols not supported directly (or
|
|
supported badly :-) by the libwww-perl library.
|
|
<P>
|
|
|
|
You should initialize your proxy setting before you start sending
|
|
requests:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
$ua = LWP::UserAgent->new;
|
|
$ua->env_proxy; # initialize from environment variables
|
|
# or
|
|
$ua->proxy(ftp => '<A HREF="http://proxy.myorg.com');">http://proxy.myorg.com');</A>
|
|
$ua->proxy(wais => '<A HREF="http://proxy.myorg.com');">http://proxy.myorg.com');</A>
|
|
$ua->no_proxy(qw(no se fi));
|
|
|
|
my $req = HTTP::Request->new(GET => '<A HREF="wais://xxx.com/');">wais://xxx.com/');</A>
|
|
print $ua->request($req)->as_string;
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The LWP::Simple interface will call <B>env_proxy()</B> for you automatically.
|
|
Applications that use the <TT>$ua</TT>-><B>env_proxy()</B> method will normally not
|
|
use the <TT>$ua</TT>-><B>proxy()</B> and <TT>$ua</TT>-><B>no_proxy()</B> methods.
|
|
<P>
|
|
|
|
Some proxies also require that you send it a username/password in
|
|
order to let requests through. You should be able to add the
|
|
required header, with something like this:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
|
|
$ua = LWP::UserAgent->new;
|
|
$ua->proxy(['http', 'ftp'] => '<A HREF="http://username:password@proxy.myorg.com');">http://username:password@proxy.myorg.com');</A>
|
|
|
|
$req = HTTP::Request->new('GET',"<A HREF="http://www.perl.com">http://www.perl.com</A>");
|
|
|
|
$res = $ua->request($req);
|
|
print $res->decoded_content if $res->is_success;
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Replace <TT>"proxy.myorg.com"</TT>, <TT>"username"</TT> and
|
|
<TT>"password"</TT> with something suitable for your site.
|
|
<A NAME="lbAH"> </A>
|
|
<H2>ACCESS TO PROTECTED DOCUMENTS</H2>
|
|
|
|
|
|
|
|
Documents protected by basic authorization can easily be accessed
|
|
like this:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
$ua = LWP::UserAgent->new;
|
|
$req = HTTP::Request->new(GET => '<A HREF="http://www.linpro.no/secret/');">http://www.linpro.no/secret/');</A>
|
|
$req->authorization_basic('aas', 'mypassword');
|
|
print $ua->request($req)->as_string;
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The other alternative is to provide a subclass of <I>LWP::UserAgent</I> that
|
|
overrides the <B>get_basic_credentials()</B> method. Study the <I>lwp-request</I>
|
|
program for an example of this.
|
|
<A NAME="lbAI"> </A>
|
|
<H2>COOKIES</H2>
|
|
|
|
|
|
|
|
Some sites like to play games with cookies. By default <FONT SIZE="-1">LWP</FONT> ignores
|
|
cookies provided by the servers it visits. <FONT SIZE="-1">LWP</FONT> will collect cookies
|
|
and respond to cookie requests if you set up a cookie jar. <FONT SIZE="-1">LWP</FONT> doesn't
|
|
provide a cookie jar itself, but if you install HTTP::CookieJar::LWP,
|
|
it can be used like this:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
use HTTP::CookieJar::LWP;
|
|
|
|
$ua = LWP::UserAgent->new(
|
|
cookie_jar => HTTP::CookieJar::LWP->new,
|
|
);
|
|
|
|
# and then send requests just as you used to do
|
|
$res = $ua->request(HTTP::Request->new(GET => "<A HREF="http://no.yahoo.com/">http://no.yahoo.com/</A>"));
|
|
print $res->status_line, "\n";
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAJ"> </A>
|
|
<H2>HTTPS</H2>
|
|
|
|
|
|
|
|
URLs with https scheme are accessed in exactly the same way as with
|
|
http scheme, provided that an <FONT SIZE="-1">SSL</FONT> interface module for <FONT SIZE="-1">LWP</FONT> has been
|
|
properly installed (see the <I></I><FONT SIZE="-1"><I>README.SSL</I></FONT><I></I> file found in the
|
|
libwww-perl distribution for more details). If no <FONT SIZE="-1">SSL</FONT> interface is
|
|
installed for <FONT SIZE="-1">LWP</FONT> to use, then you will get ``501 Protocol scheme
|
|
'https' is not supported'' errors when accessing such URLs.
|
|
<P>
|
|
|
|
Here's an example of fetching and printing a <FONT SIZE="-1">WWW</FONT> page using <FONT SIZE="-1">SSL:</FONT>
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
|
|
my $ua = LWP::UserAgent->new;
|
|
my $req = HTTP::Request->new(GET => '<A HREF="https://www.helsinki.fi/');">https://www.helsinki.fi/');</A>
|
|
my $res = $ua->request($req);
|
|
if ($res->is_success) {
|
|
print $res->as_string;
|
|
}
|
|
else {
|
|
print "Failed: ", $res->status_line, "\n";
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAK"> </A>
|
|
<H2>MIRRORING</H2>
|
|
|
|
|
|
|
|
If you want to mirror documents from a <FONT SIZE="-1">WWW</FONT> server, then try to run
|
|
code similar to this at regular intervals:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::Simple;
|
|
|
|
%mirrors = (
|
|
'<A HREF="http://www.sn.no/'">http://www.sn.no/'</A> => 'sn.html',
|
|
'<A HREF="http://www.perl.com/'">http://www.perl.com/'</A> => 'perl.html',
|
|
'<A HREF="http://search.cpan.org/distlibwww-perl/'">http://search.cpan.org/distlibwww-perl/'</A> => 'lwp.html',
|
|
'<A HREF="gopher://gopher.sn.no/'">gopher://gopher.sn.no/'</A> => 'gopher.html',
|
|
);
|
|
|
|
while (($url, $localfile) = each(%mirrors)) {
|
|
mirror($url, $localfile);
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Or, as a perl one-liner:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
perl -MLWP::Simple -e 'mirror("<A HREF="http://www.perl.com/">http://www.perl.com/</A>", "perl.html")';
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The document will not be transferred unless it has been updated.
|
|
<A NAME="lbAL"> </A>
|
|
<H2>LARGE DOCUMENTS</H2>
|
|
|
|
|
|
|
|
If the document you want to fetch is too large to be kept in memory,
|
|
then you have two alternatives. You can instruct the library to write
|
|
the document content to a file (second <TT>$ua</TT>-><B>request()</B> argument is a file
|
|
name):
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
$ua = LWP::UserAgent->new;
|
|
|
|
my $req = HTTP::Request->new(GET =>
|
|
'<A HREF="http://www.cpan.org/CPAN/authors/id/O/OA/OALDERS/libwww-perl-6.26.tar.gz');">http://www.cpan.org/CPAN/authors/id/O/OA/OALDERS/libwww-perl-6.26.tar.gz');</A>
|
|
$res = $ua->request($req, "libwww-perl.tar.gz");
|
|
if ($res->is_success) {
|
|
print "ok\n";
|
|
}
|
|
else {
|
|
print $res->status_line, "\n";
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Or you can process the document as it arrives (second <TT>$ua</TT>-><B>request()</B>
|
|
argument is a code reference):
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use LWP::UserAgent;
|
|
$ua = LWP::UserAgent->new;
|
|
$URL = '<A HREF="ftp://ftp.isc.org/pub/rfc/rfc-index.txt';">ftp://ftp.isc.org/pub/rfc/rfc-index.txt';</A>
|
|
|
|
my $expected_length;
|
|
my $bytes_received = 0;
|
|
my $res =
|
|
$ua->request(HTTP::Request->new(GET => $URL),
|
|
sub {
|
|
my($chunk, $res) = @_;
|
|
$bytes_received += length($chunk);
|
|
unless (defined $expected_length) {
|
|
$expected_length = $res->content_length || 0;
|
|
}
|
|
if ($expected_length) {
|
|
printf STDERR "%d%% - ",
|
|
100 * $bytes_received / $expected_length;
|
|
}
|
|
print STDERR "$bytes_received bytes received\n";
|
|
|
|
# XXX Should really do something with the chunk itself
|
|
# print $chunk;
|
|
});
|
|
print $res->status_line, "\n";
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAM"> </A>
|
|
<H2>COPYRIGHT</H2>
|
|
|
|
|
|
|
|
Copyright 1996-2001, Gisle Aas
|
|
<P>
|
|
|
|
This library is free software; you can redistribute it and/or
|
|
modify it under the same terms as Perl itself.
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="2"><A HREF="#lbAC">DESCRIPTION</A><DD>
|
|
<DT id="3"><A HREF="#lbAD">GET</A><DD>
|
|
<DT id="4"><A HREF="#lbAE">HEAD</A><DD>
|
|
<DT id="5"><A HREF="#lbAF">POST</A><DD>
|
|
<DT id="6"><A HREF="#lbAG">PROXIES</A><DD>
|
|
<DT id="7"><A HREF="#lbAH">ACCESS TO PROTECTED DOCUMENTS</A><DD>
|
|
<DT id="8"><A HREF="#lbAI">COOKIES</A><DD>
|
|
<DT id="9"><A HREF="#lbAJ">HTTPS</A><DD>
|
|
<DT id="10"><A HREF="#lbAK">MIRRORING</A><DD>
|
|
<DT id="11"><A HREF="#lbAL">LARGE DOCUMENTS</A><DD>
|
|
<DT id="12"><A HREF="#lbAM">COPYRIGHT</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:47 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|