1005 lines
38 KiB
HTML
1005 lines
38 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of USERFAULTFD</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>USERFAULTFD</H1>
|
|
Section: Linux Programmer's Manual (2)<BR>Updated: 2020-02-09<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
userfaultfd - create a file descriptor for handling page faults in user space
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
<PRE>
|
|
<B>#include <<A HREF="file:///usr/include/sys/types.h">sys/types.h</A>></B>
|
|
<B>#include <<A HREF="file:///usr/include/linux/userfaultfd.h">linux/userfaultfd.h</A>></B>
|
|
|
|
<B>int userfaultfd(int </B><I>flags</I><B>);</B>
|
|
</PRE>
|
|
|
|
<P>
|
|
|
|
<I>Note</I>:
|
|
|
|
There is no glibc wrapper for this system call; see NOTES.
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
<B>userfaultfd</B>()
|
|
|
|
creates a new userfaultfd object that can be used for delegation of page-fault
|
|
handling to a user-space application,
|
|
and returns a file descriptor that refers to the new object.
|
|
The new userfaultfd object is configured using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2).
|
|
|
|
<P>
|
|
|
|
Once the userfaultfd object is configured, the application can use
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2)
|
|
|
|
to receive userfaultfd notifications.
|
|
The reads from userfaultfd may be blocking or non-blocking,
|
|
depending on the value of
|
|
<I>flags</I>
|
|
|
|
used for the creation of the userfaultfd or subsequent calls to
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fcntl">fcntl</A></B>(2).
|
|
|
|
<P>
|
|
|
|
The following values may be bitwise ORed in
|
|
<I>flags</I>
|
|
|
|
to change the behavior of
|
|
<B>userfaultfd</B>():
|
|
|
|
<DL COMPACT>
|
|
<DT id="1"><B>O_CLOEXEC</B>
|
|
|
|
<DD>
|
|
Enable the close-on-exec flag for the new userfaultfd file descriptor.
|
|
See the description of the
|
|
<B>O_CLOEXEC</B>
|
|
|
|
flag in
|
|
<B><A HREF="/cgi-bin/man/man2html?2+open">open</A></B>(2).
|
|
|
|
<DT id="2"><B>O_NONBLOCK</B>
|
|
|
|
<DD>
|
|
Enables non-blocking operation for the userfaultfd object.
|
|
See the description of the
|
|
<B>O_NONBLOCK</B>
|
|
|
|
flag in
|
|
<B><A HREF="/cgi-bin/man/man2html?2+open">open</A></B>(2).
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
When the last file descriptor referring to a userfaultfd object is closed,
|
|
all memory ranges that were registered with the object are unregistered
|
|
and unread events are flushed.
|
|
|
|
<A NAME="lbAE"> </A>
|
|
<H3>Usage</H3>
|
|
|
|
The userfaultfd mechanism is designed to allow a thread in a multithreaded
|
|
program to perform user-space paging for the other threads in the process.
|
|
When a page fault occurs for one of the regions registered
|
|
to the userfaultfd object,
|
|
the faulting thread is put to sleep and
|
|
an event is generated that can be read via the userfaultfd file descriptor.
|
|
The fault-handling thread reads events from this file descriptor and services
|
|
them using the operations described in
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl_userfaultfd">ioctl_userfaultfd</A></B>(2).
|
|
|
|
When servicing the page fault events,
|
|
the fault-handling thread can trigger a wake-up for the sleeping thread.
|
|
<P>
|
|
|
|
It is possible for the faulting threads and the fault-handling threads
|
|
to run in the context of different processes.
|
|
In this case, these threads may belong to different programs,
|
|
and the program that executes the faulting threads
|
|
will not necessarily cooperate with the program that handles the page faults.
|
|
In such non-cooperative mode,
|
|
the process that monitors userfaultfd and handles page faults
|
|
needs to be aware of the changes in the virtual memory layout
|
|
of the faulting process to avoid memory corruption.
|
|
<P>
|
|
|
|
Starting from Linux 4.11,
|
|
userfaultfd can also notify the fault-handling threads about changes
|
|
in the virtual memory layout of the faulting process.
|
|
In addition, if the faulting process invokes
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fork">fork</A></B>(2),
|
|
|
|
the userfaultfd objects associated with the parent may be duplicated
|
|
into the child process and the userfaultfd monitor will be notified
|
|
(via the
|
|
<B>UFFD_EVENT_FORK</B>
|
|
|
|
described below)
|
|
about the file descriptor associated with the userfault objects
|
|
created for the child process,
|
|
which allows the userfaultfd monitor to perform user-space paging
|
|
for the child process.
|
|
Unlike page faults which have to be synchronous and require an
|
|
explicit or implicit wakeup,
|
|
all other events are delivered asynchronously and
|
|
the non-cooperative process resumes execution as
|
|
soon as the userfaultfd manager executes
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2).
|
|
|
|
The userfaultfd manager should carefully synchronize calls to
|
|
<B>UFFDIO_COPY</B>
|
|
|
|
with the processing of events.
|
|
<P>
|
|
|
|
The current asynchronous model of the event delivery is optimal for
|
|
single threaded non-cooperative userfaultfd manager implementations.
|
|
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAF"> </A>
|
|
<H3>Userfaultfd operation</H3>
|
|
|
|
After the userfaultfd object is created with
|
|
<B>userfaultfd</B>(),
|
|
|
|
the application must enable it using the
|
|
<B>UFFDIO_API</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operation.
|
|
This operation allows a handshake between the kernel and user space
|
|
to determine the API version and supported features.
|
|
This operation must be performed before any of the other
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operations described below (or those operations fail with the
|
|
<B>EINVAL</B>
|
|
|
|
error).
|
|
<P>
|
|
|
|
After a successful
|
|
<B>UFFDIO_API</B>
|
|
|
|
operation,
|
|
the application then registers memory address ranges using the
|
|
<B>UFFDIO_REGISTER</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operation.
|
|
After successful completion of a
|
|
<B>UFFDIO_REGISTER</B>
|
|
|
|
operation,
|
|
a page fault occurring in the requested memory range, and satisfying
|
|
the mode defined at the registration time, will be forwarded by the kernel to
|
|
the user-space application.
|
|
The application can then use the
|
|
<B>UFFDIO_COPY</B>
|
|
|
|
or
|
|
<B>UFFDIO_ZEROPAGE</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operations to resolve the page fault.
|
|
<P>
|
|
|
|
Starting from Linux 4.14, if the application sets the
|
|
<B>UFFD_FEATURE_SIGBUS</B>
|
|
|
|
feature bit using the
|
|
<B>UFFDIO_API</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2),
|
|
|
|
no page-fault notification will be forwarded to user space.
|
|
Instead a
|
|
<B>SIGBUS</B>
|
|
|
|
signal is delivered to the faulting process.
|
|
With this feature,
|
|
userfaultfd can be used for robustness purposes to simply catch
|
|
any access to areas within the registered address range that do not
|
|
have pages allocated, without having to listen to userfaultfd events.
|
|
No userfaultfd monitor will be required for dealing with such memory
|
|
accesses.
|
|
For example, this feature can be useful for applications that
|
|
want to prevent the kernel from automatically allocating pages and filling
|
|
holes in sparse files when the hole is accessed through a memory mapping.
|
|
<P>
|
|
|
|
The
|
|
<B>UFFD_FEATURE_SIGBUS</B>
|
|
|
|
feature is implicitly inherited through
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fork">fork</A></B>(2)
|
|
|
|
if used in combination with
|
|
<B>UFFD_FEATURE_FORK</B>.
|
|
|
|
<P>
|
|
|
|
Details of the various
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operations can be found in
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl_userfaultfd">ioctl_userfaultfd</A></B>(2).
|
|
|
|
<P>
|
|
|
|
Since Linux 4.11, events other than page-fault may enabled during
|
|
<B>UFFDIO_API</B>
|
|
|
|
operation.
|
|
<P>
|
|
|
|
Up to Linux 4.11,
|
|
userfaultfd can be used only with anonymous private memory mappings.
|
|
Since Linux 4.11,
|
|
userfaultfd can be also used with hugetlbfs and shared memory mappings.
|
|
<P>
|
|
|
|
|
|
<A NAME="lbAG"> </A>
|
|
<H3>Reading from the userfaultfd structure</H3>
|
|
|
|
Each
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2)
|
|
|
|
from the userfaultfd file descriptor returns one or more
|
|
<I>uffd_msg</I>
|
|
|
|
structures, each of which describes a page-fault event
|
|
or an event required for the non-cooperative userfaultfd usage:
|
|
<P>
|
|
|
|
|
|
|
|
struct uffd_msg {
|
|
<BR> __u8 event; /* Type of event */
|
|
<BR> ...
|
|
<BR> union {
|
|
<BR> struct {
|
|
<BR> __u64 flags; /* Flags describing fault */
|
|
<BR> __u64 address; /* Faulting address */
|
|
<BR> } pagefault;
|
|
<P>
|
|
<BR> struct { /* Since Linux 4.11 */
|
|
<BR> __u32 ufd; /* Userfault file descriptor
|
|
<BR> of the child process */
|
|
<BR> } fork;
|
|
<P>
|
|
<BR> struct { /* Since Linux 4.11 */
|
|
<BR> __u64 from; /* Old address of remapped area */
|
|
<BR> __u64 to; /* New address of remapped area */
|
|
<BR> __u64 len; /* Original mapping length */
|
|
<BR> } remap;
|
|
<P>
|
|
<BR> struct { /* Since Linux 4.11 */
|
|
<BR> __u64 start; /* Start address of removed area */
|
|
<BR> __u64 end; /* End address of removed area */
|
|
<BR> } remove;
|
|
<BR> ...
|
|
<BR> } arg;
|
|
<P>
|
|
<BR> /* Padding fields omitted */
|
|
} __packed;
|
|
|
|
|
|
<P>
|
|
|
|
If multiple events are available and the supplied buffer is large enough,
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2)
|
|
|
|
returns as many events as will fit in the supplied buffer.
|
|
If the buffer supplied to
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2)
|
|
|
|
is smaller than the size of the
|
|
<I>uffd_msg</I>
|
|
|
|
structure, the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2)
|
|
|
|
fails with the error
|
|
<B>EINVAL</B>.
|
|
|
|
<P>
|
|
|
|
The fields set in the
|
|
<I>uffd_msg</I>
|
|
|
|
structure are as follows:
|
|
<DL COMPACT>
|
|
<DT id="3"><I>event</I>
|
|
|
|
<DD>
|
|
The type of event.
|
|
Depending of the event type,
|
|
different fields of the
|
|
<I>arg</I>
|
|
|
|
union represent details required for the event processing.
|
|
The non-page-fault events are generated only when appropriate feature
|
|
is enabled during API handshake with
|
|
<B>UFFDIO_API</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2).
|
|
|
|
<DT id="4"><DD>
|
|
The following values can appear in the
|
|
<I>event</I>
|
|
|
|
field:
|
|
<DL COMPACT><DT id="5"><DD>
|
|
<DL COMPACT>
|
|
<DT id="6"><B>UFFD_EVENT_PAGEFAULT</B> (since Linux 4.3)
|
|
|
|
<DD>
|
|
A page-fault event.
|
|
The page-fault details are available in the
|
|
<I>pagefault</I>
|
|
|
|
field.
|
|
<DT id="7"><B>UFFD_EVENT_FORK</B> (since Linux 4.11)
|
|
|
|
<DD>
|
|
Generated when the faulting process invokes
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fork">fork</A></B>(2)
|
|
|
|
(or
|
|
<B><A HREF="/cgi-bin/man/man2html?2+clone">clone</A></B>(2)
|
|
|
|
without the
|
|
<B>CLONE_VM</B>
|
|
|
|
flag).
|
|
The event details are available in the
|
|
<I>fork</I>
|
|
|
|
field.
|
|
|
|
<DT id="8"><B>UFFD_EVENT_REMAP</B> (since Linux 4.11)
|
|
|
|
<DD>
|
|
Generated when the faulting process invokes
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mremap">mremap</A></B>(2).
|
|
|
|
The event details are available in the
|
|
<I>remap</I>
|
|
|
|
field.
|
|
<DT id="9"><B>UFFD_EVENT_REMOVE</B> (since Linux 4.11)
|
|
|
|
<DD>
|
|
Generated when the faulting process invokes
|
|
<B><A HREF="/cgi-bin/man/man2html?2+madvise">madvise</A></B>(2)
|
|
|
|
with
|
|
<B>MADV_DONTNEED</B>
|
|
|
|
or
|
|
<B>MADV_REMOVE</B>
|
|
|
|
advice.
|
|
The event details are available in the
|
|
<I>remove</I>
|
|
|
|
field.
|
|
<DT id="10"><B>UFFD_EVENT_UNMAP</B> (since Linux 4.11)
|
|
|
|
<DD>
|
|
Generated when the faulting process unmaps a memory range,
|
|
either explicitly using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+munmap">munmap</A></B>(2)
|
|
|
|
or implicitly during
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mmap">mmap</A></B>(2)
|
|
|
|
or
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mremap">mremap</A></B>(2).
|
|
|
|
The event details are available in the
|
|
<I>remove</I>
|
|
|
|
field.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="11"><I>pagefault.address</I>
|
|
|
|
<DD>
|
|
The address that triggered the page fault.
|
|
<DT id="12"><I>pagefault.flags</I>
|
|
|
|
<DD>
|
|
A bit mask of flags that describe the event.
|
|
For
|
|
<B>UFFD_EVENT_PAGEFAULT</B>,
|
|
|
|
the following flag may appear:
|
|
<DL COMPACT><DT id="13"><DD>
|
|
<DL COMPACT>
|
|
<DT id="14"><B>UFFD_PAGEFAULT_FLAG_WRITE</B>
|
|
|
|
<DD>
|
|
If the address is in a range that was registered with the
|
|
<B>UFFDIO_REGISTER_MODE_MISSING</B>
|
|
|
|
flag (see
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl_userfaultfd">ioctl_userfaultfd</A></B>(2))
|
|
|
|
and this flag is set, this a write fault;
|
|
otherwise it is a read fault.
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="15"><I>fork.ufd</I>
|
|
|
|
<DD>
|
|
The file descriptor associated with the userfault object
|
|
created for the child created by
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fork">fork</A></B>(2).
|
|
|
|
<DT id="16"><I>remap.from</I>
|
|
|
|
<DD>
|
|
The original address of the memory range that was remapped using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mremap">mremap</A></B>(2).
|
|
|
|
<DT id="17"><I>remap.to</I>
|
|
|
|
<DD>
|
|
The new address of the memory range that was remapped using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mremap">mremap</A></B>(2).
|
|
|
|
<DT id="18"><I>remap.len</I>
|
|
|
|
<DD>
|
|
The original length of the memory range that was remapped using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mremap">mremap</A></B>(2).
|
|
|
|
<DT id="19"><I>remove.start</I>
|
|
|
|
<DD>
|
|
The start address of the memory range that was freed using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+madvise">madvise</A></B>(2)
|
|
|
|
or unmapped
|
|
<DT id="20"><I>remove.end</I>
|
|
|
|
<DD>
|
|
The end address of the memory range that was freed using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+madvise">madvise</A></B>(2)
|
|
|
|
or unmapped
|
|
</DL>
|
|
<P>
|
|
|
|
A
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2)
|
|
|
|
on a userfaultfd file descriptor can fail with the following errors:
|
|
<DL COMPACT>
|
|
<DT id="21"><B>EINVAL</B>
|
|
|
|
<DD>
|
|
The userfaultfd object has not yet been enabled using the
|
|
<B>UFFDIO_API</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operation
|
|
</DL>
|
|
<P>
|
|
|
|
If the
|
|
<B>O_NONBLOCK</B>
|
|
|
|
flag is enabled in the associated open file description,
|
|
the userfaultfd file descriptor can be monitored with
|
|
<B><A HREF="/cgi-bin/man/man2html?2+poll">poll</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+select">select</A></B>(2),
|
|
|
|
and
|
|
<B><A HREF="/cgi-bin/man/man2html?7+epoll">epoll</A></B>(7).
|
|
|
|
When events are available, the file descriptor indicates as readable.
|
|
If the
|
|
<B>O_NONBLOCK</B>
|
|
|
|
flag is not enabled, then
|
|
<B><A HREF="/cgi-bin/man/man2html?2+poll">poll</A></B>(2)
|
|
|
|
(always) indicates the file as having a
|
|
<B>POLLERR</B>
|
|
|
|
condition, and
|
|
<B><A HREF="/cgi-bin/man/man2html?2+select">select</A></B>(2)
|
|
|
|
indicates the file descriptor as both readable and writable.
|
|
|
|
|
|
|
|
<A NAME="lbAH"> </A>
|
|
<H2>RETURN VALUE</H2>
|
|
|
|
On success,
|
|
<B>userfaultfd</B>()
|
|
|
|
returns a new file descriptor that refers to the userfaultfd object.
|
|
On error, -1 is returned, and
|
|
<I>errno</I>
|
|
|
|
is set appropriately.
|
|
<A NAME="lbAI"> </A>
|
|
<H2>ERRORS</H2>
|
|
|
|
<DL COMPACT>
|
|
<DT id="22"><B>EINVAL</B>
|
|
|
|
<DD>
|
|
An unsupported value was specified in
|
|
<I>flags</I>.
|
|
|
|
<DT id="23"><B>EMFILE</B>
|
|
|
|
<DD>
|
|
The per-process limit on the number of open file descriptors has been
|
|
reached
|
|
<DT id="24"><B>ENFILE</B>
|
|
|
|
<DD>
|
|
The system-wide limit on the total number of open files has been
|
|
reached.
|
|
<DT id="25"><B>ENOMEM</B>
|
|
|
|
<DD>
|
|
Insufficient kernel memory was available.
|
|
<DT id="26"><B>EPERM</B> (since Linux 5.2)
|
|
|
|
<DD>
|
|
|
|
The caller is not privileged (does not have the
|
|
<B>CAP_SYS_PTRACE</B>
|
|
|
|
capability in the initial user namespace), and
|
|
<I>/proc/sys/vm/unprivileged_userfaultfd</I>
|
|
|
|
has the value 0.
|
|
</DL>
|
|
<A NAME="lbAJ"> </A>
|
|
<H2>VERSIONS</H2>
|
|
|
|
The
|
|
<B>userfaultfd</B>()
|
|
|
|
system call first appeared in Linux 4.3.
|
|
<P>
|
|
|
|
The support for hugetlbfs and shared memory areas and
|
|
non-page-fault events was added in Linux 4.11
|
|
<A NAME="lbAK"> </A>
|
|
<H2>CONFORMING TO</H2>
|
|
|
|
<B>userfaultfd</B>()
|
|
|
|
is Linux-specific and should not be used in programs intended to be
|
|
portable.
|
|
<A NAME="lbAL"> </A>
|
|
<H2>NOTES</H2>
|
|
|
|
Glibc does not provide a wrapper for this system call; call it using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+syscall">syscall</A></B>(2).
|
|
|
|
<P>
|
|
|
|
The userfaultfd mechanism can be used as an alternative to
|
|
traditional user-space paging techniques based on the use of the
|
|
<B>SIGSEGV</B>
|
|
|
|
signal and
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mmap">mmap</A></B>(2).
|
|
|
|
It can also be used to implement lazy restore
|
|
for checkpoint/restore mechanisms,
|
|
as well as post-copy migration to allow (nearly) uninterrupted execution
|
|
when transferring virtual machines and Linux containers
|
|
from one host to another.
|
|
<A NAME="lbAM"> </A>
|
|
<H2>BUGS</H2>
|
|
|
|
If the
|
|
<B>UFFD_FEATURE_EVENT_FORK</B>
|
|
|
|
is enabled and a system call from the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fork">fork</A></B>(2)
|
|
|
|
family is interrupted by a signal or failed, a stale userfaultfd descriptor
|
|
might be created.
|
|
In this case, a spurious
|
|
<B>UFFD_EVENT_FORK</B>
|
|
|
|
will be delivered to the userfaultfd monitor.
|
|
<A NAME="lbAN"> </A>
|
|
<H2>EXAMPLE</H2>
|
|
|
|
The program below demonstrates the use of the userfaultfd mechanism.
|
|
The program creates two threads, one of which acts as the
|
|
page-fault handler for the process, for the pages in a demand-page zero
|
|
region created using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mmap">mmap</A></B>(2).
|
|
|
|
<P>
|
|
|
|
The program takes one command-line argument,
|
|
which is the number of pages that will be created in a mapping
|
|
whose page faults will be handled via userfaultfd.
|
|
After creating a userfaultfd object,
|
|
the program then creates an anonymous private mapping of the specified size
|
|
and registers the address range of that mapping using the
|
|
<B>UFFDIO_REGISTER</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operation.
|
|
The program then creates a second thread that will perform the
|
|
task of handling page faults.
|
|
<P>
|
|
|
|
The main thread then walks through the pages of the mapping fetching
|
|
bytes from successive pages.
|
|
Because the pages have not yet been accessed,
|
|
the first access of a byte in each page will trigger a page-fault event
|
|
on the userfaultfd file descriptor.
|
|
<P>
|
|
|
|
Each of the page-fault events is handled by the second thread,
|
|
which sits in a loop processing input from the userfaultfd file descriptor.
|
|
In each loop iteration, the second thread first calls
|
|
<B><A HREF="/cgi-bin/man/man2html?2+poll">poll</A></B>(2)
|
|
|
|
to check the state of the file descriptor,
|
|
and then reads an event from the file descriptor.
|
|
All such events should be
|
|
<B>UFFD_EVENT_PAGEFAULT</B>
|
|
|
|
events,
|
|
which the thread handles by copying a page of data into
|
|
the faulting region using the
|
|
<B>UFFDIO_COPY</B>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2)
|
|
|
|
operation.
|
|
<P>
|
|
|
|
The following is an example of what we see when running the program:
|
|
<P>
|
|
|
|
|
|
|
|
$ <B>./userfaultfd_demo 3</B>
|
|
Address returned by mmap() = 0x7fd30106c000
|
|
<P>
|
|
fault_handler_thread():
|
|
<BR> poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
|
|
<BR> UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106c00f
|
|
<BR> (uffdio_copy.copy returned 4096)
|
|
Read address 0x7fd30106c00f in main(): A
|
|
Read address 0x7fd30106c40f in main(): A
|
|
Read address 0x7fd30106c80f in main(): A
|
|
Read address 0x7fd30106cc0f in main(): A
|
|
<P>
|
|
fault_handler_thread():
|
|
<BR> poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
|
|
<BR> UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106d00f
|
|
<BR> (uffdio_copy.copy returned 4096)
|
|
Read address 0x7fd30106d00f in main(): B
|
|
Read address 0x7fd30106d40f in main(): B
|
|
Read address 0x7fd30106d80f in main(): B
|
|
Read address 0x7fd30106dc0f in main(): B
|
|
<P>
|
|
fault_handler_thread():
|
|
<BR> poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
|
|
<BR> UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106e00f
|
|
<BR> (uffdio_copy.copy returned 4096)
|
|
Read address 0x7fd30106e00f in main(): C
|
|
Read address 0x7fd30106e40f in main(): C
|
|
Read address 0x7fd30106e80f in main(): C
|
|
Read address 0x7fd30106ec0f in main(): C
|
|
|
|
|
|
<A NAME="lbAO"> </A>
|
|
<H3>Program source</H3>
|
|
|
|
|
|
|
|
/* userfaultfd_demo.c
|
|
<P>
|
|
<BR> Licensed under the GNU General Public License version 2 or later.
|
|
*/
|
|
#define _GNU_SOURCE
|
|
#include <<A HREF="file:///usr/include/sys/types.h">sys/types.h</A>>
|
|
#include <<A HREF="file:///usr/include/stdio.h">stdio.h</A>>
|
|
#include <<A HREF="file:///usr/include/linux/userfaultfd.h">linux/userfaultfd.h</A>>
|
|
#include <<A HREF="file:///usr/include/pthread.h">pthread.h</A>>
|
|
#include <<A HREF="file:///usr/include/errno.h">errno.h</A>>
|
|
#include <<A HREF="file:///usr/include/unistd.h">unistd.h</A>>
|
|
#include <<A HREF="file:///usr/include/stdlib.h">stdlib.h</A>>
|
|
#include <<A HREF="file:///usr/include/fcntl.h">fcntl.h</A>>
|
|
#include <<A HREF="file:///usr/include/signal.h">signal.h</A>>
|
|
#include <<A HREF="file:///usr/include/poll.h">poll.h</A>>
|
|
#include <<A HREF="file:///usr/include/string.h">string.h</A>>
|
|
#include <<A HREF="file:///usr/include/sys/mman.h">sys/mman.h</A>>
|
|
#include <<A HREF="file:///usr/include/sys/syscall.h">sys/syscall.h</A>>
|
|
#include <<A HREF="file:///usr/include/sys/ioctl.h">sys/ioctl.h</A>>
|
|
#include <<A HREF="file:///usr/include/poll.h">poll.h</A>>
|
|
<P>
|
|
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
|
|
<BR> } while (0)
|
|
<P>
|
|
static int page_size;
|
|
<P>
|
|
static void *
|
|
fault_handler_thread(void *arg)
|
|
{
|
|
<BR> static struct uffd_msg msg; /* Data read from userfaultfd */
|
|
<BR> static int fault_cnt = 0; /* Number of faults so far handled */
|
|
<BR> long uffd; /* userfaultfd file descriptor */
|
|
<BR> static char *page = NULL;
|
|
<BR> struct uffdio_copy uffdio_copy;
|
|
<BR> ssize_t nread;
|
|
<P>
|
|
<BR> uffd = (long) arg;
|
|
<P>
|
|
<BR> /* Create a page that will be copied into the faulting region */
|
|
<P>
|
|
<BR> if (page == NULL) {
|
|
<BR> page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
|
|
<BR> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
|
|
<BR> if (page == MAP_FAILED)
|
|
<BR> errExit("mmap");
|
|
<BR> }
|
|
<P>
|
|
<BR> /* Loop, handling incoming events on the userfaultfd
|
|
<BR> file descriptor */
|
|
<P>
|
|
<BR> for (;;) {
|
|
<P>
|
|
<BR> /* See what poll() tells us about the userfaultfd */
|
|
<P>
|
|
<BR> struct pollfd pollfd;
|
|
<BR> int nready;
|
|
<BR> pollfd.fd = uffd;
|
|
<BR> pollfd.events = POLLIN;
|
|
<BR> nready = poll(&pollfd, 1, -1);
|
|
<BR> if (nready == -1)
|
|
<BR> errExit("poll");
|
|
<P>
|
|
<BR> printf("\nfault_handler_thread():\n");
|
|
<BR> printf(" poll() returns: nready = %d; "
|
|
<BR> "POLLIN = %d; POLLERR = %d\n", nready,
|
|
<BR> (pollfd.revents & POLLIN) != 0,
|
|
<BR> (pollfd.revents & POLLERR) != 0);
|
|
<P>
|
|
<BR> /* Read an event from the userfaultfd */
|
|
<P>
|
|
<BR> nread = read(uffd, &msg, sizeof(msg));
|
|
<BR> if (nread == 0) {
|
|
<BR> printf("EOF on userfaultfd!\n");
|
|
<BR> exit(EXIT_FAILURE);
|
|
<BR> }
|
|
<P>
|
|
<BR> if (nread == -1)
|
|
<BR> errExit("read");
|
|
<P>
|
|
<BR> /* We expect only one kind of event; verify that assumption */
|
|
<P>
|
|
<BR> if (msg.event != UFFD_EVENT_PAGEFAULT) {
|
|
<BR> fprintf(stderr, "Unexpected event on userfaultfd\n");
|
|
<BR> exit(EXIT_FAILURE);
|
|
<BR> }
|
|
<P>
|
|
<BR> /* Display info about the page-fault event */
|
|
<P>
|
|
<BR> printf(" UFFD_EVENT_PAGEFAULT event: ");
|
|
<BR> printf("flags = %llx; ", msg.arg.pagefault.flags);
|
|
<BR> printf("address = %llx\n", msg.arg.pagefault.address);
|
|
<P>
|
|
<BR> /* Copy the page pointed to by 'page' into the faulting
|
|
<BR> region. Vary the contents that are copied in, so that it
|
|
<BR> is more obvious that each fault is handled separately. */
|
|
<P>
|
|
<BR> memset(page, 'A' + fault_cnt % 20, page_size);
|
|
<BR> fault_cnt++;
|
|
<P>
|
|
<BR> uffdio_copy.src = (unsigned long) page;
|
|
<P>
|
|
<BR> /* We need to handle page faults in units of pages(!).
|
|
<BR> So, round faulting address down to page boundary */
|
|
<P>
|
|
<BR> uffdio_copy.dst = (unsigned long) msg.arg.pagefault.address &
|
|
<BR> ~(page_size - 1);
|
|
<BR> uffdio_copy.len = page_size;
|
|
<BR> uffdio_copy.mode = 0;
|
|
<BR> uffdio_copy.copy = 0;
|
|
<BR> if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
|
|
<BR> errExit("ioctl-UFFDIO_COPY");
|
|
<P>
|
|
<BR> printf(" (uffdio_copy.copy returned %lld)\n",
|
|
<BR> uffdio_copy.copy);
|
|
<BR> }
|
|
}
|
|
<P>
|
|
int
|
|
main(int argc, char *argv[])
|
|
{
|
|
<BR> long uffd; /* userfaultfd file descriptor */
|
|
<BR> char *addr; /* Start of region handled by userfaultfd */
|
|
<BR> unsigned long len; /* Length of region handled by userfaultfd */
|
|
<BR> pthread_t thr; /* ID of thread that handles page faults */
|
|
<BR> struct uffdio_api uffdio_api;
|
|
<BR> struct uffdio_register uffdio_register;
|
|
<BR> int s;
|
|
<P>
|
|
<BR> if (argc != 2) {
|
|
<BR> fprintf(stderr, "Usage: %s num-pages\n", argv[0]);
|
|
<BR> exit(EXIT_FAILURE);
|
|
<BR> }
|
|
<P>
|
|
<BR> page_size = sysconf(_SC_PAGE_SIZE);
|
|
<BR> len = strtoul(argv[1], NULL, 0) * page_size;
|
|
<P>
|
|
<BR> /* Create and enable userfaultfd object */
|
|
<P>
|
|
<BR> uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
|
|
<BR> if (uffd == -1)
|
|
<BR> errExit("userfaultfd");
|
|
<P>
|
|
<BR> uffdio_api.api = UFFD_API;
|
|
<BR> uffdio_api.features = 0;
|
|
<BR> if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
|
|
<BR> errExit("ioctl-UFFDIO_API");
|
|
<P>
|
|
<BR> /* Create a private anonymous mapping. The memory will be
|
|
<BR> demand-zero paged--that is, not yet allocated. When we
|
|
<BR> actually touch the memory, it will be allocated via
|
|
<BR> the userfaultfd. */
|
|
<P>
|
|
<BR> addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
|
|
<BR> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
|
|
<BR> if (addr == MAP_FAILED)
|
|
<BR> errExit("mmap");
|
|
<P>
|
|
<BR> printf("Address returned by mmap() = %p\n", addr);
|
|
<P>
|
|
<BR> /* Register the memory range of the mapping we just created for
|
|
<BR> handling by the userfaultfd object. In mode, we request to track
|
|
<BR> missing pages (i.e., pages that have not yet been faulted in). */
|
|
<P>
|
|
<BR> uffdio_register.range.start = (unsigned long) addr;
|
|
<BR> uffdio_register.range.len = len;
|
|
<BR> uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
|
|
<BR> if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
|
|
<BR> errExit("ioctl-UFFDIO_REGISTER");
|
|
<P>
|
|
<BR> /* Create a thread that will process the userfaultfd events */
|
|
<P>
|
|
<BR> s = pthread_create(&thr, NULL, fault_handler_thread, (void *) uffd);
|
|
<BR> if (s != 0) {
|
|
<BR> errno = s;
|
|
<BR> errExit("pthread_create");
|
|
<BR> }
|
|
<P>
|
|
<BR> /* Main thread now touches memory in the mapping, touching
|
|
<BR> locations 1024 bytes apart. This will trigger userfaultfd
|
|
<BR> events for all pages in the region. */
|
|
<P>
|
|
<BR> int l;
|
|
<BR> l = 0xf; /* Ensure that faulting address is not on a page
|
|
<BR> boundary, in order to test that we correctly
|
|
<BR> handle that case in fault_handling_thread() */
|
|
<BR> while (l < len) {
|
|
<BR> char c = addr[l];
|
|
<BR> printf("Read address %p in main(): ", addr + l);
|
|
<BR> printf("%c\n", c);
|
|
<BR> l += 1024;
|
|
<BR> usleep(100000); /* Slow things down a little */
|
|
<BR> }
|
|
<P>
|
|
<BR> exit(EXIT_SUCCESS);
|
|
}
|
|
|
|
<A NAME="lbAP"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fcntl">fcntl</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl">ioctl</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ioctl_userfaultfd">ioctl_userfaultfd</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+madvise">madvise</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mmap">mmap</A></B>(2)
|
|
|
|
<P>
|
|
|
|
<I>Documentation/admin-guide/mm/userfaultfd.rst</I>
|
|
|
|
in the Linux kernel source tree
|
|
<P>
|
|
|
|
<A NAME="lbAQ"> </A>
|
|
<H2>COLOPHON</H2>
|
|
|
|
This page is part of release 5.05 of the Linux
|
|
<I>man-pages</I>
|
|
|
|
project.
|
|
A description of the project,
|
|
information about reporting bugs,
|
|
and the latest version of this page,
|
|
can be found at
|
|
<A HREF="https://www.kernel.org/doc/man-pages/.">https://www.kernel.org/doc/man-pages/.</A>
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="27"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="28"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="29"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DL>
|
|
<DT id="30"><A HREF="#lbAE">Usage</A><DD>
|
|
<DT id="31"><A HREF="#lbAF">Userfaultfd operation</A><DD>
|
|
<DT id="32"><A HREF="#lbAG">Reading from the userfaultfd structure</A><DD>
|
|
</DL>
|
|
<DT id="33"><A HREF="#lbAH">RETURN VALUE</A><DD>
|
|
<DT id="34"><A HREF="#lbAI">ERRORS</A><DD>
|
|
<DT id="35"><A HREF="#lbAJ">VERSIONS</A><DD>
|
|
<DT id="36"><A HREF="#lbAK">CONFORMING TO</A><DD>
|
|
<DT id="37"><A HREF="#lbAL">NOTES</A><DD>
|
|
<DT id="38"><A HREF="#lbAM">BUGS</A><DD>
|
|
<DT id="39"><A HREF="#lbAN">EXAMPLE</A><DD>
|
|
<DL>
|
|
<DT id="40"><A HREF="#lbAO">Program source</A><DD>
|
|
</DL>
|
|
<DT id="41"><A HREF="#lbAP">SEE ALSO</A><DD>
|
|
<DT id="42"><A HREF="#lbAQ">COLOPHON</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:35 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|