1518 lines
43 KiB
HTML
1518 lines
43 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of SECCOMP</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>SECCOMP</H1>
|
|
Section: Linux Programmer's Manual (2)<BR>Updated: 2019-11-19<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
seccomp - operate on Secure Computing state of the process
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
<PRE>
|
|
<B>#include <<A HREF="file:///usr/include/linux/seccomp.h">linux/seccomp.h</A>></B>
|
|
<B>#include <<A HREF="file:///usr/include/linux/filter.h">linux/filter.h</A>></B>
|
|
<B>#include <<A HREF="file:///usr/include/linux/audit.h">linux/audit.h</A>></B>
|
|
<B>#include <<A HREF="file:///usr/include/linux/signal.h">linux/signal.h</A>></B>
|
|
<B>#include <<A HREF="file:///usr/include/sys/ptrace.h">sys/ptrace.h</A>></B>
|
|
|
|
<B>int seccomp(unsigned int </B><I>operation</I><B>, unsigned int </B><I>flags</I><B>, void *</B><I>args</I><B>);</B>
|
|
</PRE>
|
|
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
The
|
|
<B>seccomp</B>()
|
|
|
|
system call operates on the Secure Computing (seccomp) state of the
|
|
calling process.
|
|
<P>
|
|
|
|
Currently, Linux supports the following
|
|
<I>operation</I>
|
|
|
|
values:
|
|
<DL COMPACT>
|
|
<DT id="1"><B>SECCOMP_SET_MODE_STRICT</B>
|
|
|
|
<DD>
|
|
The only system calls that the calling thread is permitted to make are
|
|
<B><A HREF="/cgi-bin/man/man2html?2+read">read</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+write">write</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+_exit">_exit</A></B>(2)
|
|
|
|
(but not
|
|
<B><A HREF="/cgi-bin/man/man2html?2+exit_group">exit_group</A></B>(2)),
|
|
|
|
and
|
|
<B><A HREF="/cgi-bin/man/man2html?2+sigreturn">sigreturn</A></B>(2).
|
|
|
|
Other system calls result in the delivery of a
|
|
<B>SIGKILL</B>
|
|
|
|
signal.
|
|
Strict secure computing mode is useful for number-crunching
|
|
applications that may need to execute untrusted byte code, perhaps
|
|
obtained by reading from a pipe or socket.
|
|
<DT id="2"><DD>
|
|
Note that although the calling thread can no longer call
|
|
<B><A HREF="/cgi-bin/man/man2html?2+sigprocmask">sigprocmask</A></B>(2),
|
|
|
|
it can use
|
|
<B><A HREF="/cgi-bin/man/man2html?2+sigreturn">sigreturn</A></B>(2)
|
|
|
|
to block all signals apart from
|
|
<B>SIGKILL</B>
|
|
|
|
and
|
|
<B>SIGSTOP</B>.
|
|
|
|
This means that
|
|
<B><A HREF="/cgi-bin/man/man2html?2+alarm">alarm</A></B>(2)
|
|
|
|
(for example) is not sufficient for restricting the process's execution time.
|
|
Instead, to reliably terminate the process,
|
|
<B>SIGKILL</B>
|
|
|
|
must be used.
|
|
This can be done by using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+timer_create">timer_create</A></B>(2)
|
|
|
|
with
|
|
<B>SIGEV_SIGNAL</B>
|
|
|
|
and
|
|
<I>sigev_signo</I>
|
|
|
|
set to
|
|
<B>SIGKILL</B>,
|
|
|
|
or by using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+setrlimit">setrlimit</A></B>(2)
|
|
|
|
to set the hard limit for
|
|
<B>RLIMIT_CPU</B>.
|
|
|
|
<DT id="3"><DD>
|
|
This operation is available only if the kernel is configured with
|
|
<B>CONFIG_SECCOMP</B>
|
|
|
|
enabled.
|
|
<DT id="4"><DD>
|
|
The value of
|
|
<I>flags</I>
|
|
|
|
must be 0, and
|
|
<I>args</I>
|
|
|
|
must be NULL.
|
|
<DT id="5"><DD>
|
|
This operation is functionally identical to the call:
|
|
<DT id="6"><DD>
|
|
|
|
|
|
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
|
|
|
|
|
|
<DT id="7"><B>SECCOMP_SET_MODE_FILTER</B>
|
|
|
|
<DD>
|
|
The system calls allowed are defined by a pointer to a Berkeley Packet
|
|
Filter (BPF) passed via
|
|
<I>args</I>.
|
|
|
|
This argument is a pointer to a
|
|
<I>struct sock_fprog</I>;
|
|
|
|
it can be designed to filter arbitrary system calls and system call
|
|
arguments.
|
|
If the filter is invalid,
|
|
<B>seccomp</B>()
|
|
|
|
fails, returning
|
|
<B>EINVAL</B>
|
|
|
|
in
|
|
<I>errno</I>.
|
|
|
|
<DT id="8"><DD>
|
|
If
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fork">fork</A></B>(2)
|
|
|
|
or
|
|
<B><A HREF="/cgi-bin/man/man2html?2+clone">clone</A></B>(2)
|
|
|
|
is allowed by the filter, any child processes will be constrained to
|
|
the same system call filters as the parent.
|
|
If
|
|
<B><A HREF="/cgi-bin/man/man2html?2+execve">execve</A></B>(2)
|
|
|
|
is allowed,
|
|
the existing filters will be preserved across a call to
|
|
<B><A HREF="/cgi-bin/man/man2html?2+execve">execve</A></B>(2).
|
|
|
|
<DT id="9"><DD>
|
|
In order to use the
|
|
<B>SECCOMP_SET_MODE_FILTER</B>
|
|
|
|
operation, either the calling thread must have the
|
|
<B>CAP_SYS_ADMIN</B>
|
|
|
|
capability in its user namespace, or the thread must already have the
|
|
<I>no_new_privs</I>
|
|
|
|
bit set.
|
|
If that bit was not already set by an ancestor of this thread,
|
|
the thread must make the following call:
|
|
<DT id="10"><DD>
|
|
|
|
|
|
prctl(PR_SET_NO_NEW_PRIVS, 1);
|
|
|
|
|
|
<DT id="11"><DD>
|
|
Otherwise, the
|
|
<B>SECCOMP_SET_MODE_FILTER</B>
|
|
|
|
operation fails and returns
|
|
<B>EACCES</B>
|
|
|
|
in
|
|
<I>errno</I>.
|
|
|
|
This requirement ensures that an unprivileged process cannot apply
|
|
a malicious filter and then invoke a set-user-ID or
|
|
other privileged program using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+execve">execve</A></B>(2),
|
|
|
|
thus potentially compromising that program.
|
|
(Such a malicious filter might, for example, cause an attempt to use
|
|
<B><A HREF="/cgi-bin/man/man2html?2+setuid">setuid</A></B>(2)
|
|
|
|
to set the caller's user IDs to nonzero values to instead
|
|
return 0 without actually making the system call.
|
|
Thus, the program might be tricked into retaining superuser privileges
|
|
in circumstances where it is possible to influence it to do
|
|
dangerous things because it did not actually drop privileges.)
|
|
<DT id="12"><DD>
|
|
If
|
|
<B><A HREF="/cgi-bin/man/man2html?2+prctl">prctl</A></B>(2)
|
|
|
|
or
|
|
<B>seccomp</B>()
|
|
|
|
is allowed by the attached filter, further filters may be added.
|
|
This will increase evaluation time, but allows for further reduction of
|
|
the attack surface during execution of a thread.
|
|
<DT id="13"><DD>
|
|
The
|
|
<B>SECCOMP_SET_MODE_FILTER</B>
|
|
|
|
operation is available only if the kernel is configured with
|
|
<B>CONFIG_SECCOMP_FILTER</B>
|
|
|
|
enabled.
|
|
<DT id="14"><DD>
|
|
When
|
|
<I>flags</I>
|
|
|
|
is 0, this operation is functionally identical to the call:
|
|
<DT id="15"><DD>
|
|
|
|
|
|
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, args);
|
|
|
|
|
|
<DT id="16"><DD>
|
|
The recognized
|
|
<I>flags</I>
|
|
|
|
are:
|
|
<DL COMPACT><DT id="17"><DD>
|
|
<DL COMPACT>
|
|
<DT id="18"><B>SECCOMP_FILTER_FLAG_TSYNC</B>
|
|
|
|
<DD>
|
|
When adding a new filter, synchronize all other threads of the calling
|
|
process to the same seccomp filter tree.
|
|
A "filter tree" is the ordered list of filters attached to a thread.
|
|
(Attaching identical filters in separate
|
|
<B>seccomp</B>()
|
|
|
|
calls results in different filters from this perspective.)
|
|
<DT id="19"><DD>
|
|
If any thread cannot synchronize to the same filter tree,
|
|
the call will not attach the new seccomp filter,
|
|
and will fail, returning the first thread ID found that cannot synchronize.
|
|
Synchronization will fail if another thread in the same process is in
|
|
<B>SECCOMP_MODE_STRICT</B>
|
|
|
|
or if it has attached new seccomp filters to itself,
|
|
diverging from the calling thread's filter tree.
|
|
<DT id="20"><B>SECCOMP_FILTER_FLAG_LOG</B> (since Linux 4.14)
|
|
|
|
<DD>
|
|
|
|
All filter return actions except
|
|
<B>SECCOMP_RET_ALLOW</B>
|
|
|
|
should be logged.
|
|
An administrator may override this filter flag by preventing specific
|
|
actions from being logged via the
|
|
<I>/proc/sys/kernel/seccomp/actions_logged</I>
|
|
|
|
file.
|
|
<DT id="21"><B>SECCOMP_FILTER_FLAG_SPEC_ALLOW</B> (since Linux 4.17)
|
|
|
|
<DD>
|
|
|
|
Disable Speculative Store Bypass mitigation.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="22"><B>SECCOMP_GET_ACTION_AVAIL</B> (since Linux 4.14)
|
|
|
|
<DD>
|
|
|
|
Test to see if an action is supported by the kernel.
|
|
This operation is helpful to confirm that the kernel knows
|
|
of a more recently added filter return action
|
|
since the kernel treats all unknown actions as
|
|
<B>SECCOMP_RET_KILL_PROCESS</B>.
|
|
|
|
<DT id="23"><DD>
|
|
The value of
|
|
<I>flags</I>
|
|
|
|
must be 0, and
|
|
<I>args</I>
|
|
|
|
must be a pointer to an unsigned 32-bit filter return action.
|
|
</DL>
|
|
<A NAME="lbAE"> </A>
|
|
<H3>Filters</H3>
|
|
|
|
When adding filters via
|
|
<B>SECCOMP_SET_MODE_FILTER</B>,
|
|
|
|
<I>args</I>
|
|
|
|
points to a filter program:
|
|
<P>
|
|
|
|
|
|
|
|
struct sock_fprog {
|
|
<BR> unsigned short len; /* Number of BPF instructions */
|
|
<BR> struct sock_filter *filter; /* Pointer to array of
|
|
<BR> BPF instructions */
|
|
};
|
|
|
|
|
|
<P>
|
|
|
|
Each program must contain one or more BPF instructions:
|
|
<P>
|
|
|
|
|
|
|
|
struct sock_filter { /* Filter block */
|
|
<BR> __u16 code; /* Actual filter code */
|
|
<BR> __u8 jt; /* Jump true */
|
|
<BR> __u8 jf; /* Jump false */
|
|
<BR> __u32 k; /* Generic multiuse field */
|
|
};
|
|
|
|
|
|
<P>
|
|
|
|
When executing the instructions, the BPF program operates on the
|
|
system call information made available (i.e., use the
|
|
<B>BPF_ABS</B>
|
|
|
|
addressing mode) as a (read-only)
|
|
|
|
|
|
|
|
|
|
|
|
buffer of the following form:
|
|
<P>
|
|
|
|
|
|
|
|
struct seccomp_data {
|
|
<BR> int nr; /* System call number */
|
|
<BR> __u32 arch; /* AUDIT_ARCH_* value
|
|
<BR> (see <<A HREF="file:///usr/include/linux/audit.h">linux/audit.h</A>>) */
|
|
<BR> __u64 instruction_pointer; /* CPU instruction pointer */
|
|
<BR> __u64 args[6]; /* Up to 6 system call arguments */
|
|
};
|
|
|
|
|
|
<P>
|
|
|
|
Because numbering of system calls varies between architectures and
|
|
some architectures (e.g., x86-64) allow user-space code to use
|
|
the calling conventions of multiple architectures
|
|
(and the convention being used may vary over the life of a process that uses
|
|
<B><A HREF="/cgi-bin/man/man2html?2+execve">execve</A></B>(2)
|
|
|
|
to execute binaries that employ the different conventions),
|
|
it is usually necessary to verify the value of the
|
|
<I>arch</I>
|
|
|
|
field.
|
|
<P>
|
|
|
|
It is strongly recommended to use an allow-list approach whenever
|
|
possible because such an approach is more robust and simple.
|
|
A deny-list will have to be updated whenever a potentially
|
|
dangerous system call is added (or a dangerous flag or option if those
|
|
are deny-listed), and it is often possible to alter the
|
|
representation of a value without altering its meaning, leading to
|
|
a deny-list bypass.
|
|
See also
|
|
<I>Caveats</I>
|
|
|
|
below.
|
|
<P>
|
|
|
|
The
|
|
<I>arch</I>
|
|
|
|
field is not unique for all calling conventions.
|
|
The x86-64 ABI and the x32 ABI both use
|
|
<B>AUDIT_ARCH_X86_64</B>
|
|
|
|
as
|
|
<I>arch</I>,
|
|
|
|
and they run on the same processors.
|
|
Instead, the mask
|
|
<B>__X32_SYSCALL_BIT</B>
|
|
|
|
is used on the system call number to tell the two ABIs apart.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
This means that in order to create a seccomp-based
|
|
deny-list for system calls performed through the x86-64 ABI,
|
|
it is necessary to not only check that
|
|
<I>arch</I>
|
|
|
|
equals
|
|
<B>AUDIT_ARCH_X86_64</B>,
|
|
|
|
but also to explicitly reject all system calls that contain
|
|
<B>__X32_SYSCALL_BIT</B>
|
|
|
|
in
|
|
<I>nr</I>.
|
|
|
|
<P>
|
|
|
|
The
|
|
<I>instruction_pointer</I>
|
|
|
|
field provides the address of the machine-language instruction that
|
|
performed the system call.
|
|
This might be useful in conjunction with the use of
|
|
<I>/proc/[pid]/maps</I>
|
|
|
|
to perform checks based on which region (mapping) of the program
|
|
made the system call.
|
|
(Probably, it is wise to lock down the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mmap">mmap</A></B>(2)
|
|
|
|
and
|
|
<B><A HREF="/cgi-bin/man/man2html?2+mprotect">mprotect</A></B>(2)
|
|
|
|
system calls to prevent the program from subverting such checks.)
|
|
<P>
|
|
|
|
When checking values from
|
|
<I>args</I>
|
|
|
|
against a deny-list, keep in mind that arguments are often
|
|
silently truncated before being processed, but after the seccomp check.
|
|
For example, this happens if the i386 ABI is used on an
|
|
x86-64 kernel: although the kernel will normally not look beyond
|
|
the 32 lowest bits of the arguments, the values of the full
|
|
64-bit registers will be present in the seccomp data.
|
|
A less surprising example is that if the x86-64 ABI is used to perform
|
|
a system call that takes an argument of type
|
|
<I>int</I>,
|
|
|
|
the more-significant half of the argument register is ignored by
|
|
the system call, but visible in the seccomp data.
|
|
<P>
|
|
|
|
A seccomp filter returns a 32-bit value consisting of two parts:
|
|
the most significant 16 bits
|
|
(corresponding to the mask defined by the constant
|
|
<B>SECCOMP_RET_ACTION_FULL</B>)
|
|
|
|
contain one of the "action" values listed below;
|
|
the least significant 16-bits (defined by the constant
|
|
<B>SECCOMP_RET_DATA</B>)
|
|
|
|
are "data" to be associated with this return value.
|
|
<P>
|
|
|
|
If multiple filters exist, they are <I>all</I> executed,
|
|
in reverse order of their addition to the filter tree---that is,
|
|
the most recently installed filter is executed first.
|
|
(Note that all filters will be called
|
|
even if one of the earlier filters returns
|
|
<B>SECCOMP_RET_KILL</B>.
|
|
|
|
This is done to simplify the kernel code and to provide a
|
|
tiny speed-up in the execution of sets of filters by
|
|
avoiding a check for this uncommon case.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The return value for the evaluation of a given system call is the first-seen
|
|
action value of highest precedence (along with its accompanying data)
|
|
returned by execution of all of the filters.
|
|
<P>
|
|
|
|
In decreasing order of precedence,
|
|
the action values that may be returned by a seccomp filter are:
|
|
<DL COMPACT>
|
|
<DT id="24"><B>SECCOMP_RET_KILL_PROCESS</B> (since Linux 4.14)
|
|
|
|
<DD>
|
|
|
|
|
|
This value results in immediate termination of the process,
|
|
with a core dump.
|
|
The system call is not executed.
|
|
By contrast with
|
|
<B>SECCOMP_RET_KILL_THREAD</B>
|
|
|
|
below, all threads in the thread group are terminated.
|
|
(For a discussion of thread groups, see the description of the
|
|
<B>CLONE_THREAD</B>
|
|
|
|
flag in
|
|
<B><A HREF="/cgi-bin/man/man2html?2+clone">clone</A></B>(2).)
|
|
|
|
<DT id="25"><DD>
|
|
The process terminates
|
|
<I>as though</I>
|
|
|
|
killed by a
|
|
<B>SIGSYS</B>
|
|
|
|
signal.
|
|
Even if a signal handler has been registered for
|
|
<B>SIGSYS</B>,
|
|
|
|
the handler will be ignored in this case and the process always terminates.
|
|
To a parent process that is waiting on this process (using
|
|
<B><A HREF="/cgi-bin/man/man2html?2+waitpid">waitpid</A></B>(2)
|
|
|
|
or similar), the returned
|
|
<I>wstatus</I>
|
|
|
|
will indicate that its child was terminated as though by a
|
|
<B>SIGSYS</B>
|
|
|
|
signal.
|
|
<DT id="26"><B>SECCOMP_RET_KILL_THREAD</B> (or <B>SECCOMP_RET_KILL</B>)
|
|
|
|
<DD>
|
|
This value results in immediate termination of the thread
|
|
that made the system call.
|
|
The system call is not executed.
|
|
Other threads in the same thread group will continue to execute.
|
|
<DT id="27"><DD>
|
|
The thread terminates
|
|
<I>as though</I>
|
|
|
|
killed by a
|
|
<B>SIGSYS</B>
|
|
|
|
signal.
|
|
See
|
|
<B>SECCOMP_RET_KILL_PROCESS</B>
|
|
|
|
above.
|
|
<DT id="28"><DD>
|
|
|
|
|
|
|
|
|
|
|
|
Before Linux 4.11,
|
|
any process terminated in this way would not trigger a coredump
|
|
(even though
|
|
<B>SIGSYS</B>
|
|
|
|
is documented in
|
|
<B><A HREF="/cgi-bin/man/man2html?7+signal">signal</A></B>(7)
|
|
|
|
as having a default action of termination with a core dump).
|
|
Since Linux 4.11,
|
|
a single-threaded process will dump core if terminated in this way.
|
|
<DT id="29"><DD>
|
|
With the addition of
|
|
<B>SECCOMP_RET_KILL_PROCESS</B>
|
|
|
|
in Linux 4.14,
|
|
<B>SECCOMP_RET_KILL_THREAD</B>
|
|
|
|
was added as a synonym for
|
|
<B>SECCOMP_RET_KILL</B>,
|
|
|
|
in order to more clearly distinguish the two actions.
|
|
<DT id="30"><B>SECCOMP_RET_TRAP</B>
|
|
|
|
<DD>
|
|
This value results in the kernel sending a thread-directed
|
|
<B>SIGSYS</B>
|
|
|
|
signal to the triggering thread.
|
|
(The system call is not executed.)
|
|
Various fields will be set in the
|
|
<I>siginfo_t</I>
|
|
|
|
structure (see
|
|
<B><A HREF="/cgi-bin/man/man2html?2+sigaction">sigaction</A></B>(2))
|
|
|
|
associated with signal:
|
|
<DL COMPACT><DT id="31"><DD>
|
|
<DL COMPACT>
|
|
<DT id="32">*<DD>
|
|
<I>si_signo</I>
|
|
|
|
will contain
|
|
<B>SIGSYS</B>.
|
|
|
|
<DT id="33">*<DD>
|
|
<I>si_call_addr</I>
|
|
|
|
will show the address of the system call instruction.
|
|
<DT id="34">*<DD>
|
|
<I>si_syscall</I>
|
|
|
|
and
|
|
<I>si_arch</I>
|
|
|
|
will indicate which system call was attempted.
|
|
<DT id="35">*<DD>
|
|
<I>si_code</I>
|
|
|
|
will contain
|
|
<B>SYS_SECCOMP</B>.
|
|
|
|
<DT id="36">*<DD>
|
|
<I>si_errno</I>
|
|
|
|
will contain the
|
|
<B>SECCOMP_RET_DATA</B>
|
|
|
|
portion of the filter return value.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="37"><DD>
|
|
The program counter will be as though the system call happened
|
|
(i.e., the program counter will not point to the system call instruction).
|
|
The return value register will contain an architecture-dependent value;
|
|
if resuming execution, set it to something appropriate for the system call.
|
|
(The architecture dependency is because replacing it with
|
|
<B>ENOSYS</B>
|
|
|
|
could overwrite some useful information.)
|
|
<DT id="38"><B>SECCOMP_RET_ERRNO</B>
|
|
|
|
<DD>
|
|
This value results in the
|
|
<B>SECCOMP_RET_DATA</B>
|
|
|
|
portion of the filter's return value being passed to user space as the
|
|
<I>errno</I>
|
|
|
|
value without executing the system call.
|
|
<DT id="39"><B>SECCOMP_RET_TRACE</B>
|
|
|
|
<DD>
|
|
When returned, this value will cause the kernel to attempt to notify a
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ptrace">ptrace</A></B>(2)-based
|
|
|
|
tracer prior to executing the system call.
|
|
If there is no tracer present,
|
|
the system call is not executed and returns a failure status with
|
|
<I>errno</I>
|
|
|
|
set to
|
|
<B>ENOSYS</B>.
|
|
|
|
<DT id="40"><DD>
|
|
A tracer will be notified if it requests
|
|
<B>PTRACE_O_TRACESECCOMP</B>
|
|
|
|
using
|
|
<I>ptrace(PTRACE_SETOPTIONS)</I>.
|
|
|
|
The tracer will be notified of a
|
|
<B>PTRACE_EVENT_SECCOMP</B>
|
|
|
|
and the
|
|
<B>SECCOMP_RET_DATA</B>
|
|
|
|
portion of the filter's return value will be available to the tracer via
|
|
<B>PTRACE_GETEVENTMSG</B>.
|
|
|
|
<DT id="41"><DD>
|
|
The tracer can skip the system call by changing the system call number
|
|
to -1.
|
|
Alternatively, the tracer can change the system call
|
|
requested by changing the system call to a valid system call number.
|
|
If the tracer asks to skip the system call, then the system call will
|
|
appear to return the value that the tracer puts in the return value register.
|
|
<DT id="42"><DD>
|
|
|
|
|
|
|
|
|
|
Before kernel 4.8, the seccomp check will not be run again after the tracer is
|
|
notified.
|
|
(This means that, on older kernels, seccomp-based sandboxes
|
|
<B>must not</B>
|
|
|
|
allow use of
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ptrace">ptrace</A></B>(2)---even
|
|
|
|
of other
|
|
sandboxed processes---without extreme care;
|
|
ptracers can use this mechanism to escape from the seccomp sandbox.)
|
|
<DT id="43"><B>SECCOMP_RET_LOG</B> (since Linux 4.14)
|
|
|
|
<DD>
|
|
|
|
This value results in the system call being executed after
|
|
the filter return action is logged.
|
|
An administrator may override the logging of this action via
|
|
the
|
|
<I>/proc/sys/kernel/seccomp/actions_logged</I>
|
|
|
|
file.
|
|
<DT id="44"><B>SECCOMP_RET_ALLOW</B>
|
|
|
|
<DD>
|
|
This value results in the system call being executed.
|
|
</DL>
|
|
<P>
|
|
|
|
If an action value other than one of the above is specified,
|
|
then the filter action is treated as either
|
|
<B>SECCOMP_RET_KILL_PROCESS</B>
|
|
|
|
(since Linux 4.14)
|
|
|
|
or
|
|
<B>SECCOMP_RET_KILL_THREAD</B>
|
|
|
|
(in Linux 4.13 and earlier).
|
|
|
|
<A NAME="lbAF"> </A>
|
|
<H3>/proc interfaces</H3>
|
|
|
|
The files in the directory
|
|
<I>/proc/sys/kernel/seccomp</I>
|
|
|
|
provide additional seccomp information and configuration:
|
|
<DL COMPACT>
|
|
<DT id="45"><I>actions_avail</I> (since Linux 4.14)
|
|
|
|
<DD>
|
|
|
|
A read-only ordered list of seccomp filter return actions in string form.
|
|
The ordering, from left-to-right, is in decreasing order of precedence.
|
|
The list represents the set of seccomp filter return actions
|
|
supported by the kernel.
|
|
<DT id="46"><I>actions_logged</I> (since Linux 4.14)
|
|
|
|
<DD>
|
|
|
|
A read-write ordered list of seccomp filter return actions that
|
|
are allowed to be logged.
|
|
Writes to the file do not need to be in ordered form but reads from
|
|
the file will be ordered in the same way as the
|
|
<I>actions_avail</I>
|
|
|
|
file.
|
|
<DT id="47"><DD>
|
|
It is important to note that the value of
|
|
<I>actions_logged</I>
|
|
|
|
does not prevent certain filter return actions from being logged when
|
|
the audit subsystem is configured to audit a task.
|
|
If the action is not found in the
|
|
<I>actions_logged</I>
|
|
|
|
file, the final decision on whether to audit the action for that task is
|
|
ultimately left up to the audit subsystem to decide for all filter return
|
|
actions other than
|
|
<B>SECCOMP_RET_ALLOW</B>.
|
|
|
|
<DT id="48"><DD>
|
|
The "allow" string is not accepted in the
|
|
<I>actions_logged</I>
|
|
|
|
file as it is not possible to log
|
|
<B>SECCOMP_RET_ALLOW</B>
|
|
|
|
actions.
|
|
Attempting to write "allow" to the file will fail with the error
|
|
<B>EINVAL</B>.
|
|
|
|
|
|
</DL>
|
|
<A NAME="lbAG"> </A>
|
|
<H3>Audit logging of seccomp actions</H3>
|
|
|
|
|
|
Since Linux 4.14, the kernel provides the facility to log the
|
|
actions returned by seccomp filters in the audit log.
|
|
The kernel makes the decision to log an action based on
|
|
the action type, whether or not the action is present in the
|
|
<I>actions_logged</I>
|
|
|
|
file, and whether kernel auditing is enabled
|
|
(e.g., via the kernel boot option
|
|
<I>audit=1</I>).
|
|
|
|
|
|
The rules are as follows:
|
|
<DL COMPACT>
|
|
<DT id="49">*<DD>
|
|
If the action is
|
|
<B>SECCOMP_RET_ALLOW</B>,
|
|
|
|
the action is not logged.
|
|
<DT id="50">*<DD>
|
|
Otherwise, if the action is either
|
|
<B>SECCOMP_RET_KILL_PROCESS</B>
|
|
|
|
or
|
|
<B>SECCOMP_RET_KILL_THREAD</B>,
|
|
|
|
and that action appears in the
|
|
<I>actions_logged</I>
|
|
|
|
file, the action is logged.
|
|
<DT id="51">*<DD>
|
|
Otherwise, if the filter has requested logging (the
|
|
<B>SECCOMP_FILTER_FLAG_LOG</B>
|
|
|
|
flag)
|
|
and the action appears in the
|
|
<I>actions_logged</I>
|
|
|
|
file, the action is logged.
|
|
<DT id="52">*<DD>
|
|
Otherwise, if kernel auditing is enabled and the process is being audited
|
|
(<B><A HREF="/cgi-bin/man/man2html?8+autrace">autrace</A></B>(8)),
|
|
|
|
the action is logged.
|
|
<DT id="53">*<DD>
|
|
Otherwise, the action is not logged.
|
|
</DL>
|
|
<A NAME="lbAH"> </A>
|
|
<H2>RETURN VALUE</H2>
|
|
|
|
On success,
|
|
<B>seccomp</B>()
|
|
|
|
returns 0.
|
|
On error, if
|
|
<B>SECCOMP_FILTER_FLAG_TSYNC</B>
|
|
|
|
was used,
|
|
the return value is the ID of the thread
|
|
that caused the synchronization failure.
|
|
(This ID is a kernel thread ID of the type returned by
|
|
<B><A HREF="/cgi-bin/man/man2html?2+clone">clone</A></B>(2)
|
|
|
|
and
|
|
<B><A HREF="/cgi-bin/man/man2html?2+gettid">gettid</A></B>(2).)
|
|
|
|
On other errors, -1 is returned, and
|
|
<I>errno</I>
|
|
|
|
is set to indicate the cause of the error.
|
|
<A NAME="lbAI"> </A>
|
|
<H2>ERRORS</H2>
|
|
|
|
<B>seccomp</B>()
|
|
|
|
can fail for the following reasons:
|
|
<DL COMPACT>
|
|
<DT id="54"><B>EACCES</B>
|
|
|
|
<DD>
|
|
The caller did not have the
|
|
<B>CAP_SYS_ADMIN</B>
|
|
|
|
capability in its user namespace, or had not set
|
|
<I>no_new_privs</I>
|
|
|
|
before using
|
|
<B>SECCOMP_SET_MODE_FILTER</B>.
|
|
|
|
<DT id="55"><B>EFAULT</B>
|
|
|
|
<DD>
|
|
<I>args</I>
|
|
|
|
was not a valid address.
|
|
<DT id="56"><B>EINVAL</B>
|
|
|
|
<DD>
|
|
<I>operation</I>
|
|
|
|
is unknown or is not supported by this kernel version or configuration.
|
|
<DT id="57"><B>EINVAL</B>
|
|
|
|
<DD>
|
|
The specified
|
|
<I>flags</I>
|
|
|
|
are invalid for the given
|
|
<I>operation</I>.
|
|
|
|
<DT id="58"><B>EINVAL</B>
|
|
|
|
<DD>
|
|
<I>operation</I>
|
|
|
|
included
|
|
<B>BPF_ABS</B>,
|
|
|
|
but the specified offset was not aligned to a 32-bit boundary or exceeded
|
|
<I>sizeof(struct seccomp_data)</I>.
|
|
|
|
<DT id="59"><B>EINVAL</B>
|
|
|
|
<DD>
|
|
|
|
A secure computing mode has already been set, and
|
|
<I>operation</I>
|
|
|
|
differs from the existing setting.
|
|
<DT id="60"><B>EINVAL</B>
|
|
|
|
<DD>
|
|
<I>operation</I>
|
|
|
|
specified
|
|
<B>SECCOMP_SET_MODE_FILTER</B>,
|
|
|
|
but the filter program pointed to by
|
|
<I>args</I>
|
|
|
|
was not valid or the length of the filter program was zero or exceeded
|
|
<B>BPF_MAXINSNS</B>
|
|
|
|
(4096) instructions.
|
|
<DT id="61"><B>ENOMEM</B>
|
|
|
|
<DD>
|
|
Out of memory.
|
|
<DT id="62"><B>ENOMEM</B>
|
|
|
|
<DD>
|
|
|
|
The total length of all filter programs attached
|
|
to the calling thread would exceed
|
|
<B>MAX_INSNS_PER_PATH</B>
|
|
|
|
(32768) instructions.
|
|
Note that for the purposes of calculating this limit,
|
|
each already existing filter program incurs an
|
|
overhead penalty of 4 instructions.
|
|
<DT id="63"><B>EOPNOTSUPP</B>
|
|
|
|
<DD>
|
|
<I>operation</I>
|
|
|
|
specified
|
|
<B>SECCOMP_GET_ACTION_AVAIL</B>,
|
|
|
|
but the kernel does not support the filter return action specified by
|
|
<I>args</I>.
|
|
|
|
<DT id="64"><B>ESRCH</B>
|
|
|
|
<DD>
|
|
Another thread caused a failure during thread sync, but its ID could not
|
|
be determined.
|
|
</DL>
|
|
<A NAME="lbAJ"> </A>
|
|
<H2>VERSIONS</H2>
|
|
|
|
The
|
|
<B>seccomp</B>()
|
|
|
|
system call first appeared in Linux 3.17.
|
|
|
|
<A NAME="lbAK"> </A>
|
|
<H2>CONFORMING TO</H2>
|
|
|
|
The
|
|
<B>seccomp</B>()
|
|
|
|
system call is a nonstandard Linux extension.
|
|
<A NAME="lbAL"> </A>
|
|
<H2>NOTES</H2>
|
|
|
|
Rather than hand-coding seccomp filters as shown in the example below,
|
|
you may prefer to employ the
|
|
<I>libseccomp</I>
|
|
|
|
library, which provides a front-end for generating seccomp filters.
|
|
<P>
|
|
|
|
The
|
|
<I>Seccomp</I>
|
|
|
|
field of the
|
|
<I>/proc/[pid]/status</I>
|
|
|
|
file provides a method of viewing the seccomp mode of a process; see
|
|
<B><A HREF="/cgi-bin/man/man2html?5+proc">proc</A></B>(5).
|
|
|
|
<P>
|
|
|
|
<B>seccomp</B>()
|
|
|
|
provides a superset of the functionality provided by the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+prctl">prctl</A></B>(2)
|
|
|
|
<B>PR_SET_SECCOMP</B>
|
|
|
|
operation (which does not support
|
|
<I>flags</I>).
|
|
|
|
<P>
|
|
|
|
Since Linux 4.4, the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ptrace">ptrace</A></B>(2)
|
|
|
|
<B>PTRACE_SECCOMP_GET_FILTER</B>
|
|
|
|
operation can be used to dump a process's seccomp filters.
|
|
|
|
<A NAME="lbAM"> </A>
|
|
<H3>Architecture support for seccomp BPF</H3>
|
|
|
|
Architecture support for seccomp BPF filtering
|
|
|
|
|
|
is available on the following architectures:
|
|
<DL COMPACT>
|
|
<DT id="65">*<DD>
|
|
x86-64, i386, x32 (since Linux 3.5)
|
|
|
|
<DT id="66">*<DD>
|
|
ARM (since Linux 3.8)
|
|
<DT id="67">*<DD>
|
|
s390 (since Linux 3.8)
|
|
<DT id="68">*<DD>
|
|
MIPS (since Linux 3.16)
|
|
<DT id="69">*<DD>
|
|
ARM-64 (since Linux 3.19)
|
|
<DT id="70">*<DD>
|
|
PowerPC (since Linux 4.3)
|
|
<DT id="71">*<DD>
|
|
Tile (since Linux 4.3)
|
|
<DT id="72">*<DD>
|
|
PA-RISC (since Linux 4.6)
|
|
|
|
|
|
|
|
</DL>
|
|
<A NAME="lbAN"> </A>
|
|
<H3>Caveats</H3>
|
|
|
|
There are various subtleties to consider when applying seccomp filters
|
|
to a program, including the following:
|
|
<DL COMPACT>
|
|
<DT id="73">*<DD>
|
|
Some traditional system calls have user-space implementations in the
|
|
<B><A HREF="/cgi-bin/man/man2html?7+vdso">vdso</A></B>(7)
|
|
|
|
on many architectures.
|
|
Notable examples include
|
|
<B><A HREF="/cgi-bin/man/man2html?2+clock_gettime">clock_gettime</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+gettimeofday">gettimeofday</A></B>(2),
|
|
|
|
and
|
|
<B><A HREF="/cgi-bin/man/man2html?2+time">time</A></B>(2).
|
|
|
|
On such architectures,
|
|
seccomp filtering for these system calls will have no effect.
|
|
(However, there are cases where the
|
|
<B><A HREF="/cgi-bin/man/man2html?7+vdso">vdso</A></B>(7)
|
|
|
|
implementations may fall back to invoking the true system call,
|
|
in which case seccomp filters would see the system call.)
|
|
<DT id="74">*<DD>
|
|
Seccomp filtering is based on system call numbers.
|
|
However, applications typically do not directly invoke system calls,
|
|
but instead call wrapper functions in the C library which
|
|
in turn invoke the system calls.
|
|
Consequently, one must be aware of the following:
|
|
<DL COMPACT><DT id="75"><DD>
|
|
<DL COMPACT>
|
|
<DT id="76">•<DD>
|
|
The glibc wrappers for some traditional system calls may actually
|
|
employ system calls with different names in the kernel.
|
|
For example, the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+exit">exit</A></B>(2)
|
|
|
|
wrapper function actually employs the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+exit_group">exit_group</A></B>(2)
|
|
|
|
system call, and the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+fork">fork</A></B>(2)
|
|
|
|
wrapper function actually calls
|
|
<B><A HREF="/cgi-bin/man/man2html?2+clone">clone</A></B>(2).
|
|
|
|
<DT id="77">•<DD>
|
|
The behavior of wrapper functions may vary across architectures,
|
|
according to the range of system calls provided on those architectures.
|
|
In other words, the same wrapper function may invoke
|
|
different system calls on different architectures.
|
|
<DT id="78">•<DD>
|
|
Finally, the behavior of wrapper functions can change across glibc versions.
|
|
For example, in older versions, the glibc wrapper function for
|
|
<B><A HREF="/cgi-bin/man/man2html?2+open">open</A></B>(2)
|
|
|
|
invoked the system call of the same name,
|
|
but starting in glibc 2.26, the implementation switched to calling
|
|
<B><A HREF="/cgi-bin/man/man2html?2+openat">openat</A></B>(2)
|
|
|
|
on all architectures.
|
|
</DL>
|
|
</DL>
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
The consequence of the above points is that it may be necessary
|
|
to filter for a system call other than might be expected.
|
|
Various manual pages in Section 2 provide helpful details
|
|
about the differences between wrapper functions and
|
|
the underlying system calls in subsections entitled
|
|
<I>C library/kernel differences</I>.
|
|
|
|
<P>
|
|
|
|
Furthermore, note that the application of seccomp filters
|
|
even risks causing bugs in an application,
|
|
when the filters cause unexpected failures for legitimate operations
|
|
that the application might need to perform.
|
|
Such bugs may not easily be discovered when testing the seccomp
|
|
filters if the bugs occur in rarely used application code paths.
|
|
|
|
<A NAME="lbAO"> </A>
|
|
<H3>Seccomp-specific BPF details</H3>
|
|
|
|
Note the following BPF details specific to seccomp filters:
|
|
<DL COMPACT>
|
|
<DT id="79">*<DD>
|
|
The
|
|
<B>BPF_H</B>
|
|
|
|
and
|
|
<B>BPF_B</B>
|
|
|
|
size modifiers are not supported: all operations must load and store
|
|
(4-byte) words
|
|
(<B>BPF_W</B>).
|
|
|
|
<DT id="80">*<DD>
|
|
To access the contents of the
|
|
<I>seccomp_data</I>
|
|
|
|
buffer, use the
|
|
<B>BPF_ABS</B>
|
|
|
|
addressing mode modifier.
|
|
<DT id="81">*<DD>
|
|
The
|
|
<B>BPF_LEN</B>
|
|
|
|
addressing mode modifier yields an immediate mode operand
|
|
whose value is the size of the
|
|
<I>seccomp_data</I>
|
|
|
|
buffer.
|
|
</DL>
|
|
<A NAME="lbAP"> </A>
|
|
<H2>EXAMPLE</H2>
|
|
|
|
The program below accepts four or more arguments.
|
|
The first three arguments are a system call number,
|
|
a numeric architecture identifier, and an error number.
|
|
The program uses these values to construct a BPF filter
|
|
that is used at run time to perform the following checks:
|
|
<DL COMPACT>
|
|
<DT id="82">[1]<DD>
|
|
If the program is not running on the specified architecture,
|
|
the BPF filter causes system calls to fail with the error
|
|
<B>ENOSYS</B>.
|
|
|
|
<DT id="83">[2]<DD>
|
|
If the program attempts to execute the system call with the specified number,
|
|
the BPF filter causes the system call to fail, with
|
|
<I>errno</I>
|
|
|
|
being set to the specified error number.
|
|
</DL>
|
|
<P>
|
|
|
|
The remaining command-line arguments specify
|
|
the pathname and additional arguments of a program
|
|
that the example program should attempt to execute using
|
|
<B><A HREF="/cgi-bin/man/man2html?3+execv">execv</A></B>(3)
|
|
|
|
(a library function that employs the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+execve">execve</A></B>(2)
|
|
|
|
system call).
|
|
Some example runs of the program are shown below.
|
|
<P>
|
|
|
|
First, we display the architecture that we are running on (x86-64)
|
|
and then construct a shell function that looks up system call
|
|
numbers on this architecture:
|
|
<P>
|
|
|
|
|
|
|
|
$ <B>uname -m</B>
|
|
x86_64
|
|
$ <B>syscall_nr() {
|
|
<BR> cat /usr/src/linux/arch/x86/syscalls/syscall_64.tbl | \
|
|
<BR> awk '$2 != "x32" && $3 == "'$1'" { print $1 }'
|
|
}</B>
|
|
|
|
|
|
<P>
|
|
|
|
When the BPF filter rejects a system call (case [2] above),
|
|
it causes the system call to fail with the error number
|
|
specified on the command line.
|
|
In the experiments shown here, we'll use error number 99:
|
|
<P>
|
|
|
|
|
|
|
|
$ <B>errno 99</B>
|
|
EADDRNOTAVAIL 99 Cannot assign requested address
|
|
|
|
|
|
<P>
|
|
|
|
In the following example, we attempt to run the command
|
|
<B><A HREF="/cgi-bin/man/man2html?1+whoami">whoami</A></B>(1),
|
|
|
|
but the BPF filter rejects the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+execve">execve</A></B>(2)
|
|
|
|
system call, so that the command is not even executed:
|
|
<P>
|
|
|
|
|
|
|
|
$ <B>syscall_nr execve</B>
|
|
59
|
|
$ <B>./a.out</B>
|
|
Usage: ./a.out <syscall_nr> <arch> <errno> <prog> [<args>]
|
|
Hint for <arch>: AUDIT_ARCH_I386: 0x40000003
|
|
<BR> AUDIT_ARCH_X86_64: 0xC000003E
|
|
$ <B>./a.out 59 0xC000003E 99 /bin/whoami</B>
|
|
execv: Cannot assign requested address
|
|
|
|
|
|
<P>
|
|
|
|
In the next example, the BPF filter rejects the
|
|
<B><A HREF="/cgi-bin/man/man2html?2+write">write</A></B>(2)
|
|
|
|
system call, so that, although it is successfully started, the
|
|
<B><A HREF="/cgi-bin/man/man2html?1+whoami">whoami</A></B>(1)
|
|
|
|
command is not able to write output:
|
|
<P>
|
|
|
|
|
|
|
|
$ <B>syscall_nr write</B>
|
|
1
|
|
$ <B>./a.out 1 0xC000003E 99 /bin/whoami</B>
|
|
|
|
|
|
<P>
|
|
|
|
In the final example,
|
|
the BPF filter rejects a system call that is not used by the
|
|
<B><A HREF="/cgi-bin/man/man2html?1+whoami">whoami</A></B>(1)
|
|
|
|
command, so it is able to successfully execute and produce output:
|
|
<P>
|
|
|
|
|
|
|
|
$ <B>syscall_nr preadv</B>
|
|
295
|
|
$ <B>./a.out 295 0xC000003E 99 /bin/whoami</B>
|
|
cecilia
|
|
|
|
|
|
<A NAME="lbAQ"> </A>
|
|
<H3>Program source</H3>
|
|
|
|
|
|
#include <<A HREF="file:///usr/include/errno.h">errno.h</A>>
|
|
#include <<A HREF="file:///usr/include/stddef.h">stddef.h</A>>
|
|
#include <<A HREF="file:///usr/include/stdio.h">stdio.h</A>>
|
|
#include <<A HREF="file:///usr/include/stdlib.h">stdlib.h</A>>
|
|
#include <<A HREF="file:///usr/include/unistd.h">unistd.h</A>>
|
|
#include <<A HREF="file:///usr/include/linux/audit.h">linux/audit.h</A>>
|
|
#include <<A HREF="file:///usr/include/linux/filter.h">linux/filter.h</A>>
|
|
#include <<A HREF="file:///usr/include/linux/seccomp.h">linux/seccomp.h</A>>
|
|
#include <<A HREF="file:///usr/include/sys/prctl.h">sys/prctl.h</A>>
|
|
<P>
|
|
#define X32_SYSCALL_BIT 0x40000000
|
|
<P>
|
|
static int
|
|
install_filter(int syscall_nr, int t_arch, int f_errno)
|
|
{
|
|
<BR> unsigned int upper_nr_limit = 0xffffffff;
|
|
<P>
|
|
<BR> /* Assume that AUDIT_ARCH_X86_64 means the normal x86-64 ABI
|
|
<BR> (in the x32 ABI, all system calls have bit 30 set in the
|
|
<BR> 'nr' field, meaning the numbers are >= X32_SYSCALL_BIT) */
|
|
<BR> if (t_arch == AUDIT_ARCH_X86_64)
|
|
<BR> upper_nr_limit = X32_SYSCALL_BIT - 1;
|
|
<P>
|
|
<BR> struct sock_filter filter[] = {
|
|
<BR> /* [0] Load architecture from 'seccomp_data' buffer into
|
|
<BR> accumulator */
|
|
<BR> BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
|
|
<BR> (offsetof(struct seccomp_data, arch))),
|
|
<P>
|
|
<BR> /* [1] Jump forward 5 instructions if architecture does not
|
|
<BR> match 't_arch' */
|
|
<BR> BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, t_arch, 0, 5),
|
|
<P>
|
|
<BR> /* [2] Load system call number from 'seccomp_data' buffer into
|
|
<BR> accumulator */
|
|
<BR> BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
|
|
<BR> (offsetof(struct seccomp_data, nr))),
|
|
<P>
|
|
<BR> /* [3] Check ABI - only needed for x86-64 in deny-list use
|
|
<BR> cases. Use BPF_JGT instead of checking against the bit
|
|
<BR> mask to avoid having to reload the syscall number. */
|
|
<BR> BPF_JUMP(BPF_JMP | BPF_JGT | BPF_K, upper_nr_limit, 3, 0),
|
|
<P>
|
|
<BR> /* [4] Jump forward 1 instruction if system call number
|
|
<BR> does not match 'syscall_nr' */
|
|
<BR> BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, syscall_nr, 0, 1),
|
|
<P>
|
|
<BR> /* [5] Matching architecture and system call: don't execute
|
|
<BR> the system call, and return 'f_errno' in 'errno' */
|
|
<BR> BPF_STMT(BPF_RET | BPF_K,
|
|
<BR> SECCOMP_RET_ERRNO | (f_errno & SECCOMP_RET_DATA)),
|
|
<P>
|
|
<BR> /* [6] Destination of system call number mismatch: allow other
|
|
<BR> system calls */
|
|
<BR> BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
|
|
<P>
|
|
<BR> /* [7] Destination of architecture mismatch: kill task */
|
|
<BR> BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
|
|
<BR> };
|
|
<P>
|
|
<BR> struct sock_fprog prog = {
|
|
<BR> .len = (unsigned short) (sizeof(filter) / sizeof(filter[0])),
|
|
<BR> .filter = filter,
|
|
<BR> };
|
|
<P>
|
|
<BR> if (seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog)) {
|
|
<BR> perror("seccomp");
|
|
<BR> return 1;
|
|
<BR> }
|
|
<P>
|
|
<BR> return 0;
|
|
}
|
|
<P>
|
|
int
|
|
main(int argc, char **argv)
|
|
{
|
|
<BR> if (argc < 5) {
|
|
<BR> fprintf(stderr, "Usage: "
|
|
<BR> "%s <syscall_nr> <arch> <errno> <prog> [<args>]\n"
|
|
<BR> "Hint for <arch>: AUDIT_ARCH_I386: 0x%X\n"
|
|
<BR> " AUDIT_ARCH_X86_64: 0x%X\n"
|
|
<BR> "\n", argv[0], AUDIT_ARCH_I386, AUDIT_ARCH_X86_64);
|
|
<BR> exit(EXIT_FAILURE);
|
|
<BR> }
|
|
<P>
|
|
<BR> if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
|
|
<BR> perror("prctl");
|
|
<BR> exit(EXIT_FAILURE);
|
|
<BR> }
|
|
<P>
|
|
<BR> if (install_filter(strtol(argv[1], NULL, 0),
|
|
<BR> strtol(argv[2], NULL, 0),
|
|
<BR> strtol(argv[3], NULL, 0)))
|
|
<BR> exit(EXIT_FAILURE);
|
|
<P>
|
|
<BR> execv(argv[4], &argv[4]);
|
|
<BR> perror("execv");
|
|
<BR> exit(EXIT_FAILURE);
|
|
}
|
|
|
|
<A NAME="lbAR"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+bpfc">bpfc</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+strace">strace</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+bpf">bpf</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+prctl">prctl</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+ptrace">ptrace</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?2+sigaction">sigaction</A></B>(2),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?5+proc">proc</A></B>(5),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?7+signal">signal</A></B>(7),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?7+socket">socket</A></B>(7)
|
|
|
|
<P>
|
|
|
|
Various pages from the
|
|
<I>libseccomp</I>
|
|
|
|
library, including:
|
|
<B><A HREF="/cgi-bin/man/man2html?1+scmp_sys_resolver">scmp_sys_resolver</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?3+seccomp_init">seccomp_init</A></B>(3),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?3+seccomp_load">seccomp_load</A></B>(3),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?3+seccomp_rule_add">seccomp_rule_add</A></B>(3),
|
|
|
|
and
|
|
<B><A HREF="/cgi-bin/man/man2html?3+seccomp_export_bpf">seccomp_export_bpf</A></B>(3).
|
|
|
|
<P>
|
|
|
|
The kernel source files
|
|
<I>Documentation/networking/filter.txt</I>
|
|
|
|
and
|
|
<I>Documentation/userspace-api/seccomp_filter.rst</I>
|
|
|
|
|
|
(or
|
|
<I>Documentation/prctl/seccomp_filter.txt</I>
|
|
|
|
before Linux 4.13).
|
|
<P>
|
|
|
|
McCanne, S. and Jacobson, V. (1992)
|
|
<I>The BSD Packet Filter: A New Architecture for User-level Packet Capture</I>,
|
|
|
|
Proceedings of the USENIX Winter 1993 Conference
|
|
|
|
|
|
<A NAME="lbAS"> </A>
|
|
<H2>COLOPHON</H2>
|
|
|
|
This page is part of release 5.05 of the Linux
|
|
<I>man-pages</I>
|
|
|
|
project.
|
|
A description of the project,
|
|
information about reporting bugs,
|
|
and the latest version of this page,
|
|
can be found at
|
|
<A HREF="https://www.kernel.org/doc/man-pages/.">https://www.kernel.org/doc/man-pages/.</A>
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="84"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="85"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="86"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DL>
|
|
<DT id="87"><A HREF="#lbAE">Filters</A><DD>
|
|
<DT id="88"><A HREF="#lbAF">/proc interfaces</A><DD>
|
|
<DT id="89"><A HREF="#lbAG">Audit logging of seccomp actions</A><DD>
|
|
</DL>
|
|
<DT id="90"><A HREF="#lbAH">RETURN VALUE</A><DD>
|
|
<DT id="91"><A HREF="#lbAI">ERRORS</A><DD>
|
|
<DT id="92"><A HREF="#lbAJ">VERSIONS</A><DD>
|
|
<DT id="93"><A HREF="#lbAK">CONFORMING TO</A><DD>
|
|
<DT id="94"><A HREF="#lbAL">NOTES</A><DD>
|
|
<DL>
|
|
<DT id="95"><A HREF="#lbAM">Architecture support for seccomp BPF</A><DD>
|
|
<DT id="96"><A HREF="#lbAN">Caveats</A><DD>
|
|
<DT id="97"><A HREF="#lbAO">Seccomp-specific BPF details</A><DD>
|
|
</DL>
|
|
<DT id="98"><A HREF="#lbAP">EXAMPLE</A><DD>
|
|
<DL>
|
|
<DT id="99"><A HREF="#lbAQ">Program source</A><DD>
|
|
</DL>
|
|
<DT id="100"><A HREF="#lbAR">SEE ALSO</A><DD>
|
|
<DT id="101"><A HREF="#lbAS">COLOPHON</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:34 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|