bpf(4) - NetBSD Manual Pages

Command:
Section:
Arch:
Collection:
>>>



BPF(4)                                                     BPF(4)



NAME

       bpf - Berkeley Packet Filter


SYNOPSIS

       pseudo-device bpfilter 16


DESCRIPTION

       The  Berkeley  Packet  Filter  provides a raw interface to
       data link layers in a protocol independent  fashion.   All
       packets  on  the  network,  even  those destined for other
       hosts, are accessible through this mechanism.

       The packet filter appears as a character  special  device,
       /dev/bpf0,  /dev/bpf1, etc.  After opening the device, the
       file descriptor must be bound to a specific network inter-
       face  with  the  BIOSETIF ioctl.  A given interface can be
       shared be multiple listeners, and  the  filter  underlying
       each  descriptor will see an identical packet stream.  The
       total number of open files is limited to the  value  given
       in the kernel configuration; the example given in the SYN-
       OPSIS above sets the limit to 16.

       A separate device file is required for each minor  device.
       If  a file is in use, the open will fail and errno will be
       set to EBUSY.

       Associated with each open instance of  a  bpf  file  is  a
       user-settable   packet   filter.   Whenever  a  packet  is
       received by an interface, all file  descriptors  listening
       on  that  interface  apply  their filter.  Each descriptor
       that accepts the packet receives its own copy.

       Reads from these files return the next  group  of  packets
       that have matched the filter.  To improve performance, the
       buffer passed to read must be the same size as the buffers
       used  internally  by  bpf.   This  size is returned by the
       BIOCGBLEN ioctl (see below), and under  BSD,  can  be  set
       with  BIOCSBLEN.   Note  that  an individual packet larger
       than this size is necessarily truncated.

       The packet filter will support  any  link  level  protocol
       that  has fixed length headers.  Currently, only Ethernet,
       SLIP and PPP drivers have been modified to  interact  with
       bpf.

       Since  packet  data is in network byte order, applications
       should use the byteorder(3n) macros to extract  multi-byte
       values.

       A  packet  can  be sent out on the network by writing to a
       bpf file descriptor.  The writes are  unbuffered,  meaning
       only  one  packet  can be processed per write.  Currently,
       only writes to Ethernets and SLIP links are supported.




                           28 June 1994                         1





BPF(4)                                                     BPF(4)



IOCTLS

       The ioctl command codes below are defined in  <net/bpf.h>.
       All commands require these includes:

            #include <sys/types.h>
            #include <sys/time.h>
            #include <sys/ioctl.h>
            #include <net/bpf.h>

       Additionally,  BIOCGETIF and BIOCSETIF require <net/if.h>.

       The (third) argument to the ioctl should be a  pointer  to
       the type indicated.

       BIOCGBLEN (u_int)
                 Returns  the required buffer length for reads on
                 bpf files.

       BIOCSBLEN (u_int)
                 Sets the buffer length for reads on  bpf  files.
                 The  buffer  must  be  set  before  the  file is
                 attached to an interface with BIOCSETIF.  If the
                 requested  buffer  size  cannot be accommodated,
                 the closest  allowable  size  will  be  set  and
                 returned  in  the  argument.   A  read call will
                 result in EIO if it is passed a buffer  that  is
                 not this size.

       BIOCGDLT (u_int)
                 Returns the type of the data link layer underly-
                 ing the attached interface.  EINVAL is  returned
                 if  no interface has been specified.  The device
                 types, prefixed with ``DLT_'',  are  defined  in
                 <net/bpf.h>.

       BIOCPROMISC
                 Forces the interface into promiscuous mode.  All
                 packets, not just those destined for  the  local
                 host,  are  processed.  Since more than one file
                 can be listening on a given  interface,  a  lis-
                 tener  that  opened  its interface non-promiscu-
                 ously may receive packets  promiscuously.   This
                 problem can be remedied with an appropriate fil-
                 ter.

                 The interface remains in promiscuous mode  until
                 all files listening promiscuously are closed.

       BIOCFLUSH Flushes  the  buffer  of  incoming  packets, and
                 resets  the  statistics  that  are  returned  by
                 BIOCGSTATS.

       BIOCGETIF (struct ifreq)
                 Returns  the name of the hardware interface that



                           28 June 1994                         2





BPF(4)                                                     BPF(4)


                 the file is listening on.  The name is  returned
                 in  the ifr_name field of ifr.  All other fields
                 are undefined.

       BIOCSETIF (struct ifreq)
                 Sets the hardware interface associate  with  the
                 file.  This command must be performed before any
                 packets can be read.  The device is indicated by
                 name  using  the  ifr_name  field  of the ifreq.
                 Additionally, performs the actions of BIOCFLUSH.

       BIOCSRTIMEOUT, BIOCGRTIMEOUT (struct timeval)
                 Set  or  get  the  read  timeout parameter.  The
                 timeval specifies the length  of  time  to  wait
                 before  timing  out  on  a  read  request.  This
                 parameter is initialized  to  zero  by  open(2),
                 indicating no timeout.

       BIOCGSTATS (struct bpf_stat)
                 Returns   the   following  structure  of  packet
                 statistics:

                 struct bpf_stat {
                      u_int bs_recv;
                      u_int bs_drop;
                 };

                 The fields are:

                 bs_recv        the number of packets received by
                                the  descriptor  since  opened or
                                reset  (including  any   buffered
                                since the last read call); and

                 bs_drop        the  number of packets which were
                                accepted  by   the   filter   but
                                dropped  by the kernel because of
                                buffer   overflows   (i.e.,   the
                                application's  reads aren't keep-
                                ing up with the packet  traffic).

       BIOCIMMEDIATE (u_int)
                 Enable  or  disable ``immediate mode'', based on
                 the truth value of the argument.  When immediate
                 mode  is  enabled, reads return immediately upon
                 packet reception.  Otherwise, a read will  block
                 until either the kernel buffer becomes full or a
                 timeout occurs.  This  is  useful  for  programs
                 like  rarpd(8c),  which must respond to messages
                 in real time.  The default for  a  new  file  is
                 off.

       BIOCSETF (struct bpf_program)
                 Sets  the  filter  program used by the kernel to



                           28 June 1994                         3





BPF(4)                                                     BPF(4)


                 discard  uninteresting  packets.   An  array  of
                 instructions  and  its length is passed in using
                 the following structure:

                 struct bpf_program {
                      int bf_len;
                      struct bpf_insn *bf_insns;
                 };

                 The filter program is pointed to by the bf_insns
                 field  while  its  length  in  units  of `struct
                 bpf_insn' is given by the bf_len  field.   Also,
                 the actions of BIOCFLUSH are performed.

                 See section FILTER MACHINE for an explanation of
                 the filter language.

       BIOCVERSION (struct bpf_version)
                 Returns the major and minor version  numbers  of
                 the  filter language currently recognized by the
                 kernel.  Before installing  a  filter,  applica-
                 tions  must  check  that  the current version is
                 compatible with  the  running  kernel.   Version
                 numbers  are  compatible  if  the  major numbers
                 match and the application minor is less than  or
                 equal  to  the kernel minor.  The kernel version
                 number is returned in the following structure:

                 struct bpf_version {
                      u_short bv_major;
                      u_short bv_minor;
                 };

                 The  current  version  numbers  are   given   by
                 BPF_MAJOR_VERSION   and  BPF_MINOR_VERSION  from
                 <net/bpf.h>.  An incompatible filter may  result
                 in  undefined  behavior  (most  likely, an error
                 returned by ioctl() or haphazard  packet  match-
                 ing).

       BIOCSRSIG BIOCGRSIG (u_int signal)
                 Set or get the receive signal.  This signal will
                 be sent to the process or process  group  speci-
                 fied by FIOSETOWN.  It defaults to SIGIO.


STANDARD IOCTLS

       bpf  now  supports several standard ioctls which allow the
       user to do async and/or non-blocking I/O to  an  open  bpf
       file descriptor.

       FIONREAD (int)
                 Returns the number of bytes that are immediately
                 available for reading.




                           28 June 1994                         4





BPF(4)                                                     BPF(4)


       SIOCGIFADDR (struct ifreq)
                 Returns the address associated with  the  inter-
                 face.

       FIONBIO (int)
                 Set  or  clear non-blocking I/O.  If arg is non-
                 zero, then doing a read when no data  is  avail-
                 able  will  return  -1  and errno will be set to
                 EWOULDBLOCK.  If arg is zero,  non-blocking  I/O
                 is  disabled.  Note:  setting this overrides the
                 timeout set by BIOCSRTIMEOUT.

       FIOASYNC (int)
                 Enable or disable async I/O.  When enabled  (arg
                 is non-zero), the process or process group spec-
                 ified by FIOSETOWN will start receiving  SIGIO's
                 when  packets  arrive.  Note that you must do an
                 FIOSETOWN in order for this to take  affect,  as
                 the  system  will not default this for you.  The
                 signal may be changed via BIOCSRSIG.

       FIOSETOWN FIOGETOWN (int)
                 Set or get the process or process group (if neg-
                 ative)  that  should  receive SIGIO when packets
                 are available.  The signal may be changed  using
                 BIOCSRSIG (see above).


BPF HEADER

       The  following  structure  is  prepended  to  each  packet
       returned by read(2):

               struct bpf_hdr {
                    struct timeval bh_tstamp;
                    u_long bh_caplen;
                    u_long bh_datalen;
                    u_short bh_hdrlen;
               };

       The fields, whose values are stored  in  host  order,  and
       are:

       bh_tstamp      The  time at which the packet was processed
                      by the packet filter.

       bh_caplen      The length of the captured portion  of  the
                      packet.  This is the minimum of the trunca-
                      tion amount specified by the filter and the
                      length of the packet.

       bh_datalen     The  length  of  the  packet  off the wire.
                      This value is independent of the truncation
                      amount specified by the filter.

       bh_hdrlen      The length of the BPF header, which may not



                           28 June 1994                         5





BPF(4)                                                     BPF(4)


                      be equal to sizeof(struct bpf_hdr).

       The bh_hdrlen field exists to account for padding  between
       the  header and the link level protocol.  The purpose here
       is to guarantee proper alignment of the packet data struc-
       tures,  which is required on alignment sensitive architec-
       tures and and improves performance on many other architec-
       tures.  The packet filter insures that the bpf_hdr and the
       network layer header will be word aligned.  Suitable  pre-
       cautions  must be taken when accessing the link layer pro-
       tocol fields  on  alignment  restricted  machines.   (This
       isn't  a problem on an Ethernet, since the type field is a
       short falling on an even offset,  and  the  addresses  are
       probably accessed in a bytewise fashion).

       Additionally,  individual  packets are padded so that each
       starts on a word boundary.  This requires that an applica-
       tion  has  some  knowledge  of  how  to get from packet to
       packet.  The macro BPF_WORDALIGN is defined in <net/bpf.h>
       to  facilitate this process.  It rounds up its argument to
       the nearest word aligned value (where a word is BPF_ALIGN-
       MENT bytes wide).

       For  example, if `p' points to the start of a packet, this
       expression will advance it to the next packet:

              p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)

       For the alignment mechanisms to work properly, the  buffer
       passed  to read(2) must itself be word aligned.  malloc(3)
       will always return an aligned buffer.


FILTER MACHINE

       A filter program is an array  of  instructions,  with  all
       branches   forwardly  directed,  terminated  by  a  return
       instruction.  Each instruction performs some action on the
       pseudo-machine  state,  which  consists of an accumulator,
       index register, scratch memory store, and implicit program
       counter.

       The following structure defines the instruction format:

              struct bpf_insn {
                   u_short   code;
                   u_char    jt;
                   u_char    jf;
                   long k;
              };

       The  k  field  is  used  in  different  ways  by different
       instructions, and the jt and jf fields are used as offsets
       by  the branch instructions.  The opcodes are encoded in a
       semi-hierarchical fashion.  There  are  eight  classes  of
       instructions:  BPF_LD,  BPF_LDX, BPF_ST, BPF_STX, BPF_ALU,



                           28 June 1994                         6





BPF(4)                                                     BPF(4)


       BPF_JMP, BPF_RET, and BPF_MISC.  Various  other  mode  and
       operator  bits  are or'd into the class to give the actual
       instructions.   The  classes  and  modes  are  defined  in
       <net/bpf.h>.

       Below  are the semantics for each defined BPF instruction.
       We use the convention that A is the accumulator, X is  the
       index  register,  P[]  packet data, and M[] scratch memory
       store.  P[i:n] gives the data at byte offset ``i'' in  the
       packet,  interpreted  as  a  word (n=4), unsigned halfword
       (n=2), or unsigned byte (n=1).  M[i] gives the  i'th  word
       in  the  scratch  memory store, which is only addressed in
       word units.   The  memory  store  is  indexed  from  0  to
       BPF_MEMWORDS-1.   k,  jt,  and  jf  are  the corresponding
       fields in the instruction definition.  ``len''  refers  to
       the length of the packet.


       BPF_LD    These instructions copy a value into the accumu-
                 lator.  The type of the source operand is speci-
                 fied by an ``addressing mode'' and can be a con-
                 stant (BPF_IMM), packet data at a  fixed  offset
                 (BPF_ABS),  packet  data  at  a  variable offset
                 (BPF_IND), the packet  length  (BPF_LEN),  or  a
                 word in the scratch memory store (BPF_MEM).  For
                 BPF_IND and BPF_ABS, the data size must be spec-
                 ified  as  a  word (BPF_W), halfword (BPF_H), or
                 byte (BPF_B).  The semantics of all  the  recog-
                 nized BPF_LD instructions follow.


                 BPF_LD+BPF_W+BPF_ABS          A <- P[k:4]

                 BPF_LD+BPF_H+BPF_ABS          A <- P[k:2]

                 BPF_LD+BPF_B+BPF_ABS          A <- P[k:1]

                 BPF_LD+BPF_W+BPF_IND          A <- P[X+k:4]

                 BPF_LD+BPF_H+BPF_IND          A <- P[X+k:2]

                 BPF_LD+BPF_B+BPF_IND          A <- P[X+k:1]

                 BPF_LD+BPF_W+BPF_LEN          A <- len

                 BPF_LD+BPF_IMM                A <- k

                 BPF_LD+BPF_MEM                A <- M[k]


       BPF_LDX   These  instructions  load a value into the index
                 register.  Note that the  addressing  modes  are
                 more  restricted  than  those of the accumulator
                 loads, but they  include  BPF_MSH,  a  hack  for



                           28 June 1994                         7





BPF(4)                                                     BPF(4)


                 efficiently loading the IP header length.

                 BPF_LDX+BPF_W+BPF_IMM         X <- k

                 BPF_LDX+BPF_W+BPF_MEM         X <- M[k]

                 BPF_LDX+BPF_W+BPF_LEN         X <- len

                 BPF_LDX+BPF_B+BPF_MSH         X               <-
                                               4*(P[k:1]&0xf)


       BPF_ST    This instruction stores the accumulator into the
                 scratch  memory.   We  do not need an addressing
                 mode since there is only one possibility for the
                 destination.

                 BPF_ST                        M[k] <- A


       BPF_STX   This  instruction  stores  the index register in
                 the scratch memory store.

                 BPF_STX                       M[k] <- X


       BPF_ALU   The alu instructions perform operations  between
                 the  accumulator and index register or constant,
                 and store the result back  in  the  accumulator.
                 For binary operations, a source mode is required
                 (BPF_K or BPF_X).

                 BPF_ALU+BPF_ADD+BPF_K         A <- A + k

                 BPF_ALU+BPF_SUB+BPF_K         A <- A - k

                 BPF_ALU+BPF_MUL+BPF_K         A <- A * k

                 BPF_ALU+BPF_DIV+BPF_K         A <- A / k

                 BPF_ALU+BPF_AND+BPF_K         A <- A & k

                 BPF_ALU+BPF_OR+BPF_K          A <- A | k

                 BPF_ALU+BPF_LSH+BPF_K         A <- A << k

                 BPF_ALU+BPF_RSH+BPF_K         A <- A >> k

                 BPF_ALU+BPF_ADD+BPF_X         A <- A + X

                 BPF_ALU+BPF_SUB+BPF_X         A <- A - X

                 BPF_ALU+BPF_MUL+BPF_X         A <- A * X




                           28 June 1994                         8





BPF(4)                                                     BPF(4)


                 BPF_ALU+BPF_DIV+BPF_X         A <- A / X

                 BPF_ALU+BPF_AND+BPF_X         A <- A & X

                 BPF_ALU+BPF_OR+BPF_X          A <- A | X

                 BPF_ALU+BPF_LSH+BPF_X         A <- A << X

                 BPF_ALU+BPF_RSH+BPF_X         A <- A >> X

                 BPF_ALU+BPF_NEG               A <- -A


       BPF_JMP   The jump instructions  alter  flow  of  control.
                 Conditional   jumps   compare   the  accumulator
                 against a constant (BPF_K) or the index register
                 (BPF_X).   If  the result is true (or non-zero),
                 the true branch is taken,  otherwise  the  false
                 branch  is taken.  Jump offsets are encoded in 8
                 bits so the longest jump  is  256  instructions.
                 However,  the  jump  always (BPF_JA) opcode uses
                 the 32 bit k field as the offset, allowing arbi-
                 trarily  distant destinations.  All conditionals
                 use unsigned comparison conventions.

                 BPF_JMP+BPF_JA                pc += k

                 BPF_JMP+BPF_JGT+BPF_K         pc += (A > k) ? jt
                                               : jf

                 BPF_JMP+BPF_JGE+BPF_K         pc  +=  (A >= k) ?
                                               jt : jf

                 BPF_JMP+BPF_JEQ+BPF_K         pc += (A ==  k)  ?
                                               jt : jf

                 BPF_JMP+BPF_JSET+BPF_K        pc += (A & k) ? jt
                                               : jf

                 BPF_JMP+BPF_JGT+BPF_X         pc += (A > X) ? jt
                                               : jf

                 BPF_JMP+BPF_JGE+BPF_X         pc  +=  (A >= X) ?
                                               jt : jf

                 BPF_JMP+BPF_JEQ+BPF_X         pc += (A ==  X)  ?
                                               jt : jf

                 BPF_JMP+BPF_JSET+BPF_X        pc += (A & X) ? jt
                                               : jf

       BPF_RET   The return  instructions  terminate  the  filter
                 program  and  specify  the  amount  of packet to
                 accept  (i.e.,  they   return   the   truncation



                           28 June 1994                         9





BPF(4)                                                     BPF(4)


                 amount).   A return value of zero indicates that
                 the packet should be ignored.  The return  value
                 is  either a constant (BPF_K) or the accumulator
                 (BPF_A).

                 BPF_RET+BPF_A                 accept A bytes

                 BPF_RET+BPF_K                 accept k bytes

       BPF_MISC  The miscellaneous category was created for  any-
                 thing  that  doesn't fit into the above classes,
                 and for any new instructions that might need  to
                 be  added.   Currently,  these  are the register
                 transfer instructions that copy the index regis-
                 ter to the accumulator or vice versa.

                 BPF_MISC+BPF_TAX              X <- A

                 BPF_MISC+BPF_TXA              A <- X

       The BPF interface provides the following macros to facili-
       tate array initializers:
              BPF_STMT(opcode, operand)
              and
              BPF_JUMP(opcode, operand,  true_offset,  false_off-
              set)



EXAMPLES

       The following filter is taken from the Reverse ARP Daemon.
       It accepts only Reverse ARP requests.

              struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
                         sizeof(struct ether_header)),
                   BPF_STMT(BPF_RET+BPF_K, 0),
              };

       This  filter  accepts  only  IP   packets   between   host
       128.3.112.15 and 128.3.112.35.

              struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
                   BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
                   BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
                   BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),



                           28 June 1994                        10





BPF(4)                                                     BPF(4)


                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
                   BPF_STMT(BPF_RET+BPF_K, 0),
              };

       Finally,  this filter returns only TCP finger packets.  We
       must parse the IP header to reach  the  TCP  header.   The
       BPF_JSET instruction checks that the IP fragment offset is
       0 so we are sure that we have a TCP header.

              struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
                   BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
                   BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
                   BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
                   BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
                   BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
                   BPF_STMT(BPF_RET+BPF_K, 0),
              };


SEE ALSO

       tcpdump(8), signal(3), ioctl(2), read(2), select(2)

       McCanne, S. and Jacobson V.  The BSD Packet Filter: A  New
       Architecture  for  User-level Packet Capture.  Proceedings
       of the 1993 Winter USENIX Technical Conference, San Diego,
       CA.


FILES

       /dev/bpf0, /dev/bpf1, ...


BUGS

       The  read  buffer must be of a fixed size (returned by the
       BIOCGBLEN ioctl).

       A file that does not request promiscuous mode may  receive
       promiscuously received packets as a side effect of another
       file requesting this mode on the same hardware  interface.
       This could be fixed in the kernel with additional process-
       ing overhead.  However, we favor the model where all files
       must  assume  that the interface is promiscuous, and if so
       desired, must utilize a filter to reject foreign  packets.

       Data  link  protocols with variable length headers are not
       currently supported.

       Under SunOS, if a BPF application  reads  more  than  2^31
       bytes  of  data, read will fail in EINVAL.  You can either



                           28 June 1994                        11





BPF(4)                                                     BPF(4)


       fix the bug in SunOS, or lseek to 0 when  read  fails  for
       this reason.

       "Immediate mode" and the "read timeout" are misguided fea-
       tures.  This functionality can be emulated with non-block-
       ing mode and select(2).


HISTORY

       The Enet packet filter was created in 1980 by Mike Accetta
       and Rick Rashid at  Carnegie-Mellon  University.   Jeffrey
       Mogul,  at  Stanford, ported the code to BSD and continued
       its development from 1983 on.  Since then, it has  evolved
       into the Ultrix Packet Filter at DEC, a STREAMS NIT module
       under SunOS 4.1, and BPF.


AUTHORS

       Steven McCanne, of Lawrence  Berkeley  Laboratory,  imple-
       mented  BPF  in Summer 1990.  The design was in collabora-
       tion with Van Jacobson, also of Lawrence Berkeley  Labora-
       tory.





































                           28 June 1994                        12