Nowadays commodity hardware is offering an ever increasing degree of parallelism: CPUs are equipped with more and more cores and a new generation of NICs can dispatch packets across multiple hardware queues.
However, opensource software for packet capturing is not designed with this high level of parallelism in mind. The most prominent solutions, such as PF_RING or Netmap, to achieve high performance take advantage of DMA of NICs to push to user-space as many packets as possible.
Nevertheless, memory mapped solutions present some major drawbacks:
PFQ is a novel capture engine orthogonal to custom device drivers, explicitly designed to leverage the potential of modern parallel hardware. Its light weight lock-free architecture is based on the following features:
Under this principles and based on the standard NAPI machinery PFQ, running on top of a 2.66Ghz 6-core Xeon equipped with an Intel 82599 10G controller, can steer to user-space about 13 Million packets per second deploying 12 balanced threads with CPUs load well under 5% each, or 42Mpps with full copies.
These numbers demonstrate that zero-copying techniques, often bound to atomic reference counting, are nowadays surpassed in favor of in-cache copies of packets.
Download PFQ ver. 1.0 on GitHub