Hacker News Clone

Linux Kernel vs. DPDK: HTTP Performance Showdown

by talawahtech on 7/4/2022, 9:39:38 PM with 71 comments

by Matthias247 on 7/5/2022, 12:07:53 AM
Hi Marc (talawahtech)! Thanks for the exhaustive article.
I took a short look at the benchmark setup (https://github.com/talawahtech/seastar/blob/http-performance...), and wonder if some simplifications there lead to overinflated performance numbers. The server here executes a single read() on the connection - and as soon as it receives any data it sends back headers. A real world HTTP server needs to read data until all header and body data is consumed before responding.
Now given the benchmark probably sends tiny requests, the server might get everything in a single buffer. However every time it does not, the server will send back two responses to the server - and at that time the client will already have a response for the follow-up request before actually sending it - which overinflates numbers. Might be interesting to re-test with a proper HTTP implementation (at least read until the last 4 bytes received are \r\n\r\n, and assume the benchmark client will never send a body).
Such a bug might also lead to a lot more write() calls than what would be actually necessary to serve the workload, or to stalling due to full send or receive buffers - all of those might also have an impact on performance.

by 0xbadcafebee on 7/5/2022, 12:52:18 AM

For those like me going "......what is dpdk"

  The Data Plane Development Kit (DPDK) is an open source software project managed
  by the Linux Foundation. It provides a set of data plane libraries and network 
  interface controller polling-mode drivers for offloading TCP packet processing 
  from the operating system kernel to processes running in user space. This offloading
  achieves higher computing efficiency and higher packet throughput than is possible 
  using the interrupt-driven processing provided in the kernel.

https://en.wikipedia.org/wiki/Data_Plane_Development_Kit https://www.packetcoders.io/what-is-dpdk/

by evgpbfhnr on 7/4/2022, 11:59:52 PM
At the point you've gotten syscall overhead is definitely going to be a big thing (even without spectre mitigations enabled) -- I'd be very curious to see how far a similar io_uring benchmark would get you.
It supports IOPOLL (polling of the socket) and SQPOLL (kernel side polling of the request queue) so hopefully the fact that application driving it is in another thread wouldn't slow it too much... With multi-shot accept/recv you'd only need to tell it to keep accepting connections on the listener fd, but I'm not sure if you can chain recvs to the child fd automatically from kernel or not yet... We live in interesting times!
by tomohawk on 7/4/2022, 10:35:46 PM
From: https://talawah.io/blog/extreme-http-performance-tuning-one-...
> I am genuinely interested in hearing the opinions of more security experts on this (turning off speculative execution mitigatins). If this is your area of expertise, feel free to leave a comment
Are these generally safe if you have a machine that does not have multi-user access and is in a security boundary?
by pclmulqdq on 7/4/2022, 11:00:14 PM
This was a fascinating read and the kernel does quite nicely in comparison - 66% of DPDK performance is amazing. That said, the article completely nails the performance advantage: DPDK doesn't do a lot of stuff that the kernel does. That stuff takes time. If I recall correctly, DPDK abstractions themselves cost a bit of NIC performance, so it might be interesting to see a comparison including a raw NIC-specific kernel bypass framework (like the SolarFlare one).
by Thaxll on 7/5/2022, 1:58:40 AM
Do Google and the like actually use TCP in user space or they just use the Linux kernel?
Edit: Looks like they do but not for TCP from what I can find: https://static.googleusercontent.com/media/research.google.c...

by limoce on 7/5/2022, 2:03:50 AM

  I am not 100% sure that all of the mitigation overhead 
  comes from syscalls, but it stands to reason that a lot 
  of it arises from security hardening in user-to-kernel 
  and kernel-to-user transitions.

Will io_uring be also affected by Spectre mitigations given it has eliminated most kernel/user switches?

And did anyone do a head-to-head comparison between io_uring and DPDK?

by thekozmo on 7/5/2022, 6:16:00 AM
What's amazing is that the seastar tcp stack hasn't been changed over the past 7 years, while the kernel received plenty of improvements (in order to close the gap vs kernel bypass mechanisms). Still, for >> 99% of users, there is no need to bypass the kernel.
by fefe23 on 7/5/2022, 7:31:47 AM
Why is this interesting to anyone? Haven't we all moved to https by now?
Optimizing raw http seems to me like a huge waste of time by now. I say that as someone who has spent years optimizing raw http performance. None of that matters these days.
by touisteur on 7/5/2022, 6:50:51 AM
I feel this would be a good place to use a spark-based TCP stack. You're bypassing the kernel, have to run stuff as root or risky CAP_ rights, your stack should be as solid as possible.
https://www.adacore.com/papers/layered-formal-verification-o...
Might also give people here some ideas on how to combine symbolic execution, proof, C and SPARK code and how to gain confidence in each part of a network stack.
I think there's even some ongoing work climbing up the stack up to HTTP but not sure of the plan (not involved).
by maxgio92 on 7/5/2022, 1:44:13 PM
Thank you, very exhaustive and interesting. A note: the link to bftrace script is broken.
by gonzo on 7/5/2022, 3:55:47 AM
My bet is that the stack in VPP is even faster.