Benchmark Flatbuffer / Protobuffer / C++ Struct performance

I’m building a project that requires maximal performance. But it also uses many data structure that need some flexibility. Naturally Flatbuffer and Protobuffer are potential candidates. So I did some benchmark. Here’s the result

Serialization / de-serialization

=================================
Raw structs bench start...
total = 15860948312882282624
Raw structs bench: 312 wire size, 210 compressed wire size
* 0.049046 encode time, 0.003316 decode time
* 0.067384 use time, 0.003062 dealloc time
* 0.073762 decode/use/dealloc
=================================
FlatBuffers bench start...
total = 15860948312882282624
FlatBuffers bench: 344 wire size, 220 compressed wire size
* 2.920470 encode time, 0.003350 decode time
* 0.740325 use time, 0.003186 dealloc time
* 0.746861 decode/use/dealloc
=================================
Protocol Buffers LITE bench start...
total = 15860948312882282624
Protocol Buffers LITE bench: 228 wire size, 174 compressed wire size
* 3.268726 encode time, 3.188535 decode time
* 0.200258 use time, 0.399030 dealloc time
* 3.787823 decode/use/dealloc

Include network communication

=================================
FLATBUF bench start...
total bytes = 15898507595776707224
* 0.003065 create time
* 0.238328 receive time
* 0.002950 use
* 0.000782 free
* 0.245125 total time
=================================
PROTOBUF bench start...
total bytes = 0
* 0.000766 create time
* 0.244944 receive time
* 0.001007 use
* 0.000785 free
* 0.247503 total time
=================================
RAW bench start...
total bytes = 54377074000
* 0.001709 create time
* 0.002417 receive time
* 0.000813 use
* 0.000759 free
* 0.005699 total time

The conclusion

  • While flatbuffer / protobuffer provides a convenient API to define data structures, have them dynamically expanded and support a variety of languages, they are slower than just using raw structures
  • While flatbuffer is faster than protobuffer at pure serialization / deserialization, the difference is minimal when accounting for remote RPC costs
  • We need to test more recent libraries for serialization, and potentially combine them with the custom RPC model we are having with EVPP: YAS, cap’n’proto

The journey

It grinds my gears when code doesn’t work
  • Contrary to popular belief, Google’s code does not always work
  • The gRPC example in flatbuffer is outdated and is not working
  • The benchmark that proves flatbuffer is faster than protobuf is from 2016 and no longer compiles with the latest libraries

I fixed the above problems in https://github.com/thanhphu/flatbuffers

  • The latest flatbuffer no longer work recent versions of gRPC due to some abstractions in data structure
  • I need to modify gRPC to expose legacy data structures that flatbuffer needs access to. Thus this repository is born https://github.com/thanhphu/grpc
  • I need to merge some recent contributions that solve the problem but did not confirm to Google’s code standard in order to make flatbuffer work

Finally, I need to write benchmark code for all three (Flatbuffer + gRPC, Protobuf + gRPC, raw struct + EVPP). The complete code is available here

https://github.com/thanhphu/buffer-bench

Tags you can use

  • v1.1: Serialization / deserialization only, runs on one machine
  • v2.0: Serialization + deserialization + network transmission, can be run on two machines

Note that if you have already installed another version of gRPC and/or protobuf, you need to remove them with

$ sudo rm -f /usr/local/bin/*grpc*
$ sudo rm -f /usr/local/bin/protoc
$ sudo rm -f /usr/local/lib/*gpr*
$ sudo rm -f /usr/local/lib/*grpc*
$ sudo rm -f /usr/local/lib/*protobuf*
$ sudo rm -f /usr/local/lib/*protoc*
$ sudo rm -f /usr/local/lib/pkgconfig/*gpr*
$ sudo rm -f /usr/local/lib/pkgconfig/*grpc*
$ sudo rm -f /usr/local/lib/pkgconfig/*protobuf*
$ sudo rm -rf /usr/local/include/google
$ sudo rm -rf /usr/local/include/grpc
$ sudo rm -rf /usr/local/include/grpc++
$ sudo rm -rf /usr/local/include/grpcpp

4 Comments

  1. come cross the here, very good article. I have same issue to have more recent FB working with most recent gRPC. Do you have any new updates to make them work together? Thanks.

  2. > “We need to test more recent libraries for serialization”
    Really!?!?

    I think we need less recent libraries – use simple structs and be careful with things like adding new members, endianess, etc.
    Why do developers try to use what’s new and shiny without looking under the hood?!?!?

    If you *think* for yourself instead of using somebody else’s products you’d find simple and cheap solutions. And they would, in the long run, cost you a lot less in complexity and development time. And they would most likely be a lot more efficient.

    In all my msgs I use a 4 byte header with type and size. Very simple. If I change a struct, and need backward compatibility, I make a copy and assign it a new type. It’s really that simple. You only have to be disciplined, not missuse things and test properly against all that you have in the field. This burden would keep you from having 10 different versions of the same struct.

    1. I agree that simplicity is best, sometimes just trying it out doesn’t hurt. Like in this case, it’s just a benchmark to see if things work out for us. In the end actually we stayed with the original library considering development time. But that decision were made after we have all the data we need to evaluate the situation. You can’t really tell stakeholders that “it’s possible that library B is faster but we decided to stay with A because we think it’s best”, you need data 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *