Benchmark Flatbuffer / Protobuffer / C++ Struct performance
I’m building a project that requires maximal performance. But it also uses many data structure that need some flexibility. Naturally Flatbuffer and Protobuffer are potential candidates. So I did some benchmark. Here’s the result
Serialization / de-serialization
=================================
Raw structs bench start...
total = 15860948312882282624
Raw structs bench: 312 wire size, 210 compressed wire size
* 0.049046 encode time, 0.003316 decode time
* 0.067384 use time, 0.003062 dealloc time
* 0.073762 decode/use/dealloc
=================================
FlatBuffers bench start...
total = 15860948312882282624
FlatBuffers bench: 344 wire size, 220 compressed wire size
* 2.920470 encode time, 0.003350 decode time
* 0.740325 use time, 0.003186 dealloc time
* 0.746861 decode/use/dealloc
=================================
Protocol Buffers LITE bench start...
total = 15860948312882282624
Protocol Buffers LITE bench: 228 wire size, 174 compressed wire size
* 3.268726 encode time, 3.188535 decode time
* 0.200258 use time, 0.399030 dealloc time
* 3.787823 decode/use/dealloc
Include network communication
=================================
FLATBUF bench start...
total bytes = 15898507595776707224
* 0.003065 create time
* 0.238328 receive time
* 0.002950 use
* 0.000782 free
* 0.245125 total time
=================================
PROTOBUF bench start...
total bytes = 0
* 0.000766 create time
* 0.244944 receive time
* 0.001007 use
* 0.000785 free
* 0.247503 total time
=================================
RAW bench start...
total bytes = 54377074000
* 0.001709 create time
* 0.002417 receive time
* 0.000813 use
* 0.000759 free
* 0.005699 total time
The conclusion
- While flatbuffer / protobuffer provides a convenient API to define data structures, have them dynamically expanded and support a variety of languages, they are slower than just using raw structures
- While flatbuffer is faster than protobuffer at pure serialization / deserialization, the difference is minimal when accounting for remote RPC costs
- We need to test more recent libraries for serialization, and potentially combine them with the custom RPC model we are having with EVPP: YAS, cap’n’proto
The journey

- Contrary to popular belief, Google’s code does not always work
- The gRPC example in flatbuffer is outdated and is not working
- The benchmark that proves flatbuffer is faster than protobuf is from 2016 and no longer compiles with the latest libraries
I fixed the above problems in https://github.com/thanhphu/flatbuffers
- The latest flatbuffer no longer work recent versions of gRPC due to some abstractions in data structure
- I need to modify gRPC to expose legacy data structures that flatbuffer needs access to. Thus this repository is born https://github.com/thanhphu/grpc
- I need to merge some recent contributions that solve the problem but did not confirm to Google’s code standard in order to make flatbuffer work
Finally, I need to write benchmark code for all three (Flatbuffer + gRPC, Protobuf + gRPC, raw struct + EVPP). The complete code is available here
https://github.com/thanhphu/buffer-bench
Tags you can use
- v1.1: Serialization / deserialization only, runs on one machine
- v2.0: Serialization + deserialization + network transmission, can be run on two machines
Note that if you have already installed another version of gRPC and/or protobuf, you need to remove them with
$ sudo rm -f /usr/local/bin/*grpc*
$ sudo rm -f /usr/local/bin/protoc
$ sudo rm -f /usr/local/lib/*gpr*
$ sudo rm -f /usr/local/lib/*grpc*
$ sudo rm -f /usr/local/lib/*protobuf*
$ sudo rm -f /usr/local/lib/*protoc*
$ sudo rm -f /usr/local/lib/pkgconfig/*gpr*
$ sudo rm -f /usr/local/lib/pkgconfig/*grpc*
$ sudo rm -f /usr/local/lib/pkgconfig/*protobuf*
$ sudo rm -rf /usr/local/include/google
$ sudo rm -rf /usr/local/include/grpc
$ sudo rm -rf /usr/local/include/grpc++
$ sudo rm -rf /usr/local/include/grpcpp