Benchmark Flatbuffer / Protobuffer / C++ Struct performance

I’m building a project that requires maximal performance. But it also uses many data structure that need some flexibility. Naturally Flatbuffer and Protobuffer are potential candidates. So I did some benchmark. Here’s the result

Serialization / de-serialization

=================================
Raw structs bench start...
total = 15860948312882282624
Raw structs bench: 312 wire size, 210 compressed wire size
* 0.049046 encode time, 0.003316 decode time
* 0.067384 use time, 0.003062 dealloc time
* 0.073762 decode/use/dealloc
=================================
FlatBuffers bench start...
total = 15860948312882282624
FlatBuffers bench: 344 wire size, 220 compressed wire size
* 2.920470 encode time, 0.003350 decode time
* 0.740325 use time, 0.003186 dealloc time
* 0.746861 decode/use/dealloc
=================================
Protocol Buffers LITE bench start...
total = 15860948312882282624
Protocol Buffers LITE bench: 228 wire size, 174 compressed wire size
* 3.268726 encode time, 3.188535 decode time
* 0.200258 use time, 0.399030 dealloc time
* 3.787823 decode/use/dealloc

Include network communication

=================================
FLATBUF bench start...
total bytes = 15898507595776707224
* 0.003065 create time
* 0.238328 receive time
* 0.002950 use
* 0.000782 free
* 0.245125 total time
=================================
PROTOBUF bench start...
total bytes = 0
* 0.000766 create time
* 0.244944 receive time
* 0.001007 use
* 0.000785 free
* 0.247503 total time
=================================
RAW bench start...
total bytes = 54377074000
* 0.001709 create time
* 0.002417 receive time
* 0.000813 use
* 0.000759 free
* 0.005699 total time

The conclusion

  • While flatbuffer / protobuffer provides a convenient API to define data structures, have them dynamically expanded and support a variety of languages, they are slower than just using raw structures
  • While flatbuffer is faster than protobuffer at pure serialization / deserialization, the difference is minimal when accounting for remote RPC costs
  • We need to test more recent libraries for serialization, and potentially combine them with the custom RPC model we are having with EVPP: YAS, cap’n’proto

The journey

It grinds my gears when code doesn’t work
  • Contrary to popular belief, Google’s code does not always work
  • The gRPC example in flatbuffer is outdated and is not working
  • The benchmark that proves flatbuffer is faster than protobuf is from 2016 and no longer compiles with the latest libraries

I fixed the above problems in https://github.com/thanhphu/flatbuffers

  • The latest flatbuffer no longer work recent versions of gRPC due to some abstractions in data structure
  • I need to modify gRPC to expose legacy data structures that flatbuffer needs access to. Thus this repository is born https://github.com/thanhphu/grpc
  • I need to merge some recent contributions that solve the problem but did not confirm to Google’s code standard in order to make flatbuffer work

Finally, I need to write benchmark code for all three (Flatbuffer + gRPC, Protobuf + gRPC, raw struct + EVPP). The complete code is available here

https://github.com/thanhphu/buffer-bench

Tags you can use

  • v1.1: Serialization / deserialization only, runs on one machine
  • v2.0: Serialization + deserialization + network transmission, can be run on two machines

Note that if you have already installed another version of gRPC and/or protobuf, you need to remove them with

$ sudo rm -f /usr/local/bin/*grpc*
$ sudo rm -f /usr/local/bin/protoc
$ sudo rm -f /usr/local/lib/*gpr*
$ sudo rm -f /usr/local/lib/*grpc*
$ sudo rm -f /usr/local/lib/*protobuf*
$ sudo rm -f /usr/local/lib/*protoc*
$ sudo rm -f /usr/local/lib/pkgconfig/*gpr*
$ sudo rm -f /usr/local/lib/pkgconfig/*grpc*
$ sudo rm -f /usr/local/lib/pkgconfig/*protobuf*
$ sudo rm -rf /usr/local/include/google
$ sudo rm -rf /usr/local/include/grpc
$ sudo rm -rf /usr/local/include/grpc++
$ sudo rm -rf /usr/local/include/grpcpp

Calling C# from C++

Most of the post on the internet about interoperability is the other way around (Calling C++ from C#) since C# is apparently the superior language in terms of developer friendliness. But a part of research work is to make weird combinations of system work regardless of the reason.

I succeeded in doing just that thank to this Visual C++ example (documentation is available in part 1 of the tutorial).

One thing to add is, I needed a tool to export .NET 4.0 to .tlb files so I can use the same assemblies in C++ (pure C++ and not C++.NET). The tool is called tlbgen, and the syntax is quite simple:

tlbgen System.Net.dll

Building WS4D-gSOAP on Linux

WS4D-gSOAP is a framework that assists development of web services on multiple platforms. These includes mobile phones, server, computers or embedded system.

WS4D Features

WS4D can run as a standalone application or based upon a server and has been used in many commercial products. Despite being the most prominent framework for embedded system SOAP services, WS4D seems to be lacking behind in version support and documentation. Namely, you cannot build the latest version of WS4D-gSOAP with the latest version of gSOAP, even though both frameworks’ last version were one or two years ago. Development doesn’t seem to be active, so does the community. Finding help is a big hassle!

So, this post outlines some problems i encountered while working with the framework and how to solve them, hoping that the next follower won’t have as much trouble starting in a new environment.

Terminology

It seems that the WS4D documentation assumes that you have a certain level of understanding on Unix-based build system and working across platform. You cannot find definition for several new terminologies used in the documentation, so the docs may be quite confusing and intimidating for new users. I find these words particularly weird:

  • In source build: The binaries will be generated inside the same directory as the source. This has the advantage of simplifying the make files as you don’t have to link and copy libraries over, but the directory may look extremely messy.
  • Out-of-source build: Separate source and binary directories. This is the configuration used by WS4D tutorial and code. The sample make file in the tutorial already to the hard work and creates the most common directories used in almost all WS4D projects.
  • Cross-compile (and compilation): This means instead of compiling only one binary to use on your development platform, it will compile another binary for another platform. What do you do with this second binary is up to you: you can push it to the device or run it in a simulator, but cmake’s job in this case is only generating the binary file for you.
  • Tool-chain: A set of compiler and linker for a specific device or platform. Cross-compiling uses two or more tool-chain to generate two or more executable.

Proper version

The documentation for WS4D is a bit disorganized, so it may not relevant for first-time readers that you must compile WS4D with specific versions of gSOAP, doing otherwise will lead to make-errors and build headaches. It is mentioned in Features page (which is not visible from the first page of documentation). I was reading only the installation instruction and tutorial when I started to build the system, so I missed this important fact! How should I know that Getting Started is not a good place to get started?

ws4d-gsoap 0.7.x ws4d-gsoap 0.8.x ws4d-gsoap trunk
 gSOAP 2.8.0 Not yet supported Not yet supported Not yet supported
 gSOAP 2.7.17 Not yet supported supported with patch supported with patch
 gSOAP 2.7.16 Not yet supported supported with patch supported with patch
 gSOAP 2.7.15 Not yet supported Not yet supported Not yet supported
 gSOAP 2.7.14 Not yet supported Not yet supported Not yet supported
 gSOAP 2.7.13 supported supported supported
 gSOAP 2.7.12 supported supported supported
 gSOAP 2.7.11 supported supported supported
 gSOAP 2.7.10 supported supported supported

So the latest version of the stack you can use is gSOAP 2.7.17 with WS4D-gSOAP 0.8. I don’t know why it is so complicated, maybe due to the fact the gSOAP 2.8 also implemented WS-Discovery (a feature overlap with WS4D), and the auther didn’t have enough time to adjust WS4D-gSOAP to the new changes.

Build steps and permissions

Supposed you have decent Linux knowledge, you can use this inside the directory where WS4D-gSOAP was extracted to

  1. ./configure
  2. ccmake .
  3. Edit path to gSOAP directory (the source you downloaded, do not ‘make install’ this)
  4. Press g to generate, there may by some warning, ignore them
  5. Press c to configure
  6. make (I don’t know why this step is needed, but if you misses this step, the next step will result in various errors. This step will result in errors)
  7. su -c ‘make’ – (You can’t use sudo, you have to use su with a trialing dash to load root user’s environment. Root permission is required to install binaries to their proper locations)

Web service for device (WS4D) and using make, cmake with Integrated Development Environments (IDEs) on Linux

WS4D-gSOAP is a framework to deploy web services on multiple environments (computers, embedded devices, phones…) without having to rewrite code. More introduction information can be found in this post. The downside for WS4D is that the build system is based on cmake, and it’s a bit complicated to use. I have since managed to successfully build the stack from source (there’s no binary distribution). But I ran into numerous problems during my time getting familiar with the stack. Namely following the tutorial.

Even for such a simple task of copying and pasting code from the tutorial (The Air Conditioner tutorial), I did not succeed. The client and device doesn’t seem to be able to communicate with each other. Even though the tutorial have been kind enough to include logging code (send, received and memory allocations), I’m still unable to figure out where the problem lies. And that’s when I think I’m forced to perform debugging on the project. That leads to a big problem: what am I supposed to use to debug this?

So I tried to install Code Blocks, but it doesn’t support cmake projects, it couldn’t import the CMakeList.txt. Then reading through several articles revealed Eclipse to be a pretty good gdb front-end, I installed it, and have successfully imported the project, but I was hit with several problems:

  • I couldn’t build the project, Eclipse’s project structure was make-based, not cmake-based, trying to ‘make all’ or ‘make clean’ with Eclipse obviously would generate errors
  • Editing  the code is an eyesore because Eclipse doesn’t seem to be able to recognize include and libraries directories even after I added them manually in project properties.
After several more hours of analyzing WS4D-gSOAP documentation (not really helpful, they are only related to build and run with command line), cmake (wiki pages after wiki pages with no structure between pages and in articles), Eclipse and Code Blocks’ documentation on make projects; I finally realized the right way to do this: cmake supports generating make-based projects for both Code Blocks and Eclipse, and it’s as simple as this:
  1. Go to the directory where you would normally run cmake to build your project from the command line
  2. Perform

    cmake -G"Eclipse CDT4 - Unix Makefiles" -D CMAKE_BUILD_TYPE=Debug ../certi_src
    cmake . -G "CodeBlocks - Unix Makefiles"
  3. Import

    The project directory
    ProjectName.cbp file
You can now work with the project normally within your preferred IDE

Why specifying the call type when performing P/Invoke is important

I was working on a C# project that required interaction between unmanaged and managed code. It’s nothing complex, just calling various functions in a DLL through a wrapper class, then I hit this exception:

A call to PInvoke function 'TestLPRLib!EvAPI.EvLicensePlateReg::evLPROpen' has unbalanced the stack. This is likely because the managed PInvoke signature does not match the unmanaged target signature. Check that the calling convention and parameters of the PInvoke signature match the target unmanaged signature.

Strange thing is, this “stack unbalanced” exception only pop up when running in debug mode, the application still work just fine without debugging, but leave bugs in applications is not my habit, so I decided to reach out to Google. The unmanaged code used __declspec(dllexport) to specify exported functions, internet forums suggested this exception is caused by the difference in calling convention.

So, I changed the unmanaged code from

extern "C" __declspec(dllexport) int evLPROpen(int width, int height, int widthStep, int depth = 8, int channel = 3);

to

extern "C" __declspec(dllexport) int __cdecl evLPROpen(int width, int height, int widthStep, int depth = 8, int channel = 3);

And the wrapper from

[DllImport("LicensePlateRecognitionLib.dll")]
public static extern int evLPROpen(int width, int height, int widthStep, int depth, int channel);

to

[DllImport("LicensePlateRecognitionLib.dll", CallingConvention=CallingConvention.Cdecl)]
public static extern int evLPROpen(int width, int height, int widthStep, int depth, int channel);

(using the clear stack by caller convention), and the exception goes away. By default, .NET assumes you use __stdcall, but I didn’t have much success trying this 🙂