A Perfect example of pointless benchmarks

The developers of a web framework written in Apple's Swift language have a set of benchmarks that (surprise, surprise) shows that their framework, modified to use a new (now newish) async library is faster than everything else. Apart from the the usual dangers of micro-benchmarks, there is a lot wrong with the tests.

Just to explain for those unfamiliar with the dangers of micro-benchmarks (or web framework benchmarks). Performance in an artificial simple test does not always translate into real world performance. This is even worse in the case of the framework where the web app code is not usually the bottleneck: the database is usually a bigger problem for both responsiveness and scaleability, and front ends loading lots of resources are almost always the cause of poor responsiveness. On top of that it is highly artificial because people who actually need to hand tens of thousands of requests per minute on server with 36 physical cores (and running 72 hyper-threads) will set things up very differently

The Register reported these benchmarks as having being carried out by "Canada's Centre of Excellence in Next Generation Networks". In fact that organisation seems to have provided resources to the developers of the Perfect framework who actually carried out the tests. You can find their code and results here.

To be fair to the developers, they are being honest and upfront about this and the purpose of the testing is to improve their performance, not to persuade people to use Perfect because its fast. However, I would have liked a clear disclaimer, and nothing seems to prevent other people from reporting it as such: the article in The Register said it was a demonstration of Swift's "viability".

The first problem I see is that comparing async frameworks with multi-process and multi-threaded frameworks is very much an apples to oranges comparison. They are very different things you would use and configure differently. Why not use async frameworks in multiple languages? They do exist.

A more fundamental problem is that they tweaked their framework to do better in these tests. From their own comments on the test results

After consultation with the SwiftNIO team we were able to devise a method for configuring and launching the NIO based server in a manner which was better able to take advantage of this hardware.


Seeing that Perfect on NIO could perform very well compared to our existing offering, we then continued to test, analyze, and optimize Perfect-NIO. The results shown below are taken from our final complete run of all framework tests.

Another issue is that they are comparing very different systems. The Go code does not even use a framework, just a wrapper around net/http from Go's standard library, whereas the "Django" example in fact does not only use Django (a heavy full stack framework in itself) but uses Django Rest Framework on top of it. Calling it a Django test is misleading.

The configurations used are not comparable either. The PHP configuration is just Apache with mod_php configured to run one process per core. I am not a PHP developer these days but I believe that, while mod_php is convenient and easy to set up, you will get better performance from Nginx with fast CGI, and even if using mod_php there is probably a lot of room to improve performance.

The Python setup for Django is obviously bad. It uses Apache with fast CGI which has been discouraged by the Django developers for years. Popular configurations like Nginx and Gurnicorn, or Apache and mod_wsgi, or either Apache or Nginx and uwsgi would have made a lot more sense. Ideally which ever one of those performed best of these tests, probably with PyPy (which is faster) rather than CPython (which is more widely used). If you really want higher performance you would at least try PyPy.

On the other hand the Perfect and Go implementations used builtin webservers. This gives them a huge advantage. The Django and Rails tests were configured so a front end webserver had to pass data to a backend webserver. This could very easily have been replicated for Django, Rails, etc.: for example, by running Django with Gunicorn or Rails with Passenger.

So, yes, its fast at doing unrealistic micro-benchmarks if has been optimised for in comparison with not really comparable frameworks set up without any real consideration for performance. Fairly typical benchmarks!