When REST and Websockets are a bad idea

Graeme Pietersz

2019-04-19 10:58

HTTP was designed for a specific use case: communication between a browser and a server. The same is true for JSON. Is also often a good idea to use HTTP ports for things like mobile apps because a significant number of people need to use them from networks that block other ports (e.g. when using corporate wi-fi). The problem is that HTTP and websockets are used inappropriately.

I am assuming that whatever technology is used, it is implemented properly. This is about choosing the right protocols and formats, not things like this:

Take an example of the inefficiency of JSON. Look at these few lines from a 500 line response from a cryptocurrency exchange API:

[[1555642860000,"0.03270000","0.03272500","0.03269900","0.03270700","153.29400000",1555642919999,"5.01456045",89,"78.14100000","2.55624430","0"],
[1555642920000,"0.03271300","0.03272600","0.03269800","0.03270900","278.01000000",1555642979999,"9.09636765",106,"63.92100000","2.09131523","0"],
[1555642980000,"0.03271800","0.03273900","0.03268900","0.03272900","114.33900000",1555643039999,"3.74173448",72,"83.97100000","2.74831968","0"],
[1555643040000,"0.03272800","0.03273700","0.03272000","0.03272700","97.96900000",1555643099999,"3.20613676",59,"35.26100000","1.15404998","0"]]

Consider how much more compactly this could have been represented if numbers were not strings. In the string representation decimal numbers less than 10 take 13 bytes (including quotes and comma). Given the fixed number of decimal places they could be represented in Message Pack as a 32 bit int (not a float because we need exact decimals in a financial application) in five bytes. The saving is not as great with bigger numbers, but even the case where message pack does worst comparatively (numbers below 100 and above 2³² ÷ 10⁸ - approximately 42.9) the string representation takes 14 bytes whereas Message Pack takes 9 (a 64 bit integer).

There is a similar (14 vs 9 bytes - or 55% extra) wastage with the integer at the beginning, but only a byte (or a third) wasted with the one at the end. Nonetheless the bulk of the data are numbers that would take just 5 bytes as Message Pack, so JSON is doubling the bandwidth used.

One reason to choose JSON may be the availability of libraries, but Message Pack and BSON have libraries available for most languages. Developers may be more familiar with JSON but once you have a library installed it makes little difference.

The example above comes from data via websocket, but the same similar data is sent through an HTTP REST API. A lot of the time the data is going to another server so we do not need to worry about browser or firewall limitations. So why not use just TCP? In some cases people may case more about latency than lost packets (e.g. when displaying latest prices) so UDP might be a better choice. One reason for using HTTP and websockets may be so that the same API can be used to provide data to mobile apps (which may be limited by firewalls). In which case why not write HTTP and websocket wrappers around an underlying API? There are even existing websocket to plain TCP tunnels for just this purpose.

One particularly bad choice is to use JSON and REST to access something that is only accessed by other servers, such as a database server.

Custom protocols used by the typical RDBMS have a lot of advantages. I mostly use Postgres and it has excellent support for access over a Unix socket (so you can just not have network access at all, a big security win) but it works fine over TCP as well if you need remote access. When using a Unix socket you can give access to OS users which simplifies configuration. The client library handles authentication in either case, and encryption when required. Its also efficient with binary representation of floats. The use of REST means that you cannot do all of this.

On top of using HTTP, many new "No SQL" databases use JSON. In some cases, such as Elasticsearch, where the expected user case is mostly text, JSON is not a problem. In others it is. Consider time series databases that use JSON (like Kairos) or another text based protocol (like Influx DB): in many cases a lot of numerical data will be sent as text.

Another thing I have come across is people implementing REST API wrappers around a database. Again, there are cases where this might be useful. The problem is that I have seen it used when people are accessing the data from other servers of their own. Just use a client for the database, use its own authentication, and firewall to limit access to your other servers' IPs. If you want a database abstraction layer, write one. You are probably going to end up writing an abstraction layer around your REST API anyway.

Again, I suspect a lot of the thinking is "bottom end web developers understand REST and JSON", but that only really matters to those implementing clients. All you need is one developer per language to implement an open source client library and you have something easier to use for everyone else.

What really worries me is that these APIs are not being developed because those using it do not understand the alternatives, but because those implementing it do not understand the alternatives. This is probably not true for the databases (which are hard to implement so likely to be done by very good developers) but shows a lot of signs of being true in other cases. Low quality developers seem to have developed cryptocurrency exchanges, and bad design has lead to the shutdown of one. In the last case part of the problem was bad platform choices, using Mongo DB for a financial system whereas anyone who understood things like race conditions would have made choices that eliminated them (transactions are essential, really ACID as a whole is essential).