Just Use WebSockets
When developing certain web-based applications we might have requirements where we need to send data to clients in real-time.
Imagine that we are developing some kind of the following applications:
- An application that provides foreign exchange rates in real-time
- An application that displays scores in a sports (e-sports) event
- An auctioning application that displays the latest bids
- A flight application that displays arrivals and departures
- A social media application where one of the requirements is to notify users whenever someone reacts to their post, or to display posts in categories that users are subscribed to, as soon as there are some
In each case, it would be nice to make clients aware of new data almost immediately.
-So you need real-time data communication? Just use WebSockets.
-Hm...
Whenever real-time communication is mentioned in the context of web applications, for some people the immediate and default conclusion is to use WebSockets.
So why not simply use WebSockets everywhere we need real-time data communication?
An Ancient Software Philosophy
The answer to the question above is that we want to have an engineering mindset and use the right tool for the job.
We do not want to use a certain technology because it is either the only one that we know about or because it is the one that we have fallen in love with.
In software engineering there is no silver bullet or one size fits all solution.
Each approach, each paradigm, and each technology solves a certain set of problems and introduces another one - which is one of the reasons why we have so many different technologies around in the first place.
Software engineering is about solving problems, it is about intensive thinking, reasoning, and it is about making compromises - we get something, we lose something.
The nature of our business dictates how tolerant we are of the difference
between what we get and what we lose.
This is why I think we should strive to achieve an unbiased and open-minded attitude.
It is not always the case that our code needs to satisfy all SOLID principles, that it needs to have O(n) time complexity instead of O(n^2), that it needs to be super-clean, or that every single bit of it needs to be tested.
It is also not always the case that we need to use MongoDB, that we need to use Typescript, that we need to use type hints in Python, that we need to use Angular, React or Vue, and so on.
Let's use REST on every web API? - No.
Let's use GraphQL on every web API? - No.
Let's use the same technology on each microservice? - No.
Let's try to fit each design pattern everywhere we can? - No.
Windows is bad, let's use Linux (or the opposite)? - No.
The answer should not literally be "no", I hope you get the point.
However, this is my personal opinion and the way I look at things.
Back To The Subject
Notice the nature of the applications at the beginning of the article.
It would be enough for the server to somehow push foreign exchange rates, scores, bids, arrival and departure times, or any kind of notifications, to the clients.
The flow would be unidirectional, there would be no need for the client to send data to the server in real-time, only receive.
WebSockets is a really powerful technology but it is mostly meant for bi-directional, full-duplex communication.
Why would you pay for the implementation that you are never going to use?
Say that you are flying from London to Singapore. You have had it enough of London, and you decided to never come back. Does it make sense to pay for a two-way ticket?
Additionally, WebSockets is a more complex technology than something like Server-Sent Events which we will discuss in a moment.
Why introduce complexity in cases where we can go with simplicity?
As Leonardo da Vinci said:
Simplicity is the ultimate sophistication.
Traditions First
The traditional HTTP model of communication is client-driven, and this has proven to be a limitation in scenarios where clients need to be aware of data that is either being generated or that is changing frequently.
What do I mean by client-driven?
This means that the client is always the one that initiates the communication - it sends an HTTP request, the server processes it and sends back an HTTP response.
The limitation lies in the fact that servers can not proactively push data to clients without being explicitly asked to.
If clients needed to be aware of any new data, the only way to achieve that was to request it from the server at certain intervals.
The above is a technique called server polling, which was a way to emulate server-push before a true server-push technology like Server-Sent Events became a standard.
Before we talk about Server-Sent Events, let us first have an overview of server polling techniques - polling and long polling.
Polling
As we already said, server polling is a technique where the client polls the server at regular intervals in order to retrieve new data.
This basically means that the client keeps sending an AJAX request at some interval T.
The client sends an AJAX request, and the server immediately responds either with new data or with an empty response.
One issue with this approach is that the connection needs to be opened and closed on each cycle.
If the client is polling very frequently, if T = 1s for example, you can imagine that it might not be an ideal thing resource-wise.
In case that the server frequently sends an empty response, there is also the issue of headers overhead.
This is especially worth mentioning if the headers of the response are bigger than the response body - the actual data that we are expecting.
Long Polling
A slightly improved version of polling is long polling which enables us to emulate server-push.
The client sends an AJAX request. The server, however, does not immediately send an empty response if there is no new data, instead it chooses to hold the connection open until new data is available.
When the data becomes available, the server sends back the response, the client immediately sends a new AJAX request, and by doing that repeats the flow.
With this approach we solved the issue of headers overhead - the amount of data that needs to be transferred is reduced because the server does not send back an empty response if there is no new data.
If new data is not being generated very frequently we also reduce the number of times that the connection needs to be opened and closed.
However, we are still opening the connection prior to each request and closing it after each response and we still do not have a true server push technology.
What if we could do it only once, and have the server truly push the data instead of having clients pull it?
What if could have the scenario below?
Server-Sent Events
Server-Sent Events is a simple, elegant, and efficient technology that allows servers to push data to clients in real-time, via a single long-lived HTTP connection.
Imagine that an event gets raised on the server and then it gets handled in the browser as a regular DOM event - which is pretty neat you have to admit.
This is what Server-Sent Events conceptually is about.
The Server-Sent Events specification defines two things:
- EventSource API
- Event Stream format
Event Source API
EventSource API is a simple and intuitive browser API that is part of the HTML5 standard.
It allows us to subscribe to an event stream from the server and to register event handlers for different types of events - where the parsing of the events is handled by the browser itself.
It also makes our lives easier by abstracting away low-level connection management.
This is how we can open (subscribe to) and close an event stream:
const eventSource = new EventSource("www.example.com/event-stream");
eventSource.close();
This API has predefined events open, error and message, as well as their respective handlers:
eventSource.onopen = (event) => {
// Connection opened...
};
eventSource.onerror = (event) => {
if (event.readyState === EventSource.CLOSED) {
// Connection closed...
}
if(event.readyState === EventSource.CONNECTING){
// Attempting to reconnect...
}
};
eventSource.onmessage = (event) => {
// Do something...
}
We can also define event handlers for our custom events:
eventSource.addEventListener("myCustomEvent", (event) => {
// Do something...
}
You can take a look at the details here:
Event Source API
Event Stream Format
The Event Stream format is a simple format of UTF-8 encoded text data.
This is what an Event Stream message, that is, an event, looks like:
event: someEventName
data: someData
id: someId
retry: someRetryTime
These are the only fields that are allowed as per the specification.
Anything else will be ignored.
Each field is represented by the field name, followed by a colon, followed by the text data for that field's value.
Event
The event field is a string representing the type of the event.
If this field is included in the event, then we have to register an event handler for that particular event in the browser.
Otherwise, the default onmessage handler will be called.
Data
The data field is also a string representing the actual data.
We can specify multiple data fields inside a single event and the browser will merge those and parse them as a single data field.
Id
The id field is another thing where EventSource API shines.
The browser will remember the id of the last seen event.
If the connection is lost the browser will try to reconnect to the server and will send the ID of the last event that it has received. The way it does this is via the Last-Event-Id header.
On the server, we could keep track of how many messages the browser has seen, and in case of a connection interruption, we could gracefully resume the communication by sending all missed events starting from the last event that the browser has seen.
By doing this we would not lose any data.
Retry
The retry field is an integer representing the time in milliseconds after which the browser will attempt to reconnect in case that the connection is lost. If left unspecified, the browser will try to reconnect after ~3 seconds.
So, the basic flow looks like this:
- The browser initially sends a regular HTTP request to the event stream endpoint on the server
- The server then sends a response that contains the Content-Type header set to text/event-stream
- After that, the server can successfully stream UTF-8 encoded event data
Simple enough?
Let us take a look at a practical example.
I created a mini-application that shows how prices of some super-fake cryptocurrencies change in real-time.
I used Pug templating engine to initially populate the markup with existing cryptocurrencies, so in case you are not familiar with it just focus on the script section.
The client code:
doctype html
head
meta(charset='UTF-8')
meta(name='viewport' content='width=device-width, initial-scale=1.0')
title Document
div Cryptocurrencies:
ul
for cryptocurrency in cryptocurrencies
li
span=cryptocurrency.name
span=" - "
span(id=cryptocurrency.name + "Value")=cryptocurrency.value
script.
const eventSource = new EventSource("/event-stream");
eventSource.onopen = (event) => {
console.log("Connection established.");
};
eventSource.onerror = (event) => {
if (event.readyState === EventSource.CLOSED) {
console.log("Connection closed.");
}
if(event.readyState === EventSource.CONNECTING){
console.log("Attempting to reconnect...");
}
};
eventSource.addEventListener("cryptocurrencyChanged", (event) => {
const cryptocurrency = JSON.parse(event.data);
const valueDomElement = document.getElementById(`${cryptocurrency.name}Value`);
valueDomElement.innerText = cryptocurrency.value;
});
The server code:
const express = require("express");
const application = express();
application.set("view engine", "pug");
application.set("views", __dirname);
const cryptocurrencies = require("./cryptocurrencies.json");
application.get("/", (request, response) => {
response.render("index", { cryptocurrencies });
});
application.get("/event-stream", (request, response) => {
response.set({
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
});
application.on("cryptocurrencyChanged", (cryptocurrency) => {
response.write(`event: cryptocurrencyChanged\n`);
response.write(`data: ${JSON.stringify(cryptocurrency)}\n\n`);
});
setInterval(() => {
const cryptocurrency =
cryptocurrencies[Math.floor(Math.random() * cryptocurrencies.length)];
const randomValue = Math.round(Math.random() * 100) / 1000;
const shouldValueDrop = Math.random() <= 0.5 ? true : false;
const currentValue = cryptocurrency.value;
cryptocurrency.value = shouldValueDrop
? currentValue - randomValue
: currentValue + randomValue;
application.emit("cryptocurrencyChanged", cryptocurrency);
}, 1200);
});
application.listen(80, () => {
console.log(`Server running...`);
});
Since the code is relatively simple I will leave it up to you to analyze it a bit.
Can you perhaps notice which bit actually flushes, that is, sends, the event ?
I know, I know, what were they thinking?
The code is available here:
Code
Strengths And Weaknesses
- Server-Sent Events is a simple and efficient technology that allows us to push data from the server in real-time
- It provides low latency
- There is minimum message overhead
- The browser pretty much handles everything - automatic reconnection, parsing of the events, and so on
But, as I already said, there is no silver-bullet.
Here are the two weaknesses that I believe are the most important:
- The Event Stream protocol is designed to transfer UTF-8 text data.
Binary streaming is possible but it would not be as efficient as with some other technologies - Unless we are using HTTP2 the maximum number of open connections to a single domain, per browser, is relatively low - 6.
If you have clients like myself that tend to open 10 tabs of the same page, well that might be an issue.
And just to put this out, you do not need to use Server-Sent Events whenever you need a server-push technology.
Long polling is pretty decent in cases where there are no frequent data changes, and where you can tolerate some latency.
Think, discuss, implement.