Wire’s AVS team lead Julian Spittka discusses the challenges that led to a major upgrade to the calling protocol.
When our old calling protocol was designed we wanted to build the greatest calling experience for anyone at any time in any location. The assumption was that people use multiple devices, move between different networks, and like to switch from phone to laptop, from mobile data to wifi. These assumptions have not changed, but the technical solution for the new calling protocol is fundamentally different.
The previous calling protocol tried to maintain the call state on the server and on the clients (Wire app on your phone or laptop). The clients would synchronize with the server to update their local state. In theory the different devices of a user would all be in sync and everybody’s happy.
The reality was more complex — clients would constantly go offline and come back online for a variety of reasons. Every time this happened the call state needed to be re-synchronized. That led to slow response times and long call setup times. Also, what happens to the media stream when the signalling connection to the server is interrupted?
Each change to the protocol had to be implemented both on the client and on the server, making it harder to experiment and innovate quickly. We discovered all kinds of edge cases that quickly made the calling protocol bloated and challenging to maintain.
The other factor that motivated us to take another approach to call signalling was the introduction of end-to-end encryption (E2EE) in Wire in March 2016. The new messaging infrastructure of Wire allows users to send messages in a completely secure way while keeping conversations in sync between the multiple devices of the sender and receiver.
It was time to use the same infrastructure for call setup and authentication. This solved multiple problems in a very efficient way.
Call signalling events are automatically distributed to all devices by the E2EE messaging infrastructure. The call state resides on the clients. Moving the call state info from Wire servers to a local device meant synchronization was no longer needed.
Clients know best what state they are in. In case of something unexpected, a simple reset of the clients will fix the problem. Different clients may have different call states. This is expected and does not cause any issues.
If connections between call participants are verified, the call signalling is automatically verified as well and all signalling will be transmitted end-to-end encrypted and verified.
The fingerprint for a DTLS negotiation can be securely sent during call setup, and therefore potential man-in-the-middle attacks are avoided.
Call setup messages are reduced to the bare minimum. Calls are set up with a single round trip, and there are only a few call control messages. Therefore, robustness against packet loss is high while retransmissions are kept at a minimum.
Call setup messages are sent as end-to-end encrypted messages and Wire servers have no knowledge of the contents or type of these messages. That means the servers don’t have access to who called who, when and for how long. To further conceal any potential calling message patterns, messages to tear down a call are sent peer-to-peer as part of the media stream.
The new call protocol implementation relies only on the E2EE messaging infrastructure as a transport service. All functional implementations live in a lean implementation on the client. Therefore, maintenance is easy and rolling out new features is straightforward and fast.
Currently, the described calling protocol has been rolled out for all Wire 1:1 calls. The same principles will be applied to group calls. Our team is working on group call details and edge cases. The same reliable and high quality experience will be available for group calls soon.
— Julian Spittka, AVS team lead