Tech —

How Applidium reverse engineered Siri’s protocol

A mobile application development firm has reverse engineered the underlying …

How Applidium reverse engineered Siri's protocol

French mobile development company Applidium has reverse engineered the protocol that the iPhone 4S uses to communicate with Siri’s servers. Applidium’s developers have published a brief technical explanation of the protocol and some sample code that demonstrates how to use the service to tap into Siri's hosted speech-to-text conversion capabilities.

Their research shows that much of the heavy lifting for Siri's speech recognition is done on the server side—which suggests that it’s theoretically possible to build Siri speech recognition clients for other devices. When a user speaks to Siri on a 4S, the phone records the audio and compresses it with the Speex codec, an open audio format that is optimal for voice. The recording is transmitted to Apple’s servers as part of a specialized HTTP request. The servers send back a zlib-compressed binary plist that contains the response data.

The Applidium developers came up with a simple way to intercept Siri requests from an iPhone on a local network so that they could examine the outgoing data. They set up a fake DNS server that would cause the iPhone to send its Siri requests to their own server. The requests are SSL-encrypted, however, so they had to load a custom SSL root certificate onto the iPhone to get it to work with their server.

On their server, they ran a really simple HTTP proxy written in Ruby that relayed the requests on to Apple’s servers, but echoed the input and output to stdout so that they could see what was being said in both directions. In order to reproduce a custom Siri request, they first had to unravel the format. The actual request is unusual and has characteristics that don’t conform with the HTTP standard. It uses a custom HTTP method called ACE, has an arbitrarily high Content-Length value, and has a custom user agent string that identifies it as Assistant.

The request also includes a special header that contains a unique identifier for the device. The Applidium researchers found that the requests wouldn’t be processed unless that header provided a valid key from an actual iPhone 4S. That poses challenges for widespread adoption of third-party Siri client implementations.

Any Siri client will have to include a key from an actual iPhone 4S in order to operate. If a developer embeds a key in a Siri client application that is widely distributed to users, Apple could simply disable the application by locking out that key.

It’s not really clear how Apple would respond to unauthorized third-party Siri clients, but we suspect that the company would shut them down quickly. It could become difficult for Apple to manage and predict the load on its Siri servers if a large number of new users were to begin hammering the service from additional devices.

It’s clear that Apple’s decision to limit Siri to the iPhone 4S and not roll it out to previous iterations of the device was not motivated by the hardware constraints of older iPhones. That means it’s still theoretically possible for Apple to make the feature available to older iPhones in a future update.

Despite the device identifier restriction, individual developers who own iPhones will still be able to freely build on Applidium’s research and experiment with Siri client programming as long as Apple leaves the current protocol intact. Additional Siri functionality could potentially be unlocked with further effort. The code samples that Applidium has graciously published offer a good starting point, and although the code is largely undocumented, it is reasonably easy to follow.

Their sample Siri client is implemented in Ruby. They also provide code for a really simple command line tool that uses the Speex library to generate properly encoded audio data to transmit to Siri. You can see all of their sample code in their repository on GitHub.

Channel Ars Technica