author | Gabrielle Beck, Julia Len, Ian Miers, Matthew Green |
title | Fuzzy Message Detection |
howpublished | Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security |
year | 2021 |
where | Johns Hopkins University; Cornell University; University of Maryland |
url | https://eprint.iacr.org/2021/089.pdf |
Some background is needed to motivate this (excellent!) paper. End-to-end encryption is pretty normal now for anyone who wants it. (Use Signal! It’s a great message app regardless if you need the security promises.) However, there are many things a centralized messaging system could do if corrupted, without breaking E2E encryption:
Contrast the above with a fully decentralized messaging system like Tox: Doing any of the above bad things would require monitoring/interfering with the users’ internet connections via deep-packet-analysis, and, insofar as it’s possible to mask one’s IP address, any such adversary can be completely thwarted.
There are, however, a couple things a fully decentralized messenger can’t* do, which can be frustrating.
* “can’t” is a strong word; this depends a lot on one’s threat model. Briar actually does solve these problems under it’s particular threat model (which is decent!).
In short, it would solve a lot of UX and engineering problems if there was a way for a centralized message server to work without the ability to know who was talking to who. In fact, it’s not very difficult to occlude who a single message is from; Signal already does this (kinda, they tried). The hard part is occluding who the messages are for while still letting the server do its job.
Early versions of Cwtch solved this by saying “The server should send all the messages it gets to everyone.” This more or less worked in their system because the servers themselves weren’t fully centralized; the idea is there should be lots of small Cwtch servers around. Individual servers will only “subscribe” to the servers where they have conversations. I don’t want to get side-tracked discussing the particular threat models of Cwtch; it’s sufficient to say that this system doesn’t scale, or that it’s janky, whichever your prefer.
Fuzzy message detection is a system of asymmetric-key flagging with a controllable rate of false positives. That is to say,
is_recipient(key,message)
]]=p when the message isn’t actually for me.
When the message is actually for me, the probability is 1, but the server has no way of distinguishing that situation.Beck et al show a couple different ways of accomplishing these keys; the mathematics are sound and may be useful in other contexts too. More importantly (for me) is the question of what kinds of privacy guarantees the system actually gives.
The bad news is that, to provide the kind of scaling we’d hope for, p should be pretty small, on the order of 1/n (where n is the number of users, or messages, or whatever). But the smaller p gets (and it’s almost certainly less than 0.01), the worse the privacy it gives is, i.e. the more certain the server can be that the messages it’s sending are actually for the user in question. One might hope that this weren’t a problem; say for example p=10-5 but there are a million users so only a tenth of the messages I’m downloading are actually for me. Work showing that this idea is “wrong” has been limited, but attempting to apply differential privacy to FMD have mostly failed (That is to say, you can do it, but only by weakening your DP guarantees to the point where they don’t really matter much anymore).
I should go back to this paper now, in 2023. It’s a loose thread that may still have promise.