1 00:00:00,000 --> 00:00:18,702 *36C3 preroll music* 2 00:00:18,702 --> 00:00:23,372 Herald: Please put your hands together and give a warm round of applause to Will Scott. 3 00:00:23,373 --> 00:00:25,045 *Applause* 4 00:00:25,045 --> 00:00:25,635 Will Scott: Thank you. 5 00:00:25,635 --> 00:00:31,510 *Applause* 6 00:00:31,510 --> 00:00:42,960 Will: All right. Welcome. So. The basic structure of this talk is sort of twofold. 7 00:00:42,960 --> 00:00:50,979 The first thing is to provide an overview of the different mechanisms that exist in 8 00:00:50,979 --> 00:00:58,500 this space of secure communication and try to tease apart a bunch of the individual 9 00:00:58,500 --> 00:01:03,380 choices and tradeoffs that have to be made and the implications of them. Because a 10 00:01:03,380 --> 00:01:07,680 lot of times we talk about security or privacy as very broad terms that cover a 11 00:01:07,680 --> 00:01:12,939 bunch of individual things. And breaking that down gives us a better way to 12 00:01:12,939 --> 00:01:18,670 understand what it is we're giving up or whether or why these decisions actually 13 00:01:18,670 --> 00:01:24,079 get made for the systems that we end up using. And the way that it's going to sort 14 00:01:24,079 --> 00:01:29,509 of the arc that I'll cover is first trying to provide a sort of taxonomy or 15 00:01:29,509 --> 00:01:34,800 classification of a bunch of the different systems that we see around us. And from 16 00:01:34,800 --> 00:01:39,810 there identify the threats that we often are trying to protect against and the 17 00:01:39,810 --> 00:01:44,189 mechanisms that we have to mitigate those threats and then go into some of these 18 00:01:44,189 --> 00:01:48,599 mechanisms and look at what's happening right now on different systems. And by the 19 00:01:48,599 --> 00:01:53,420 end, we'll sort of be closer to the research frontier of what is still 20 00:01:53,420 --> 00:01:59,509 happening, where are places where we have new ideas, but there's still quite a high 21 00:01:59,509 --> 00:02:06,159 tradeoff to usability or for other reasons where these haven't gained mass adoption. 22 00:02:06,159 --> 00:02:11,330 So I'll introduce our actors: Alice and Bob. The basic structure for pretty much 23 00:02:11,330 --> 00:02:17,651 all of this is one to one messaging. So this is primarily systems that are 24 00:02:17,651 --> 00:02:21,220 enabling us to have a conversation that looks a lot like what we would have in 25 00:02:21,220 --> 00:02:25,720 person. That's sort of the thing that we're modelling is I want to have a 26 00:02:25,720 --> 00:02:30,130 somewhat synchronous real time communication over a span of weeks, 27 00:02:30,130 --> 00:02:35,240 months, years, resume it, and in the same way that in real life I know someone and I 28 00:02:35,240 --> 00:02:38,500 recognize them when I come and talk to them again I expect the system to give me 29 00:02:38,500 --> 00:02:41,810 similar sorts of properties. 30 00:02:41,810 --> 00:02:44,860 So the way we're going to then think about systems is 31 00:02:44,860 --> 00:02:51,920 initially, we have systems that look very much the same as how we would have a real 32 00:02:51,920 --> 00:02:59,630 life communication, where I can - on a local network - use AirDrop or use a bunch 33 00:02:59,630 --> 00:03:04,310 of things that just work directly between my device and a friend's device 34 00:03:04,310 --> 00:03:06,650 to communicate. 35 00:03:06,650 --> 00:03:09,870 On a computer, this might look like using Netcat or a command line 36 00:03:09,870 --> 00:03:14,790 tool to just push data directly to the other person. And this actually results in 37 00:03:14,790 --> 00:03:18,060 a form of communication that looks very similar. Right, it's ephemeral, it goes 38 00:03:18,060 --> 00:03:24,450 away afterwards unless the other person saves it. But there is already a set of 39 00:03:24,450 --> 00:03:27,260 adversaries or threats that we can think about how do we secure this sort of 40 00:03:27,260 --> 00:03:30,240 communication? 41 00:03:30,240 --> 00:03:34,590 One of those would be the network. So, can someone else see this 42 00:03:34,590 --> 00:03:39,320 communication and how do we hide from that? And we have mechanisms against that, 43 00:03:39,320 --> 00:03:43,910 namely encryption. Right, I can disguise my communication and encrypt it 44 00:03:43,910 --> 00:03:49,600 so that someone who is not my intended recipient cannot see what's happening. 45 00:03:49,600 --> 00:03:53,630 And then the other one would be the other...these end devices themselves. 46 00:03:53,630 --> 00:03:57,570 Right, so there's a couple of things that we need to think about when we think about 47 00:03:57,570 --> 00:04:00,330 what is it that we're trying to protect against on an end device. One is there 48 00:04:00,330 --> 00:04:05,682 might be other bad software that either, later gets installed and tries 49 00:04:05,682 --> 00:04:09,980 to steal or learn about what was said. 50 00:04:09,980 --> 00:04:12,240 Either, either at the same time or afterwards. 51 00:04:12,240 --> 00:04:15,460 And so we have mechanisms there. One of them would be message 52 00:04:15,460 --> 00:04:19,690 expiry. So we can make the messages go away, make sure we delete them from disk 53 00:04:19,690 --> 00:04:25,460 at some point. And the other would be making sure that we've sort of isolated 54 00:04:25,460 --> 00:04:28,810 our chats so that it doesn't overlap and other applications can't see what's 55 00:04:28,810 --> 00:04:32,440 happening there. 56 00:04:32,440 --> 00:04:36,470 So, we have these direct communication patterns but that's a small 57 00:04:36,470 --> 00:04:42,801 minority of most of what we think of when we chat. Instead, most of the systems that 58 00:04:42,801 --> 00:04:48,110 we're using online use a centralized server. There's some logically centralized 59 00:04:48,110 --> 00:04:52,940 thing in the cloud and I send my messages there and it then forwards them to my 60 00:04:52,940 --> 00:04:58,880 intended recipient. And so whether it's Facebook or WhatsApp or Signal or sorry, 61 00:04:58,880 --> 00:05:05,110 Slack or IRC or Signal or Wire or Threema or whatever, you know, cloud chat app 62 00:05:05,110 --> 00:05:12,740 we're using today, this same model applies. So we can identify additional 63 00:05:12,740 --> 00:05:19,810 threats here and then we can think about why we do this. So one threat is the 64 00:05:19,810 --> 00:05:24,150 network. And I'll tear that apart a little bit. You've got the local network that we 65 00:05:24,150 --> 00:05:28,850 had before. So someone who's on the network near the person who's sending 66 00:05:28,850 --> 00:05:33,280 messages or receiving messages, so someone else in the coffee shop, your local 67 00:05:33,280 --> 00:05:38,910 organization, your school, your work, you've got the Internet as a whole that 68 00:05:38,910 --> 00:05:44,770 messages are passing over. So the ISPs or the countries that you're in may want to 69 00:05:44,770 --> 00:05:48,840 look at or prevent you from sending messages. You've also got an adversary in 70 00:05:48,840 --> 00:05:54,040 the network, sort of local or near the server that can see most of the messages 71 00:05:54,040 --> 00:05:57,680 going in and out of the server because these services have to exist somewhere be 72 00:05:57,680 --> 00:06:04,200 that in a data center that they physically have computers in or in AWS or Google or 73 00:06:04,200 --> 00:06:08,530 one of these other clouds. And now you've got a set of actors that you need to think 74 00:06:08,530 --> 00:06:11,530 about that are near the server that can see most of the traffic going in and out 75 00:06:11,530 --> 00:06:13,526 of that server. 76 00:06:14,600 --> 00:06:17,610 We also have to think about the server itself as a potential 77 00:06:17,610 --> 00:06:21,870 adversary. There's a few different threats that we need to think about. The server 78 00:06:21,870 --> 00:06:27,060 could get threatened... could get hacked or otherwise compromised. So parts of 79 00:06:27,060 --> 00:06:32,450 the communication or bugs in the software can potentially be a problem. 80 00:06:32,450 --> 00:06:34,060 You've got a 81 00:06:34,060 --> 00:06:39,370 legal entity typically that is running this server. And so the jurisdiction 82 00:06:39,370 --> 00:06:44,180 that it's in can send requests to get data from users or to compel it to provide 83 00:06:44,180 --> 00:06:49,050 information. So there's this whole threat of what is the server required to turn 84 00:06:49,050 --> 00:06:55,210 over. And then you've got sort of how is the server actually or this company making 85 00:06:55,210 --> 00:06:58,690 money and sustaining itself. Is it going to get acquired by someone that you don't 86 00:06:58,690 --> 00:07:02,750 trust, even if you trust it now? So there's this future view of how do we 87 00:07:02,750 --> 00:07:08,809 ensure that the messages I have now don't get misused in the future? 88 00:07:08,809 --> 00:07:10,339 And we have a 89 00:07:10,339 --> 00:07:14,370 set of techniques that mitigate these problems as well. So one of them would 90 00:07:14,370 --> 00:07:18,980 be we can use traffic obfuscation or circumvention techniques to make our 91 00:07:18,980 --> 00:07:25,919 traffic look less obvious to the network. And that prevents a large amount of these. 92 00:07:25,919 --> 00:07:29,210 And then, I'm calling this server hardening but it's really a sort of a broad set of 93 00:07:29,210 --> 00:07:34,130 techniques around how do we trust the server less? And how do we make those 94 00:07:34,130 --> 00:07:38,930 potential compromises of the server, either code based or it having to reveal 95 00:07:38,930 --> 00:07:42,940 information less damaging? 96 00:07:44,080 --> 00:07:46,680 It's worth saying that there are a bunch of reasons 97 00:07:46,680 --> 00:07:50,760 why we have primarily used centralized messaging. 98 00:07:50,760 --> 00:07:53,290 You've got availability. It's 99 00:07:53,290 --> 00:07:58,260 very easy to go to a single place and it also makes a bunch of problems like 100 00:07:58,260 --> 00:08:04,040 handling multiple devices and mobile push in particular, because both Google and 101 00:08:04,040 --> 00:08:09,870 Apple expect or allocate sort of a single authorized provider who can send 102 00:08:09,870 --> 00:08:16,270 notifications to the app user's mobile devices. And so that sort of requires you 103 00:08:16,270 --> 00:08:20,040 to have a centralized place that knows when to send those messages if you want to 104 00:08:20,040 --> 00:08:24,440 provide real time alerts to your application users. 105 00:08:24,440 --> 00:08:25,830 The cons is that it is 106 00:08:25,830 --> 00:08:32,130 both cost, there's some entity now that is responsible for all of this cost 107 00:08:32,130 --> 00:08:35,769 and has to have a business model and also that there is a single entity that people 108 00:08:35,769 --> 00:08:41,170 can come to and that now faces the legal and regulatory issues. 109 00:08:41,170 --> 00:08:42,020 So this is not the 110 00:08:42,020 --> 00:08:46,610 only type of system we have, right? The next most common is probably federated. 111 00:08:46,610 --> 00:08:52,810 E-mail is a great example of this. An email is nice that now as a user I can 112 00:08:52,810 --> 00:08:58,240 choose an email provider that I trust out of many, or if I don't trust any of the 113 00:08:58,240 --> 00:09:02,830 ones that I see, I can even spin up my own with a small group so we can decentralize 114 00:09:02,830 --> 00:09:09,550 cost. We can make this more approachable. And so while I can gain more confidence in 115 00:09:09,550 --> 00:09:16,040 my individual provider, I don't have as much trust in, you know, is the recipient, 116 00:09:16,040 --> 00:09:21,520 is Bob in this case, I don't know how secure his connection is to his provider. 117 00:09:21,520 --> 00:09:26,220 Because we've separated and decentralized that. 118 00:09:26,220 --> 00:09:27,890 There's also a bunch of problems, 119 00:09:27,890 --> 00:09:35,160 both in figuring out identity and discovery securely and mobile push. But we 120 00:09:35,160 --> 00:09:38,530 have a number of successful examples of this. So beyond email, the Fediverse and 121 00:09:38,530 --> 00:09:43,810 Mastodon, Riot chat and even SMS are examples of federated systems where 122 00:09:43,810 --> 00:09:50,570 there's a bunch of providers and it's not a single central place. 123 00:09:50,570 --> 00:09:52,990 As you continue 124 00:09:52,990 --> 00:09:57,870 this sort of metaphor of splitting apart and decentralizing and reducing the trust 125 00:09:57,870 --> 00:10:01,510 in a single party, you end up with a set of decentralized messaging systems as 126 00:10:01,510 --> 00:10:07,420 well. And so it's worth mentioning that as we sort of get onto this fringe. There's 127 00:10:07,420 --> 00:10:11,430 sort of two types: One is using Gossip protocols. So things like Secure 128 00:10:11,430 --> 00:10:15,740 Scuttlebutt. And in those you connect to either the people around you or people 129 00:10:15,740 --> 00:10:20,210 that you know. And when you get messages, you gossip, you send them on to all of 130 00:10:20,210 --> 00:10:26,550 the people around you. And so messages spread through the network. That is still 131 00:10:26,550 --> 00:10:33,050 an area where we are learning the tradeoff of how much metadata gets leaked and 132 00:10:33,050 --> 00:10:41,450 things, but is nice in its level of decentralization. The others basically 133 00:10:41,450 --> 00:10:47,610 tried to make all of the users have some relatively low trusted participation in 134 00:10:47,610 --> 00:10:52,390 the serving infrastructure. And so you can think of this as evolving out of things 135 00:10:52,390 --> 00:10:57,610 like distributed hash tables that that are used in BitTorrent. You see something very 136 00:10:57,610 --> 00:11:05,760 similar in in things like ricochet or tox.chat, which will use either tor like 137 00:11:05,760 --> 00:11:10,740 relays for sending messages or have an explicit DHT for routing where all of the 138 00:11:10,740 --> 00:11:15,480 members provide some amount of lookup to help with discovery 139 00:11:15,480 --> 00:11:18,424 and finding other participants. 140 00:11:19,810 --> 00:11:24,870 OK, so let's now turn to some of these mechanisms that we've 141 00:11:24,870 --> 00:11:31,160 uncovered and we can start with encryption. So when you're sending 142 00:11:31,160 --> 00:11:37,030 messages to a server by default, there's no encryption. This is things like IRC. 143 00:11:37,030 --> 00:11:43,300 Email used to be primarily unencrypted and you can think of that like a postcard. So 144 00:11:43,300 --> 00:11:47,560 you've got a letter or a postcard in this case that you're sending. It has where 145 00:11:47,560 --> 00:11:53,060 that message is coming from, where it's going to and the contents. In contrast, 146 00:11:53,060 --> 00:11:58,059 when you use transport encryption -- and so this is now a standard for most of the 147 00:11:58,059 --> 00:12:01,080 centralized things. What that means is you're taking that postcard and you're 148 00:12:01,080 --> 00:12:06,200 putting it in an envelope that the network can't open. And that's what TLS and other 149 00:12:06,200 --> 00:12:11,709 forms of transport encryption are going to give you, is the network link just sees 150 00:12:11,709 --> 00:12:15,790 the source and destination. It sees there's a message coming between Alice and 151 00:12:15,790 --> 00:12:19,779 Facebook or whatever cloud provider, but can't look into that and see that that's 152 00:12:19,779 --> 00:12:23,950 really a message for Bob or what's being said. It just sees individuals 153 00:12:23,950 --> 00:12:30,570 communicating with that cloud provider. And so, you know, SMTPS, there are secure 154 00:12:30,570 --> 00:12:35,570 versions of IRC and e-mail and most other protocols are using transport security at 155 00:12:35,570 --> 00:12:41,730 this point. The thing that we have now is called end-to-end encryption or E2E, and so 156 00:12:41,730 --> 00:12:48,880 now the difference here is the message that Alice is sending is addressed to Bob. 157 00:12:48,880 --> 00:12:53,640 And it's encrypted so that the provider Facebook can't open that either and can't 158 00:12:53,640 --> 00:13:00,370 look at the contents. OK? So the network just sees a message going between Alice 159 00:13:00,370 --> 00:13:04,270 and Facebook still, but Facebook can't open that and actually see the contents of 160 00:13:04,270 --> 00:13:11,690 the message. And so end-to-end encryption has gained pretty widespread adoption. We 161 00:13:11,690 --> 00:13:16,330 have this in Signal, for the most part in iMessage, we have tools like PGP and GPG 162 00:13:16,330 --> 00:13:21,040 that are implementing forms of this. For messaging there's a few that are worth 163 00:13:21,040 --> 00:13:26,350 sort of covering in the space: the Signal protocol, which was initially called 164 00:13:26,350 --> 00:13:34,420 axolotl, is adopted in WhatsApp, in Facebook private messaging and sort of 165 00:13:34,420 --> 00:13:43,170 is... I guess it has generalized into something called the noise framework and 166 00:13:43,170 --> 00:13:50,161 is gaining a lot of adoption. OMEMO looks a lot like that specifically for XMPP, and 167 00:13:50,161 --> 00:13:56,310 so it is a specific implementation. The other one is called Off-The-Record or OTR 168 00:13:56,310 --> 00:14:03,519 and Off-The-Record sort of developed a little bit ... or independently from this, 169 00:14:03,519 --> 00:14:10,480 thinks a lot about deniability. I'm not going to go too deep into the specific 170 00:14:10,480 --> 00:14:14,649 nits of what these protocols are doing, but I guess the intuition is the hard 171 00:14:14,649 --> 00:14:20,240 parts here is not encrypting a message, but rather the hard parts is how do you 172 00:14:20,240 --> 00:14:24,170 send that first message and establish a session, especially if the other person is 173 00:14:24,170 --> 00:14:28,160 offline. So I want to start a communication. I type in the first message 174 00:14:28,160 --> 00:14:32,410 I'm sending to someone. I need to somehow get a key and then send a message that 175 00:14:32,410 --> 00:14:37,531 only that person can read and also establish this sort of shared secret. And 176 00:14:37,531 --> 00:14:41,529 doing all of that in one message or with the other device not online ends up being 177 00:14:42,055 --> 00:14:48,374 tricky. Additionally, figuring out the mapping between a user and their devices, 178 00:14:48,374 --> 00:14:53,437 especially as that changes and making sure you've appropriately revoked devices, 179 00:14:53,437 --> 00:14:59,210 added new devices without keys falling over or getting too many warnings to the 180 00:14:59,210 --> 00:15:04,609 error *ehm* too many warnings to the user ends up being a lot of the trick in these 181 00:15:05,207 --> 00:15:15,463 systems. There's two problems that sort of come into play when we start using an end. 182 00:15:15,463 --> 00:15:20,049 One is we need to think about connection establishment. So, so this is the problem 183 00:15:20,049 --> 00:15:27,140 of saying who is Bob? So, so I find a contact and I know them in some way by an 184 00:15:27,140 --> 00:15:33,880 email address, by a phone number. Signal uses phone numbers. You know, a lot of 185 00:15:33,880 --> 00:15:38,470 systems maybe use an email address. There's things like Threema that use a 186 00:15:38,470 --> 00:15:42,280 unique identifier that they generate for you. But somehow I have to go from that 187 00:15:42,319 --> 00:15:47,727 identifier to some actual key or some knowledge of of a cryptographic secret 188 00:15:47,727 --> 00:15:51,480 that identifies the other person. And I have figure out who I trust to do that 189 00:15:51,480 --> 00:15:59,024 mapping of of gaining this thing that I'm now using for encryption. And then also 190 00:15:59,080 --> 00:16:04,486 there's this "Well, how do we match?" So a lot of systems do this by uploading your 191 00:16:04,486 --> 00:16:10,420 address book or trying to match with existing contacts to solve the user- 192 00:16:10,455 --> 00:16:16,150 interface problem of discovery, which is: If they can already know the identifiers 193 00:16:16,275 --> 00:16:20,179 and have this mapping, then when someone new comes in they can suggest and have 194 00:16:20,256 --> 00:16:24,760 "prefound" these keys and you just sort of trust the server to hold this address book 195 00:16:24,972 --> 00:16:28,500 and to do this mapping between what they're using as their identifier and and 196 00:16:28,709 --> 00:16:33,860 the keys themselves that you're getting out. Signal is nice here, it says it's not 197 00:16:34,114 --> 00:16:38,850 uploading your contacts, which is true. They're uploading hashes of your phone 198 00:16:38,850 --> 00:16:43,430 number rather than the actual phone numbers. But but it's a similar thing. 199 00:16:43,430 --> 00:16:48,910 They've got a directory of known phone numbers. And then as people search, you'll 200 00:16:49,041 --> 00:16:54,680 search for a hash of the phone number and get back, you know, the key that you hope 201 00:16:54,680 --> 00:17:01,470 signal has correctly given you. So there's sort of a couple of ways that you reduce 202 00:17:01,661 --> 00:17:09,850 your trust here. Signal has been going down a path using SGX to raise the cost of 203 00:17:09,951 --> 00:17:16,408 attacks, oblivious RAM and a bunch of sort of systems mechanisms to reduce the 204 00:17:16,408 --> 00:17:22,319 costs... or increase the cost of attack against their discovery mechanism. The 205 00:17:22,361 --> 00:17:27,089 other way that you do this is you allow for people to use pseudonyms or anonymous 206 00:17:27,089 --> 00:17:32,439 identifiers. So wire you can just register on an anonymous email address. And now the 207 00:17:32,439 --> 00:17:37,710 cost to you is potentially less if that gets compromised. And it's worth noting 208 00:17:37,790 --> 00:17:42,950 Moxie will be talking tomorrow at 4:00 p.m. about the evolution of the space 209 00:17:42,950 --> 00:17:50,114 around Signal, so there's probably a bunch more depth there that you can expect. So 210 00:17:50,180 --> 00:17:54,590 what if we don't want to trust the server to do matchmaking? One of the early things 211 00:17:54,590 --> 00:18:00,400 that has been around is the web of trust around GPG. And this is the notion that. 212 00:18:00,400 --> 00:18:09,015 I, if I have in real life or otherwise associated an identifier with a key, I can 213 00:18:09,015 --> 00:18:15,770 publicly provide a signed statement saying that I trust that mapping and then people 214 00:18:15,830 --> 00:18:21,910 who don't know someone but have a link socially maybe can find these proofs and 215 00:18:21,910 --> 00:18:27,460 use that to trust this mapping. So I know an identifier and I know that I trust 216 00:18:27,460 --> 00:18:32,429 someone who has said, well, this is the key associate with that identifier and I 217 00:18:32,429 --> 00:18:37,360 can use that network to eventually find an identifier that that I'm willing to trust 218 00:18:37,360 --> 00:18:44,226 or a key that I'm willing to encrypt to. There's some user interface tradeoff here. 219 00:18:44,226 --> 00:18:49,960 This is a manual process in general. And this year we've had a set of denial-of- 220 00:18:49,960 --> 00:18:56,070 service attacks on the web-of-trust infrastructure. And so the the specific 221 00:18:56,070 --> 00:19:03,830 attack is that anyone can upload these attestations or trust, and so if a bunch 222 00:19:03,830 --> 00:19:08,140 of random users or sybils start uploading trusts, when you go to try and download 223 00:19:08,140 --> 00:19:12,480 this, you end up overwhelmed by the amount of information. And so the system does not 224 00:19:12,480 --> 00:19:17,330 scale because it's very hard to filter to people you care about without telling the 225 00:19:17,330 --> 00:19:20,200 system who you care about and revealing your network, which you're trying to 226 00:19:20,200 --> 00:19:29,180 avoid. Keybase takes another approach. They made the observation that when I go 227 00:19:29,180 --> 00:19:34,630 to try and talk to someone, what I actually care about is the person that I 228 00:19:34,630 --> 00:19:40,520 believe owns a specific GitHub or Twitter or other social profile. And so I can 229 00:19:40,520 --> 00:19:44,870 provide an attestation where I say: "Well, this is a key that's associated with the 230 00:19:44,870 --> 00:19:50,540 account that controls this Twitter account or this Reddit account or this, you know, 231 00:19:50,540 --> 00:19:55,890 Facebook account." And so by having that trust of proofs, I can connect an 232 00:19:55,890 --> 00:20:00,350 individual and a cryptographic identity with the person behind who has the 233 00:20:00,350 --> 00:20:08,160 passwords to a set of other systems. Keybase also this year began to provide a 234 00:20:08,160 --> 00:20:13,357 monetary incentive for users and then struggled with the number of sign ups. And 235 00:20:13,357 --> 00:20:17,150 so there's a lot of work in figuring out: "OK, do these identities actually 236 00:20:17,150 --> 00:20:21,920 correspond to real people and how do you prevent a similar denial-of-service--style 237 00:20:21,977 --> 00:20:30,910 attack that the web of trust faced in identifying things here?" On our devices, 238 00:20:30,910 --> 00:20:37,655 we end up in general resorting to a concept called tofu or Trust-On-First-Use, 239 00:20:37,655 --> 00:20:43,010 and what that means is when I first see a key that identifies someone, I'll save 240 00:20:43,010 --> 00:20:47,760 that. And if I ever get another need to communicate with that person again, I've 241 00:20:47,760 --> 00:20:50,850 already got a key and I can keep using that same key and expect that key to stay 242 00:20:50,850 --> 00:20:56,419 the same. And so that that continuation and the ability to pin keys once you've 243 00:20:56,419 --> 00:21:00,790 seen them means that if when you first establish a connection with someone, it's 244 00:21:00,790 --> 00:21:04,750 the real person, then someone who compromises them later can't take over or 245 00:21:04,750 --> 00:21:14,360 change that. Finally, one of the sort of exciting things that came out - this is 246 00:21:14,360 --> 00:21:21,049 circa 2015 and is largely defunct now - was a system by Adam Langley called Pond 247 00:21:21,049 --> 00:21:27,790 that looked at hardening a modern version of email. And one of the things that Pond 248 00:21:27,790 --> 00:21:33,470 did was it had something called a password authenticated key exchange. And so this is 249 00:21:33,470 --> 00:21:40,220 an evolving cryptographic area where you're saying if two people can start with 250 00:21:40,220 --> 00:21:48,140 some weak shared secret - So I can perhaps publicly or in plain text ask the 251 00:21:48,140 --> 00:21:53,483 challenge, the other person: "Where were we at a specific day?" And so now we both 252 00:21:53,483 --> 00:21:57,450 know something that maybe has a few bits of entropy, at least. If we can write the 253 00:21:57,915 --> 00:22:04,700 same textual answer, we can take that, run a key derivation function to end up with a 254 00:22:04,700 --> 00:22:09,620 larger amount of shared entropy and use that as a bootstrapping method to do a key 255 00:22:09,620 --> 00:22:13,120 exchange and end up finding a strong cryptographic identity for the other 256 00:22:13,120 --> 00:22:22,049 person. So Pond has a system that they call Panda for linking to individuals 257 00:22:22,049 --> 00:22:25,960 based on a challenge response and this is also something that you'll find in off- 258 00:22:25,960 --> 00:22:32,309 the-record systems around Jabber. The other thing that we need to be careful 259 00:22:32,309 --> 00:22:37,750 about in end-to-end--encrypted systems is deniability. When I'm chatting one on one 260 00:22:37,750 --> 00:22:46,049 with someone, that conversation is eventually fairly deniable. Either a 261 00:22:46,049 --> 00:22:49,929 person can have their recollection of what happened and there's no proof that the 262 00:22:49,929 --> 00:22:55,220 other person said something unless you've recorded it or otherwise, you know, 263 00:22:55,220 --> 00:22:58,490 brought some other technology into play. But with an encrypted thing where I've 264 00:22:58,490 --> 00:23:02,790 authenticated the other person, I end up with a transcript - potentially - that, 265 00:23:02,790 --> 00:23:09,400 you know, I can turn over later and say, look, this person said this. And and, you 266 00:23:09,400 --> 00:23:13,419 know, we've seen recently that things like emails that come out are authenticated in 267 00:23:13,419 --> 00:23:20,610 this way. The DKIM system that authenticates email senders showed up in 268 00:23:20,610 --> 00:23:26,554 the WikiLeak's releases of Hillary Clinton's emails and was able to say: 269 00:23:26,554 --> 00:23:30,099 "Look the text in these hasn't been changed." And it was signed by the real 270 00:23:30,099 --> 00:23:36,351 server that we would expect. So the thing that we get from Off-The-Record and the 271 00:23:36,351 --> 00:23:42,076 Signal protocol is something called deniability or reputability. And this 272 00:23:42,076 --> 00:23:48,004 plays into a concept of a forward secrecy, which is: We're going to sort of throw 273 00:23:48,004 --> 00:23:54,590 away stuff afterwards in a way that our chat goes back to being more ephemeral. 274 00:23:54,590 --> 00:23:57,620 And so we can think about this in two ways. There's actually two properties that 275 00:23:57,620 --> 00:24:03,470 interlink in this: We have keys that we're using to form our shared session that 276 00:24:03,470 --> 00:24:10,620 we're expecting to use to have our secret message. And each time I send a message, 277 00:24:10,814 --> 00:24:15,980 I'm going to also provide some new key material and begin changing that secret 278 00:24:16,152 --> 00:24:21,540 key that we're using. So I provide a next key. And when Bob replies, he's going to 279 00:24:21,540 --> 00:24:26,645 now use my next key as part of that and give me his next key. And the other thing 280 00:24:26,645 --> 00:24:31,295 that I can then do is when I send a message, I can provide the secret bit of 281 00:24:31,295 --> 00:24:35,220 my previous key. So I can say: "My last private key that I used to send you that 282 00:24:35,220 --> 00:24:41,190 previous message was this." And now at the end of our conversation, we both know all 283 00:24:41,190 --> 00:24:46,030 of the private keys such that we both could have created that whole conversation 284 00:24:46,030 --> 00:24:53,610 on our own computer. At any given time, it's only the most recent message that is 285 00:24:53,610 --> 00:24:57,190 that only could have been sent by the other person and the rest of the 286 00:24:57,190 --> 00:25:03,570 transcript that you have is something you could have generated yourself. There is a 287 00:25:03,570 --> 00:25:07,980 talk on day three about Off-The-Record v4, the fourth version of that, that will go 288 00:25:07,980 --> 00:25:13,650 deeper into that, that's at 9:00 p.m. in the about:freedom assembly. So I encourage 289 00:25:13,650 --> 00:25:20,090 you to do that if you're interested in this. OK. The next one to talk about is 290 00:25:20,090 --> 00:25:28,150 expiry. This is sort of a follow on to this concept of forward secrecy. But 291 00:25:28,178 --> 00:25:31,720 there's sort of two attacks here to consider. One is something that we should 292 00:25:31,720 --> 00:25:37,299 maybe, I guess, give credit to Snapchat for popularizing, which is this concept of 293 00:25:37,299 --> 00:25:42,200 "the message goes away after some amount of time". And really, this is protecting 294 00:25:42,200 --> 00:25:46,289 against not fully trusting the other person from like sharing it later or 295 00:25:46,289 --> 00:25:51,419 sharing in a way you didn't attend *ehm* intent. And this is also like a snapshot 296 00:25:51,419 --> 00:25:57,190 adversary. So a bunch of apps will alert the other participant if you take a 297 00:25:57,190 --> 00:26:01,610 screenshot. This is why some apps will blank the screen when they go to the task 298 00:26:01,610 --> 00:26:07,919 switcher. So if you're swapping between apps, you'll see that some of your 299 00:26:07,919 --> 00:26:12,210 applications will just show a blank screen or will not show contents. And that's 300 00:26:12,210 --> 00:26:16,030 because the mobile operating systems APIs don't tell them when you're in that mode 301 00:26:16,030 --> 00:26:19,090 when you take a screenshot and so they want to just be able to notify you if the 302 00:26:19,090 --> 00:26:23,500 other person does. It's worth noting that this is all just raising the cost of these 303 00:26:23,500 --> 00:26:27,720 attacks and providing sort of a social incentive not to, right. I can still use 304 00:26:27,720 --> 00:26:31,740 another camera to take a picture of my phone and get evidence of something that 305 00:26:31,740 --> 00:26:39,200 has been said. But it's discouraging it and setting social norms. The other reason 306 00:26:39,200 --> 00:26:44,289 for expiry is: After the fact, a compromise of a device, so whether that's 307 00:26:44,289 --> 00:26:49,190 - you know, someone gets hold of the device and tries to do forensic analysis 308 00:26:49,190 --> 00:26:54,539 to pull off previous messages or the chat database or whether someone tries to 309 00:26:54,539 --> 00:26:59,770 install an application that then scans through your phone... So that's Fengcai is 310 00:26:59,770 --> 00:27:05,549 a application that's been installed as a surveillance app in China. And this also 311 00:27:05,549 --> 00:27:10,006 boils down to a user interface and user experience question, which is how longer 312 00:27:10,006 --> 00:27:13,480 you're going to save logs, how much history are you going to save and what 313 00:27:13,480 --> 00:27:19,493 norms are you going to have? And there's there's a tradeoff here. It's useful 314 00:27:19,493 --> 00:27:24,549 sometimes to scroll back. And especially for companies that believe that they have 315 00:27:24,549 --> 00:27:31,560 value added services around being able to do data analytics on your chat history. 316 00:27:31,560 --> 00:27:40,140 They're wary of getting rid of that. The next thing that we have is isolation and 317 00:27:40,140 --> 00:27:47,650 OS sand boxing. Right. So this is a lot of this is up one layer, which is what is the 318 00:27:47,650 --> 00:27:53,049 operating system doing to secure your application, your chat system from the 319 00:27:53,049 --> 00:27:58,650 other things, the malware or the compromises of the the broader device that 320 00:27:58,650 --> 00:28:06,750 it's running on. We have a bunch of projects around us at Congress that are 321 00:28:06,750 --> 00:28:11,440 innovating on this. There are chat systems that also attempt to do this sort of on 322 00:28:11,440 --> 00:28:16,270 their own. One sort of extreme example is called tinfoil chat, which makes use of 323 00:28:16,270 --> 00:28:21,240 three devices and a physical diode which is designed to have one device that is 324 00:28:21,240 --> 00:28:25,600 sending messages and another device that is receiving messages. And the thought is: 325 00:28:25,600 --> 00:28:30,559 if you receive a message that somehow compromises the device, the malware or the 326 00:28:30,559 --> 00:28:36,580 malicious file can never get any communication back out and so becomes much 327 00:28:36,580 --> 00:28:41,850 less valuable to have compromised. And they implement this with like a physical 328 00:28:41,850 --> 00:28:53,690 hardware diode. The other side of this is recovery and backups. Which is you've got 329 00:28:53,690 --> 00:29:01,169 a user experience tradeoff between a lot of people losing their devices and wanting 330 00:29:01,169 --> 00:29:04,960 to get back their contact list or their chat history and the fact that now you're 331 00:29:04,960 --> 00:29:08,120 keeping this extra copy and have this additional place for things to get 332 00:29:08,120 --> 00:29:15,290 compromised. Apple has done a lot of work here that we don't look out so much. They 333 00:29:15,290 --> 00:29:19,640 gave a blackout talk a few years ago where they discuss how they use custom hardware 334 00:29:19,640 --> 00:29:25,380 security modules in their data centers, much like the T2 chip. In the end, devices 335 00:29:25,380 --> 00:29:30,990 that will hold the backup keys that get used for their iclub backups and do 336 00:29:30,990 --> 00:29:36,870 similar amounts of rate limiting. And they consider a set of - a pretty wide set of 337 00:29:36,870 --> 00:29:40,530 adversaries - more than we might expect. So including things like what happens when 338 00:29:40,530 --> 00:29:46,520 the government comes and asks us to write new software to compromise this? And so 339 00:29:46,520 --> 00:29:51,840 they set up their HSMs such that they cannot provide software updates to them, 340 00:29:51,840 --> 00:29:56,900 which is, you know, a sort of a step of how do you do this cloud security side 341 00:29:56,900 --> 00:30:03,650 that we don't think about as much. So there's a set of slides that you can find 342 00:30:03,650 --> 00:30:09,110 from from this. And these slides will be online, too, as a pointer to to look at 343 00:30:09,110 --> 00:30:14,380 their solution, which considers a large number of adversaries that you might not 344 00:30:14,380 --> 00:30:28,220 have thought about. So traffic obfuscation is primarily a network side adversary. The 345 00:30:28,220 --> 00:30:31,799 technique that is getting used as sort of what people are using if they feel they 346 00:30:31,799 --> 00:30:37,350 need to do this, is something called domain fronting and domain fronting, had 347 00:30:37,350 --> 00:30:42,510 its heyday maybe in 2014 ish and has become somewhat less effective, but it's 348 00:30:42,510 --> 00:30:50,110 still effective enough for most of the chat things. The basic idea behind domain 349 00:30:50,110 --> 00:30:55,440 fronting is that there's a separation of layers behind that envelope and the 350 00:30:55,440 --> 00:31:02,240 message inside of it that we get in HTTP in the Web. So when I create a secure 351 00:31:02,240 --> 00:31:09,059 connection to a CDN to a content provider like Amazon or Google or Microsoft, I can 352 00:31:09,059 --> 00:31:14,030 make that connection and do perform the security layer and provide a fairly 353 00:31:14,030 --> 00:31:19,100 generic service that I'm connecting to. I just want to establish a secure connection 354 00:31:19,100 --> 00:31:23,580 to CloudFlare. And then once I've done that, the message that I can send inside 355 00:31:23,580 --> 00:31:27,399 can be a chat message to a specific customer of that CDN or that cloud 356 00:31:27,399 --> 00:31:35,440 provider. And so this is an effective way to prevent the network from knowing what 357 00:31:35,440 --> 00:31:41,659 specific service you're accessing. It got used for a bunch of circumvention things. 358 00:31:41,659 --> 00:31:45,620 It then got used for a bunch of malware things and this caused a bunch of the 359 00:31:45,620 --> 00:31:52,480 cloud providers to stop allowing you to do this. But it's still getting used. This is 360 00:31:52,480 --> 00:31:56,330 still what sort of happening when you turn on certain censorship circumvention in 361 00:31:56,330 --> 00:32:01,770 signal, it's what telegram is using for the most part. And it's the same basic 362 00:32:01,770 --> 00:32:08,290 technique is getting another revival with DNS over HTTPS and encrypted SNI 363 00:32:08,290 --> 00:32:15,300 extensions to TLS which allow for a standardized approach to establish a 364 00:32:15,300 --> 00:32:19,760 connection to a service without providing any specific identifiers to the network 365 00:32:19,760 --> 00:32:26,159 for which service you want to connect to. It's worth sort of mentioning that 366 00:32:26,159 --> 00:32:33,640 probably the most active chat service for this sort of obfuscation or circumvention 367 00:32:33,640 --> 00:32:39,380 is telegram, which has a bunch of users in countries that are not fans of having lots 368 00:32:39,380 --> 00:32:44,929 of users of telegram. And so they have both systems where they can bounce between 369 00:32:44,929 --> 00:32:49,343 IPs very quickly and change where their servers appear to be. And they've also 370 00:32:49,343 --> 00:32:55,299 used techniques like sending messages over DNS tunnels to mitigate some of these 371 00:32:55,299 --> 00:33:01,510 censorship things From the provider's perspectives this is really accessing 372 00:33:01,510 --> 00:33:05,700 their user population. They're not really thinking about your local network or 373 00:33:05,700 --> 00:33:09,220 caring about that as much as as much as they are like, oh, there's millions of 374 00:33:09,220 --> 00:33:16,570 users that should probably still have access to us. So we can maybe hide the 375 00:33:16,570 --> 00:33:21,809 characteristics of traffic in terms of what specific service we're connecting. 376 00:33:21,809 --> 00:33:25,840 There's some other things about traffic, though, that also are revealing to the 377 00:33:25,840 --> 00:33:28,990 network. And this is sort of this additional metadata that we need to think 378 00:33:28,990 --> 00:33:36,039 about. So one of these is padding or the size of messages can be revealing. So one 379 00:33:36,039 --> 00:33:39,350 sort of immediate thing is the size of a chat or a text message is going to be very 380 00:33:39,350 --> 00:33:45,700 different from the size of an image or voice or movies. And you see this on 381 00:33:45,700 --> 00:33:49,059 airplanes or in other bandwidth limited settings: they might allow text messages 382 00:33:49,059 --> 00:33:56,270 to go through, but images won't. There's been research that shows, for instance, on 383 00:33:56,270 --> 00:34:02,840 voice, even if I encrypt my voice, we've actually gotten really good at compressing 384 00:34:02,840 --> 00:34:07,580 audio of human speech. So much so that different phonemes, different sounds that 385 00:34:07,580 --> 00:34:13,799 we make take up different sizes. And so I can say something, compress it, encrypt it 386 00:34:13,799 --> 00:34:20,169 and then recover what was said based on the relative sizes of different sounds. So 387 00:34:20,169 --> 00:34:25,240 there was there was a paper in 2011 that Oakland S&P that demonstrated this 388 00:34:25,240 --> 00:34:33,159 potential for attacks. And so what this is telling us perhaps is that there's a 389 00:34:33,159 --> 00:34:39,639 tradeoff between how efficiently I want to send things and how much metadata or 390 00:34:39,639 --> 00:34:44,760 revealing information for distinguishing them I'm giving up. So I can use a less 391 00:34:44,760 --> 00:34:49,579 efficient compression that's constant bit rate or that otherwise is not revealing 392 00:34:49,579 --> 00:34:52,469 this information, but it has higher overhead and won't work as well in 393 00:34:52,469 --> 00:34:58,539 constrained network environments. The other place this shows up is just when 394 00:34:58,539 --> 00:35:04,839 people are active. So if I can look at when someone is tweeting or when messages 395 00:35:04,839 --> 00:35:10,785 are sent, I can probably figure out pretty quickly what timezone they're in. Right. 396 00:35:10,785 --> 00:35:17,009 And so this leads to a whole set of these metadata based attacks. And in particular, 397 00:35:17,009 --> 00:35:21,509 there's confirmation attacks and intersection attacks. And so intersection 398 00:35:21,509 --> 00:35:26,780 attacks is looking at the relative activity of multiple people and trying to 399 00:35:26,780 --> 00:35:32,519 figure out: OK, when Alice sent a message, who else was online or active at the same 400 00:35:32,519 --> 00:35:37,190 time? And over time, can I narrow down or filter to specific people that were likely 401 00:35:37,190 --> 00:35:45,269 who Alice was talking to? Pond also is a service to look at or a system to look at 402 00:35:45,269 --> 00:35:51,969 in this regard. Their approach was that a client would hopefully be always be online 403 00:35:51,969 --> 00:35:57,609 and would at a regular pattern check in with the server with the same amount of 404 00:35:57,609 --> 00:36:01,980 data, regardless of whether there was a real message to send or not. So that from 405 00:36:01,980 --> 00:36:07,089 the network's perspective, every user looked the same. The downside being that 406 00:36:07,089 --> 00:36:12,579 you've now got this message being sent by every client every minute or so and that 407 00:36:12,579 --> 00:36:19,300 creates a huge amount of overhead of, you know, just padded data that doesn't have 408 00:36:19,300 --> 00:36:27,559 any meaning. So finally, I'll take a look at server hardening and the things that 409 00:36:27,559 --> 00:36:33,259 we're doing to reduce trust in the server. There's a few examples of why we would 410 00:36:33,259 --> 00:36:37,759 want to do this. So one is that you've had messaging servers, plenty of times, that 411 00:36:37,759 --> 00:36:46,690 have not been as secure as they claim. One example being that there was a period 412 00:36:46,690 --> 00:36:52,739 where the Skype subsidiary in China was using a blacklist of keywords on the 413 00:36:52,739 --> 00:36:57,779 server to either prevent or intercept some subset of their users messages without 414 00:36:57,779 --> 00:37:03,650 telling anyone that they were doing that. And then also just sort of this uncertain 415 00:37:03,650 --> 00:37:07,999 future of, OK, I trust the data now, but what can we do so that I don't worry about 416 00:37:07,999 --> 00:37:14,890 what the corporate future of this service entails for my data. One of the sort of 417 00:37:14,890 --> 00:37:20,670 elephants in the room is: the software development is probably pretty 418 00:37:20,670 --> 00:37:25,339 centralized. So even if I don't trust the server, there's some pretty small number 419 00:37:25,339 --> 00:37:29,180 of developers who are writing the code. And how do I trust that the updates that 420 00:37:29,180 --> 00:37:33,229 they are making to this, either the server or to my client that they pushed my client 421 00:37:33,229 --> 00:37:39,339 isn't reducing my security. Open source is a great start to mitigating that, but it's 422 00:37:39,339 --> 00:37:45,581 certainly not solving all of this. So one thing, one way we can think about how we 423 00:37:45,581 --> 00:37:49,749 reduce trust in the server is by looking at what the server knows after end to end 424 00:37:49,775 --> 00:37:53,969 encryption. It knows things about the size. It knows where the message is coming 425 00:37:53,969 --> 00:37:58,385 from. It knows where the message is going to. Size: we've talked about some of these 426 00:37:58,385 --> 00:38:03,300 padding things that we can use to mitigate. So how do we reduce the amount 427 00:38:03,300 --> 00:38:06,720 of information about sources and destinations in this network graph that 428 00:38:06,720 --> 00:38:13,240 the server knows? So this is a concept called linkability, which is being able to 429 00:38:13,240 --> 00:38:21,690 link the source and destination of a message. We start to see some mitigations 430 00:38:21,690 --> 00:38:27,640 or approaches to reducing linkability entering mainstream systems. So Signal has 431 00:38:27,640 --> 00:38:32,260 a system called "Sealed Sender" that you can enable, where the source of the 432 00:38:32,260 --> 00:38:37,489 message goes within the encrypted envelope. So that Signal doesn't see that. 433 00:38:37,489 --> 00:38:42,089 The downside being that Signal is still seeing your IP address but the thought is 434 00:38:42,089 --> 00:38:46,559 that they will throw those out relatively quickly and so they will have less logs 435 00:38:46,559 --> 00:38:53,099 about this source to destination. Theoretically, though, there is a bunch of 436 00:38:53,099 --> 00:38:59,160 work in this. The first thing I'll point to is a set of systems that we classify as 437 00:38:59,160 --> 00:39:07,819 mixnets. A mixnet works by having a set of providers rather than a single entity 438 00:39:07,819 --> 00:39:12,800 that's running the servers. A bunch of users will send messages to the first 439 00:39:12,800 --> 00:39:16,670 provider, which will shuffle all of them and send them to the next provider, which 440 00:39:16,670 --> 00:39:20,640 will shuffle them again and send them to a final provider that will shuffle them and 441 00:39:20,640 --> 00:39:25,599 then be able to send them to destinations. And this de-links. Where none of the 442 00:39:25,599 --> 00:39:31,519 individual providers know both the source and destination of these messages. So this 443 00:39:31,519 --> 00:39:39,750 looks maybe a bit like Tors onion routing, but differs in in sort of a couple of 444 00:39:39,750 --> 00:39:44,799 technicalities. One is typically, you will wait for some number of messages rather 445 00:39:44,799 --> 00:39:49,719 than just going through with bandwidth and low latency. And so by doing that, you can 446 00:39:49,719 --> 00:39:53,920 get a theoretical guarantee that this batch had at least n messages that got 447 00:39:53,920 --> 00:39:58,400 shuffled and therefore you can prevent there being some time where only one user 448 00:39:58,400 --> 00:40:05,400 was using the system. And so you got a stronger theoretic guarantee. There's an 449 00:40:05,400 --> 00:40:09,779 active project making a messaging system using mixnets called Katzenpost. They gave 450 00:40:09,779 --> 00:40:14,150 a talk at Camp this summer and I'd encourage you to look at their website or 451 00:40:14,150 --> 00:40:22,679 go back to that talk to learn more about mixnets. The project that I was, I guess, 452 00:40:22,679 --> 00:40:26,319 tangentially helping with is in a space called private information retrieval, 453 00:40:26,319 --> 00:40:33,410 which is another technique for doing this delinking. Private information retrieval 454 00:40:33,410 --> 00:40:37,559 frames the question a little bit differently. And what it asks is: if I 455 00:40:37,559 --> 00:40:41,669 have a server that has a database of messages and I want a client to be able to 456 00:40:41,669 --> 00:40:45,539 retrieve one of those messages without the server knowing which message the client 457 00:40:45,539 --> 00:40:55,199 got or asked for. So this sounds maybe hard. I can give you a straw man to 458 00:40:55,199 --> 00:40:59,150 convince yourself that this is doable and the straw man is: I can ask the server for 459 00:40:59,150 --> 00:41:04,069 its entire database and then take the message that I want and the server hasn't 460 00:41:04,069 --> 00:41:08,349 learned anything about which message I cared about. But I spent a lot of network 461 00:41:08,349 --> 00:41:13,899 bandwidth probably doing that. So there's a couple of constructions for this. I'm 462 00:41:13,899 --> 00:41:20,029 going to focus on the information theoretic private information retrieval. 463 00:41:20,029 --> 00:41:24,920 And so we're going to use a similar setup to what we had in our threat model for a 464 00:41:24,920 --> 00:41:29,680 mixed net, which is we've got a set of providers now that have the same database. 465 00:41:29,680 --> 00:41:34,869 And I'm going to assume that they're not all talking to each other or colluding. So 466 00:41:34,869 --> 00:41:40,200 I just need at least one of them, to be honest. And one of the things that we'll 467 00:41:40,200 --> 00:41:44,749 use here is something called the exclusive or operation. To refresh your memory here 468 00:41:44,749 --> 00:41:50,711 exclusive or is a binary bitwise operation. And the nice property that we 469 00:41:50,711 --> 00:41:55,949 get is if I xor something with itself, it cancels out. So if I have some piece of 470 00:41:55,949 --> 00:42:02,970 data and I xor it against itself, it just goes away. So if I have my systems that 471 00:42:02,970 --> 00:42:11,430 have the database, I can ask each one to give me a superposition of some random 472 00:42:11,430 --> 00:42:17,249 subset of its database so I can ask the first server, give me items for 11, 14 and 473 00:42:17,249 --> 00:42:23,549 20 xor together. I'm assuming all of the items are the same size so that you can do 474 00:42:23,549 --> 00:42:31,069 these xors. And then if I structure that, it can appear to each server independently 475 00:42:31,069 --> 00:42:35,379 or as in the request that it sees that I just ask for some random subset. But I can 476 00:42:35,379 --> 00:42:39,019 do that so that when I xor the things I get back, everything just cancels out 477 00:42:39,019 --> 00:42:44,009 except the item that I care about. Unless you saw all of the requests that I made, 478 00:42:44,009 --> 00:42:49,140 you wouldn't be able to tell which item I cared about. So by doing this, I've 479 00:42:49,140 --> 00:42:53,949 reduced the network bandwidth. I'm only getting one item of size back from every 480 00:42:53,949 --> 00:43:00,050 server. Now, you might you might have a concern that I'm asking the server to do a 481 00:43:00,050 --> 00:43:03,720 whole lot of work here. It has to look through its entire database and compute 482 00:43:03,720 --> 00:43:09,519 this superposition thing. And that seems potentially like a lot of work, right. The 483 00:43:09,519 --> 00:43:14,660 thing that I think is exciting about this space is it turns out this sort of 484 00:43:14,660 --> 00:43:19,499 operation of going out to a large database and like searching for all of the things 485 00:43:19,499 --> 00:43:23,759 and then coming back with a small amount of data looks a lot like the hardware that 486 00:43:23,759 --> 00:43:29,510 we're building for A.I. and for a bunch of these sorts of search like things. And so 487 00:43:29,510 --> 00:43:34,479 this runs really quite well on a GPU where I can have all of those thousands of cores 488 00:43:34,479 --> 00:43:38,719 compute little small parts of the XOR and then pull back this relatively small 489 00:43:38,719 --> 00:43:43,160 amount of information. And so with GPUs, you can actually have databases of 490 00:43:43,160 --> 00:43:50,920 gigabytes, tens of gigabytes of data and compute these XORs across all of it in 491 00:43:50,920 --> 00:43:59,441 order of a millisecond or less. So a couple of things in this space. "Talek" is 492 00:43:59,441 --> 00:44:03,960 the system that I helped with that demonstrates this working. The converse 493 00:44:03,960 --> 00:44:08,900 problem is called private information storage. And that one is how do I write an 494 00:44:08,900 --> 00:44:14,150 item into a database without the database knowing which item I wrote, the 495 00:44:14,150 --> 00:44:20,209 mathematical construction there is not quite as simple to explain. But there's a 496 00:44:20,209 --> 00:44:26,039 pretty cool new work in the last month or two out of Dan Boneh and Henry Corrigan- 497 00:44:26,039 --> 00:44:34,680 Gibbs at Stanford called Express and Saba as first author that is showing how to 498 00:44:34,680 --> 00:44:44,380 fairly practically perform that operation. I'll finish just with a couple minutes on 499 00:44:44,380 --> 00:44:53,299 multiparty chat or group chat, so small groups. You've sort of got a choice here 500 00:44:53,299 --> 00:44:58,029 in terms of how assisted chat systems are implementing group chat. One is you can 501 00:44:58,029 --> 00:45:01,759 not tell the server about the group. And as someone who is part of the group, I 502 00:45:01,759 --> 00:45:05,729 just send the same message to everyone in the group. And maybe I can tag it for them 503 00:45:05,729 --> 00:45:10,009 so that they know it's part of the group or you do something more efficient where 504 00:45:10,009 --> 00:45:13,984 you tell the server about group membership and I send the message once to the server 505 00:45:13,984 --> 00:45:22,829 and it sends it to everyone in the group. Even if you don't tell the server about 506 00:45:22,829 --> 00:45:26,680 it, though, you've got a bunch of things to worry about leaked correlation, 507 00:45:26,680 --> 00:45:31,979 which is: if at a single time someone sends the same sized message to five other 508 00:45:31,979 --> 00:45:35,640 people and then later someone else sends the same sized message to five other 509 00:45:35,640 --> 00:45:39,360 people, and those basically overlap, someone in the network basically knows who 510 00:45:39,360 --> 00:45:42,839 the group membership is. So it's actually quite difficult to conceal group 511 00:45:42,839 --> 00:45:48,609 membership. The other thing that breaks down is our concept of deniability once 512 00:45:48,609 --> 00:45:52,929 again, which is now if multiple people have this log. Even if both of them 513 00:45:52,929 --> 00:45:56,799 individually could have written it, the fact that they have the same cryptographic 514 00:45:56,799 --> 00:46:04,329 keys from this other third party probably means that third party made that message. 515 00:46:04,329 --> 00:46:13,119 So there continues to be work here. Signal is working on providing again and SGX and 516 00:46:13,119 --> 00:46:16,510 centralized construction for grid management to be able to scale better, 517 00:46:16,510 --> 00:46:21,969 given I think the pretty realistic fact that the server in these cases is probably 518 00:46:21,969 --> 00:46:25,689 going to be able to figure out group membership in some case, you might as well 519 00:46:25,689 --> 00:46:32,019 make it scale. On the other side, one of the cool systems that's being prototyped 520 00:46:32,019 --> 00:46:39,969 is called "cwtch" out of open privacy. And this is an extension to ricochet that 521 00:46:39,969 --> 00:46:45,849 allows for offline messages and small group chats. It works for order of 5 to 20 522 00:46:45,849 --> 00:46:50,700 people, and it works by having a server that obliviously forwards on messages to 523 00:46:50,700 --> 00:46:55,599 everyone connected to it. So when I send a message to a group, the server sends the 524 00:46:55,599 --> 00:46:59,430 message to everyone it knows about, not just the people in the group, and 525 00:46:59,430 --> 00:47:03,609 therefore the server doesn't actually know the subgroups that exist. It just knows 526 00:47:03,609 --> 00:47:10,549 who's connected to it. And that's a neat way. It doesn't necessarily scale to large 527 00:47:10,549 --> 00:47:16,140 groups, but it allows for some concealing of group membership. They've got an 528 00:47:16,140 --> 00:47:22,299 Android prototype as well that's sort of a nice extension to make this usable. 529 00:47:22,299 --> 00:47:33,509 Wonderful. I guess the final thought here is: there's a lot of systems, I'm sure I 530 00:47:33,509 --> 00:47:40,339 haven't mentioned all of them. But this community is really closely tied to the 531 00:47:40,339 --> 00:47:46,059 innovations that are happening in the space of private chat. And this is the 532 00:47:46,059 --> 00:47:49,910 infrastructure that supports communities and is some of the most meaningful stuff 533 00:47:49,910 --> 00:47:55,959 you can possibly work on. And I encourage you to find new ones and look at a bunch 534 00:47:55,959 --> 00:48:00,029 of them and think about the tradeoffs and encourage friends to play with new 535 00:48:00,029 --> 00:48:03,650 systems, because that's how they gain adoption and how people figure out what 536 00:48:03,650 --> 00:48:09,710 mechanisms do and don't work. So with that, I will take questions. 537 00:48:09,710 --> 00:48:17,698 *Applause* 538 00:48:17,698 --> 00:48:21,379 Herald: Wasn't necessary to encourage you to come with an applause. There are 539 00:48:21,379 --> 00:48:25,130 microphones that are numbered in the room, so if you start lining up behind the 540 00:48:25,130 --> 00:48:29,709 microphones, then we can take your questions. We already have a question from 541 00:48:29,709 --> 00:48:36,500 the Internet. Question: Popularity and independency are 542 00:48:36,500 --> 00:48:42,630 a contradiction. How can I be sure that an increasingly popular messenger like Signal 543 00:48:42,630 --> 00:48:50,959 stays independent? Answer: I guess I would question whether 544 00:48:50,959 --> 00:48:57,720 independence is a goal in and of itself. It's true that the value is increasing. 545 00:48:57,720 --> 00:49:03,449 And so one of the things I think about is, is using systems that have open protocols 546 00:49:03,449 --> 00:49:07,289 or that are federated or otherwise not centralized. And again, this is reducing 547 00:49:07,289 --> 00:49:13,400 that need to have confidence in the future business model of single legal entity. 548 00:49:13,400 --> 00:49:20,539 But I don't know if independence is of the company is the thing that you're 549 00:49:20,539 --> 00:49:25,279 trying to trade off with popularity. Herald: Well, and we have questions at the 550 00:49:25,279 --> 00:49:27,630 microphones. We'll start a microphone, number one. 551 00:49:27,630 --> 00:49:33,839 Question: Thanks for the talk. First of all, we talked to you talked a lot about 552 00:49:33,839 --> 00:49:40,739 content and encryption. What about the initial problem? History shows that if I'm 553 00:49:40,739 --> 00:49:47,229 an individual already observed in a sensitive area, that might no need to 554 00:49:47,229 --> 00:49:52,750 encrypt or decrypt the message on sending. It's already identified. I'm sending at a 555 00:49:52,750 --> 00:49:58,880 specific location at a specific time. Is there any chance to hide that or do 556 00:49:58,880 --> 00:50:02,769 something against it? Answer: So make things hidden again after 557 00:50:02,769 --> 00:50:13,069 the fact? That seems very hard. I mean, so. So there's a couple thoughts there, 558 00:50:13,069 --> 00:50:20,769 maybe. There's sort of this real world intersection attack, which is if 559 00:50:20,769 --> 00:50:25,230 there's a real world observable action of who actually shows up at the protest, 560 00:50:25,230 --> 00:50:29,239 that's a pretty good way to figure out who is chatting about the protests beforehand, 561 00:50:29,239 --> 00:50:37,299 potentially. And so, I mean, I think what we've seen in real world organizing is 562 00:50:37,299 --> 00:50:42,170 things like either really decentralizing that, where it happens across a lot of 563 00:50:42,170 --> 00:50:46,119 platforms, and happens very spontaneously close to the event. So there's not enough 564 00:50:46,119 --> 00:50:55,740 time to respond in advance or using or hiding your presence or otherwise trying 565 00:50:55,740 --> 00:51:01,039 to stagger your actual actions so that they are harder to correlate to a specific 566 00:51:01,039 --> 00:51:06,849 group. But it's not something the chat systems are talking about. I don't think. 567 00:51:06,849 --> 00:51:10,890 Herald: We have time for more questions. So please line up in the microphones and 568 00:51:10,890 --> 00:51:15,510 if you're leaving, then leave quietly. We have a question from microphone number 4. 569 00:51:15,510 --> 00:51:18,690 Question: So if network actress 570 00:51:18,690 --> 00:51:23,509 translation is the original sin to the end to end principle, and due to that, we now 571 00:51:23,509 --> 00:51:31,309 have to run servers, someone has to pay for it. Do you know any solution to that 572 00:51:31,309 --> 00:51:38,130 economic problem? Answer: I mean, we had to pay for things 573 00:51:38,130 --> 00:51:42,609 even without network address translation, but we could move more of that cost to end 574 00:51:42,609 --> 00:51:49,829 users. And so we have another opportunity with IP v six to potentially keep more of 575 00:51:49,829 --> 00:51:53,539 the cost with end users or develop protocols that are more decentralized 576 00:51:53,539 --> 00:52:00,440 where that cost stays more fairly distributed. You know, our phones have a 577 00:52:00,440 --> 00:52:05,279 huge amount of computation power and figuring out how we make our protocols so 578 00:52:05,279 --> 00:52:13,339 that work happens there is, I think, an ongoing balance. I think some of the 579 00:52:13,339 --> 00:52:18,349 reasons why network address translation or centralization is so common is because 580 00:52:18,349 --> 00:52:22,849 distribute systems are pretty hard to build and pretty hard to gain confidence 581 00:52:22,849 --> 00:52:29,739 in. So more tools around how we can test and feel like we understand and that the 582 00:52:29,739 --> 00:52:35,130 system actually is, you know, going to work 99.9% of the time for distributed 583 00:52:35,130 --> 00:52:38,709 systems is going to make people less wary of working with them. 584 00:52:38,709 --> 00:52:42,779 So better tools on distribute systems is maybe the best answer. 585 00:52:42,779 --> 00:52:48,180 Herald: We also have another question from the internet, which we'll take now. 586 00:52:48,180 --> 00:52:53,299 Question: What do you think of technical novices, acceptance and dealing with OTR 587 00:52:53,299 --> 00:52:58,930 keys, for example, Matrix Riot? Most people I know just click "I verified this 588 00:52:58,930 --> 00:53:03,419 key" even if they didn't. Anwer: Absolutely. So this, I think 589 00:53:03,419 --> 00:53:07,550 goes back to a lot of these problems are sort of a user experience tradeoff, which 590 00:53:07,550 --> 00:53:14,160 is, you know, we saw initial versions of Signal where you would actually try and 591 00:53:14,160 --> 00:53:19,470 regularly verify some QR code between each and then that sort of has gotten pushed 592 00:53:19,470 --> 00:53:24,499 back to a harder to access part of the user interface because not many people 593 00:53:24,499 --> 00:53:29,120 wanted to deal with that. And an early matrix riot you would get a lot of 594 00:53:29,120 --> 00:53:33,059 warnings about: There's a new device. Do you want to verify this new device? Do you 595 00:53:33,059 --> 00:53:37,209 only want to send to the previous devices that you trusted. And now you're getting 596 00:53:37,209 --> 00:53:41,739 the ability to sort of more automatically just sort of accept these changes and 597 00:53:41,739 --> 00:53:45,429 you're weakening some amount of the encryption security, but you're getting a 598 00:53:45,429 --> 00:53:49,299 better, smoother user interface because most users are just going to sort of click 599 00:53:49,299 --> 00:53:52,669 "yes" because they want to send the message. Right. And so there's this 600 00:53:52,669 --> 00:53:56,129 tradeoff: when you have built the protocols such that you are standing in 601 00:53:56,129 --> 00:54:00,140 the way of the person doing what they want to do. That's not really where you want to 602 00:54:00,140 --> 00:54:06,369 put that friction. So figuring out other ways where you can have this on the side 603 00:54:06,369 --> 00:54:12,959 or supporting the communication rather than hindering it is probably the types of 604 00:54:12,959 --> 00:54:16,889 user interfaces or systems that we should be thinking about that can be successful. 605 00:54:16,889 --> 00:54:20,169 Herald: We have a couple of more questions. We'll start at microphone 606 00:54:20,169 --> 00:54:23,820 number 3. Question: Thank you for your talk. You 607 00:54:23,820 --> 00:54:28,970 talked about deniability by sending the private key with the last message. 608 00:54:28,970 --> 00:54:34,339 And how I you get the private key for the last message in the whole conversation 609 00:54:34,339 --> 00:54:45,119 Anwer: In the OTR, XMPP, Jabber systems there would be an explicit action to end 610 00:54:45,119 --> 00:54:50,410 the conversation that would then make it repudiateable that would that would send 611 00:54:50,410 --> 00:54:55,970 that final message to to close it. What you have in things like Signal is it's 612 00:54:55,970 --> 00:54:59,549 actually happening every message as part of the confirmation of the message. 613 00:54:59,549 --> 00:55:03,329 Question: OK. Thank you. Herald: We still probably have questions 614 00:55:03,329 --> 00:55:07,439 , time for more questions. So please line up if you have any. Don't hold back. 615 00:55:07,439 --> 00:55:09,549 We have a question from microphone number 7. 616 00:55:09,549 --> 00:55:14,269 Question: So, first of all, a brief comment. The riot thing still doesn't even 617 00:55:14,269 --> 00:55:19,880 do tofu. They they haven't figured this out. But I think there's a 618 00:55:19,880 --> 00:55:24,760 much more subtle conversation that needs to happen around deniability, because most 619 00:55:24,760 --> 00:55:31,489 of the time, if you have people with with a power imbalance, the non repudiatable 620 00:55:31,489 --> 00:55:36,660 conversation actually benefits the weaker person. So we actually don't want 621 00:55:36,660 --> 00:55:42,729 deniability in most of our chat applications or whatever, except that's 622 00:55:42,729 --> 00:55:47,390 still more subtle than that, because when you have people with equal power, maybe 623 00:55:47,390 --> 00:55:54,609 you do. It's kind of weird. Anwer: Absolutely. And I guess the other 624 00:55:54,609 --> 00:55:58,759 part of that is, is that something that should be shown to users and is that a 625 00:55:58,759 --> 00:56:03,259 concept? Is there a way that you express that notion in a way that users can 626 00:56:03,259 --> 00:56:07,910 understand it and make good choices? Or is it just something that your system makes a 627 00:56:07,910 --> 00:56:13,270 choice on for all of your users? Herald: We have one more question. 628 00:56:13,270 --> 00:56:17,229 Microphone number seven, please line up if you have any more. We still have a couple 629 00:56:17,229 --> 00:56:19,559 of more minutes. Microphone number seven, please. 630 00:56:19,559 --> 00:56:23,309 Question: Hi, Thanks for the talk. You talked about the private information 631 00:56:23,309 --> 00:56:30,979 retrieval and how that would stop the server from knowing who retrieved the 632 00:56:30,979 --> 00:56:36,469 message. But for me, the question is, how do I find out in the first place which 633 00:56:36,469 --> 00:56:44,140 message is for me? Because if he, for example, always use message slot 14, then 634 00:56:44,140 --> 00:56:53,589 obviously over a conversation, it would again be possible to deanonymize the users 635 00:56:53,589 --> 00:56:58,819 in like, OK, they always accessing this one in like all those queries. 636 00:56:58,819 --> 00:57:06,749 Answer: Absolutely. So I didn't explain that part. The trick is that between the 637 00:57:06,749 --> 00:57:13,069 two people, we will share some secret, which is our conversation secret. And what 638 00:57:13,069 --> 00:57:16,569 we will use that conversation secret for is to seed a pseudo random number 639 00:57:16,569 --> 00:57:20,900 generator. And so we will be able to generate the same stream of random 640 00:57:20,900 --> 00:57:27,519 numbers. And so each next message will go at the place determined by the next item 641 00:57:27,519 --> 00:57:32,550 in that random number generator. And so now the person writing can just write out 642 00:57:32,550 --> 00:57:36,119 random places as far as the server tells and when it wants to write the next 643 00:57:36,119 --> 00:57:40,869 message in this conversation, it'll make sure to write at that next place 644 00:57:40,869 --> 00:57:46,600 in its a random number generator for that conversation. There is a paper that will 645 00:57:46,600 --> 00:57:50,130 describe a bunch more of that system. But that's the basic sketch. 646 00:57:50,130 --> 00:57:53,819 A: Thank you. H: we have a question from the Internet. 647 00:57:53,819 --> 00:57:58,699 Question: It seems like identity is the weak point of the new breed of messaging 648 00:57:58,699 --> 00:58:02,979 apps. How do we solve this part of Zooko's triangle, the need for 649 00:58:02,979 --> 00:58:07,680 identifiers and to find people? Answer: Identity is hard, and I think 650 00:58:07,680 --> 00:58:18,279 identity has always been hard and will continue to be hard. Having a variety of 651 00:58:18,279 --> 00:58:23,420 ways to be identified, I think remains important and is why there isn't a single 652 00:58:23,420 --> 00:58:26,950 winner takes all system that we use for chat. But rather you have a lot of 653 00:58:26,950 --> 00:58:30,720 different chat protocols that you use for different circles and different social 654 00:58:30,720 --> 00:58:34,779 circles that you find yourself in. And part of that is our desire to not be 655 00:58:34,779 --> 00:58:38,920 confined to a single identity, but to be able to have different facets to our 656 00:58:38,920 --> 00:58:44,539 personalities. There are systems where you can identify yourself with a unique 657 00:58:44,539 --> 00:58:48,449 identifier to each person you talk to rather than having a single identity 658 00:58:48,449 --> 00:58:53,890 within the system. So that's something else that Pond would use. Was that the 659 00:58:53,890 --> 00:58:57,989 identifier that you gave out to each separate friend was different. And so 660 00:58:57,989 --> 00:59:03,710 you would appear as a totally separate user to each of them. It turns out that's 661 00:59:03,710 --> 00:59:10,239 at the same time very difficult, because if I post an identifier publicly, suddenly 662 00:59:10,239 --> 00:59:14,780 that identifier is now linked to me for everyone who uses that identifier. So you 663 00:59:14,780 --> 00:59:18,309 have to give these out privately in a one on one setting, which limits your 664 00:59:18,309 --> 00:59:22,909 discoverability. So that that concept of how we deal with identities I think is 665 00:59:22,909 --> 00:59:26,679 inherently messy and inherently something that there's not going to be something 666 00:59:26,679 --> 00:59:31,859 satisfying that solves. Herald: And that was the final question 667 00:59:31,859 --> 00:59:35,339 concluding this talk. Please give a big round of applause for Will Scott. 668 00:59:35,339 --> 00:59:36,089 Will: Thank you 669 00:59:36,090 --> 00:59:40,862 *Postroll music* 670 00:59:40,862 --> 01:00:04,000 subtitles created by c3subtitles.de in the year 2019. Join, and help us!