1 00:00:19,509 --> 00:00:25,998 Herald: Vincezo Izzo is entrepreneur and investor with a focus on cybersecurity. He 2 00:00:25,998 --> 00:00:32,180 has started up, gotten bought, and repeated this a few times, and now he is 3 00:00:32,180 --> 00:00:37,280 an advisor who advises people on starting up companies, getting bought, and 4 00:00:37,280 --> 00:00:45,489 repeating that. He is also director at CrowdStrike and an associate at MIT Media 5 00:00:45,489 --> 00:00:54,489 Lab. Just checking the time to make sure that 6 00:00:54,489 --> 00:01:00,129 we start on time, and this is, start talking now. On the scale of cyber 7 00:01:00,129 --> 00:01:03,812 security. Please give a warm welcome to Vincenzo. 8 00:01:03,812 --> 00:01:09,004 *Applause* 9 00:01:09,004 --> 00:01:14,290 Vincenzo Izzo: So hi, everyone, thanks for being here. As Karen said, I have made a 10 00:01:14,290 --> 00:01:19,080 few changes to my career, but my background is originally technical, and 11 00:01:19,080 --> 00:01:24,530 what I wanted to do today is to talk about a trend that I think we sort of take for 12 00:01:24,530 --> 00:01:32,150 granted and it's to some extent obvious but also underappreciated. And that is the 13 00:01:32,150 --> 00:01:39,705 cloud scale in security. Specifically, when I say cloud scale, what I mean is the 14 00:01:39,705 --> 00:01:47,110 ability to process very large amounts of data as well as spawn computing power with 15 00:01:47,110 --> 00:01:55,258 ease, and how that has played a role in our industry in the past decade or so. But 16 00:01:55,258 --> 00:02:01,060 before I talk about that, I think some context is important. So I joined the 17 00:02:01,060 --> 00:02:05,950 industry about 15 years ago and back in the days, even even a place like the 18 00:02:05,950 --> 00:02:12,640 Congress was a much smaller place. It was to some extent cozier and the community 19 00:02:12,640 --> 00:02:19,670 was tiny. The industry was fairly niche. And then something happened around 2010. 20 00:02:19,670 --> 00:02:23,980 People realized that there were more and more state sponsored attacks being carried 21 00:02:23,980 --> 00:02:31,560 out. From Operation Aurora against Google, to the Mandiant report APT1, that was the 22 00:02:31,560 --> 00:02:39,060 first reported document how the Chinese PLA was hacking west- let's call it the 23 00:02:39,060 --> 00:02:47,150 western world infrastructure for IP theft. And that changed a lot for the 24 00:02:47,150 --> 00:02:53,770 industry. There have been two significant changes because of all of this attention. 25 00:02:53,770 --> 00:02:59,000 The first one is notoriety. We went from being, as I said, a relatively unknown 26 00:02:59,000 --> 00:03:04,920 industry to something that everyone talk about. If you open any kind of a 27 00:03:04,920 --> 00:03:09,880 newspaper, there's almost always an article on cybersecurity, boardrooms talk 28 00:03:09,880 --> 00:03:15,000 about cybersecurity... and in a sense, again, back when I joined, cybersecurity 29 00:03:15,000 --> 00:03:19,300 wasn't a thing. It used to be called infosec. And now very few people know what 30 00:03:19,300 --> 00:03:25,150 infosec even means. So notoriety is one thing, but notoriety is not the only thing 31 00:03:25,150 --> 00:03:29,099 that changed. The other thing that changed is the amount of money deployed in the 32 00:03:29,099 --> 00:03:37,120 sector. So, back in 2004, depending on the estimate you trust there, the total 33 00:03:37,120 --> 00:03:42,220 spending for cybersecurity was between three point five to ten billion dollars. 34 00:03:42,220 --> 00:03:48,750 Today's over 120 billion dollars. And so it kind of looks exponential. But the 35 00:03:48,750 --> 00:03:55,910 spending came with a almost... Like, a very significant change in the type of 36 00:03:55,910 --> 00:04:00,520 players there are in the industry today. So a lot of the traditional vendors that 37 00:04:00,520 --> 00:04:04,790 used to sell security software have kind of disappeared. And what you have today 38 00:04:04,790 --> 00:04:09,910 are two kinds of player largely. You have the big tech vendors. So you have 39 00:04:09,910 --> 00:04:13,709 companies like Google, Amazon, Apple and so on, and so forth, that have sort of 40 00:04:13,709 --> 00:04:17,341 decided to take security more seriously. Some of them are trying to monetize 41 00:04:17,341 --> 00:04:23,620 security. Others are trying to use it as a sort of like slogan to sell more phones. 42 00:04:23,620 --> 00:04:30,186 The other group of people or entities are large cloud-based security vendors. And 43 00:04:30,186 --> 00:04:34,629 what both groups have in common is that they're using more and more sort of like 44 00:04:34,629 --> 00:04:43,470 cloud-scale and cloud resources to try to tackle security problems. And so what I 45 00:04:43,470 --> 00:04:49,950 want to discuss today is from a somewhat technical perspective, our scale has made 46 00:04:49,950 --> 00:04:55,820 a significant impact in the way we approach problems, but also in the kind of 47 00:04:55,820 --> 00:05:00,520 people that we have in the industry today. So what I'm gonna do is to give you a few 48 00:05:00,520 --> 00:05:08,320 examples of the change that we've gone through. And one of the, I think one of 49 00:05:08,320 --> 00:05:12,669 the important things to keep in mind is that what scale has done, at least in the 50 00:05:12,669 --> 00:05:20,474 past decade, is it has given defense a significant edge over offense. It's not 51 00:05:20,474 --> 00:05:24,449 necessarily here to stay, but I think it's an important trend that it's somewhat 52 00:05:24,449 --> 00:05:32,150 overlooked. So let me start with endpoint security. So back in the 80s, a few people 53 00:05:32,150 --> 00:05:37,850 started to toy with this idea of IDS systems. And the idea behind an IDS system 54 00:05:37,850 --> 00:05:42,750 is pretty straightforward. You want to create a baseline benign behavior for a 55 00:05:42,750 --> 00:05:48,060 machine, and then if that machine starts to exhibit anomalous behavior, you would 56 00:05:48,060 --> 00:05:54,510 flag that as potentially malicious. This was the first paper published on a host 57 00:05:54,510 --> 00:05:59,860 based IDS systems. Now, the problem with host based IDS systems is that they never 58 00:05:59,860 --> 00:06:06,800 actually quite made it as a commercial product. And the reason for this... There 59 00:06:06,800 --> 00:06:11,229 were largely two reasons for this: The first one is that it was really hard to 60 00:06:11,229 --> 00:06:17,320 interpret results. So it was really hard to figure out: "Hey, here's an anomaly and 61 00:06:17,320 --> 00:06:21,509 this is why this anomaly might actually be a security incident." The second problem 62 00:06:21,509 --> 00:06:26,669 was, you had a lot of false positives and it was kind of hard to establish a benign 63 00:06:26,669 --> 00:06:31,830 baseline on a single machine, because you had a lot of variance on how an individual 64 00:06:31,830 --> 00:06:37,180 machine would behave. So what happened is that commercially we kind of got stuck 65 00:06:37,180 --> 00:06:43,910 with antivirus, antivirus vendors, and signatures for a very long time. Now, fast 66 00:06:43,910 --> 00:06:55,890 forward to 2013. As I mentioned, the APT1 report came out and AV companies actually 67 00:06:55,890 --> 00:07:02,310 admitted that they weren't that useful at detecting stuff like Stuxnet or Flame. And 68 00:07:02,310 --> 00:07:08,570 so there was kind of like a new kid on the block, and the buzzword name for it was 69 00:07:08,570 --> 00:07:13,760 EDR. So, endpoint detection and response. But when you strip EDR from like the 70 00:07:13,760 --> 00:07:20,080 marketing fluff, what EDR really is, is effectively host-based intrusion detection 71 00:07:20,080 --> 00:07:27,430 system at scale. So in other words, scale and ability to have cloud-scale has made 72 00:07:27,430 --> 00:07:33,530 IDS systems possible in two ways. The first one is that because you actually now 73 00:07:33,530 --> 00:07:38,310 have this sort of like data lake with a number of machines, you have much larger 74 00:07:38,310 --> 00:07:43,930 datasets to train and test detections on. What that means is, it's much easier to 75 00:07:43,930 --> 00:07:49,580 establish the benign baseline, and it's much easier to create proper detection, so 76 00:07:49,580 --> 00:07:54,779 they don't detect just malware, but also sort of like malware-less attacks. The 77 00:07:54,779 --> 00:08:01,370 other thing is that EDR vendors and also companies that have internal EDR systems 78 00:08:01,370 --> 00:08:05,630 have -to a large extent- economy of scale. And what that means is you can actually 79 00:08:05,630 --> 00:08:11,410 have a team of analysts that can create explanation and sort of an ontology to 80 00:08:11,410 --> 00:08:17,740 explain why a given detection may actually represent a security incident. On top of 81 00:08:17,740 --> 00:08:22,410 it, because you have those data lake, you are now able to mine that for a data to 82 00:08:22,410 --> 00:08:27,900 figure out new attack patterns that you weren't aware of in the past. So this in 83 00:08:27,900 --> 00:08:31,550 itself is a pretty significant achievement, because we finally managed to 84 00:08:31,550 --> 00:08:36,550 move away from signatures to something that works much better and is able to 85 00:08:36,550 --> 00:08:42,079 detect a broader range of attacks. But the other thing that EDR system solved, sort 86 00:08:42,079 --> 00:08:48,050 of like as a side effect, is the data sharing problem. So, if you've been around 87 00:08:48,050 --> 00:08:52,770 industry for a long time, there have been many attempts at sharing threat data 88 00:08:52,770 --> 00:08:57,870 across different entities and they all kind of failed because it was really hard 89 00:08:57,870 --> 00:09:04,290 to establish sort of like a protocol to share those data. But implicitly, what EDR 90 00:09:04,290 --> 00:09:12,810 has done, is to force people to share and collect threat intelligence data and just 91 00:09:12,810 --> 00:09:18,750 in general data from endpoints. And so now you have the vendors being the sort of 92 00:09:18,750 --> 00:09:24,310 implicitly trusted third party that can use that data to write detections that can 93 00:09:24,310 --> 00:09:29,870 be applied to all the systems, not just an individual company or any individual 94 00:09:29,870 --> 00:09:36,150 machine. And the result of that, the implication of that is that the meme that 95 00:09:36,150 --> 00:09:39,820 the attacker only needs to get it right once and the defender needs to get it 96 00:09:39,820 --> 00:09:43,770 right all the time is actually not that true anymore, because in the past you were 97 00:09:43,770 --> 00:09:48,570 in a situation where if you had an offensive infrastructure, whether it was 98 00:09:48,570 --> 00:09:54,500 servers, whether it was exploit chains, you could more often than not reuse them 99 00:09:54,500 --> 00:09:58,630 over and over again. Even if you had malware, all you had to do was to slightly 100 00:09:58,630 --> 00:10:05,440 mutate the sample and you would pass any kind of detection. But today that is not 101 00:10:05,440 --> 00:10:10,570 true anymore in most cases. If you get detected on one machine, all of the 102 00:10:10,570 --> 00:10:14,661 sudden, all of your offensive infrastructure has to be scrapped and you 103 00:10:14,661 --> 00:10:20,480 need to start from scratch. So this is the first example and I think in itself is 104 00:10:20,480 --> 00:10:26,790 quite significant. The second example that I want to talk about is fuzzing. And 105 00:10:26,790 --> 00:10:30,610 fuzzing is interesting also for another reason, which is it gives us a glimpse 106 00:10:30,610 --> 00:10:36,310 into what I think the future might look like. So as you're probably familiar, if 107 00:10:36,310 --> 00:10:40,240 you've done any apps like work in the past, Fuzzing has been sort of like a 108 00:10:40,240 --> 00:10:46,790 staple in the apps like arsenal for a very long time. But in the past, probably five 109 00:10:46,790 --> 00:10:51,990 years or so, fuzzing has gone through some kind of renaissance in the sense that two 110 00:10:51,990 --> 00:10:57,630 things have changed. Two things have improved massively. The first one is that 111 00:10:57,630 --> 00:11:05,180 we finally managed to find a better way to assess the fitness function that we use to 112 00:11:05,180 --> 00:11:10,360 guide fuzzing. So a few years ago, somebody called Michal Zalewski release a 113 00:11:10,360 --> 00:11:18,170 fuzzer called AFL, and one of the primary intuitions behind AFL was that you could 114 00:11:18,170 --> 00:11:23,399 actually instead of using code coverage to drive the fuzzer, you could use path 115 00:11:23,399 --> 00:11:28,640 coverage to drive the fuzzer and that turned fuzzing in a way more, you know, 116 00:11:28,640 --> 00:11:34,300 much more effective instrument to find bugs. But the second intuition that I 117 00:11:34,300 --> 00:11:39,620 think is even more important and that changed fuzzing significantly is the fact 118 00:11:39,620 --> 00:11:46,410 that as far as fuzzing is concerned, speed is more important than smarts. You know, 119 00:11:46,410 --> 00:11:52,850 in a way. And what I mean by this is that when you look at AFL, AFL as an example, 120 00:11:52,850 --> 00:11:57,959 is an extremely dumb fuzzer. It does stuff like byte flipping, bit flipping. It has 121 00:11:57,959 --> 00:12:04,080 very, very simple strategies for mutation. But what AFL does very well is, it's an 122 00:12:04,080 --> 00:12:09,970 extremely optimized piece of C code and it scales very well. And so you are in a 123 00:12:09,970 --> 00:12:15,279 situation where if you have a reasonably good server, where you can run AFL, you 124 00:12:15,279 --> 00:12:23,360 can synthesize a very complex file formats in very few iterations. And what I find 125 00:12:23,360 --> 00:12:27,689 that amazing is that this intuition doesn't apply just to file formats. This 126 00:12:27,689 --> 00:12:32,190 intuition applies to much more complicated state machines. So the other example that 127 00:12:32,190 --> 00:12:38,010 I want to talk about as far as fuzzing goes, is ClusterFuzz. ClusterFuzz is a 128 00:12:38,010 --> 00:12:45,810 fuzzing harness used by the Chrome team to find bugs in Chrome and ClusterFuzz has 129 00:12:45,810 --> 00:12:50,760 been around for about six years. In the span of six years ClusterFuzz found 130 00:12:50,760 --> 00:12:54,649 sixteen thousand bugs in Chrome alone, plus another eleven thousand bugs in a 131 00:12:54,649 --> 00:13:00,250 bunch of open source projects. If you compare ClusterFuzz with the second most 132 00:13:00,250 --> 00:13:06,940 successful fuzzer are out there for JavaScript engines, you'll find that the 133 00:13:06,940 --> 00:13:12,980 second fuzzer called jsfunfuzz found about six thousand bugs in the span of eight to 134 00:13:12,980 --> 00:13:18,570 nine years. And if you look at the code, the main difference between the two is not 135 00:13:18,570 --> 00:13:22,850 the mutation engine. The mutation engine is actually pretty similar. They don't... 136 00:13:22,850 --> 00:13:26,280 ClusterFuzz doesn't do anything particularly fancy, but what ClusterFuzz 137 00:13:26,280 --> 00:13:33,010 does very well is it scales massively. So ClusterFuzz today runs on about twenty 138 00:13:33,010 --> 00:13:38,920 five thousand cores. And so with fuzzing we're now at a stage where the bug churn 139 00:13:38,920 --> 00:13:45,410 is so high that defense again has an advantage compared to offense because it 140 00:13:45,410 --> 00:13:51,080 becomes much quicker to fix bugs than it becomes to fix exploit chains, which would 141 00:13:51,080 --> 00:13:56,680 have been unthinkable just a few years ago. The last example that I want to bring 142 00:13:56,680 --> 00:14:04,019 up is a slightly different one. So, a few months ago, the TAG team at Google found 143 00:14:04,019 --> 00:14:11,839 in the wild a server that was used for a watering hole attack, and it was thought 144 00:14:11,839 --> 00:14:17,118 that the server was used against Chinese Muslim dissidents. But what's interesting 145 00:14:17,118 --> 00:14:21,040 is that the way you would detect this kind of attack in the past was that you would 146 00:14:21,040 --> 00:14:26,920 have a compromised device and you would sort of like work backwards from there. 147 00:14:26,920 --> 00:14:32,250 You would try to figure out how the device got compromised. What's interesting is 148 00:14:32,250 --> 00:14:36,190 that the way they found the server was effectively to mine their local copy of 149 00:14:36,190 --> 00:14:43,370 the Internet. And so, again, this is another example of scale that gives them a 150 00:14:43,370 --> 00:14:49,920 significant advantage to defense versus offense. So, in all of these examples 151 00:14:49,920 --> 00:14:55,680 that I brought up, I think when you look deeper into them, you realise that it's 152 00:14:55,680 --> 00:14:59,680 not that the state of security has improved because we've necessarily got 153 00:14:59,680 --> 00:15:05,600 better at security. It's that it has improved because we got better at handling 154 00:15:05,600 --> 00:15:10,380 large amounts of data, storing large amounts of data and spawning computing 155 00:15:10,380 --> 00:15:18,780 power and resources quickly when needed. So, if that is true, then one of... the 156 00:15:18,780 --> 00:15:22,550 other thing to realise is that in many of these cases, when you look back at the 157 00:15:22,550 --> 00:15:29,360 examples that I brought up, it actually is the case that the problem at scale looks 158 00:15:29,360 --> 00:15:33,790 very different from the problem at a much smaller scale, and the solution as a 159 00:15:33,790 --> 00:15:39,270 result is very different. So I'm going to use a silly example to try to drive the 160 00:15:39,270 --> 00:15:45,190 point home. Let's say that your job is to audit this function. And so you need to 161 00:15:45,190 --> 00:15:49,279 find bugs and this function. In case you're not familiar with C code, the 162 00:15:49,279 --> 00:15:58,170 problem here is that you can overflow or underflow that buffer at your pleasure 163 00:15:58,170 --> 00:16:05,329 just by passing a random value for "pos". Now, if you were to manually audit this 164 00:16:05,329 --> 00:16:12,760 thing, or if your job was to audit this function, well, you could use... You 165 00:16:12,760 --> 00:16:18,010 would have many tools you could use. You could do manual code auditing. You could 166 00:16:18,010 --> 00:16:21,880 use a symbolic execution engine. You could use a fuzzer. You could use static 167 00:16:21,880 --> 00:16:27,910 analysis. And a lot of the solutions that are optimal for this case end up being 168 00:16:27,910 --> 00:16:32,389 completely useless, if now your task becomes to audit this function and this is 169 00:16:32,389 --> 00:16:39,010 because the state machine that this function implements is so complex that a 170 00:16:39,010 --> 00:16:44,890 lot of those tools don't scale to get here. Now, for a lot of the problems I've 171 00:16:44,890 --> 00:16:51,329 talked about it, we kind of face the same situation where the solution at scale and 172 00:16:51,329 --> 00:16:58,269 a problem of scale looks very different. And so one thing, one realization is that 173 00:16:58,269 --> 00:17:02,839 engineering skills today are actually more important than security skills in many 174 00:17:02,839 --> 00:17:09,240 ways. So when you look... when you think back at fuzzers like ClusterFuzz, or AFL, 175 00:17:09,240 --> 00:17:14,530 or again EDR tools, what matters there is not really any kind of security expertise. 176 00:17:14,530 --> 00:17:20,490 What matters there is the ability to design systems that scale arbitrarily 177 00:17:20,490 --> 00:17:26,260 well, in sort of like their backend, to design, to write code that is very 178 00:17:26,260 --> 00:17:32,799 performant and none of this has really much to do with traditional security 179 00:17:32,799 --> 00:17:38,529 skills. The other thing you realize is when you combine these two things is that 180 00:17:38,529 --> 00:17:47,129 a lot of what we consider research is happening in a different world to some 181 00:17:47,129 --> 00:17:52,139 extent. So, six years ago, about six years ago, I gave a talk at a conference called 182 00:17:52,139 --> 00:17:57,479 CCS and it's an academic conference. And basically what I... my message there was 183 00:17:57,479 --> 00:18:02,321 that if academia wanted to do research that was relevant to the industry, they 184 00:18:02,321 --> 00:18:07,489 had to talk to the industry more. And I think we are now reached the point where 185 00:18:07,489 --> 00:18:13,379 this is true for industry in the sense that if we want to still produce 186 00:18:13,379 --> 00:18:19,960 significant research at places like CCC, we are kind of in a bad spot because a lot 187 00:18:19,960 --> 00:18:25,549 of the innovation that is practical in the real world is happening very large... in 188 00:18:25,549 --> 00:18:30,930 very large environments that few of us have access to. And I'm going to talk a 189 00:18:30,930 --> 00:18:35,090 bit more about this in a second. But before I do, there is a question that I 190 00:18:35,090 --> 00:18:41,970 think is important to digress on a bit. And this is the question of: 191 00:18:41,970 --> 00:18:46,350 Have we changed significantly as an industry, are we are 192 00:18:46,350 --> 00:18:53,389 in sort of like a new age of the industry? And I think that if you were to split the 193 00:18:53,389 --> 00:18:58,780 industry in phases, we left the kind of like artisanal phase, the phase where what 194 00:18:58,780 --> 00:19:03,789 mattered the most was security knowledge. And we're now in a phase where we have 195 00:19:03,789 --> 00:19:08,710 this large scale expert systems that require significant more 196 00:19:08,710 --> 00:19:13,549 engineering skills, that they require security skills, but they still take input 197 00:19:13,549 --> 00:19:18,979 from kind of like security practitioners. And I think there is a question of: Is 198 00:19:18,979 --> 00:19:23,950 this it? Or is this the kind of like where the industry is going to stay, or is there 199 00:19:23,950 --> 00:19:31,499 more to come? I know better than to make predictions in security, 'cause most of 200 00:19:31,499 --> 00:19:36,200 the times they tend to be wrong, but I want to draw a parallel. And that parallel 201 00:19:36,200 --> 00:19:41,539 is with another industry, and it's Machine Learning. So, somebody called Rich Sutton 202 00:19:41,539 --> 00:19:45,760 who is one of the godfather of machine learning, wrote an essay called "The 203 00:19:45,760 --> 00:19:52,380 Bitter Truth". And in that essay, he reflects on many decades of machine 204 00:19:52,380 --> 00:19:58,090 learning work and what he says in the essay is that people tried for a very long 205 00:19:58,090 --> 00:20:02,149 time to embed knowledge in machine learning systems. The rationale was that 206 00:20:02,149 --> 00:20:06,220 if you could embed knowledge, you would have a smart... you could build smarter 207 00:20:06,220 --> 00:20:12,019 systems. But it turns out that what actually worked were things that scale 208 00:20:12,019 --> 00:20:18,239 arbitrarily well with more computational power, more storage capabilities. And so, 209 00:20:18,239 --> 00:20:23,049 what he realized was that what actually worked for machine learning was search and 210 00:20:23,049 --> 00:20:29,619 learning. And when you look at stuff like AlphaGo today, AlphaGo works not really 211 00:20:29,619 --> 00:20:36,190 because it has a lot of goal knowledge. It works because it has a lot of computing 212 00:20:36,190 --> 00:20:44,220 power. It has the ability to train itself faster and faster. And so there is a 213 00:20:44,220 --> 00:20:49,230 question of how much of this can potentially port to security. Obviously, 214 00:20:49,230 --> 00:20:53,140 security is a bit different, it's more adversarial in nature, so it's not quite 215 00:20:53,140 --> 00:20:58,340 the same thing. But I think we are... we have only scratched the surface of what 216 00:20:58,340 --> 00:21:04,789 can be done as far as reaching a newer level of automation where security 217 00:21:04,789 --> 00:21:09,960 knowledge will matter less and less. So I want to go back to the AFL example that I 218 00:21:09,960 --> 00:21:16,019 brought up earlier, because one way to think about AFL is to think about it as a 219 00:21:16,019 --> 00:21:22,619 reinforcement learning fuzzer. And what I mean by this... is in this slide, what AFL 220 00:21:22,619 --> 00:21:29,960 capable to do, was to take one single JPEG file and in the span of about twelve 221 00:21:29,960 --> 00:21:35,700 hundred days iteration, they were completely random dumb mutation. Go to 222 00:21:35,700 --> 00:21:40,909 another well-formed JPEG file. And when you think about it, this is an amazing 223 00:21:40,909 --> 00:21:47,749 achievement because there was no knowledge of the file format in AFL. And so we are 224 00:21:47,749 --> 00:21:52,830 in... we are now more and more building systems that do not require any kind of 225 00:21:52,830 --> 00:21:57,470 expert knowledge as far as security is concerned. The other example that I want 226 00:21:57,470 --> 00:22:01,799 to talk about is the Cyber Grand Challenge. So DARPA ... a few years ago 227 00:22:01,799 --> 00:22:04,809 started this competition called Cyber Grand Challenge, 228 00:22:04,809 --> 00:22:09,529 and the Idea behind cyber grand challenge was to try to answer the question of can 229 00:22:09,529 --> 00:22:14,230 you automagically do exploit generation and can you automatically do patch 230 00:22:14,230 --> 00:22:20,940 generation. And obviously they did it on some well toy environments. But if you 231 00:22:20,940 --> 00:22:24,859 talk today to anybody who does automatic export generation research, they'll tell 232 00:22:24,859 --> 00:22:30,509 you that we are probably five years away from being able to automatically 233 00:22:30,509 --> 00:22:35,991 synthesize non trivial exploits, which is an amazing achievement because if you 234 00:22:35,991 --> 00:22:40,659 asked anybody five years ago, most people, myself included, would tell you that 235 00:22:40,659 --> 00:22:45,850 time would not come anytime soon. The third example that I want to bring up is 236 00:22:45,850 --> 00:22:51,130 something called Amazon Macie, which is a new sort of service released by Amazon. 237 00:22:51,130 --> 00:22:56,309 And what it does is basically uses machine learning to try to automatically identify 238 00:22:56,309 --> 00:23:01,740 PII information and intellectual property in the data. You started with a AWS and 239 00:23:01,740 --> 00:23:07,950 then tried to give you a better sense of what happens to that data. So in all of 240 00:23:07,950 --> 00:23:11,809 these cases, when you think about them, again, it's a scenario where there is very 241 00:23:11,809 --> 00:23:21,120 little security expertise needed. What matters more is engineering skills. So 242 00:23:21,120 --> 00:23:28,299 everything I've said so far is reasonably positive for scale. Is a positive scale, 243 00:23:28,299 --> 00:23:34,429 it is a positive, sort of like case for scale. But I think that there is another 244 00:23:34,429 --> 00:23:41,649 side of scale that is worth touching on. And I think especially to this audience is 245 00:23:41,649 --> 00:23:48,779 important to think about. And the other side of scale is that scale breeds 246 00:23:48,779 --> 00:23:55,239 centralization. And so to the point I was making earlier about where, where is 247 00:23:55,239 --> 00:24:00,070 research happening, where is real word applicable research happening, and that 248 00:24:00,070 --> 00:24:08,129 happens increasingly in places like Amazon or Google or large security vendors or 249 00:24:08,129 --> 00:24:14,140 some intelligence agencies. And so what that means is the field, the barriers to 250 00:24:14,140 --> 00:24:20,399 entry to the field are are significantly higher. So I said earlier that I tried to 251 00:24:20,399 --> 00:24:24,962 join the industry about 15 years ago. Back then, I was still in high school. And one 252 00:24:24,962 --> 00:24:28,602 of the things that was cool about the industry for me was that as long as you 253 00:24:28,602 --> 00:24:33,519 had a reasonably decent internet connection and a laptop, you could 254 00:24:33,519 --> 00:24:39,669 contribute to the top of the industry. You could see what everyone was up to. You 255 00:24:39,669 --> 00:24:44,450 could do research that was relevant to what the what the industry was working on. 256 00:24:44,450 --> 00:24:49,450 But today, the same sort of like 15, 16 year old kid in high school would have a 257 00:24:49,450 --> 00:24:54,639 much harder time contributing to the industry. And so we are in a situation 258 00:24:54,639 --> 00:25:00,769 where... but because scale breeds centralization. We are in a situation 259 00:25:00,769 --> 00:25:06,029 where we will likely increase the barrier of entry to a point where if you want to 260 00:25:06,029 --> 00:25:11,279 contribute meaningfully to security, you will have to go through a very 261 00:25:11,279 --> 00:25:16,070 standardized path where you probably do computer science and then you go work for 262 00:25:16,070 --> 00:25:25,890 a big tech company. And that's not necessarily a positive. So I think the 263 00:25:25,890 --> 00:25:31,309 same Kranzberg principle applies to scale in a sense, where it has done a lot of 264 00:25:31,309 --> 00:25:37,539 positive things for the sector, but it also comes with some consequences. And if 265 00:25:37,539 --> 00:25:44,960 there is one takeaway from this talk that I would like you to have is to think 266 00:25:44,960 --> 00:25:51,639 about how much something that is pretty mundane that we take for granted in our 267 00:25:51,639 --> 00:25:56,590 day to day has changed the industry and how much that will probably contribute to 268 00:25:56,590 --> 00:26:00,479 the next phase of the industry. And not just from a technical standpoint. It's not 269 00:26:00,479 --> 00:26:04,157 that the solutions we use today are much different from what we used to use, 270 00:26:04,157 --> 00:26:08,349 but also from the kind of people that are part of the industry and the community. 271 00:26:08,349 --> 00:26:12,714 And that's all I had. Thank you for listening. 272 00:26:12,714 --> 00:26:24,379 *Applause* 273 00:26:24,379 --> 00:26:27,929 Herald: Thank you very much. We have time for questions. So if you have any 274 00:26:27,929 --> 00:26:31,740 questions for Vincenzo, please line up behind the microphones that are marked 275 00:26:31,740 --> 00:26:36,779 with numbers and I will give you a signal if you can ask a question. We also have 276 00:26:36,779 --> 00:26:41,169 our wonderful signal angels that have been keeping an eye on the Internet to see if 277 00:26:41,169 --> 00:26:46,820 there are any questions from either Twitter, Mastodon or IRC. Are there any 278 00:26:46,820 --> 00:26:53,159 questions from the Internet? We'll just have to mic fourth... microphone number 279 00:26:53,159 --> 00:26:57,849 nine to be turned on and then we'll have a question from the Internet for Vincenzo. 280 00:26:57,849 --> 00:27:01,419 And please don't be shy. Line up behind the microphone. Ask any questions. 281 00:27:01,419 --> 00:27:05,212 Signal Angel: Now it's on. But actually there are no questions from the Internet 282 00:27:05,212 --> 00:27:08,859 right now. Herald: There must be people in the room 283 00:27:08,859 --> 00:27:14,799 that have some questions. I cannot see anybody lining up. Do you have any advice 284 00:27:14,799 --> 00:27:18,789 for people that want to work on some security on scale? 285 00:27:18,789 --> 00:27:23,690 Vincenzo: I mean, I just had to think a lot of the interesting research is 286 00:27:23,690 --> 00:27:28,499 happening more and more like tech companies and similar. And so as much as 287 00:27:28,499 --> 00:27:34,500 it pains me. It's probably the advice to think either whether you can find other 288 00:27:34,500 --> 00:27:40,080 ways to get access to large amounts of data or and computational power or maybe 289 00:27:40,080 --> 00:27:45,330 consideresting into one of those places. Herald: And we now actually have questions 290 00:27:45,330 --> 00:27:51,302 at microphone number one. Microphone 1: Can you hear me? Yeah. Thank 291 00:27:51,302 --> 00:27:55,200 you for the great talk. You're making a very strong case that information at scale 292 00:27:55,200 --> 00:27:59,619 has benefited security, but is that also statistical evidence for that? 293 00:27:59,619 --> 00:28:04,999 Vincenzo: So I think, well, it's a bit hard to answer the question because a lot 294 00:28:04,999 --> 00:28:09,570 of the people that have an incentive to answer that question are also kind of 295 00:28:09,570 --> 00:28:17,200 biased, but I think when you look at metrics like well, time in terms of how 296 00:28:17,200 --> 00:28:22,049 much time people spend on attackers machine, that has decreased significantly 297 00:28:22,049 --> 00:28:28,840 like it, it has statistically decreased significantly. As far as the other 298 00:28:28,840 --> 00:28:33,330 examples I brought up, like fuzzing and similar. I don't think I as far as I'm 299 00:28:33,330 --> 00:28:42,830 aware, there hasn't been any sort of rigorous study around where now we are. 300 00:28:42,830 --> 00:28:50,809 We've reached the place where defense has kind of like an edge against offense. But 301 00:28:50,809 --> 00:28:56,509 I think if I talk to anybody who has kind of like some offensive security knowledge 302 00:28:56,509 --> 00:29:04,830 or they did work in offense, the overall feedback that I hear is that it's becoming 303 00:29:04,830 --> 00:29:12,339 much harder to keep bug chains alive for a long time. And this is in large part not 304 00:29:12,339 --> 00:29:18,339 really for for countermeasures. It's in large part because bugs keep churning. 305 00:29:18,339 --> 00:29:23,369 So there isn't a lot of statistical evidence, but from what I can 306 00:29:23,369 --> 00:29:28,649 gather, it seems to be the case. Herald: We have one more question from 307 00:29:28,649 --> 00:29:32,179 microphone number one. Microphone 1: So thank you for the 308 00:29:32,179 --> 00:29:36,190 interesting talk. My question goes in the direction of the centralization that you 309 00:29:36,190 --> 00:29:39,570 mentioned, that the large like the hyperscalers are converging to be the 310 00:29:39,570 --> 00:29:43,919 hotspots for security research. So is there any guidance you can give for us as 311 00:29:43,919 --> 00:29:49,908 a community how to to retain access to the field and contribute? 312 00:29:49,908 --> 00:29:53,119 Vincenzo: Yes. So. So I think it's an interesting situation 313 00:29:53,119 --> 00:29:56,869 because more and more there are open source tools that 314 00:29:56,869 --> 00:30:01,549 allow you to gather the data. But the problem with these data gathering 315 00:30:01,549 --> 00:30:06,369 exercises is not too much how to gather the data. The problem is what to gather 316 00:30:06,369 --> 00:30:11,999 and how to keep it. Because when you look at the cloud bill, for most 317 00:30:11,999 --> 00:30:17,216 players, it's extraordinarily high. And I don't unfortunately, I don't have an 318 00:30:17,216 --> 00:30:23,099 easy solution to that. I mean, you can use pretty cheap cloud providers, but 319 00:30:23,099 --> 00:30:29,159 it's still like, the expenditure is still an order of magnitude higher than it used 320 00:30:29,159 --> 00:30:34,580 to be. And I don't know, maybe academia can step up. I'm not sure. 321 00:30:34,580 --> 00:30:38,669 Herald: We have one last question from the Internet. And you can stay at the 322 00:30:38,669 --> 00:30:41,369 microphone if you have another question for Vincenzo. 323 00:30:41,369 --> 00:30:45,519 Signal: Yes. The Internet asked that. You ask a lot about fuzzing at scale about 324 00:30:45,519 --> 00:30:51,409 besides OSS-Fuzz, are you aware of any other scaled large fuzzing infrastructure? 325 00:30:51,409 --> 00:30:58,349 Vincenzo: That is publicly available? No. But when you look at, I mean when you 326 00:30:58,349 --> 00:31:03,630 look, for instance, of the participants for Cyber Grand Challenge, a lot of them 327 00:31:03,630 --> 00:31:13,389 were effectively using a significant amount of CPU power for fuzzing. So I'm 328 00:31:13,389 --> 00:31:17,409 not aware of any kind of like plug and play fuzzing infrastructure you can use 329 00:31:17,409 --> 00:31:25,539 aside from OSS-Fuzz. But there is a law, like as far as I'm aware, everyone there 330 00:31:25,539 --> 00:31:33,110 that does fuzzing for a living has now access to significant resources and tries 331 00:31:33,110 --> 00:31:38,719 to scale fuzzing infrastructure. Herald: If we don't have any more 332 00:31:38,719 --> 00:31:42,749 questions, this is your last chance to run to a microphone or write a question on the 333 00:31:42,749 --> 00:31:46,929 Internet. Then I think we should give a big round of applause to Vincenzo. 334 00:31:46,929 --> 00:31:48,449 Vincenzo: Thank you. 335 00:31:48,449 --> 00:31:52,540 *Applause* 336 00:31:52,540 --> 00:32:19,000 subtitles created by c3subtitles.de in the year 2019. Join, and help us!