0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/942 Thanks! 1 00:00:16,050 --> 00:00:18,179 So C, C++ 2 00:00:18,180 --> 00:00:20,189 programmers kind of use typecasting, like 3 00:00:20,190 --> 00:00:22,139 we just go for a for a beer at a bar 4 00:00:23,670 --> 00:00:26,069 and they don't really seem to enforce, 5 00:00:26,070 --> 00:00:28,379 you know, the types very, very, 6 00:00:28,380 --> 00:00:30,209 you know, carefully at least a compilers 7 00:00:30,210 --> 00:00:32,279 don't. And memory access is also a 8 00:00:32,280 --> 00:00:33,280 bit of a problem. 9 00:00:34,110 --> 00:00:36,269 This leads to all sorts of issues 10 00:00:36,270 --> 00:00:38,879 like this, specifically 11 00:00:38,880 --> 00:00:41,969 type confusion and other vulnerabilities. 12 00:00:41,970 --> 00:00:44,099 Matteus Problem has come 13 00:00:44,100 --> 00:00:46,229 to us all the way from Purdue University, 14 00:00:46,230 --> 00:00:48,389 the other places, but just recently from 15 00:00:48,390 --> 00:00:50,579 Purdue University, to 16 00:00:50,580 --> 00:00:52,619 tell us all about a compiler base 17 00:00:52,620 --> 00:00:55,049 extension that he's been working on that 18 00:00:55,050 --> 00:00:56,669 should be able to detect and invert some 19 00:00:56,670 --> 00:00:58,859 of these bugs. And with that, it's 20 00:00:58,860 --> 00:00:59,909 over to Matyas. 21 00:01:01,000 --> 00:01:02,020 Pretty introduction. 22 00:01:10,020 --> 00:01:11,939 And different forms of type safety 23 00:01:11,940 --> 00:01:14,279 vulnerabilities that can be exploited 24 00:01:14,280 --> 00:01:16,679 by our attackers 25 00:01:16,680 --> 00:01:18,779 to gain some form of code execution, 26 00:01:18,780 --> 00:01:21,479 and we've worked for quite a bit on 27 00:01:21,480 --> 00:01:23,909 looking at the type hierarchy 28 00:01:23,910 --> 00:01:25,469 of C++ and different forms of 29 00:01:25,470 --> 00:01:26,939 vulnerabilities that can arise that can 30 00:01:26,940 --> 00:01:29,129 then be exploited in different forms. 31 00:01:29,130 --> 00:01:31,589 And it's as if 32 00:01:31,590 --> 00:01:32,969 just just quick show of hands. 33 00:01:32,970 --> 00:01:34,920 Who here is a C++ programmer? 34 00:01:36,100 --> 00:01:38,169 Wow, pretty much every one of you who has 35 00:01:38,170 --> 00:01:39,999 written more than, let's say, 10000 lines 36 00:01:40,000 --> 00:01:41,000 of C++ code. 37 00:01:42,510 --> 00:01:44,579 A lot of you. OK, so that's a good 38 00:01:44,580 --> 00:01:46,649 setting and a C++ programmers, 39 00:01:46,650 --> 00:01:48,029 you are likely used to the different 40 00:01:48,030 --> 00:01:49,589 forms of typecasting that you have. 41 00:01:49,590 --> 00:01:51,209 You have because you have dynamic costs. 42 00:01:51,210 --> 00:01:52,949 And if a bunch of other different costs 43 00:01:52,950 --> 00:01:55,169 and amazingly, C++ has no 44 00:01:55,170 --> 00:01:56,969 form of type safety whatsoever. 45 00:01:56,970 --> 00:01:59,369 So this talk could also be 46 00:01:59,370 --> 00:02:01,559 described as cloudy with a chance 47 00:02:01,560 --> 00:02:03,719 of calculators, as we'll see 48 00:02:03,720 --> 00:02:04,679 in a bit. 49 00:02:04,680 --> 00:02:06,149 There's a lot of opportunity that an 50 00:02:06,150 --> 00:02:08,309 attacker can get to exploit these 51 00:02:08,310 --> 00:02:10,379 different forms of, uh, 52 00:02:10,380 --> 00:02:12,509 of type confusion and 53 00:02:12,510 --> 00:02:14,699 type confusion leads to remote code 54 00:02:14,700 --> 00:02:17,249 execution, which an adversary 55 00:02:17,250 --> 00:02:18,719 can abuse the different forms of type 56 00:02:18,720 --> 00:02:20,759 confusion and the different settings in a 57 00:02:20,760 --> 00:02:22,709 type hierarchy to pretty much execute 58 00:02:22,710 --> 00:02:24,829 arbitrary code on your on your system. 59 00:02:24,830 --> 00:02:27,029 And, um, especially 60 00:02:27,030 --> 00:02:29,699 browsers and hypervisor 61 00:02:29,700 --> 00:02:32,459 and colonels are great targets to 62 00:02:32,460 --> 00:02:34,119 find different forms of type confusion 63 00:02:34,120 --> 00:02:35,999 and then exploit them, as was recently 64 00:02:36,000 --> 00:02:38,729 shown on the, uh, 65 00:02:38,730 --> 00:02:40,169 point to own competition. 66 00:02:40,170 --> 00:02:43,229 And we've seen a couple of these, um, 67 00:02:43,230 --> 00:02:45,719 type confusions that are used to 68 00:02:45,720 --> 00:02:48,209 spawn calculators all over the place, 69 00:02:48,210 --> 00:02:49,109 which are fun. 70 00:02:49,110 --> 00:02:50,579 So Calculator pretty much shows you that 71 00:02:50,580 --> 00:02:52,649 you get arbitrary code execution if you 72 00:02:52,650 --> 00:02:53,879 can spawn that. 73 00:02:53,880 --> 00:02:56,039 And as we see if you think about 74 00:02:56,040 --> 00:02:58,499 it, um, the attack surface 75 00:02:58,500 --> 00:03:00,629 that we face on our systems is pretty 76 00:03:00,630 --> 00:03:01,529 much huge. 77 00:03:01,530 --> 00:03:03,899 We are no longer working with 78 00:03:03,900 --> 00:03:05,759 our systems of a couple of thousand lines 79 00:03:05,760 --> 00:03:07,859 of code, but with millions and millions 80 00:03:07,860 --> 00:03:09,899 of lines of code, the abstraction is 81 00:03:09,900 --> 00:03:12,059 immense. And if you if you look at Google 82 00:03:12,060 --> 00:03:14,219 Chrome, for example, um, we have 83 00:03:14,220 --> 00:03:15,689 more than one hundred million lines of 84 00:03:15,690 --> 00:03:17,999 code, which is an immense 85 00:03:18,000 --> 00:03:20,189 code base that we are facing. 86 00:03:20,190 --> 00:03:22,349 And it's very hard to protect against 87 00:03:22,350 --> 00:03:24,119 vulnerabilities in his large code base. 88 00:03:24,120 --> 00:03:26,279 And even though the, uh, 89 00:03:26,280 --> 00:03:28,529 folks at Google are doing an awesome job 90 00:03:28,530 --> 00:03:31,469 at code reviews, figuring out, uh, 91 00:03:31,470 --> 00:03:33,209 test test cases, checking all the 92 00:03:33,210 --> 00:03:34,769 different conditions, there are still a 93 00:03:34,770 --> 00:03:36,629 large opportunity for different forms of 94 00:03:36,630 --> 00:03:38,759 type confusion in this large 95 00:03:38,760 --> 00:03:40,469 source space. And it's not just Google 96 00:03:40,470 --> 00:03:42,809 Chrome, the 76 million lines of code, 97 00:03:42,810 --> 00:03:45,179 but also a bunch of other systems. 98 00:03:45,180 --> 00:03:47,939 On top of that, there's your, 99 00:03:47,940 --> 00:03:50,249 uh, window managers, there's 100 00:03:50,250 --> 00:03:52,679 your, uh, standard library, 101 00:03:52,680 --> 00:03:54,029 there's the Linux kernel, there's a 102 00:03:54,030 --> 00:03:55,259 hypervisor and so on. 103 00:03:55,260 --> 00:03:57,359 And there's easily clocks into more than 104 00:03:57,360 --> 00:03:59,099 one hundred million lines of code. 105 00:03:59,100 --> 00:04:01,259 And there's easy easily 106 00:04:01,260 --> 00:04:03,359 there's a lot of opportunities for 107 00:04:03,360 --> 00:04:05,639 different forms of type confusion. 108 00:04:05,640 --> 00:04:07,739 Um, we will explore 109 00:04:07,740 --> 00:04:09,149 these different opportunities of type 110 00:04:09,150 --> 00:04:11,429 confusion. We will see how we can find 111 00:04:11,430 --> 00:04:13,529 time, confusion, vulnerabilities, how 112 00:04:13,530 --> 00:04:15,149 we can automate the search for type 113 00:04:15,150 --> 00:04:17,039 confusional abilities, and also will 114 00:04:17,040 --> 00:04:19,559 discuss what kind of 115 00:04:19,560 --> 00:04:21,749 capabilities an adversary can get through 116 00:04:21,750 --> 00:04:23,409 different forms of type query. 117 00:04:23,410 --> 00:04:24,569 So how can you exploited? 118 00:04:24,570 --> 00:04:26,559 What is the underlying attack vector? 119 00:04:26,560 --> 00:04:28,379 What are the attack primitives? 120 00:04:28,380 --> 00:04:30,299 How can you build attack primitives? 121 00:04:30,300 --> 00:04:32,459 And then in the end, how can how you can 122 00:04:32,460 --> 00:04:34,829 automated to figure out, uh, 123 00:04:34,830 --> 00:04:35,830 exploit the ability. 124 00:04:36,900 --> 00:04:38,879 Now the attacker model is as follows. 125 00:04:38,880 --> 00:04:41,009 We start off with, uh, an 126 00:04:41,010 --> 00:04:42,779 external user, for example, without any 127 00:04:42,780 --> 00:04:45,299 form of code execution capabilities. 128 00:04:45,300 --> 00:04:47,489 Um, you have to imagine is, uh, on on 129 00:04:47,490 --> 00:04:50,369 one end, you have, uh, a program that is 130 00:04:50,370 --> 00:04:52,109 answering to different forms of requests. 131 00:04:52,110 --> 00:04:53,069 You're sending in a request. 132 00:04:53,070 --> 00:04:54,329 You're getting a reply. 133 00:04:54,330 --> 00:04:56,639 This defines a some form of 134 00:04:56,640 --> 00:04:58,739 computational capabilities that you 135 00:04:58,740 --> 00:05:00,569 get on the other end. 136 00:05:00,570 --> 00:05:02,699 Right. So you can send in a request, you 137 00:05:02,700 --> 00:05:04,919 get a reply, and it 138 00:05:04,920 --> 00:05:07,109 conforms to some form of, uh, 139 00:05:07,110 --> 00:05:08,789 computation that you are allowed to 140 00:05:08,790 --> 00:05:10,889 execute your you are 141 00:05:10,890 --> 00:05:12,599 severely limited in what kind of 142 00:05:12,600 --> 00:05:13,979 computation you can execute. 143 00:05:13,980 --> 00:05:15,599 So, for example, if you're interacting 144 00:05:15,600 --> 00:05:16,829 with a Web server, you're sending in a 145 00:05:16,830 --> 00:05:19,079 request and you're supposed to get an 146 00:05:19,080 --> 00:05:21,149 HMO file as a response that you can 147 00:05:21,150 --> 00:05:23,369 then render and and look 148 00:05:23,370 --> 00:05:25,859 at. And an adversary tries to craft 149 00:05:25,860 --> 00:05:27,719 a request that is being sent in so that 150 00:05:27,720 --> 00:05:29,909 instead of an HTML, 151 00:05:29,910 --> 00:05:32,159 uh, document, you get a shell in 152 00:05:32,160 --> 00:05:34,709 return. And among several steps, 153 00:05:34,710 --> 00:05:36,779 you are extending your capabilities 154 00:05:36,780 --> 00:05:38,459 from an external user that is issuing 155 00:05:38,460 --> 00:05:40,829 requests to to retrieve 156 00:05:40,830 --> 00:05:42,929 documents to a local user and 157 00:05:42,930 --> 00:05:45,599 then to an administrator account. 158 00:05:45,600 --> 00:05:46,600 And these. 159 00:05:47,810 --> 00:05:50,269 These steps are being followed to extend 160 00:05:50,270 --> 00:05:52,429 the capabilities of an adversary step by 161 00:05:52,430 --> 00:05:54,729 step, and 162 00:05:54,730 --> 00:05:57,019 the nice thing is that an 163 00:05:57,020 --> 00:05:59,599 external adversary can easily trigger 164 00:05:59,600 --> 00:06:01,699 these these attacks by, 165 00:06:01,700 --> 00:06:03,859 uh, through very simple means by, 166 00:06:03,860 --> 00:06:06,239 uh, a simple, uh, simple request 167 00:06:06,240 --> 00:06:08,329 that is being sent on on current 168 00:06:08,330 --> 00:06:09,589 software. 169 00:06:09,590 --> 00:06:11,749 We're mostly focusing on contraflow, high 170 00:06:11,750 --> 00:06:14,509 tech attacks and as, 171 00:06:14,510 --> 00:06:16,609 uh, software security community, over 172 00:06:16,610 --> 00:06:18,679 the last 20 plus years, we 173 00:06:18,680 --> 00:06:20,059 have worked on a large amount of 174 00:06:20,060 --> 00:06:21,709 mitigations, different forms of 175 00:06:21,710 --> 00:06:23,779 mitigations that try to detect 176 00:06:23,780 --> 00:06:25,909 and exploit condition or some form of 177 00:06:25,910 --> 00:06:27,889 vulnerability that is being used. 178 00:06:27,890 --> 00:06:30,079 So given the attack 179 00:06:30,080 --> 00:06:32,059 first that I've shown before, it is 180 00:06:32,060 --> 00:06:34,429 highly likely that the large 181 00:06:34,430 --> 00:06:36,469 amount of code that we have will contain 182 00:06:36,470 --> 00:06:37,769 vulnerabilities. 183 00:06:37,770 --> 00:06:40,249 So we are working on mechanisms 184 00:06:40,250 --> 00:06:41,959 that protect the integrity and 185 00:06:41,960 --> 00:06:43,669 availability of our systems, even in the 186 00:06:43,670 --> 00:06:45,319 presence of vulnerabilities. 187 00:06:45,320 --> 00:06:46,399 So it's a very first step. 188 00:06:46,400 --> 00:06:48,379 We have to accept the fact that there 189 00:06:48,380 --> 00:06:50,149 will be bugs in our code. 190 00:06:50,150 --> 00:06:51,769 So all the defenses that we have, all the 191 00:06:51,770 --> 00:06:53,959 mitigations that we have focused on 192 00:06:53,960 --> 00:06:55,819 detecting this exploit condition and 193 00:06:55,820 --> 00:06:57,919 stopping an adversary from actually 194 00:06:57,920 --> 00:06:59,599 running a full exploit. 195 00:06:59,600 --> 00:07:01,849 And over the last 20 plus years, 196 00:07:01,850 --> 00:07:03,319 we've developed a set of different 197 00:07:03,320 --> 00:07:05,149 mitigations and defenses that make it 198 00:07:05,150 --> 00:07:07,699 harder and harder for adversaries 199 00:07:07,700 --> 00:07:10,219 to gain full code execution capabilities. 200 00:07:10,220 --> 00:07:12,349 So if you want to hijack the 201 00:07:12,350 --> 00:07:14,329 contraflow on a current system, you need 202 00:07:14,330 --> 00:07:15,829 to jump through a set of hoops to 203 00:07:15,830 --> 00:07:17,959 actually get code execution, which 204 00:07:17,960 --> 00:07:19,579 allows you to spawn a calculator or 205 00:07:19,580 --> 00:07:22,519 execute arbitrary order commands. 206 00:07:22,520 --> 00:07:24,589 So for a contraflow hatcheck attack, 207 00:07:24,590 --> 00:07:26,779 uh, what an adversary does is it 208 00:07:26,780 --> 00:07:28,999 influences the address 209 00:07:29,000 --> 00:07:31,429 space of an application 210 00:07:31,430 --> 00:07:33,739 of a process to readjust 211 00:07:33,740 --> 00:07:36,639 the different forms of, um, 212 00:07:36,640 --> 00:07:38,569 of code pointers, pointers and the data 213 00:07:38,570 --> 00:07:39,739 of the application. 214 00:07:39,740 --> 00:07:42,379 So that executes something different. 215 00:07:42,380 --> 00:07:44,509 So you mentioned originally the code is 216 00:07:44,510 --> 00:07:46,639 just the Web server, but instead 217 00:07:46,640 --> 00:07:49,099 of serving a Web document, you wanted 218 00:07:49,100 --> 00:07:50,479 to open a shell for you. 219 00:07:50,480 --> 00:07:51,979 That gives you full computational 220 00:07:51,980 --> 00:07:53,719 capabilities on that system so that you 221 00:07:53,720 --> 00:07:55,849 can interact with the system and then 222 00:07:55,850 --> 00:07:57,709 further escalate your privileges to an 223 00:07:57,710 --> 00:07:59,209 administrator. 224 00:07:59,210 --> 00:08:01,039 Now, with all the different mitigations 225 00:08:01,040 --> 00:08:02,659 that we have in our systems, we've 226 00:08:02,660 --> 00:08:04,429 severely restricted the set of 227 00:08:04,430 --> 00:08:06,619 capabilities that an adversary can 228 00:08:06,620 --> 00:08:08,659 have, even if there are vulnerabilities, 229 00:08:08,660 --> 00:08:10,099 memory safety vulnerabilities or type 230 00:08:10,100 --> 00:08:12,259 safety vulnerabilities in the code. 231 00:08:12,260 --> 00:08:14,719 And the slide here shows 232 00:08:14,720 --> 00:08:16,879 the outer space of the program in an 233 00:08:16,880 --> 00:08:17,779 abstract form. 234 00:08:17,780 --> 00:08:19,969 So we see that the code section 235 00:08:19,970 --> 00:08:22,129 is read only and 236 00:08:22,130 --> 00:08:23,329 executed only. 237 00:08:23,330 --> 00:08:25,459 So an 238 00:08:25,460 --> 00:08:27,589 adversary and this is the only 239 00:08:27,590 --> 00:08:29,569 section that is readable and executable 240 00:08:29,570 --> 00:08:31,459 so an adversary can no longer inject new 241 00:08:31,460 --> 00:08:33,319 code. And this is one of the defenses we 242 00:08:33,320 --> 00:08:34,459 came up with. 243 00:08:34,460 --> 00:08:36,649 In addition to that, there's, uh, there's 244 00:08:36,650 --> 00:08:38,869 a heap and the stack which are readable 245 00:08:38,870 --> 00:08:40,729 and readable, but not executable. 246 00:08:40,730 --> 00:08:42,288 So the only way an adversary can 247 00:08:42,289 --> 00:08:44,749 influence the program is by modifying 248 00:08:44,750 --> 00:08:46,879 the data and then reusing the existing 249 00:08:46,880 --> 00:08:47,880 code. 250 00:08:48,260 --> 00:08:50,359 So what we have here is we 251 00:08:50,360 --> 00:08:53,119 have a large amount of code pointers 252 00:08:53,120 --> 00:08:54,859 on the heap and on a stack, which then 253 00:08:54,860 --> 00:08:57,619 points to the code which to get 254 00:08:57,620 --> 00:08:59,839 code execution or to hijack 255 00:08:59,840 --> 00:09:01,909 the contraflow, an adversary can simply 256 00:09:01,910 --> 00:09:04,369 override these code pointers and redirect 257 00:09:04,370 --> 00:09:05,989 them to some alternate location. 258 00:09:05,990 --> 00:09:08,119 And this is precisely what is being done 259 00:09:08,120 --> 00:09:09,949 for contraflow, hatcheck attack. 260 00:09:09,950 --> 00:09:11,809 And this can either be for direct code 261 00:09:11,810 --> 00:09:13,879 pointers that point from the heap or to 262 00:09:13,880 --> 00:09:15,979 stack into the code, which in itself or 263 00:09:15,980 --> 00:09:18,259 we can also go through the tables, which 264 00:09:18,260 --> 00:09:20,539 is a way for C++ 265 00:09:20,540 --> 00:09:22,699 to handle, um, 266 00:09:22,700 --> 00:09:25,129 inheritance and virtual functions. 267 00:09:25,130 --> 00:09:26,809 So if you have a virtual function, you 268 00:09:26,810 --> 00:09:28,969 are, uh, allowing it to 269 00:09:28,970 --> 00:09:30,739 be overwritten depending on the class 270 00:09:30,740 --> 00:09:32,509 hierarchy. And if you are dispatching 271 00:09:32,510 --> 00:09:33,739 based on a virtual function, you're 272 00:09:33,740 --> 00:09:35,959 following the code point, a code 273 00:09:35,960 --> 00:09:38,179 pointer to a specific class 274 00:09:38,180 --> 00:09:39,409 based implementation. 275 00:09:41,280 --> 00:09:43,169 And we can influence all these different 276 00:09:43,170 --> 00:09:45,779 pointers as an attacker and then redirect 277 00:09:45,780 --> 00:09:47,339 and stitch together the existing code 278 00:09:47,340 --> 00:09:48,450 parts and alternate ways. 279 00:09:52,050 --> 00:09:54,479 As the show of hands showed before, 280 00:09:54,480 --> 00:09:56,879 all of you are already C++ 281 00:09:56,880 --> 00:09:59,099 programmers but are still quickly goes 282 00:09:59,100 --> 00:10:00,509 through the different forms of casting 283 00:10:00,510 --> 00:10:02,759 behavior. And even 284 00:10:02,760 --> 00:10:04,709 if you're if you've been using C++ for a 285 00:10:04,710 --> 00:10:06,809 while, you may not be aware of how 286 00:10:06,810 --> 00:10:08,939 the different casting operators 287 00:10:08,940 --> 00:10:11,189 actually boil down to the underlying 288 00:10:11,190 --> 00:10:13,019 code, how they are being compiled down 289 00:10:13,020 --> 00:10:15,269 into an actual application. 290 00:10:15,270 --> 00:10:17,039 So there's two main casting operations 291 00:10:17,040 --> 00:10:19,049 that we have to study costs and their 292 00:10:19,050 --> 00:10:20,579 dynamic costs. 293 00:10:20,580 --> 00:10:22,649 A static cost allows you to 294 00:10:22,650 --> 00:10:24,869 cost the an object 295 00:10:24,870 --> 00:10:26,999 or a pointer to an object into a 296 00:10:27,000 --> 00:10:28,649 different cost. 297 00:10:28,650 --> 00:10:30,749 The advantages a static 298 00:10:30,750 --> 00:10:32,999 cost is very, very fast. 299 00:10:33,000 --> 00:10:34,859 The disadvantage is it doesn't do 300 00:10:34,860 --> 00:10:37,499 anything except for doing a feasibility 301 00:10:37,500 --> 00:10:39,329 check at compile time. 302 00:10:39,330 --> 00:10:41,139 So what a static cost actually is. 303 00:10:41,140 --> 00:10:43,319 It's it tells the compiler, please 304 00:10:43,320 --> 00:10:45,569 check if there is a pass from 305 00:10:45,570 --> 00:10:47,789 the current type to the other 306 00:10:47,790 --> 00:10:49,529 type to the target type. 307 00:10:49,530 --> 00:10:51,179 And if there is any pause in the class 308 00:10:51,180 --> 00:10:53,249 hierarchy that, uh, 309 00:10:53,250 --> 00:10:55,379 that goes from the source type to the 310 00:10:55,380 --> 00:10:57,539 target type, then the cost is actually 311 00:10:57,540 --> 00:10:59,639 allowed. And, uh, 312 00:10:59,640 --> 00:11:01,319 it doesn't need any runtime information. 313 00:11:01,320 --> 00:11:03,989 It doesn't introduce any overhead, 314 00:11:03,990 --> 00:11:06,209 uh, which is which is great 315 00:11:06,210 --> 00:11:08,759 for, uh, performance, 316 00:11:08,760 --> 00:11:10,229 but it doesn't give you any security 317 00:11:10,230 --> 00:11:11,399 guarantees. 318 00:11:11,400 --> 00:11:13,709 A dynamic cost, on the other hand, 319 00:11:13,710 --> 00:11:16,289 executes an actual runtime check. 320 00:11:16,290 --> 00:11:18,389 So the cost is somewhat comparable to 321 00:11:18,390 --> 00:11:20,579 a cost in another, uh, programing 322 00:11:20,580 --> 00:11:22,229 language, like, for example, in Java or 323 00:11:22,230 --> 00:11:24,329 so on, when you're doing a type cost, 324 00:11:24,330 --> 00:11:26,729 it is actually enforced that the type 325 00:11:26,730 --> 00:11:29,039 that you're costing into the object, 326 00:11:29,040 --> 00:11:30,629 that you're costing a different type 327 00:11:30,630 --> 00:11:32,849 actually is of that order type so 328 00:11:32,850 --> 00:11:34,199 that the cost is allowed. 329 00:11:34,200 --> 00:11:36,269 And in C++, the dynamic cost 330 00:11:36,270 --> 00:11:38,519 leads to a runtime check. 331 00:11:38,520 --> 00:11:40,649 Now, to actually execute a runtime 332 00:11:40,650 --> 00:11:42,929 check, you need the runtime type 333 00:11:42,930 --> 00:11:45,149 information to be able to decide, hey, 334 00:11:45,150 --> 00:11:47,249 what is the actual type of the 335 00:11:47,250 --> 00:11:49,109 object? What is the type of the memory 336 00:11:49,110 --> 00:11:50,550 area that we are looking at? 337 00:11:52,220 --> 00:11:54,709 You need to identify the underlying 338 00:11:54,710 --> 00:11:56,869 type of the memory object to in some 339 00:11:56,870 --> 00:11:58,969 form, and this 340 00:11:58,970 --> 00:12:01,189 is where we see some of the drawbacks 341 00:12:01,190 --> 00:12:02,329 of C++. 342 00:12:02,330 --> 00:12:04,609 C++ is an extension 343 00:12:04,610 --> 00:12:06,919 of C pretty much an in C, 344 00:12:06,920 --> 00:12:09,679 everything boils down to an typed memory. 345 00:12:09,680 --> 00:12:12,379 Everything boils down to bytes in memory. 346 00:12:12,380 --> 00:12:14,479 Might you have some, Khaira that 347 00:12:14,480 --> 00:12:16,609 can be interpreted in different ways 348 00:12:16,610 --> 00:12:18,709 and without an actual identification 349 00:12:18,710 --> 00:12:20,779 of the underlying Carrere, you don't know 350 00:12:20,780 --> 00:12:22,849 what the type is and what the dynamic 351 00:12:22,850 --> 00:12:25,369 cost. You are using a unique identifier 352 00:12:25,370 --> 00:12:27,469 for an object to actually decide 353 00:12:27,470 --> 00:12:28,759 what type it is. 354 00:12:28,760 --> 00:12:30,919 And this is where the V 355 00:12:30,920 --> 00:12:32,899 table pointer is actually being used. 356 00:12:32,900 --> 00:12:34,939 This allows you to have a unique 357 00:12:34,940 --> 00:12:37,369 identifier for the actual object 358 00:12:37,370 --> 00:12:39,259 that allows you to decide, hey, this is 359 00:12:39,260 --> 00:12:41,449 the runtime type of this object. 360 00:12:41,450 --> 00:12:43,759 So it's a unique way to identify 361 00:12:43,760 --> 00:12:46,039 different or to distinguish between 362 00:12:46,040 --> 00:12:47,899 a different object types. 363 00:12:47,900 --> 00:12:49,849 So let's look at the different casting, 364 00:12:49,850 --> 00:12:51,259 uh, behavior and a little bit more 365 00:12:51,260 --> 00:12:52,309 detail. 366 00:12:52,310 --> 00:12:54,499 So if you have a static cast, 367 00:12:54,500 --> 00:12:56,719 we cast an object B 368 00:12:56,720 --> 00:12:59,119 into, uh, pointer, 369 00:12:59,120 --> 00:13:00,109 Greta. 370 00:13:00,110 --> 00:13:02,449 And this is being compiled down into 371 00:13:02,450 --> 00:13:04,579 a lot of the pointer, 372 00:13:04,580 --> 00:13:06,739 B, into 373 00:13:06,740 --> 00:13:08,869 the Aurora X register and 374 00:13:08,870 --> 00:13:10,939 then a store into this other 375 00:13:10,940 --> 00:13:12,469 target area. 376 00:13:12,470 --> 00:13:14,689 So there's no real type 377 00:13:14,690 --> 00:13:16,609 check happening there. 378 00:13:16,610 --> 00:13:18,859 And the compiler only does 379 00:13:18,860 --> 00:13:20,719 a feasibility check at compile time. 380 00:13:20,720 --> 00:13:23,059 And it goes from, uh, from B to A 381 00:13:23,060 --> 00:13:25,669 to make sure that is of the right type. 382 00:13:25,670 --> 00:13:27,469 Now, if you have a dynamic cast, if you 383 00:13:27,470 --> 00:13:30,049 compile it was O0 without optimization, 384 00:13:30,050 --> 00:13:31,309 we see that there's actually a lot of 385 00:13:31,310 --> 00:13:32,659 code being generated. 386 00:13:32,660 --> 00:13:34,789 Again, we load the pointer, we do not 387 00:13:34,790 --> 00:13:35,929 check. 388 00:13:35,930 --> 00:13:38,479 And in addition to that, we load the 389 00:13:38,480 --> 00:13:40,909 pointer to the greater 390 00:13:40,910 --> 00:13:42,559 class and we load the pointer to the base 391 00:13:42,560 --> 00:13:45,059 class and then we execute a full dynamic 392 00:13:45,060 --> 00:13:47,069 because this allows us to do this this 393 00:13:47,070 --> 00:13:49,249 actual check and make sure that the 394 00:13:49,250 --> 00:13:51,229 type of the runtime object that we have 395 00:13:51,230 --> 00:13:53,209 conforms to, the actual type that we 396 00:13:53,210 --> 00:13:54,079 expect. 397 00:13:54,080 --> 00:13:56,239 So we we do a full runtime 398 00:13:56,240 --> 00:13:58,339 enforcement check. Now, if we optimize 399 00:13:58,340 --> 00:14:00,499 this, um, we 400 00:14:00,500 --> 00:14:02,239 do a dynamic cost. 401 00:14:02,240 --> 00:14:03,859 We load the two pointers, the two based 402 00:14:03,860 --> 00:14:05,809 pointers. We check what the the current 403 00:14:05,810 --> 00:14:08,389 base pointer is of the current object 404 00:14:08,390 --> 00:14:10,459 and depending on the 405 00:14:10,460 --> 00:14:12,199 result of the cost of to allow it or we 406 00:14:12,200 --> 00:14:13,849 terminate the program at runtime as a 407 00:14:13,850 --> 00:14:15,049 type safety violation. 408 00:14:16,370 --> 00:14:18,319 Let's look what a static cost is 409 00:14:18,320 --> 00:14:19,320 optimized. 410 00:14:21,910 --> 00:14:23,979 It ends up in zero instructions 411 00:14:23,980 --> 00:14:26,289 because we refused to register, 412 00:14:26,290 --> 00:14:28,839 so esthetic cost does not ensure 413 00:14:28,840 --> 00:14:31,089 any runtime overhead and does not in 414 00:14:31,090 --> 00:14:33,309 any runtime check uses 415 00:14:33,310 --> 00:14:35,259 as a as a take home message. 416 00:14:35,260 --> 00:14:37,419 Static costs do not 417 00:14:37,420 --> 00:14:39,549 result in any instructions being 418 00:14:39,550 --> 00:14:41,439 executed at runtime. 419 00:14:41,440 --> 00:14:43,569 So no performance 420 00:14:43,570 --> 00:14:44,499 overhead. 421 00:14:44,500 --> 00:14:46,690 And, uh. 422 00:14:48,540 --> 00:14:50,939 No security guarantees, 423 00:14:50,940 --> 00:14:53,219 so now with this knowledge, 424 00:14:53,220 --> 00:14:55,829 what actually is type, confusion, 425 00:14:55,830 --> 00:14:57,659 type confusion arises through illegal 426 00:14:57,660 --> 00:14:58,709 down costs. 427 00:14:58,710 --> 00:15:00,179 Assume you have the following type 428 00:15:00,180 --> 00:15:02,279 hierarchy. We have a parent 429 00:15:02,280 --> 00:15:04,739 class and two dependent classes, 430 00:15:04,740 --> 00:15:06,659 child one and child two. 431 00:15:06,660 --> 00:15:09,599 Now, if we allocate an object 432 00:15:09,600 --> 00:15:11,669 or child of type, child 433 00:15:11,670 --> 00:15:13,859 one and we store it and in the three 434 00:15:13,860 --> 00:15:16,259 pointer we can cast it to 435 00:15:16,260 --> 00:15:17,399 a parent type. 436 00:15:17,400 --> 00:15:19,319 Right, so we can cast from the child 437 00:15:19,320 --> 00:15:21,839 object child one object to parent object. 438 00:15:21,840 --> 00:15:24,209 And as these two classes are dependent, 439 00:15:24,210 --> 00:15:26,399 this is a valid cost and we 440 00:15:26,400 --> 00:15:29,579 can store the pointer to P uh, 441 00:15:29,580 --> 00:15:30,580 in, uh. 442 00:15:31,790 --> 00:15:33,859 In the in the pointer, and we can 443 00:15:33,860 --> 00:15:35,449 use the parent objects or the fields of 444 00:15:35,450 --> 00:15:37,969 the parent object now 445 00:15:37,970 --> 00:15:40,069 as a second step, we can consider 446 00:15:40,070 --> 00:15:42,139 a parent object into a child object. 447 00:15:42,140 --> 00:15:44,479 And if the child to object or 448 00:15:44,480 --> 00:15:46,129 if the underlying object has been 449 00:15:46,130 --> 00:15:48,229 allocated as a child to object, then this 450 00:15:48,230 --> 00:15:50,119 cost would be allowed. 451 00:15:50,120 --> 00:15:52,309 But the steady course does not 452 00:15:52,310 --> 00:15:53,839 do any checks, right. 453 00:15:53,840 --> 00:15:55,899 So at runtime, this would lead to 454 00:15:55,900 --> 00:15:56,839 to type confusion. 455 00:15:56,840 --> 00:15:59,569 And this is exactly where the exploitable 456 00:15:59,570 --> 00:16:01,189 behavior comes in. 457 00:16:01,190 --> 00:16:03,709 So with this static cost, 458 00:16:03,710 --> 00:16:05,779 which is not being checked, the 459 00:16:05,780 --> 00:16:08,659 static cost could be abused 460 00:16:08,660 --> 00:16:11,419 to reinterpret the underlying 461 00:16:11,420 --> 00:16:13,969 memory as a different type. 462 00:16:13,970 --> 00:16:16,129 Let me get you a little bit 463 00:16:16,130 --> 00:16:17,779 more detail and background on that. 464 00:16:17,780 --> 00:16:19,999 On the type confusion we have to parent 465 00:16:20,000 --> 00:16:21,589 class and we have the child and I'll 466 00:16:21,590 --> 00:16:23,809 break it down to just parent and child 467 00:16:23,810 --> 00:16:26,119 to make it a little bit easier. 468 00:16:26,120 --> 00:16:28,189 Now to parent object only has a single 469 00:16:28,190 --> 00:16:31,009 variable, uh, inside it 470 00:16:31,010 --> 00:16:33,409 called, uh, often 471 00:16:33,410 --> 00:16:36,319 type. And the child class has a second 472 00:16:36,320 --> 00:16:38,479 type and a virtual function called 473 00:16:38,480 --> 00:16:39,619 print. 474 00:16:39,620 --> 00:16:42,709 Now if we allocate a P object, 475 00:16:42,710 --> 00:16:44,929 we only allocate the four point four 476 00:16:44,930 --> 00:16:47,099 bytes are being used for the integer. 477 00:16:47,100 --> 00:16:49,879 If we allocate a C type object, 478 00:16:49,880 --> 00:16:51,829 we have the V table pointer that points 479 00:16:51,830 --> 00:16:54,349 to the actual, um. 480 00:16:54,350 --> 00:16:56,509 Actual location, it contains all the 481 00:16:56,510 --> 00:16:59,119 code pointers, we have the X integer 482 00:16:59,120 --> 00:17:00,769 and we have to Y integer that can all be 483 00:17:00,770 --> 00:17:01,770 used. 484 00:17:03,130 --> 00:17:06,098 Now, let's assume we allocate, uh, 485 00:17:06,099 --> 00:17:08,078 uh, a object, the parent object, and we 486 00:17:08,079 --> 00:17:10,149 have a pointer to it, if you do 487 00:17:10,150 --> 00:17:12,489 a static cast into a C, 488 00:17:12,490 --> 00:17:14,618 uh, C pointer, 489 00:17:14,619 --> 00:17:16,959 the C pointer ends up pointing above 490 00:17:16,960 --> 00:17:18,969 the actual object. 491 00:17:18,970 --> 00:17:21,399 And, um, the underlying 492 00:17:21,400 --> 00:17:23,499 object or the underlying data that is at 493 00:17:23,500 --> 00:17:26,078 that location would be reinterpreted 494 00:17:26,079 --> 00:17:28,419 as a V table pointer along 495 00:17:28,420 --> 00:17:30,489 with the Y object or Y 496 00:17:30,490 --> 00:17:32,619 integer that could then be read 497 00:17:32,620 --> 00:17:34,269 and written, which would expose the 498 00:17:34,270 --> 00:17:35,289 underlying memory. 499 00:17:35,290 --> 00:17:36,579 So this leads to a memory safety 500 00:17:36,580 --> 00:17:38,799 violation and contraflow, high checking 501 00:17:38,800 --> 00:17:40,299 after type confusion. 502 00:17:40,300 --> 00:17:42,969 And if you look at the, uh, at the chain 503 00:17:42,970 --> 00:17:45,099 of violations, the type 504 00:17:45,100 --> 00:17:47,109 Confucianist, the first thing that 505 00:17:47,110 --> 00:17:49,389 happens that violates the integrity 506 00:17:49,390 --> 00:17:51,699 of the underlying application. 507 00:17:51,700 --> 00:17:54,249 And this is the initial 508 00:17:54,250 --> 00:17:56,439 entry vector for an attacker to 509 00:17:56,440 --> 00:17:58,569 abuse this underlying block, 510 00:17:58,570 --> 00:17:59,949 this type confusion block. 511 00:17:59,950 --> 00:18:01,869 And this can then be used as a memory 512 00:18:01,870 --> 00:18:03,519 safety confusion or memory safety 513 00:18:03,520 --> 00:18:05,679 violation or then for contraflow, 514 00:18:05,680 --> 00:18:07,089 high checking. 515 00:18:07,090 --> 00:18:09,819 Now, how do we use this 516 00:18:09,820 --> 00:18:11,979 vulnerability type to build 517 00:18:11,980 --> 00:18:13,210 and exploit primitive. 518 00:18:15,350 --> 00:18:17,659 So imagine that, 519 00:18:17,660 --> 00:18:19,699 um, when you're using type confusional, 520 00:18:19,700 --> 00:18:21,769 when you're exploiting type confusion 521 00:18:21,770 --> 00:18:23,959 in in your programs, you're 522 00:18:23,960 --> 00:18:26,959 trying to control two pointers 523 00:18:26,960 --> 00:18:29,059 of different type that both 524 00:18:29,060 --> 00:18:31,129 points to the same memory area. 525 00:18:31,130 --> 00:18:32,779 But the two pointers of different type 526 00:18:32,780 --> 00:18:34,969 allow you to in reinterprets 527 00:18:34,970 --> 00:18:37,189 the different fields of the object in two 528 00:18:37,190 --> 00:18:38,299 different ways. 529 00:18:38,300 --> 00:18:40,189 So you have you have a certain memory 530 00:18:40,190 --> 00:18:42,539 area that is of of one 531 00:18:42,540 --> 00:18:44,269 original type or has been allocated as 532 00:18:44,270 --> 00:18:45,169 one original type. 533 00:18:45,170 --> 00:18:46,879 But you have two pointers of different 534 00:18:46,880 --> 00:18:48,649 types. That's a memory area. 535 00:18:48,650 --> 00:18:51,259 And for example, and in the first type, 536 00:18:51,260 --> 00:18:53,359 the parameter is 537 00:18:53,360 --> 00:18:55,429 the first entry is interpreted as a 538 00:18:55,430 --> 00:18:57,619 visible pointer, while in the second type 539 00:18:57,620 --> 00:18:59,959 it is interpreted as a long right. 540 00:18:59,960 --> 00:19:02,119 And if you if you use a setter for 541 00:19:02,120 --> 00:19:04,369 it is long value, you can use it 542 00:19:04,370 --> 00:19:06,529 to override the veto pointer 543 00:19:06,530 --> 00:19:07,639 in the interview. 544 00:19:07,640 --> 00:19:08,899 So you mentioned that you are using the 545 00:19:08,900 --> 00:19:11,029 first view to set the 546 00:19:11,030 --> 00:19:12,859 table pointer and then you're using the 547 00:19:12,860 --> 00:19:14,809 second pointer that you control as well 548 00:19:14,810 --> 00:19:17,029 of different type to dispatch on 549 00:19:17,030 --> 00:19:18,030 that pointer. 550 00:19:19,200 --> 00:19:21,479 As a simple example, um, 551 00:19:21,480 --> 00:19:23,609 just to show you the the power 552 00:19:23,610 --> 00:19:25,689 of this exploit, primitive, uh, I 553 00:19:25,690 --> 00:19:28,169 can imagine that we have a base class 554 00:19:28,170 --> 00:19:30,209 that just implements some basic 555 00:19:30,210 --> 00:19:32,609 functionality and we have two subclasses 556 00:19:32,610 --> 00:19:33,939 of it to dissent in classes. 557 00:19:33,940 --> 00:19:35,999 We have a greater class that just says 558 00:19:36,000 --> 00:19:38,249 hello and we have a 559 00:19:38,250 --> 00:19:40,889 great executer as a service. 560 00:19:40,890 --> 00:19:43,379 So both of those are implemented 561 00:19:43,380 --> 00:19:45,389 as virtual function because we may want 562 00:19:45,390 --> 00:19:47,489 to build our fancy framework on top 563 00:19:47,490 --> 00:19:49,109 of that with additional functionality. 564 00:19:49,110 --> 00:19:50,999 So we want to be able to override these 565 00:19:51,000 --> 00:19:52,439 functionalities. 566 00:19:52,440 --> 00:19:54,899 So the executer service implements 567 00:19:54,900 --> 00:19:57,129 one virtual function called exec that 568 00:19:57,130 --> 00:19:59,259 takes a string that is then being passed 569 00:19:59,260 --> 00:20:01,559 to system to execute it as a, uh, 570 00:20:01,560 --> 00:20:03,539 as an additional service. 571 00:20:03,540 --> 00:20:05,909 And the greater function just prints 572 00:20:05,910 --> 00:20:07,619 the string to standard out. 573 00:20:07,620 --> 00:20:10,199 So that sounds pretty reasonable, right? 574 00:20:10,200 --> 00:20:12,689 There's no way that an, uh, a programmer 575 00:20:12,690 --> 00:20:14,999 would confuse exec and say high 576 00:20:15,000 --> 00:20:16,529 because the functions have different 577 00:20:16,530 --> 00:20:18,149 names and there's no way for us to 578 00:20:18,150 --> 00:20:19,309 confuse it. Right. 579 00:20:22,830 --> 00:20:23,830 Now. 580 00:20:24,880 --> 00:20:27,369 If we allocate, 581 00:20:27,370 --> 00:20:29,559 uh, to base objects, be 582 00:20:29,560 --> 00:20:31,809 one and be two of type, 583 00:20:31,810 --> 00:20:33,369 the first one of type creature and the 584 00:20:33,370 --> 00:20:35,379 second type, uh, the second object of 585 00:20:35,380 --> 00:20:37,479 type, we can actually 586 00:20:37,480 --> 00:20:39,519 dispatch, uh. 587 00:20:43,020 --> 00:20:45,299 Dispatch on those objects, 588 00:20:45,300 --> 00:20:47,549 so we allocate these two objects, 589 00:20:47,550 --> 00:20:49,319 one object of type Grétar, the second 590 00:20:49,320 --> 00:20:51,479 object of type exec, and then 591 00:20:51,480 --> 00:20:53,789 we cast the first object BE1 592 00:20:53,790 --> 00:20:55,829 into Grétar and we call Graeter's say 593 00:20:55,830 --> 00:20:58,649 high and then Graeter's says high. 594 00:20:58,650 --> 00:21:00,779 And then this was the second object 595 00:21:00,780 --> 00:21:02,969 we again casted into, Greta. 596 00:21:02,970 --> 00:21:05,189 So from the base class to Grétar class 597 00:21:05,190 --> 00:21:07,469 and the compiler does a runtime, 598 00:21:07,470 --> 00:21:09,929 a compile time check and says, oh yes, 599 00:21:09,930 --> 00:21:11,939 the, uh, the Grétar. 600 00:21:13,980 --> 00:21:16,619 Type is dependent or a descendant 601 00:21:16,620 --> 00:21:18,689 of the beast type synesthetic cost is 602 00:21:18,690 --> 00:21:20,909 actually allowed, and then we can call 603 00:21:20,910 --> 00:21:23,309 hi with this weird string user 604 00:21:23,310 --> 00:21:25,439 being x click and it works perfectly 605 00:21:25,440 --> 00:21:27,509 fine. The compiler doesn't complain. 606 00:21:32,390 --> 00:21:33,799 This is actually really fun. 607 00:21:35,060 --> 00:21:36,319 If you look into this. 608 00:21:39,980 --> 00:21:42,259 We see this is exactly the code that 609 00:21:42,260 --> 00:21:43,260 I've just shown. 610 00:21:44,200 --> 00:21:46,479 We've got the the study cost 611 00:21:46,480 --> 00:21:48,579 into Grieder here and we've got a study 612 00:21:48,580 --> 00:21:50,649 cost Grieder here and we call 613 00:21:50,650 --> 00:21:51,730 CIHI twice, 614 00:21:53,440 --> 00:21:54,440 so we make. 615 00:21:57,440 --> 00:21:59,689 Object the allocate 616 00:21:59,690 --> 00:22:01,759 to objects of type, base 617 00:22:01,760 --> 00:22:04,159 of type, creature and type exec, 618 00:22:04,160 --> 00:22:06,439 but we both recall the same 619 00:22:06,440 --> 00:22:09,019 high method two times. 620 00:22:09,020 --> 00:22:10,160 And if you executed. 621 00:22:11,710 --> 00:22:13,779 We once the first call to say 622 00:22:13,780 --> 00:22:15,999 hi, this Grieder says hi and the second 623 00:22:16,000 --> 00:22:18,369 call to say hi opens 624 00:22:18,370 --> 00:22:20,649 a calculator, which 625 00:22:20,650 --> 00:22:22,809 is not what we want. 626 00:22:23,840 --> 00:22:26,119 Um, if you look at how this is actually 627 00:22:26,120 --> 00:22:28,309 implemented, so why does this 628 00:22:28,310 --> 00:22:30,529 happen? First off, the initial 629 00:22:30,530 --> 00:22:32,719 bar or the underlying bug, is that the 630 00:22:32,720 --> 00:22:34,819 type hierarchy or the compiler 631 00:22:34,820 --> 00:22:36,919 cannot, uh, 632 00:22:36,920 --> 00:22:39,259 stop us from casting a base class 633 00:22:39,260 --> 00:22:41,029 into a greater class, even though it is 634 00:22:41,030 --> 00:22:42,109 an exact class. 635 00:22:43,520 --> 00:22:45,829 We've got these two tables from 636 00:22:45,830 --> 00:22:48,079 B1 and B2 and the first V table 637 00:22:48,080 --> 00:22:50,029 points to the V table of the of the 638 00:22:50,030 --> 00:22:52,609 Grieder type and the second, uh, 639 00:22:52,610 --> 00:22:54,679 base V two 640 00:22:54,680 --> 00:22:57,209 pointer points to the exact type, 641 00:22:57,210 --> 00:22:59,329 um, and we 642 00:22:59,330 --> 00:23:01,189 can easily cost between the two of you 643 00:23:01,190 --> 00:23:03,149 without the type system in C plus 644 00:23:03,150 --> 00:23:05,359 actually complaining against it. 645 00:23:05,360 --> 00:23:07,129 And if you look at it at the actual 646 00:23:07,130 --> 00:23:08,569 implementation, if you drill down in the 647 00:23:08,570 --> 00:23:10,439 source code, what it actually ends up 648 00:23:10,440 --> 00:23:12,919 with as we dereference the first field 649 00:23:12,920 --> 00:23:15,019 of the, uh, of the greater 650 00:23:15,020 --> 00:23:17,239 class, which is the V table pointer, 651 00:23:17,240 --> 00:23:19,429 and then we dereference the 652 00:23:19,430 --> 00:23:20,839 first V table pointer. 653 00:23:20,840 --> 00:23:23,239 So even though we are executing C high 654 00:23:23,240 --> 00:23:25,129 or we have written C high in the source 655 00:23:25,130 --> 00:23:27,379 code, it boils down to executing 656 00:23:27,380 --> 00:23:29,479 the exec 657 00:23:29,480 --> 00:23:32,239 function in the, 658 00:23:32,240 --> 00:23:34,549 um, in the exec 659 00:23:34,550 --> 00:23:36,109 class instead of the Grieder class, 660 00:23:36,110 --> 00:23:38,519 leading to the actual type confusion. 661 00:23:38,520 --> 00:23:40,709 Now, this is a fun, uh, fun way 662 00:23:40,710 --> 00:23:43,049 to exploit software. Now the hard 663 00:23:43,050 --> 00:23:45,209 question is, how do we find these types 664 00:23:45,210 --> 00:23:46,139 of vulnerabilities? 665 00:23:46,140 --> 00:23:48,209 How can we find such issues 666 00:23:48,210 --> 00:23:49,349 in our software? 667 00:23:49,350 --> 00:23:51,059 And the classic approach that people have 668 00:23:51,060 --> 00:23:53,249 been using for 669 00:23:53,250 --> 00:23:54,809 the last couple of years, especially to 670 00:23:54,810 --> 00:23:56,879 find vulnerabilities in 671 00:23:56,880 --> 00:23:59,369 large browsers, has been fussing 672 00:23:59,370 --> 00:24:00,539 and fussing is great. 673 00:24:00,540 --> 00:24:01,559 Right? 674 00:24:01,560 --> 00:24:03,929 But what it ended up 675 00:24:03,930 --> 00:24:05,999 being is your fuzzing 676 00:24:06,000 --> 00:24:07,979 and you're trying to find these type 677 00:24:07,980 --> 00:24:09,869 Confucian vulnerabilities. 678 00:24:09,870 --> 00:24:12,149 But as I've just shown, it's really hard 679 00:24:12,150 --> 00:24:14,249 to find or to actually trigger 680 00:24:14,250 --> 00:24:15,719 the type confusion vulnerabilities 681 00:24:15,720 --> 00:24:18,899 because there's no way for you to, 682 00:24:18,900 --> 00:24:21,449 um, enforce the 683 00:24:21,450 --> 00:24:22,499 actual check. 684 00:24:22,500 --> 00:24:24,269 Right. So the only way that you will 685 00:24:24,270 --> 00:24:26,429 discover that something is amiss is, 686 00:24:26,430 --> 00:24:28,529 is if you run into a memory 687 00:24:28,530 --> 00:24:30,359 corruption, if you run into segmentation 688 00:24:30,360 --> 00:24:32,399 fault, if you don't run into segmentation 689 00:24:32,400 --> 00:24:33,899 fault, there is no way for you to detect 690 00:24:33,900 --> 00:24:35,159 the actual type confusion. 691 00:24:35,160 --> 00:24:37,409 And you may be missing a large amount 692 00:24:37,410 --> 00:24:39,659 of time. Confucians right to 693 00:24:39,660 --> 00:24:41,519 your only. If you're running a father, 694 00:24:41,520 --> 00:24:43,289 you're only detecting the subset of type 695 00:24:43,290 --> 00:24:45,209 confusion that results in a direct memory 696 00:24:45,210 --> 00:24:46,919 corruption, the segmentation fault. 697 00:24:46,920 --> 00:24:48,209 There may be a large amount of type 698 00:24:48,210 --> 00:24:49,769 confusion that could be abused that 699 00:24:49,770 --> 00:24:50,759 you're missing. 700 00:24:50,760 --> 00:24:52,739 And what we wanted to look at is can we 701 00:24:52,740 --> 00:24:54,959 discover the missing set of type 702 00:24:54,960 --> 00:24:56,309 confusion? 703 00:24:56,310 --> 00:24:59,309 So can we bring type safety 704 00:24:59,310 --> 00:25:01,469 to C++ or at least some 705 00:25:01,470 --> 00:25:03,719 form of type system and typing so 706 00:25:03,720 --> 00:25:05,729 that we can be aware of when an illegal 707 00:25:05,730 --> 00:25:07,629 cost is being happening? 708 00:25:07,630 --> 00:25:09,779 Right. So the underlying 709 00:25:09,780 --> 00:25:12,329 problem that we have here is in C++, 710 00:25:12,330 --> 00:25:14,459 a static cost is checked only 711 00:25:14,460 --> 00:25:16,769 at compile time, which is fast, 712 00:25:16,770 --> 00:25:18,929 but does not give us any form of runtime 713 00:25:18,930 --> 00:25:20,099 guarantees. 714 00:25:20,100 --> 00:25:22,739 On the other hand, we have dynamic costs 715 00:25:22,740 --> 00:25:24,839 that are checks at runtime which result 716 00:25:24,840 --> 00:25:26,819 in high overhead and are limited to 717 00:25:26,820 --> 00:25:28,559 polymorphic classes. 718 00:25:28,560 --> 00:25:30,659 Polymorphic classes are classes that have 719 00:25:30,660 --> 00:25:32,869 virtual functions in them. 720 00:25:32,870 --> 00:25:34,909 Why are dynamic costs limited to 721 00:25:34,910 --> 00:25:36,619 polymorph classes? 722 00:25:36,620 --> 00:25:38,809 Well, we need to have some a 723 00:25:38,810 --> 00:25:41,209 way to identify individual 724 00:25:41,210 --> 00:25:43,639 objects or the type of an individual 725 00:25:43,640 --> 00:25:45,619 objects. And the visible pointer is such 726 00:25:45,620 --> 00:25:47,599 an identifying field. 727 00:25:47,600 --> 00:25:49,999 And this goes back to the design of 728 00:25:50,000 --> 00:25:51,079 C++. 729 00:25:51,080 --> 00:25:53,359 And in C++, a struct 730 00:25:53,360 --> 00:25:55,669 is a class and a class is a struct. 731 00:25:55,670 --> 00:25:57,619 And if you allocate a structure and C, 732 00:25:57,620 --> 00:26:00,049 you have no idea what the underlying 733 00:26:00,050 --> 00:26:01,429 type is right there. 734 00:26:01,430 --> 00:26:02,989 There's no way that she remembers that 735 00:26:02,990 --> 00:26:04,999 you are allocated a few struct. 736 00:26:05,000 --> 00:26:07,159 It could be any arbitrary type only if 737 00:26:07,160 --> 00:26:08,959 you have an identifying field, a type 738 00:26:08,960 --> 00:26:11,059 idea, only then you can actually 739 00:26:11,060 --> 00:26:12,809 identify the underlying type. 740 00:26:12,810 --> 00:26:15,169 Um, safe. 741 00:26:15,170 --> 00:26:17,719 Object oriented languages like Java, 742 00:26:17,720 --> 00:26:19,789 C, Sharp and so on, whenever 743 00:26:19,790 --> 00:26:21,439 you allocate an object, they have an 744 00:26:21,440 --> 00:26:23,669 object ID, an object type idea 745 00:26:23,670 --> 00:26:25,159 that clearly identifies the underlying 746 00:26:25,160 --> 00:26:28,069 type. This is missing in C++. 747 00:26:28,070 --> 00:26:30,379 This is why we cannot explicitly check 748 00:26:30,380 --> 00:26:32,059 all the costs between any objects, but 749 00:26:32,060 --> 00:26:34,429 only for polymorphic objects as virtual 750 00:26:34,430 --> 00:26:35,749 classes. 751 00:26:35,750 --> 00:26:37,849 So what we figure is this, is 752 00:26:37,850 --> 00:26:38,959 there something missing here? 753 00:26:38,960 --> 00:26:41,269 We need to be able to do 754 00:26:41,270 --> 00:26:43,399 an actual type check for 755 00:26:43,400 --> 00:26:45,289 any of these objects. 756 00:26:45,290 --> 00:26:47,599 So according to the model 757 00:26:47,600 --> 00:26:49,819 of the S4 3C, 758 00:26:49,820 --> 00:26:52,129 we figured we were to bat and 759 00:26:52,130 --> 00:26:54,319 bring TIFE safety to C++. 760 00:26:55,370 --> 00:26:57,529 And under our underlying idea 761 00:26:57,530 --> 00:26:58,530 is that. 762 00:27:00,170 --> 00:27:03,859 We would check every single typecasts, 763 00:27:03,860 --> 00:27:05,779 so we do a dynamic check for every single 764 00:27:05,780 --> 00:27:08,089 type cast and then aggressively 765 00:27:08,090 --> 00:27:10,339 remove as many casts that 766 00:27:10,340 --> 00:27:12,679 we can as part of our design 767 00:27:12,680 --> 00:27:14,989 and as part of our implementation. 768 00:27:14,990 --> 00:27:17,269 So we are making type checks explicit. 769 00:27:18,510 --> 00:27:21,029 So we enforce an explicit runtime 770 00:27:21,030 --> 00:27:23,349 check at all, at all cost 771 00:27:23,350 --> 00:27:25,619 sites for dynamic costs, 772 00:27:25,620 --> 00:27:28,109 static costs, reinterpret costs 773 00:27:28,110 --> 00:27:29,729 and also see costing. 774 00:27:32,180 --> 00:27:33,889 This sounds like a contradiction, right? 775 00:27:33,890 --> 00:27:35,419 I've just told you that this is not 776 00:27:35,420 --> 00:27:37,669 possible for in the existing 777 00:27:37,670 --> 00:27:40,249 framework that C++ has 778 00:27:40,250 --> 00:27:42,319 because we have no way to identify 779 00:27:42,320 --> 00:27:44,689 the underlying type of an object. 780 00:27:44,690 --> 00:27:46,909 How do we solve this problem? 781 00:27:46,910 --> 00:27:49,189 Whenever you allocate an object, 782 00:27:49,190 --> 00:27:51,589 whenever you execute it constructors, 783 00:27:51,590 --> 00:27:52,759 or if you simply go through the 784 00:27:52,760 --> 00:27:53,839 allocator? 785 00:27:53,840 --> 00:27:56,269 We remember that this memory 786 00:27:56,270 --> 00:27:58,549 area over here is of that particular 787 00:27:58,550 --> 00:28:01,099 type and we keep some form of of metadata 788 00:28:01,100 --> 00:28:02,869 table somewhere in the background that 789 00:28:02,870 --> 00:28:05,119 allows us to query and look up for 790 00:28:05,120 --> 00:28:06,559 any byte in memory. 791 00:28:06,560 --> 00:28:08,839 What type does this this 792 00:28:09,890 --> 00:28:10,789 piece of memory have? 793 00:28:10,790 --> 00:28:12,889 And we can then use this information 794 00:28:12,890 --> 00:28:15,379 in any of the costs so we can replace 795 00:28:15,380 --> 00:28:17,869 a static cost with an actual runtime 796 00:28:17,870 --> 00:28:20,239 check and make sure that we detect 797 00:28:20,240 --> 00:28:22,309 a type confusion problem right when it 798 00:28:22,310 --> 00:28:24,409 happens right at the site and 799 00:28:24,410 --> 00:28:26,689 not much later than an actual 800 00:28:26,690 --> 00:28:27,999 memory corruption happens. 801 00:28:29,240 --> 00:28:30,979 So we build a global type hierarchy 802 00:28:30,980 --> 00:28:32,899 during the compilation of the software 803 00:28:32,900 --> 00:28:34,999 and we keep track of the allocation 804 00:28:35,000 --> 00:28:37,429 type of each object to the instrument, 805 00:28:37,430 --> 00:28:39,559 all forms of allocation, and 806 00:28:39,560 --> 00:28:41,539 we keep this in our disjoined metadata 807 00:28:41,540 --> 00:28:43,609 table. And then in a second 808 00:28:43,610 --> 00:28:45,889 step, we can execute for every single 809 00:28:45,890 --> 00:28:47,599 type cast that happens at one time. 810 00:28:47,600 --> 00:28:49,669 We can execute this this check and make 811 00:28:49,670 --> 00:28:51,589 sure that it actually matches. 812 00:28:51,590 --> 00:28:54,139 So we've built this large system, uh, 813 00:28:54,140 --> 00:28:56,269 based on LVM, 814 00:28:56,270 --> 00:28:58,549 where the instrument 815 00:28:58,550 --> 00:29:00,919 source code on top of Klang with 816 00:29:00,920 --> 00:29:03,199 additional explicit type checks 817 00:29:03,200 --> 00:29:06,469 during the compilation. 818 00:29:06,470 --> 00:29:08,599 We do object tracing, uh, as 819 00:29:08,600 --> 00:29:10,669 part of additional 11 percent and 820 00:29:10,670 --> 00:29:11,989 track the type hierarchy. 821 00:29:11,990 --> 00:29:13,699 And then at runtime you can check if 822 00:29:13,700 --> 00:29:15,059 something fails or not. 823 00:29:15,060 --> 00:29:17,389 And then we have a a hardened binary that 824 00:29:17,390 --> 00:29:20,459 does all the explicit type checks. 825 00:29:20,460 --> 00:29:22,409 Compared to some prior work, you may know 826 00:29:22,410 --> 00:29:24,659 you, BISAN, which does the checking for 827 00:29:24,660 --> 00:29:26,879 polymorphic types only, this allows 828 00:29:26,880 --> 00:29:29,159 us to check every single 829 00:29:29,160 --> 00:29:31,709 typecasts that is out there for static, 830 00:29:31,710 --> 00:29:33,989 uh, for static costs and for the net 831 00:29:33,990 --> 00:29:35,699 cost to do this, this fine-grained 832 00:29:35,700 --> 00:29:36,989 checking. 833 00:29:36,990 --> 00:29:39,119 We cover new object allocations, we 834 00:29:39,120 --> 00:29:41,279 cover placement new we cover 835 00:29:41,280 --> 00:29:43,169 reinterpret cost and a bunch of other 836 00:29:43,170 --> 00:29:45,419 things. Right. So we we've worked 837 00:29:45,420 --> 00:29:47,549 very hard to compile real software, 838 00:29:47,550 --> 00:29:49,619 including Chrome, Firefox and 839 00:29:49,620 --> 00:29:51,179 other systems. 840 00:29:51,180 --> 00:29:53,339 Now, the problem is, as soon 841 00:29:53,340 --> 00:29:55,589 as you enforce full type 842 00:29:55,590 --> 00:29:57,659 checks for every single cost, you 843 00:29:57,660 --> 00:30:00,269 run into impressive overheads. 844 00:30:00,270 --> 00:30:02,489 I understood our main task was to 845 00:30:02,490 --> 00:30:04,589 reduce the overhead to make it more 846 00:30:04,590 --> 00:30:05,699 useful. 847 00:30:05,700 --> 00:30:07,769 So on one hand, we limit tracing to 848 00:30:07,770 --> 00:30:09,959 unsafe objects if an object is 849 00:30:09,960 --> 00:30:13,019 only used in a safe context. 850 00:30:13,020 --> 00:30:14,609 So, for example, if it is only if it is 851 00:30:14,610 --> 00:30:16,619 never being used for costing, we don't 852 00:30:16,620 --> 00:30:18,299 need to instrument. If you don't need to 853 00:30:18,300 --> 00:30:19,979 remember the type of the underlying 854 00:30:19,980 --> 00:30:22,079 object. If an object is never 855 00:30:22,080 --> 00:30:23,429 used in cost and we don't need to worry 856 00:30:23,430 --> 00:30:24,359 about it. Right. 857 00:30:24,360 --> 00:30:26,399 So we can remove tracing for types that 858 00:30:26,400 --> 00:30:28,020 are never cast in the program. 859 00:30:29,070 --> 00:30:31,229 We limit checking to unsafe cost. 860 00:30:31,230 --> 00:30:33,359 So we do some static 861 00:30:33,360 --> 00:30:35,429 verification inside the 862 00:30:35,430 --> 00:30:37,739 scope of a function to figure out what 863 00:30:37,740 --> 00:30:40,169 parts of the code are actually 864 00:30:40,170 --> 00:30:41,519 used in a safe way. 865 00:30:41,520 --> 00:30:43,769 And this also allows us to remove 866 00:30:43,770 --> 00:30:46,739 some of the, uh, of the costs. 867 00:30:46,740 --> 00:30:49,259 Um, we also replace all the dynamic 868 00:30:49,260 --> 00:30:51,629 costs with our special form of costing. 869 00:30:51,630 --> 00:30:54,089 As it turns out, our cost 870 00:30:54,090 --> 00:30:56,189 that we have developed using our metadata 871 00:30:56,190 --> 00:30:58,379 information is much faster than 872 00:30:58,380 --> 00:31:00,449 any cost done through the 873 00:31:00,450 --> 00:31:02,579 RTI information that 874 00:31:02,580 --> 00:31:04,829 the original C++ dynamic cost 875 00:31:04,830 --> 00:31:07,019 us. Um, as it turns out that Emmi 876 00:31:07,020 --> 00:31:08,969 cost has never been optimized. 877 00:31:08,970 --> 00:31:11,009 People don't really use dynamic costs due 878 00:31:11,010 --> 00:31:12,269 to the performance overhead. 879 00:31:12,270 --> 00:31:13,859 Therefore it's not been optimized, 880 00:31:13,860 --> 00:31:14,969 therefore it's not being used. 881 00:31:14,970 --> 00:31:17,519 It's is this endless circle. 882 00:31:17,520 --> 00:31:19,229 If you replace all the time it costs with 883 00:31:19,230 --> 00:31:21,329 our type, uh, type costs, we can 884 00:31:21,330 --> 00:31:23,309 actually improve the performance a little 885 00:31:23,310 --> 00:31:24,310 bit. 886 00:31:25,110 --> 00:31:27,149 Interestingly, by just doing this based 887 00:31:27,150 --> 00:31:29,339 system, we already found four new 888 00:31:29,340 --> 00:31:31,679 vulnerabilities in Apache Circus's, 889 00:31:31,680 --> 00:31:33,899 which is a large XML processing 890 00:31:33,900 --> 00:31:35,129 library. 891 00:31:35,130 --> 00:31:36,839 Um, in 892 00:31:38,010 --> 00:31:40,559 there very there were costs 893 00:31:40,560 --> 00:31:42,959 from a dorm text implementation node 894 00:31:42,960 --> 00:31:45,119 to DOM Element implementation node, 895 00:31:45,120 --> 00:31:47,789 which allowed us to reinterpret these 896 00:31:47,790 --> 00:31:49,289 these types in different ways. 897 00:31:49,290 --> 00:31:51,539 And we've also found type confusion 898 00:31:51,540 --> 00:31:54,119 in the Kutty based library 899 00:31:54,120 --> 00:31:56,279 going from the node base to the to the 900 00:31:56,280 --> 00:31:58,959 mapped, uh, map node itself. 901 00:31:58,960 --> 00:32:01,169 And those were easy, low hanging 902 00:32:01,170 --> 00:32:03,449 fruit that we found by simply compiling 903 00:32:03,450 --> 00:32:05,309 software and running it in a day to day 904 00:32:05,310 --> 00:32:07,499 use. So by simply 905 00:32:07,500 --> 00:32:09,629 compiling your C++ software with 906 00:32:09,630 --> 00:32:11,969 our type, uh, type checker, 907 00:32:11,970 --> 00:32:14,849 you can already find vulnerabilities 908 00:32:14,850 --> 00:32:16,919 and bugs in the software by just 909 00:32:16,920 --> 00:32:19,589 running them in your day to day settings. 910 00:32:19,590 --> 00:32:22,229 This was step one and we found 911 00:32:22,230 --> 00:32:24,839 a bunch of, uh, of different 912 00:32:24,840 --> 00:32:26,909 vulnerabilities, but we wanted to go 913 00:32:26,910 --> 00:32:28,289 further. 914 00:32:28,290 --> 00:32:30,569 So a couple of weeks before to Congress, 915 00:32:30,570 --> 00:32:33,209 we started to fuzz all the things. 916 00:32:33,210 --> 00:32:35,879 As it turns out, um, 917 00:32:35,880 --> 00:32:37,349 you can combine our. 918 00:32:38,860 --> 00:32:41,139 Type safety mechanism, 919 00:32:41,140 --> 00:32:43,929 whereas AFL, so you can compile 920 00:32:43,930 --> 00:32:46,599 the any arbitrary C++ software 921 00:32:46,600 --> 00:32:48,999 with our hex type album 922 00:32:49,000 --> 00:32:51,519 based instrumentation and you then run 923 00:32:51,520 --> 00:32:53,769 the software on top of LVM and you 924 00:32:53,770 --> 00:32:55,869 fuzzed to find different forms 925 00:32:55,870 --> 00:32:57,609 of type confusion. 926 00:32:57,610 --> 00:33:00,069 You simply let AFL do its magic 927 00:33:00,070 --> 00:33:02,169 and you have to invest some time into 928 00:33:02,170 --> 00:33:04,929 triaging all the type confusion reports 929 00:33:04,930 --> 00:33:06,699 and you'll figure out different forms of 930 00:33:06,700 --> 00:33:08,149 vulnerabilities. 931 00:33:08,150 --> 00:33:10,179 Um, and at this point in time, I would 932 00:33:10,180 --> 00:33:12,249 like to, um, give 933 00:33:12,250 --> 00:33:14,019 a huge shout out to the students that 934 00:33:14,020 --> 00:33:16,179 actually did all the work and invested 935 00:33:16,180 --> 00:33:18,069 a lot of time into developing these 936 00:33:18,070 --> 00:33:20,469 systems and, um, 937 00:33:20,470 --> 00:33:22,749 triaging the vulnerabilities, building 938 00:33:22,750 --> 00:33:25,239 to the system 939 00:33:25,240 --> 00:33:27,369 and playing with it for for such 940 00:33:27,370 --> 00:33:28,389 a time. 941 00:33:28,390 --> 00:33:30,609 So we spent some time fuzzing on 942 00:33:30,610 --> 00:33:32,769 our Geto fuzzing cluster 943 00:33:32,770 --> 00:33:34,869 that we have under the 944 00:33:34,870 --> 00:33:36,059 desk of one of the students. 945 00:33:36,060 --> 00:33:38,019 So you see, this is a very low power 946 00:33:38,020 --> 00:33:39,849 power setting. We only have five machines 947 00:33:39,850 --> 00:33:42,279 that, um, that were running 948 00:33:42,280 --> 00:33:44,349 different pieces of software. 949 00:33:44,350 --> 00:33:46,629 But nevertheless, we found quite 950 00:33:46,630 --> 00:33:49,029 a couple of of interesting cases. 951 00:33:49,030 --> 00:33:51,399 After two weeks of fuzzing, uh, 952 00:33:51,400 --> 00:33:53,559 we found two new type confusion, 953 00:33:53,560 --> 00:33:56,079 bugs in Kutty core, uh, unfortunately 954 00:33:56,080 --> 00:33:58,149 not exploitable, but, uh, 955 00:33:58,150 --> 00:33:59,829 they've already been fixed and 956 00:33:59,830 --> 00:34:01,569 acknowledged by the developers. 957 00:34:01,570 --> 00:34:04,389 We found one more bug in success 958 00:34:04,390 --> 00:34:06,729 and we found, um, seven 959 00:34:06,730 --> 00:34:09,249 issues or reports and Lipsius, 960 00:34:09,250 --> 00:34:11,529 where we're still looking if 961 00:34:11,530 --> 00:34:14,079 they are exploitable or not. 962 00:34:14,080 --> 00:34:15,080 Um. 963 00:34:15,800 --> 00:34:16,880 As it turns out. 964 00:34:17,929 --> 00:34:19,579 Pretty much every software you throw at 965 00:34:19,580 --> 00:34:21,919 it will generate a couple of reports 966 00:34:21,920 --> 00:34:23,029 and there's. 967 00:34:24,090 --> 00:34:26,218 Part of some of these reports 968 00:34:26,219 --> 00:34:27,359 are due to the 969 00:34:28,440 --> 00:34:30,839 underlying problems with C++ 970 00:34:30,840 --> 00:34:32,968 as there's no explicit type 971 00:34:32,969 --> 00:34:35,279 information, um. 972 00:34:35,280 --> 00:34:37,499 Developers are abusing the type system 973 00:34:37,500 --> 00:34:39,718 in many odd ways, which 974 00:34:39,719 --> 00:34:41,669 leads to some spurious reports. 975 00:34:41,670 --> 00:34:43,948 So actually triaging 976 00:34:43,949 --> 00:34:45,988 and figuring out if it is an actual bug 977 00:34:45,989 --> 00:34:49,049 or not adds additional overacted. 978 00:34:49,050 --> 00:34:51,238 So you have to spend some, uh, 979 00:34:51,239 --> 00:34:52,948 some time to look into it. 980 00:34:52,949 --> 00:34:55,320 Um, as, for example, this Lipsius. 981 00:34:57,060 --> 00:34:59,309 Now, we focused most of our time 982 00:34:59,310 --> 00:35:01,499 on small software to test the scalability 983 00:35:01,500 --> 00:35:03,239 of our approach and to find some some 984 00:35:03,240 --> 00:35:05,759 reasonable bugs, but we also looked at 985 00:35:05,760 --> 00:35:08,159 Firefox for a while, also to test 986 00:35:08,160 --> 00:35:10,649 our, uh, the performance overhead, 987 00:35:10,650 --> 00:35:12,149 for example. 988 00:35:12,150 --> 00:35:14,309 So these are the 989 00:35:14,310 --> 00:35:16,769 results for Firefox that we currently 990 00:35:16,770 --> 00:35:18,929 have. And 991 00:35:18,930 --> 00:35:20,579 they are fairly impressive. 992 00:35:20,580 --> 00:35:22,949 Right. So based on a specific 993 00:35:22,950 --> 00:35:25,169 set of, uh, of benchmarks 994 00:35:25,170 --> 00:35:26,170 we found. 995 00:35:27,530 --> 00:35:29,359 Let's just say some type confusion 996 00:35:29,360 --> 00:35:31,489 reports, and we are 997 00:35:31,490 --> 00:35:33,679 still figuring out on how we can 998 00:35:33,680 --> 00:35:35,899 handle these large amount of 999 00:35:35,900 --> 00:35:37,369 time confusion reports, many of them will 1000 00:35:37,370 --> 00:35:39,559 be duplicates and even more 1001 00:35:39,560 --> 00:35:41,809 of them will be false positives. 1002 00:35:41,810 --> 00:35:43,639 And we are working hard on triaging and 1003 00:35:43,640 --> 00:35:46,159 trying to reduce them to a smaller 1004 00:35:46,160 --> 00:35:48,619 set of of actual 1005 00:35:48,620 --> 00:35:50,569 bugs that we can then report to the 1006 00:35:50,570 --> 00:35:53,059 Firefox Firefox people. 1007 00:35:53,060 --> 00:35:54,709 The big problem that we are facing for 1008 00:35:54,710 --> 00:35:56,929 Firefox and also for Chrome is 1009 00:35:56,930 --> 00:35:59,149 but much more so for Firefox, that 1010 00:35:59,150 --> 00:36:01,579 the code is really, really messy. 1011 00:36:01,580 --> 00:36:03,529 A heart problem that we have here is 1012 00:36:03,530 --> 00:36:05,989 that, uh, Firefox has several 1013 00:36:05,990 --> 00:36:07,079 allocators. 1014 00:36:07,080 --> 00:36:09,139 So there's different forms, are 1015 00:36:09,140 --> 00:36:10,999 different locations in the code that 1016 00:36:11,000 --> 00:36:12,679 handle different parts of the heap's. 1017 00:36:12,680 --> 00:36:13,639 There are different types. 1018 00:36:13,640 --> 00:36:15,769 Did that move data back and 1019 00:36:15,770 --> 00:36:17,359 forth that share data? 1020 00:36:17,360 --> 00:36:20,029 And there's very odd allocators 1021 00:36:20,030 --> 00:36:22,169 that are missing with different parts. 1022 00:36:22,170 --> 00:36:24,799 So there is not 1023 00:36:24,800 --> 00:36:27,079 a seven billion type confusion, 1024 00:36:27,080 --> 00:36:29,269 bugs and Firefox or at least we hope so. 1025 00:36:29,270 --> 00:36:30,799 We would guess that the number will be 1026 00:36:30,800 --> 00:36:32,149 much lower. 1027 00:36:32,150 --> 00:36:33,799 But it's a it's a first step. 1028 00:36:33,800 --> 00:36:36,019 And we are working on, uh, on reducing 1029 00:36:36,020 --> 00:36:38,359 them. So Firefox is ongoing 1030 00:36:38,360 --> 00:36:40,549 work and we'll see how we can, 1031 00:36:40,550 --> 00:36:42,889 uh, how we can get there and make it make 1032 00:36:42,890 --> 00:36:44,809 it more, more useful. 1033 00:36:44,810 --> 00:36:47,119 Um, if you if you end 1034 00:36:47,120 --> 00:36:49,399 up after five 1035 00:36:49,400 --> 00:36:51,469 or six days of fussing with seven 1036 00:36:51,470 --> 00:36:53,569 billion reports, that's clearly too many. 1037 00:36:53,570 --> 00:36:55,729 So we have to figure out how to, uh, 1038 00:36:55,730 --> 00:36:57,889 how to reduce them to to see which 1039 00:36:57,890 --> 00:36:58,890 ones can be. 1040 00:37:00,620 --> 00:37:01,620 Interesting. 1041 00:37:02,770 --> 00:37:04,239 So what 1042 00:37:06,190 --> 00:37:08,409 in as concluding remark remarks, 1043 00:37:08,410 --> 00:37:10,389 what did we do? What are what are steps 1044 00:37:10,390 --> 00:37:12,969 forwards? How can we improve from here? 1045 00:37:12,970 --> 00:37:15,279 On one hand, we want to fund 1046 00:37:15,280 --> 00:37:16,479 all the things. 1047 00:37:16,480 --> 00:37:18,189 So we want to go deeper. 1048 00:37:18,190 --> 00:37:19,329 We want to find more software. 1049 00:37:19,330 --> 00:37:20,769 We want to find better test cases, 1050 00:37:20,770 --> 00:37:22,719 better, better fuzzing inputs and get 1051 00:37:22,720 --> 00:37:25,449 cheaper coverage for the overall systems. 1052 00:37:25,450 --> 00:37:28,269 And especially looking at 1053 00:37:28,270 --> 00:37:29,589 Firefox. 1054 00:37:29,590 --> 00:37:31,569 One thing we want to do is want to do 1055 00:37:31,570 --> 00:37:33,969 selective fuzzing instead of just blindly 1056 00:37:33,970 --> 00:37:34,970 fuzzing. 1057 00:37:35,520 --> 00:37:37,829 Uh, large software system, 1058 00:37:37,830 --> 00:37:40,049 which may result in a large amount 1059 00:37:40,050 --> 00:37:42,209 of false positives due to 1060 00:37:42,210 --> 00:37:44,429 the way that the software is architected. 1061 00:37:44,430 --> 00:37:46,949 Think about the Firefox example again. 1062 00:37:46,950 --> 00:37:49,079 You allocate an object, it may 1063 00:37:49,080 --> 00:37:51,059 be reused in different times without 1064 00:37:51,060 --> 00:37:52,379 being freed. 1065 00:37:52,380 --> 00:37:54,779 This would so or 1066 00:37:54,780 --> 00:37:55,679 let me take a step back. 1067 00:37:55,680 --> 00:37:57,749 Right. So one of the problems we've seen 1068 00:37:57,750 --> 00:37:59,309 with Firefox that led to a large amount 1069 00:37:59,310 --> 00:38:01,439 of reports is that 1070 00:38:01,440 --> 00:38:04,279 you often allocate an object. 1071 00:38:04,280 --> 00:38:06,979 You return this object to a pool, 1072 00:38:06,980 --> 00:38:08,719 you know, the developer knows that there 1073 00:38:08,720 --> 00:38:10,819 is no more life reference to that object, 1074 00:38:10,820 --> 00:38:13,519 but is then being reinterpreted 1075 00:38:13,520 --> 00:38:15,619 and reused as a different type of object, 1076 00:38:15,620 --> 00:38:18,109 which leads to type confusion report. 1077 00:38:18,110 --> 00:38:20,809 But this is not an actual exploitable 1078 00:38:20,810 --> 00:38:23,629 bug. It's just a quirk of the 1079 00:38:23,630 --> 00:38:25,819 lack of a type system that C++ 1080 00:38:25,820 --> 00:38:26,969 has. 1081 00:38:26,970 --> 00:38:29,099 So we want to move towards 1082 00:38:29,100 --> 00:38:31,259 a more selective form of funding 1083 00:38:31,260 --> 00:38:32,819 where we can say, hey, we're only 1084 00:38:32,820 --> 00:38:35,099 interested in this subset of the type 1085 00:38:35,100 --> 00:38:37,289 hierarchy, so we want to do explicit type 1086 00:38:37,290 --> 00:38:38,859 checks for this subset of the type 1087 00:38:38,860 --> 00:38:40,589 hierarchy, but we are not interested in 1088 00:38:40,590 --> 00:38:41,519 anything else. 1089 00:38:41,520 --> 00:38:43,689 So focusing on, for example, just 1090 00:38:43,690 --> 00:38:46,409 the dumb or just a Chavez script object 1091 00:38:46,410 --> 00:38:47,520 or something like that. 1092 00:38:49,220 --> 00:38:51,049 In addition to that, we are also looking 1093 00:38:51,050 --> 00:38:53,119 into an always on check for 1094 00:38:53,120 --> 00:38:55,309 polymorphic objects think 1095 00:38:55,310 --> 00:38:57,199 think back to the contraflow hijacking 1096 00:38:57,200 --> 00:38:58,609 defense that I talked about in the 1097 00:38:58,610 --> 00:39:00,289 beginning of the talk. 1098 00:39:00,290 --> 00:39:02,779 One option is that you check the 1099 00:39:02,780 --> 00:39:04,609 type of the object whenever you do a 1100 00:39:04,610 --> 00:39:05,839 virtual dispatch. 1101 00:39:05,840 --> 00:39:07,979 So this would protect against 1102 00:39:07,980 --> 00:39:11,179 the type confusion from from before. 1103 00:39:11,180 --> 00:39:12,109 Right. 1104 00:39:12,110 --> 00:39:13,110 Um. 1105 00:39:17,090 --> 00:39:19,189 As in as an example, if you are 1106 00:39:19,190 --> 00:39:20,989 looking at the. 1107 00:39:25,490 --> 00:39:27,709 At the code here before 1108 00:39:27,710 --> 00:39:28,710 I, uh. 1109 00:39:30,250 --> 00:39:32,589 When I compiled it, it made two versions 1110 00:39:32,590 --> 00:39:34,659 that you may have observed, the second 1111 00:39:34,660 --> 00:39:36,729 version is W uh, was a 1112 00:39:36,730 --> 00:39:37,730 type safety. 1113 00:39:38,840 --> 00:39:39,840 Mechanism. 1114 00:39:42,380 --> 00:39:44,989 And if I run it with this type safety 1115 00:39:44,990 --> 00:39:47,629 protection instead of, um, 1116 00:39:47,630 --> 00:39:49,759 opening up, uh, calculator, 1117 00:39:49,760 --> 00:39:52,189 it actually reports, uh, 1118 00:39:52,190 --> 00:39:54,260 type safety or type confusion. 1119 00:39:55,920 --> 00:39:58,949 So we want to extend this into 1120 00:39:58,950 --> 00:39:59,950 our. 1121 00:40:02,090 --> 00:40:04,489 A bigger and larger system so that we can 1122 00:40:04,490 --> 00:40:06,379 we can run it partially, we can build it 1123 00:40:06,380 --> 00:40:09,289 on top of of Firefox's in a selective, 1124 00:40:09,290 --> 00:40:11,509 selective part, but you can also use it 1125 00:40:11,510 --> 00:40:13,759 for your software to specifically 1126 00:40:13,760 --> 00:40:16,099 protect against 1127 00:40:16,100 --> 00:40:18,289 these dispatch vulnerabilities, 1128 00:40:18,290 --> 00:40:20,469 as we just saw here, as to see how 1129 00:40:20,470 --> 00:40:22,609 I wanted to dispatch, we stopped 1130 00:40:22,610 --> 00:40:25,759 the execution and terminated the program. 1131 00:40:25,760 --> 00:40:27,919 So to actually conclude type 1132 00:40:27,920 --> 00:40:29,449 confusion is fundamental in today's 1133 00:40:29,450 --> 00:40:30,559 exploit. 1134 00:40:30,560 --> 00:40:32,629 There's a set of existing solutions that 1135 00:40:32,630 --> 00:40:34,939 are incomplete, partial and slow and 1136 00:40:34,940 --> 00:40:37,099 make it very hard for us to 1137 00:40:37,100 --> 00:40:38,449 protect these systems. 1138 00:40:38,450 --> 00:40:40,759 And especially in large software systems 1139 00:40:40,760 --> 00:40:43,039 like Chrome, Firefox and other 1140 00:40:43,040 --> 00:40:45,049 large, uh, large mechanisms. 1141 00:40:45,050 --> 00:40:47,089 We need to develop new ways to protect 1142 00:40:47,090 --> 00:40:49,429 and enforce type safety at runtime. 1143 00:40:49,430 --> 00:40:51,619 Um, I presented a hex 1144 00:40:51,620 --> 00:40:53,839 type, which is an LVM based 1145 00:40:53,840 --> 00:40:56,239 extension that allows you to trap upon 1146 00:40:56,240 --> 00:40:58,339 type confusion so you can compile your 1147 00:40:58,340 --> 00:41:00,889 software, uh, with 1148 00:41:00,890 --> 00:41:03,019 these type confusion protection, which 1149 00:41:03,020 --> 00:41:05,089 allows you to track the true type of 1150 00:41:05,090 --> 00:41:07,489 every object to allocate and then upon 1151 00:41:07,490 --> 00:41:09,829 typecasts or dispatches allows 1152 00:41:09,830 --> 00:41:11,899 you to do an actual type check 1153 00:41:11,900 --> 00:41:13,999 so we can trap at the type 1154 00:41:14,000 --> 00:41:16,309 confusion and not at the later 1155 00:41:16,310 --> 00:41:17,929 memory safety violation. 1156 00:41:17,930 --> 00:41:20,209 I showed you one application of this 1157 00:41:20,210 --> 00:41:22,309 approach where we've combined our 1158 00:41:22,310 --> 00:41:24,499 hex type mechanism that does the type 1159 00:41:24,500 --> 00:41:27,199 checking with, uh, fuzzing approach. 1160 00:41:27,200 --> 00:41:29,689 And we found a nice set of 1161 00:41:29,690 --> 00:41:32,389 of bugs, uh, that 1162 00:41:32,390 --> 00:41:35,299 are now being fixed or were fixed 1163 00:41:35,300 --> 00:41:37,069 overall. This has a reasonable overhead. 1164 00:41:37,070 --> 00:41:39,139 So for Firefox, depending on 1165 00:41:39,140 --> 00:41:41,239 the benchmark, we have between zero 1166 00:41:41,240 --> 00:41:42,499 and 50 percent. 1167 00:41:42,500 --> 00:41:43,500 Uh. 1168 00:41:44,830 --> 00:41:47,019 Overhead and 1169 00:41:47,020 --> 00:41:49,389 you can integrated with EFL for broad 1170 00:41:49,390 --> 00:41:51,939 bug discovery and as always, 1171 00:41:51,940 --> 00:41:54,279 with our research, it's all open source. 1172 00:41:54,280 --> 00:41:56,409 So you can download to the system, you 1173 00:41:56,410 --> 00:41:57,429 can play with it. 1174 00:41:57,430 --> 00:42:00,249 It takes about 15 minutes to build it on 1175 00:42:00,250 --> 00:42:01,749 on your machine. 1176 00:42:01,750 --> 00:42:04,119 And you can then compile your software is 1177 00:42:04,120 --> 00:42:06,789 LVM and full type checking. 1178 00:42:06,790 --> 00:42:08,409 And we said I would like to thank you for 1179 00:42:08,410 --> 00:42:10,239 your attention and I'm happy to take any 1180 00:42:10,240 --> 00:42:11,389 questions. 1181 00:42:11,390 --> 00:42:12,390 Thanks. 1182 00:42:16,620 --> 00:42:17,800 Awesome, awesome. 1183 00:42:18,840 --> 00:42:21,239 So we have four microphones 1184 00:42:21,240 --> 00:42:23,309 here, one, two, three, four, where 1185 00:42:23,310 --> 00:42:25,499 you can ask questions that and 1186 00:42:25,500 --> 00:42:27,689 just to be clear, a question is like one 1187 00:42:27,690 --> 00:42:30,029 or two sentences with a question mark 1188 00:42:30,030 --> 00:42:31,559 behind it. 1189 00:42:31,560 --> 00:42:32,879 And with that, I'm going to go to 1190 00:42:32,880 --> 00:42:35,249 microphone to thank 1191 00:42:35,250 --> 00:42:36,639 you for the presentation. 1192 00:42:36,640 --> 00:42:37,789 Oops. This is not 1193 00:42:39,330 --> 00:42:41,099 what is also be possible to have a 1194 00:42:41,100 --> 00:42:43,199 compiler plugin that 1195 00:42:43,200 --> 00:42:45,449 prevents you from misusing static cost 1196 00:42:45,450 --> 00:42:46,799 at compile time. 1197 00:42:46,800 --> 00:42:49,049 I could we could you build something 1198 00:42:49,050 --> 00:42:51,119 that only allows you 1199 00:42:51,120 --> 00:42:53,219 to use does not allow 1200 00:42:53,220 --> 00:42:55,979 you to use that because combined with 1201 00:42:55,980 --> 00:42:57,000 dynamic dispatch. 1202 00:42:58,560 --> 00:42:59,560 Um, 1203 00:43:00,780 --> 00:43:02,999 let me think about your question, would 1204 00:43:03,000 --> 00:43:05,249 you want to detect the type confusion 1205 00:43:05,250 --> 00:43:07,979 statically or would you just forbid 1206 00:43:07,980 --> 00:43:10,529 the programmer from using, uh, 1207 00:43:10,530 --> 00:43:12,689 static cast for any object 1208 00:43:12,690 --> 00:43:14,999 that has a virtual function? 1209 00:43:15,000 --> 00:43:16,379 The second one, I just want to prevent 1210 00:43:16,380 --> 00:43:19,049 books like usually in C++, we try to 1211 00:43:19,050 --> 00:43:20,879 load as much checking as possible and 1212 00:43:20,880 --> 00:43:22,799 compile time so we do not have their own 1213 00:43:22,800 --> 00:43:24,989 time overhead. So it would be nice to 1214 00:43:24,990 --> 00:43:27,449 disallow the cost for 1215 00:43:27,450 --> 00:43:29,789 anything that is 1216 00:43:29,790 --> 00:43:31,409 virtual something. 1217 00:43:31,410 --> 00:43:32,729 Right. This allows telecast. 1218 00:43:32,730 --> 00:43:33,839 Yeah. Yeah. 1219 00:43:33,840 --> 00:43:36,149 So you BISAN followed a similar 1220 00:43:36,150 --> 00:43:38,549 approach. They convert 1221 00:43:38,550 --> 00:43:40,769 and make all the static costs for 1222 00:43:40,770 --> 00:43:43,289 polymorphic objects into dynamic costs 1223 00:43:43,290 --> 00:43:45,149 and to simply replace them. 1224 00:43:45,150 --> 00:43:47,579 Unfortunately, as the 1225 00:43:47,580 --> 00:43:50,219 the Grétar example showed, the base class 1226 00:43:50,220 --> 00:43:52,319 is not necessarily polymorphic. 1227 00:43:52,320 --> 00:43:54,449 So you run into VR 1228 00:43:54,450 --> 00:43:56,729 runtime behavior with 1229 00:43:56,730 --> 00:43:58,829 AWT, like the base class is 1230 00:43:58,830 --> 00:44:00,149 not polymorphic. And if you turn a 1231 00:44:00,150 --> 00:44:02,919 certain cost into that because you fail, 1232 00:44:02,920 --> 00:44:03,959 right. 1233 00:44:03,960 --> 00:44:06,269 There's C++ code is really, 1234 00:44:06,270 --> 00:44:08,489 really messy and it's very hard for 1235 00:44:08,490 --> 00:44:10,379 you to actually simply replace them. 1236 00:44:10,380 --> 00:44:12,549 You can report it as a warning, 1237 00:44:12,550 --> 00:44:14,669 uh, as part of the compiler 1238 00:44:14,670 --> 00:44:15,569 process. 1239 00:44:15,570 --> 00:44:17,519 But in the end, you need to you need to 1240 00:44:17,520 --> 00:44:19,769 support non polymorphic 1241 00:44:19,770 --> 00:44:22,169 based classes, which are 1242 00:44:22,170 --> 00:44:24,269 surprisingly frequent, especially 1243 00:44:24,270 --> 00:44:26,699 for, uh, for browsers, as we found 1244 00:44:26,700 --> 00:44:28,289 that there are several base classes that 1245 00:44:28,290 --> 00:44:29,819 are non polymorphic. 1246 00:44:29,820 --> 00:44:30,820 Thank you. 1247 00:44:31,410 --> 00:44:33,539 Microphone three, thank 1248 00:44:33,540 --> 00:44:35,639 you for your great talk. 1249 00:44:35,640 --> 00:44:37,769 You mentioned that in Firefox you had 1250 00:44:37,770 --> 00:44:40,139 the problem that some objects were freed 1251 00:44:40,140 --> 00:44:41,189 and then reuse. 1252 00:44:41,190 --> 00:44:42,719 So I was wondering, could you build on 1253 00:44:42,720 --> 00:44:45,189 top of temporal memory safety 1254 00:44:45,190 --> 00:44:47,339 analysis and take 1255 00:44:47,340 --> 00:44:49,259 that information into account to make 1256 00:44:49,260 --> 00:44:51,389 your analysis more precise? 1257 00:44:51,390 --> 00:44:53,519 Sure. Uh, temporal memory 1258 00:44:53,520 --> 00:44:54,659 safety usually clocks in. 1259 00:44:54,660 --> 00:44:57,239 I'd like to 3x overhead. 1260 00:44:57,240 --> 00:44:59,129 So this is actually more expensive than 1261 00:44:59,130 --> 00:45:01,259 what we are doing, but it wouldn't be an 1262 00:45:01,260 --> 00:45:02,350 obstacle to fuzzing, right. 1263 00:45:03,560 --> 00:45:05,209 Like, if you only use it for fuzzing, not 1264 00:45:05,210 --> 00:45:07,479 in production, right, um. 1265 00:45:08,720 --> 00:45:10,879 Sure. Well, ideally, you 1266 00:45:10,880 --> 00:45:12,019 would combine it with additional 1267 00:45:12,020 --> 00:45:13,099 sanitizers. 1268 00:45:13,100 --> 00:45:14,869 You would you would use our our type 1269 00:45:14,870 --> 00:45:17,089 sanitizer combined with a 1270 00:45:17,090 --> 00:45:19,519 business like, um, 1271 00:45:19,520 --> 00:45:21,379 spatial and temporal memory safety 1272 00:45:21,380 --> 00:45:23,239 sanitizer as well. So you can you can use 1273 00:45:23,240 --> 00:45:24,289 it as as well. 1274 00:45:24,290 --> 00:45:25,849 In addition to that, um. 1275 00:45:27,690 --> 00:45:29,819 I you asked about 1276 00:45:29,820 --> 00:45:32,009 if the additional 1277 00:45:32,010 --> 00:45:35,549 data that you have from the type safety. 1278 00:45:35,550 --> 00:45:37,559 System, sorry, from the memory safety 1279 00:45:37,560 --> 00:45:39,969 system would be useful in our analysis, 1280 00:45:39,970 --> 00:45:42,089 and I would answer you that the 1281 00:45:42,090 --> 00:45:44,579 temporal memory safety sanitizer 1282 00:45:44,580 --> 00:45:46,639 will run into the same problem. 1283 00:45:48,110 --> 00:45:50,419 I said this, this is C, 1284 00:45:50,420 --> 00:45:52,549 B or C plus, plus we have a 1285 00:45:52,550 --> 00:45:54,739 lot of untapped memory and 1286 00:45:54,740 --> 00:45:56,899 Firefox simply reuses the memory, 1287 00:45:56,900 --> 00:45:59,119 even though there are still references to 1288 00:45:59,120 --> 00:46:01,079 it, and then just changes to type A. 1289 00:46:01,080 --> 00:46:03,619 And this is allowed, according to C++, 1290 00:46:03,620 --> 00:46:05,329 Symantec. So they are not doing something 1291 00:46:05,330 --> 00:46:06,979 illegal. It's just that it's really, 1292 00:46:06,980 --> 00:46:09,049 really messy and we'll have to work 1293 00:46:09,050 --> 00:46:10,729 around these these quirks that they have 1294 00:46:10,730 --> 00:46:12,169 there. Thanks. 1295 00:46:12,170 --> 00:46:13,999 Thanks. A microphone for. 1296 00:46:16,310 --> 00:46:18,679 To be frank, I'm 1297 00:46:18,680 --> 00:46:20,809 a bit puzzled 1298 00:46:20,810 --> 00:46:23,059 by your terminology because, 1299 00:46:23,060 --> 00:46:24,060 um. 1300 00:46:24,640 --> 00:46:26,829 Just because the type system is not 1301 00:46:26,830 --> 00:46:28,869 checked, it doesn't mean that it doesn't 1302 00:46:28,870 --> 00:46:31,359 exist. So wouldn't it be 1303 00:46:31,360 --> 00:46:33,550 better to have some 1304 00:46:34,870 --> 00:46:37,449 static solution like 1305 00:46:37,450 --> 00:46:38,450 preventing, 1306 00:46:39,850 --> 00:46:40,850 forbidding? 1307 00:46:41,670 --> 00:46:44,669 Down costs that are static 1308 00:46:44,670 --> 00:46:46,799 and forcing developers 1309 00:46:46,800 --> 00:46:48,389 to use dynamic down costs only. 1310 00:46:50,770 --> 00:46:53,139 It would be much faster 1311 00:46:53,140 --> 00:46:55,329 than having a father fuzzing 1312 00:46:55,330 --> 00:46:57,459 the entire application because the 1313 00:46:57,460 --> 00:47:00,189 result and the problem 1314 00:47:00,190 --> 00:47:02,319 is at compile time and 1315 00:47:02,320 --> 00:47:03,320 not. 1316 00:47:04,030 --> 00:47:06,589 Throwing some fuzzing step. 1317 00:47:06,590 --> 00:47:08,569 Sure, this would be a great solution, 1318 00:47:08,570 --> 00:47:09,999 unfortunately, it doesn't scale. 1319 00:47:11,990 --> 00:47:14,119 I try to do this for 75 million lines 1320 00:47:14,120 --> 00:47:15,739 of code where you have two hundred 1321 00:47:15,740 --> 00:47:16,820 thousand violations. 1322 00:47:18,490 --> 00:47:20,559 Right. So rewriting the whole 1323 00:47:20,560 --> 00:47:22,869 software stack is always a solution, 1324 00:47:22,870 --> 00:47:24,579 you may run into time constraints. 1325 00:47:25,680 --> 00:47:27,829 By the way, this is just a source 1326 00:47:27,830 --> 00:47:29,659 base that we have to live with. 1327 00:47:29,660 --> 00:47:30,779 It's a great solution. 1328 00:47:30,780 --> 00:47:32,779 So your approach would actually work 1329 00:47:32,780 --> 00:47:34,909 really well to 1330 00:47:34,910 --> 00:47:37,069 protect against these down costs or 1331 00:47:37,070 --> 00:47:38,719 illegal down costs or confused down 1332 00:47:38,720 --> 00:47:40,819 costs. Um, 1333 00:47:40,820 --> 00:47:42,799 you may have a hard time rewriting all 1334 00:47:42,800 --> 00:47:44,959 the software, and there 1335 00:47:44,960 --> 00:47:48,619 are non polymorphic. 1336 00:47:48,620 --> 00:47:51,469 Objects where you cannot enforce 1337 00:47:51,470 --> 00:47:53,629 then may costs, so you can only do 1338 00:47:53,630 --> 00:47:56,179 dynamic costs for polymorphic objects 1339 00:47:56,180 --> 00:47:58,009 and you may have an illegal down cost for 1340 00:47:58,010 --> 00:47:59,630 non polymorphic objects as well. 1341 00:48:02,450 --> 00:48:03,859 You can take you can take it offline, it 1342 00:48:03,860 --> 00:48:05,449 sounds a little like complicated 1343 00:48:05,450 --> 00:48:07,549 microphone to thank you 1344 00:48:07,550 --> 00:48:09,769 for the talk while you gave an example 1345 00:48:09,770 --> 00:48:12,169 counter to it at the very beginning, you 1346 00:48:12,170 --> 00:48:14,269 made a claim that Static's don't 1347 00:48:14,270 --> 00:48:15,999 introduce any code. 1348 00:48:16,000 --> 00:48:18,499 This isn't exactly correct, as everything 1349 00:48:18,500 --> 00:48:20,329 is implementation defined at this point. 1350 00:48:20,330 --> 00:48:22,999 But you can it can shift, 1351 00:48:23,000 --> 00:48:25,099 especially when you're going from a type 1352 00:48:25,100 --> 00:48:26,419 that has multiple inheritance and it 1353 00:48:26,420 --> 00:48:27,709 needs to shift around to get to the 1354 00:48:27,710 --> 00:48:28,710 correct thunks. 1355 00:48:30,260 --> 00:48:32,479 This introduces a very specific type of 1356 00:48:32,480 --> 00:48:34,639 type confusion bug where if one 1357 00:48:34,640 --> 00:48:36,949 casts to avoid pointer and 1358 00:48:36,950 --> 00:48:39,169 then from the void pointer to 1359 00:48:39,170 --> 00:48:41,509 the different type in the chain, 1360 00:48:41,510 --> 00:48:44,419 it won't do the shifting properly. 1361 00:48:44,420 --> 00:48:46,549 It also is a very specific one 1362 00:48:46,550 --> 00:48:48,349 that might be hard for you to catch. 1363 00:48:48,350 --> 00:48:49,819 How do you how do you attempt to catch 1364 00:48:49,820 --> 00:48:50,820 those once? 1365 00:48:52,880 --> 00:48:54,559 You mean for our system? 1366 00:48:54,560 --> 00:48:55,639 Yes. 1367 00:48:55,640 --> 00:48:57,769 You at one point in time you allocate 1368 00:48:57,770 --> 00:48:59,779 a slab of memory, you allocate a piece of 1369 00:48:59,780 --> 00:49:01,829 memory according to specific time. 1370 00:49:01,830 --> 00:49:04,429 And if you if you do a new full type, 1371 00:49:04,430 --> 00:49:06,619 we record this is now of type these 1372 00:49:06,620 --> 00:49:08,299 memories of Type four, then you can do 1373 00:49:08,300 --> 00:49:10,369 anything you want with your pointer. 1374 00:49:10,370 --> 00:49:12,739 The memory slab will still be tagged 1375 00:49:12,740 --> 00:49:14,119 as Taperoo. 1376 00:49:14,120 --> 00:49:15,979 Whenever you cast back into some other 1377 00:49:15,980 --> 00:49:18,259 type, we look up what is the base 1378 00:49:18,260 --> 00:49:19,519 type of this object? 1379 00:49:19,520 --> 00:49:20,989 If it is foo and you're crossing into 1380 00:49:20,990 --> 00:49:22,010 foo, everything is fine. 1381 00:49:23,240 --> 00:49:25,129 Otherwise reported an error. 1382 00:49:25,130 --> 00:49:27,199 And you do this when you're casting from 1383 00:49:27,200 --> 00:49:28,369 void pointer. 1384 00:49:28,370 --> 00:49:30,049 We do this when you're casting from 1385 00:49:30,050 --> 00:49:31,279 anything. 1386 00:49:31,280 --> 00:49:32,409 Well. 1387 00:49:32,410 --> 00:49:33,519 Thank you so much. 1388 00:49:33,520 --> 00:49:35,859 Which actually brings me to tonight's 1389 00:49:35,860 --> 00:49:37,659 topic that I didn't really talk about. 1390 00:49:37,660 --> 00:49:39,139 So this is a nice observation that you 1391 00:49:39,140 --> 00:49:41,139 had your had here. 1392 00:49:41,140 --> 00:49:43,239 See, style costs are one 1393 00:49:43,240 --> 00:49:45,939 of the ugliest features that C++ 1394 00:49:45,940 --> 00:49:48,219 has. And 1395 00:49:48,220 --> 00:49:50,319 I just want to to call out this 1396 00:49:50,320 --> 00:49:52,209 ugly feature, which which pretty much 1397 00:49:52,210 --> 00:49:54,789 does so insist 1398 00:49:54,790 --> 00:49:56,919 on cost if you do just the parentheses 1399 00:49:56,920 --> 00:49:58,749 and the target type. 1400 00:49:58,750 --> 00:50:00,639 This is pretty much a hammer that 1401 00:50:00,640 --> 00:50:02,869 hammers. Does this object into 1402 00:50:02,870 --> 00:50:05,109 this disaster type that says 1403 00:50:05,110 --> 00:50:07,419 make this underlying memory 1404 00:50:07,420 --> 00:50:09,489 area now of this other type? 1405 00:50:09,490 --> 00:50:11,589 So this is pretty much the ugliest thing 1406 00:50:11,590 --> 00:50:14,019 you can do if you're programing in C++. 1407 00:50:14,020 --> 00:50:16,269 Never, ever under any circumstances 1408 00:50:16,270 --> 00:50:18,369 you see star cost because this 1409 00:50:18,370 --> 00:50:20,109 really messes up the underlying type 1410 00:50:20,110 --> 00:50:21,249 system. 1411 00:50:21,250 --> 00:50:22,269 Thanks. 1412 00:50:22,270 --> 00:50:24,099 Microphone three. 1413 00:50:24,100 --> 00:50:26,259 I was wondering if you tried your 1414 00:50:26,260 --> 00:50:28,509 tool with Shachar or Safari 1415 00:50:28,510 --> 00:50:30,159 and what happened? 1416 00:50:30,160 --> 00:50:31,160 We did not. 1417 00:50:32,700 --> 00:50:34,979 We ran it with Firefox mostly, 1418 00:50:34,980 --> 00:50:38,249 we tried a little bit with Chrome, um, 1419 00:50:38,250 --> 00:50:40,559 again, one of our future works 1420 00:50:40,560 --> 00:50:43,349 is to ported to more software 1421 00:50:43,350 --> 00:50:45,419 and larger software so far 1422 00:50:45,420 --> 00:50:47,159 for this presentation here we focus 1423 00:50:47,160 --> 00:50:49,499 mostly on smaller libraries and, 1424 00:50:49,500 --> 00:50:51,719 um, yeah, if anybody wants to 1425 00:50:51,720 --> 00:50:54,299 to offer more resources, feel free. 1426 00:50:54,300 --> 00:50:56,159 If you want to run it on safari, it's 1427 00:50:56,160 --> 00:50:58,139 open source downloaded building on your 1428 00:50:58,140 --> 00:51:00,509 system, run it on safari and report 1429 00:51:00,510 --> 00:51:01,499 the results. 1430 00:51:01,500 --> 00:51:03,119 Yeah, just my thought here is I think you 1431 00:51:03,120 --> 00:51:04,949 might have even more of those Firefox 1432 00:51:04,950 --> 00:51:06,779 type issues with the way that they do 1433 00:51:06,780 --> 00:51:08,849 testing. Um, though, I 1434 00:51:08,850 --> 00:51:10,949 think like the difference we found 1435 00:51:10,950 --> 00:51:12,539 between Firefox and Chrome. 1436 00:51:12,540 --> 00:51:14,549 So I don't know about Safari, but I can 1437 00:51:14,550 --> 00:51:15,989 tell you an anecdote between the 1438 00:51:15,990 --> 00:51:17,969 different about the differences between 1439 00:51:17,970 --> 00:51:19,319 Firefox and Chrome. 1440 00:51:19,320 --> 00:51:21,899 The Firefox has a very old code base, 1441 00:51:21,900 --> 00:51:23,999 so there's a lot of ugliness hidden in 1442 00:51:24,000 --> 00:51:26,309 there. So we found, uh, 1443 00:51:26,310 --> 00:51:28,799 things that, um, that do direct 1444 00:51:28,800 --> 00:51:31,229 dispatches or indirect dispatches 1445 00:51:31,230 --> 00:51:32,849 in assembly code. 1446 00:51:32,850 --> 00:51:33,850 And they are. 1447 00:51:34,560 --> 00:51:37,209 Doing weird stuff to table pointers 1448 00:51:37,210 --> 00:51:39,359 in in inline assembly, just due to the 1449 00:51:39,360 --> 00:51:42,179 legacy nature of Firefox 1450 00:51:42,180 --> 00:51:43,829 while in Chrome, Chrome is a much more 1451 00:51:43,830 --> 00:51:45,629 recent code pays and it uses a much more 1452 00:51:45,630 --> 00:51:47,169 recent C++ standard. 1453 00:51:47,170 --> 00:51:49,259 So it's much nicer and much less 1454 00:51:49,260 --> 00:51:51,089 likely to find bugs in there. 1455 00:51:51,090 --> 00:51:53,219 So just the age of the 1456 00:51:53,220 --> 00:51:55,619 code base, um, 1457 00:51:55,620 --> 00:51:57,989 is bad for for Firefox or 1458 00:51:57,990 --> 00:52:00,179 may lead to a large amount of 1459 00:52:00,180 --> 00:52:01,679 potential vulnerabilities. 1460 00:52:01,680 --> 00:52:03,150 So some refactoring is needed 1461 00:52:04,440 --> 00:52:06,419 like a run for um. 1462 00:52:06,420 --> 00:52:08,519 Yeah, well it is like super cool and so 1463 00:52:08,520 --> 00:52:10,589 on. But I'm just my question is 1464 00:52:10,590 --> 00:52:12,879 like the concept of down 1465 00:52:12,880 --> 00:52:14,159 casting is a code smell. 1466 00:52:14,160 --> 00:52:16,319 Right. And it isn't your tool 1467 00:52:16,320 --> 00:52:18,509 kind of owling like people to 1468 00:52:18,510 --> 00:52:20,579 keep code smell and keep 1469 00:52:20,580 --> 00:52:22,050 writing smelly code basically. 1470 00:52:23,470 --> 00:52:25,359 And I just even like the reason saying, 1471 00:52:25,360 --> 00:52:26,939 well, we already have a code base 1472 00:52:26,940 --> 00:52:29,109 wouldn't be better to have like something 1473 00:52:29,110 --> 00:52:31,029 that would help people to to write notes. 1474 00:52:31,030 --> 00:52:32,019 Kelly code. 1475 00:52:32,020 --> 00:52:34,539 Sure. Let's rewrite everything in Rust. 1476 00:52:34,540 --> 00:52:35,739 I'm all up for it. 1477 00:52:35,740 --> 00:52:36,740 Right. 1478 00:52:37,570 --> 00:52:38,769 Sure. 1479 00:52:38,770 --> 00:52:40,779 There's justice 100 plus million lines of 1480 00:52:40,780 --> 00:52:42,339 code that are lying around and we cannot 1481 00:52:42,340 --> 00:52:43,299 easily parted. 1482 00:52:43,300 --> 00:52:45,429 And the the job that we 1483 00:52:45,430 --> 00:52:47,379 try to do is we try to make it as secure 1484 00:52:47,380 --> 00:52:49,389 as possible and we try to find potential 1485 00:52:49,390 --> 00:52:51,039 vulnerabilities in the existing code 1486 00:52:51,040 --> 00:52:53,169 base. If you have unlimited resources, 1487 00:52:53,170 --> 00:52:55,449 let's just stop right now and rewrite 1488 00:52:55,450 --> 00:52:57,309 everything we have and in a safe 1489 00:52:57,310 --> 00:52:59,409 language, sure, I'm all up for 1490 00:52:59,410 --> 00:53:00,549 it. 1491 00:53:00,550 --> 00:53:02,199 It's just the fact that we have this 1492 00:53:02,200 --> 00:53:03,969 large amount of code that is out there 1493 00:53:03,970 --> 00:53:05,109 and we are using it right. 1494 00:53:05,110 --> 00:53:07,179 So we have to do the best that we can 1495 00:53:07,180 --> 00:53:09,249 to bump up the protection 1496 00:53:09,250 --> 00:53:10,840 for this code as much as possible. 1497 00:53:12,130 --> 00:53:13,130 Microphone to. 1498 00:53:14,410 --> 00:53:16,539 Thanks for a great work, 1499 00:53:16,540 --> 00:53:19,239 do you have any similar idea for C-code? 1500 00:53:24,120 --> 00:53:26,269 Is this work or work 1501 00:53:26,270 --> 00:53:28,519 already done? Can I read something about 1502 00:53:28,520 --> 00:53:30,440 it? Some of it is in progress. 1503 00:53:31,640 --> 00:53:33,079 We can talk offline about it. 1504 00:53:33,080 --> 00:53:34,080 Yeah, cool. 1505 00:53:34,870 --> 00:53:36,979 OK, microphone three. 1506 00:53:36,980 --> 00:53:39,079 Thank you for the talk. 1507 00:53:39,080 --> 00:53:40,219 I would like to know why. 1508 00:53:40,220 --> 00:53:42,409 Why did you take the 1509 00:53:42,410 --> 00:53:44,839 time during a vacation and not 1510 00:53:44,840 --> 00:53:47,509 inside your contractor, which 1511 00:53:47,510 --> 00:53:49,129 would feel more natural to me. 1512 00:53:49,130 --> 00:53:51,499 And it would also be able to 1513 00:53:51,500 --> 00:53:53,280 solve the problem of false positives. 1514 00:53:54,500 --> 00:53:56,300 Oh, not every type has a constructor. 1515 00:54:00,140 --> 00:54:02,329 So what we do pretty much is when 1516 00:54:02,330 --> 00:54:04,489 you when you allocate a new type, we 1517 00:54:04,490 --> 00:54:06,469 don't run it in the Allocator, but it's 1518 00:54:06,470 --> 00:54:07,519 part of Klang. 1519 00:54:07,520 --> 00:54:09,649 We know Verdie allocators are 1520 00:54:09,650 --> 00:54:11,689 and we can then target and adds the 1521 00:54:11,690 --> 00:54:13,399 metadata as additional code. 1522 00:54:13,400 --> 00:54:16,099 So as part of our Klank Pass, we detect 1523 00:54:16,100 --> 00:54:18,769 wherever data is being allocated 1524 00:54:18,770 --> 00:54:21,679 or where the individual costs 1525 00:54:21,680 --> 00:54:23,599 allocators are and we target and then 1526 00:54:23,600 --> 00:54:25,369 instrumented in a later step. 1527 00:54:25,370 --> 00:54:27,739 This allows us to tag all the locations 1528 00:54:27,740 --> 00:54:29,449 and not just the ones that have 1529 00:54:29,450 --> 00:54:31,009 constructor's. 1530 00:54:31,010 --> 00:54:32,929 So it allows us to extend the coverage 1531 00:54:32,930 --> 00:54:35,059 further to not just, uh, 1532 00:54:35,060 --> 00:54:37,099 classes with constructor's, but all the 1533 00:54:37,100 --> 00:54:38,899 the object allocations. 1534 00:54:38,900 --> 00:54:41,149 Even so, imagine 1535 00:54:41,150 --> 00:54:43,469 and this is something we found in, um, 1536 00:54:44,660 --> 00:54:46,969 some older software as well. 1537 00:54:46,970 --> 00:54:49,069 You allocate uh you 1538 00:54:49,070 --> 00:54:50,989 call mallock on a struct. 1539 00:54:52,230 --> 00:54:54,209 And then you use it as a class, 1540 00:54:55,410 --> 00:54:57,479 right? You would never be able to 1541 00:54:57,480 --> 00:54:59,639 detect that as part of a 1542 00:54:59,640 --> 00:55:01,709 instrumenting or a constructor, and it 1543 00:55:01,710 --> 00:55:04,139 actually happens in software like Firefox 1544 00:55:04,140 --> 00:55:06,839 and other older code bases you 1545 00:55:06,840 --> 00:55:08,969 call mallock instead of new and 1546 00:55:08,970 --> 00:55:11,279 use a struct instead of a class. 1547 00:55:11,280 --> 00:55:13,859 So I set the code is really, really ugly. 1548 00:55:13,860 --> 00:55:15,839 And here you see the the similarity 1549 00:55:15,840 --> 00:55:18,059 between C and C++ in the end of class 1550 00:55:18,060 --> 00:55:19,289 is just a struct. 1551 00:55:19,290 --> 00:55:21,599 And if you allocate objects 1552 00:55:21,600 --> 00:55:23,759 as struct, you may end up 1553 00:55:23,760 --> 00:55:26,009 missing a large amount of objects. 1554 00:55:26,010 --> 00:55:26,939 Shouldn't custo. 1555 00:55:26,940 --> 00:55:29,009 Is that because then in 1556 00:55:29,010 --> 00:55:31,109 compile time if you cast a 1557 00:55:31,110 --> 00:55:32,110 struggle to class. 1558 00:55:33,730 --> 00:55:34,960 It's like 1559 00:55:36,100 --> 00:55:38,199 a class, a class can 1560 00:55:38,200 --> 00:55:39,429 be a struct, right? 1561 00:55:39,430 --> 00:55:40,599 So they are the equivalent. 1562 00:55:41,640 --> 00:55:42,659 If it's the same type. 1563 00:55:43,730 --> 00:55:46,099 You can use this struct as a base type 1564 00:55:46,100 --> 00:55:48,109 name and then you can have a class that 1565 00:55:48,110 --> 00:55:50,119 is a descendant of that struct. 1566 00:55:50,120 --> 00:55:52,029 Oh, right, I understand. 1567 00:55:52,030 --> 00:55:54,289 I think you write C++ 1568 00:55:54,290 --> 00:55:55,290 is ugly. 1569 00:55:56,120 --> 00:55:57,559 Tell me about it. 1570 00:55:57,560 --> 00:55:58,879 Was that we're at the end of her 1571 00:55:58,880 --> 00:55:59,959 questions. 1572 00:55:59,960 --> 00:56:02,059 I'd like to thank our Speaker Matteus 1573 00:56:02,060 --> 00:56:03,529 again for this wonderful talk and 1574 00:56:03,530 --> 00:56:05,900 contribution to the C++ mess. 1575 00:56:07,280 --> 00:56:09,089 We are trying to fix it. 1576 00:56:09,090 --> 00:56:10,789 OK, thank you very much. 1577 00:56:10,790 --> 00:56:11,790 Thanks.