1 00:00:00,099 --> 00:00:14,890 *34c3 intro* 2 00:00:14,890 --> 00:00:19,090 Hanno Böck: Yeah, so many of you probably know me from doing things around IT 3 00:00:19,090 --> 00:00:25,000 security, but I'm gonna surprise you to almost not talk about IT security today. 4 00:00:25,000 --> 00:00:32,189 But I'm gonna ask the question "Can we trust the scientific method?". I want to 5 00:00:32,189 --> 00:00:38,809 start this by giving you which is quite a simple example. So if we do science like 6 00:00:38,809 --> 00:00:45,210 we start with the theory and then we are trying to test if it's true, right? So I 7 00:00:45,210 --> 00:00:49,760 mean I said I'm not going to talk about IT security but I chose an example from IT 8 00:00:49,760 --> 00:00:56,690 security or kind of from IT security. So there was a post on Reddit a while ago, 9 00:00:56,690 --> 00:01:01,329 a picture from some book which claimed that if you use a Malachite crystal that can 10 00:01:01,329 --> 00:01:06,240 protect you from computer viruses. Which... to me doesn't sound very 11 00:01:06,240 --> 00:01:11,009 plausible, right? Like, these are crystals and if you put them on your computer, this book 12 00:01:11,009 --> 00:01:18,590 claims this protects you from malware. But of course if we really want to know, we 13 00:01:18,590 --> 00:01:23,990 could do a study on this. And if you say people don't do Studies on crazy things: 14 00:01:23,990 --> 00:01:28,770 that's wrong. I mean people do studies on homeopathy or all kinds of crazy things 15 00:01:28,770 --> 00:01:34,549 that are completely implausible. So we can do a study on this and what we will do is 16 00:01:34,549 --> 00:01:39,509 we will do a randomized control trial, which is kind of the gold standard of 17 00:01:39,509 --> 00:01:46,310 doing a test on these kinds of things. So this is our question: "Do Malachite 18 00:01:46,310 --> 00:01:52,479 crystals prevent malware infections?" and how we would test that, our study design 19 00:01:52,479 --> 00:01:58,399 is: ok, we take a group of maybe 20 computer users. And then we split them 20 00:01:58,399 --> 00:02:06,009 randomly to two groups, and then one group we'll give one of these crystals and tell 21 00:02:06,009 --> 00:02:10,919 them: "Put them on your desk or on your computer.". Then we need, the other group 22 00:02:10,919 --> 00:02:15,800 is our control group. That's very important because if we want to know if 23 00:02:15,800 --> 00:02:20,940 they help we need another group to compare it to. And to rule out that there are any 24 00:02:20,940 --> 00:02:27,130 kinds of placebo effects, we give these control groups a fake Malachite crystal so 25 00:02:27,130 --> 00:02:32,260 we can compare them against each other. And then we wait for maybe six months and 26 00:02:32,260 --> 00:02:39,310 then we check how many malware infections they had. Now, I didn't do that study, but 27 00:02:39,310 --> 00:02:45,090 I simulated it with a Python script and given that I don't believe that this 28 00:02:45,090 --> 00:02:50,310 theory is true I just simulated this as random data. So I'm not going to go 29 00:02:50,310 --> 00:02:55,090 through the whole script but I'm just like generating, I'm assuming there can be 30 00:02:55,090 --> 00:02:59,950 between 0 and 3 malware infections and it's totally random and then I compare the 31 00:02:59,950 --> 00:03:04,790 two groups. And then I calculate something which is called a p-value which is a very 32 00:03:04,790 --> 00:03:10,730 common thing in science whenever you do statistics. A p-value is, it's a bit 33 00:03:10,730 --> 00:03:17,290 technical, but it's the probability that if you have no effect that you would get 34 00:03:17,290 --> 00:03:23,570 this result. Which kind of in another way means, if you have 20 results in an 35 00:03:23,570 --> 00:03:29,260 idealized world then one of them is a false positive which means one of them 36 00:03:29,260 --> 00:03:34,510 says something happens although it doesn't. And in many fields of science 37 00:03:34,510 --> 00:03:41,180 this p-value of 0.05 is considered that significant which is like these twenty 38 00:03:41,180 --> 00:03:48,620 studies. So one error in twenty studies but as I said under idealized conditions. 39 00:03:48,620 --> 00:03:53,330 So and as it's the script and I can run it in less than a second I just did it twenty 40 00:03:53,330 --> 00:03:59,821 times instead of once. So here are my 20 simulated studies and most of them look 41 00:03:59,821 --> 00:04:06,360 not very interesting so of course we have a few random variations but nothing very 42 00:04:06,360 --> 00:04:12,460 significant. Except if you look at this one study, it says the people with the 43 00:04:12,460 --> 00:04:17,160 Malachite crystal had on average 1.8 malware infections and the people with the 44 00:04:17,160 --> 00:04:24,670 fake crystal had 0.8. So it means actually the crystal made it worse. But also this 45 00:04:24,670 --> 00:04:32,100 result is significant because it has a p-value of 0.03. So of course we can 46 00:04:32,100 --> 00:04:36,110 publish that, assuming I really did these studies. 47 00:04:36,110 --> 00:04:40,600 *applause* B.: And the other studies we just forget 48 00:04:40,600 --> 00:04:45,850 about. I mean they were not interesting right and who cares? Non significant 49 00:04:45,850 --> 00:04:52,990 results... Okay so you have just seen that I created a significant result out of 50 00:04:52,990 --> 00:05:00,590 random data. And that's concerning because people in science - I mean you can really do 51 00:05:00,590 --> 00:05:07,850 that. And this phenomena is called publication bias. So what's happening here 52 00:05:07,850 --> 00:05:13,130 is that, you're doing studies and if they get a positive result - meaning you're 53 00:05:13,130 --> 00:05:18,990 seeing an effect, then you publish them and if there's no effect you just forget 54 00:05:18,990 --> 00:05:26,670 about them. We learned earlier that with this p-value of 0.05 means 1 in 20 studies 55 00:05:26,670 --> 00:05:32,760 is a false positive, but you usually don't see the studies that are not significant, 56 00:05:32,760 --> 00:05:39,320 because they don't get published. And you may wonder: "Ok, what's stopping a 57 00:05:39,320 --> 00:05:43,500 scientist from doing exactly this? What's stopping a scientist from just doing so 58 00:05:43,500 --> 00:05:47,750 many experiments till one of them looks like it's a real result although it's just 59 00:05:47,750 --> 00:05:54,710 a random fluke?". And the disconcerning answer to that is, it's usually nothing. 60 00:05:56,760 --> 00:06:03,620 And this is not just a theoretical example. I want to give you an example, 61 00:06:03,620 --> 00:06:09,110 that has quite some impact and that was researched very well, and that is a 62 00:06:09,110 --> 00:06:17,980 research on antidepressants so called SSRIs. And in 2008 there was a study, the 63 00:06:17,980 --> 00:06:22,680 interesting situation here was, that the US Food and Drug Administration, which is 64 00:06:22,680 --> 00:06:29,480 the authority that decides whether a medical drug can be put on the market, 65 00:06:29,480 --> 00:06:35,490 they had knowledge about all the studies that had been done to register this 66 00:06:35,490 --> 00:06:40,380 medication. And then some researchers looked at that and compared it with what 67 00:06:40,380 --> 00:06:45,810 has been published. And they figured out there were 38 studies that saw that these 68 00:06:45,810 --> 00:06:51,040 medications had a real effect, had real improvements for patients. And from those 69 00:06:51,040 --> 00:06:56,790 38 studies 37 got published. But then there were 36 studies that said: "These 70 00:06:56,790 --> 00:07:00,010 medications don't really have any effect.", "They are not really better than 71 00:07:00,010 --> 00:07:06,530 a placebo effect" and out of those only 14 got published. And even from those 14 72 00:07:06,530 --> 00:07:11,010 there were 11, where the researcher said, okay they have spent the result in a way 73 00:07:11,010 --> 00:07:17,920 that it sounds like these medications do something. But they were also a bunch of 74 00:07:17,920 --> 00:07:21,870 studies that were just not published because they had a negative result. And 75 00:07:21,870 --> 00:07:26,390 it's clear that if you look at the published studies only and you ignore the 76 00:07:26,390 --> 00:07:29,320 studies with a negative result that haven't been published, then these 77 00:07:29,320 --> 00:07:34,290 medications look much better than they really are. And it's not like the earlier 78 00:07:34,290 --> 00:07:38,240 example there is a real effect from antidepressants, but they are not as good 79 00:07:38,240 --> 00:07:40,210 as people have believed in the past. 80 00:07:43,020 --> 00:07:45,860 So we've learnt in theory with publication bias 81 00:07:45,860 --> 00:07:50,520 you can create result out of nothing. But if you're a researcher and you have a 82 00:07:50,520 --> 00:07:54,790 theory that's not true but you really want to publish something about it, that's not 83 00:07:54,790 --> 00:07:59,699 really efficient, because you have to do 20 studies on average to get one of these 84 00:07:59,699 --> 00:08:06,130 random results that look like real results. So there are more efficient ways 85 00:08:06,130 --> 00:08:12,780 to get to a result from nothing. If you're doing a study then there are a lot of 86 00:08:12,780 --> 00:08:17,320 micro decisions you have to make, for example you may have dropouts from your 87 00:08:17,320 --> 00:08:22,150 study where people, I don't know they move to another place or they - you now longer 88 00:08:22,150 --> 00:08:26,020 reach them, so they are no longer part of your study. And there are different things 89 00:08:26,020 --> 00:08:30,480 how you can handle that. Then you may have cornercase results, where you're not 90 00:08:30,480 --> 00:08:34,509 entirely sure: "Is this an effect or not and how do you decide?", "How do you 91 00:08:34,509 --> 00:08:39,639 exactly measure?". And then also you may be looking for different things, maybe 92 00:08:39,639 --> 00:08:46,620 there are different tests you can do on people, and you may control for certain 93 00:08:46,620 --> 00:08:51,639 variables like "Do you split men and women into separate?", "Do you see them 94 00:08:51,639 --> 00:08:56,430 separately?" or "Do you separate them by age?". So there are many decisions you can 95 00:08:56,430 --> 00:09:02,050 make while doing a study. And of course each of these decisions has a small effect 96 00:09:02,050 --> 00:09:10,399 on the result. And it may very often be, that just by trying all the combinations 97 00:09:10,399 --> 00:09:15,230 you will get a p-value that looks like it's statistically significant, although 98 00:09:15,230 --> 00:09:20,670 there's no real effect. So and there's this term called p-Hacking which means 99 00:09:20,670 --> 00:09:25,550 you're just adjusting your methods long enough, that you get a significant result. 100 00:09:27,050 --> 00:09:32,550 And I'd like to point out here, that this is usually not that a scientist says: "Ok, 101 00:09:32,550 --> 00:09:36,259 today I'm going to p-hack my result, because I know my theory is wrong but I 102 00:09:36,259 --> 00:09:42,420 want to show it's true.". But it's a subconscious process, because usually the 103 00:09:42,420 --> 00:09:47,399 scientists believe in their theories. Honestly. They honestly think that their 104 00:09:47,399 --> 00:09:52,040 theory is true and that their research will show that. So they may subconsciously 105 00:09:52,040 --> 00:09:58,279 say: "Ok, if I analyze my data like this it looks a bit better so I will do this.". 106 00:09:58,279 --> 00:10:05,079 So subconsciously, they may p-hack themselves into getting a result that's 107 00:10:05,079 --> 00:10:11,449 not really there. And again we can ask: "What is stopping scientists from 108 00:10:11,449 --> 00:10:22,009 p-hacking?". And the concerning answer is the same: usually nothing. And I came to 109 00:10:22,009 --> 00:10:26,069 this conclusion that I say: "Ok, the scientific method it's a way to create 110 00:10:26,069 --> 00:10:31,899 evidence for whatever theory you like. No matter if it's true or not.". And you may 111 00:10:31,899 --> 00:10:35,720 say: "That's a pretty bold thing to say.". and I'm saying this even though I'm not 112 00:10:35,720 --> 00:10:42,480 even a scientist. I'm just like some hacker who, whatever... But I'm not alone 113 00:10:42,480 --> 00:10:47,759 in this, like there's a paper from a famous researcher John Ioannidis, who 114 00:10:47,759 --> 00:10:51,529 said: "Why most published research findings are false.". He published this in 115 00:10:51,529 --> 00:10:57,170 2005 and if you look at the title, he doesn't really question that most research 116 00:10:57,170 --> 00:11:02,560 findings are false. He only wants to give reasons why this is the case. And he makes 117 00:11:02,560 --> 00:11:08,499 some very possible assumptions if you look at that many negative results don't get 118 00:11:08,499 --> 00:11:12,129 published, and that you will have some bias. And it comes to a very plausible 119 00:11:12,129 --> 00:11:17,180 conclusion, that this is the case and this is not even very controversial. If you ask 120 00:11:17,180 --> 00:11:23,491 people who are doing what you can call science on science or meta science, who 121 00:11:23,491 --> 00:11:28,410 look at scientific methodology, they will tell you: "Yeah, of course that's the 122 00:11:28,410 --> 00:11:32,079 case.". Some will even say: "Yeah, that's how science works, that's what we 123 00:11:32,079 --> 00:11:37,689 expect.". But I find it concerning. And if you take this seriously, it means: if you 124 00:11:37,689 --> 00:11:43,160 read about a study, like in a newspaper, the default assumption should be 'that's 125 00:11:43,160 --> 00:11:51,179 not true' - while we might usually think the opposite. And if science is a method 126 00:11:51,179 --> 00:11:55,709 to create evidence for whatever you like, you can think about something really 127 00:11:55,709 --> 00:12:00,939 crazy, like "Can people see into the future?", "Does our mind have 128 00:12:00,939 --> 00:12:09,720 some extra perception where we can sense things that happen in an hour?". And 129 00:12:09,720 --> 00:12:15,559 there was a psychologist called Daryl Bem and he thought that this is the case and 130 00:12:15,559 --> 00:12:20,399 he published a study on it. It was titled "feeling the future". He did a lot of 131 00:12:20,399 --> 00:12:25,449 experiments where he did something, and then something later happened, and he 132 00:12:25,449 --> 00:12:29,569 thought he had statistical evidence that what happened later influenced what 133 00:12:29,569 --> 00:12:34,999 happened earlier. So, I don't think that's very plausible - based on what we know 134 00:12:34,999 --> 00:12:41,550 about the universe, but yeah... and it was published in a real psychology journal. 135 00:12:41,550 --> 00:12:46,680 And a lot of things were wrong with this study. Basically, it's a very nice example 136 00:12:46,680 --> 00:12:51,009 for p-hacking and just even a book by Daryl Bem, where he describes something 137 00:12:51,009 --> 00:12:55,040 which basically looks like p-hacking, where he says that's how you do 138 00:12:55,040 --> 00:13:03,870 psychology. But the study was absolutely in line with the existing standards in 139 00:13:03,870 --> 00:13:08,759 Experimental Psychology. And that a lot of people found concerning. So, if you can 140 00:13:08,759 --> 00:13:13,619 show that precognition is real, that you can see into the future, then what else 141 00:13:13,619 --> 00:13:19,139 can you show and how can we trust our results? And psychology has debated this a 142 00:13:19,139 --> 00:13:21,880 lot in the past couple of years. So there's a lot of talk about the 143 00:13:21,880 --> 00:13:30,009 replication crisis in psychology. And many effects that psychology just thought were 144 00:13:30,009 --> 00:13:35,040 true, they figured out, okay, if they try to repeat these experiments, they couldn't 145 00:13:35,040 --> 00:13:40,759 get these results even though entire subfields were built on these results. 146 00:13:44,369 --> 00:13:48,069 And I want to show you an example, which is one of the ones that is not discussed so 147 00:13:48,069 --> 00:13:55,540 much. So there's a theory which is called moral licensing. And the idea is that if 148 00:13:55,540 --> 00:14:00,649 you do something good, or something you think is good, then later basically you 149 00:14:00,649 --> 00:14:04,880 behave like an asshole. Because you think I already did something good now, I don't 150 00:14:04,880 --> 00:14:10,689 have to be so nice anymore. And there were some famous studies that had the theory, 151 00:14:10,689 --> 00:14:17,870 that people consume organic food, that later they become more judgmental, or less 152 00:14:17,870 --> 00:14:27,949 social, less nice to their peers. But just last week someone tried to replicate this 153 00:14:27,949 --> 00:14:32,720 original experiments. And they tried it three times with more subjects and better 154 00:14:32,720 --> 00:14:39,010 research methodology and they totally couldn't find that effect. But like what 155 00:14:39,010 --> 00:14:43,790 you've seen here is lots of media articles. I have not found a single 156 00:14:43,790 --> 00:14:51,179 article reporting that this could not be replicated. Maybe they will come but yeah 157 00:14:51,179 --> 00:14:57,360 there's just a very recent example. But now I want to have a small warning for you 158 00:14:57,360 --> 00:15:01,319 because you may think now "yeah these psychologists, that all sounds very 159 00:15:01,319 --> 00:15:05,329 fishy and they even believe in precognition and whatever", but maybe your 160 00:15:05,329 --> 00:15:09,889 field is not much better maybe you just don't know about it yet because nobody 161 00:15:09,889 --> 00:15:15,990 else has started replicating studies in your field. And there are other fields 162 00:15:15,990 --> 00:15:21,670 that have replication problems and some much worse for example the pharma company 163 00:15:21,670 --> 00:15:27,279 Amgen in 2012 they published something where they said "We have tried to 164 00:15:27,279 --> 00:15:32,940 replicate cancer research and preclinical research" that is stuff in a petri dish or 165 00:15:32,940 --> 00:15:38,869 animal experiments so not drugs on humans but what happens before you develop a drug 166 00:15:38,869 --> 00:15:44,699 and they were only able to replicate 47 out of 53 studies. And these were they 167 00:15:44,699 --> 00:15:50,050 said landmark studies, so studies that have been published in the best journals. 168 00:15:50,050 --> 00:15:54,099 Now there are a few problems with this publication because they have not 169 00:15:54,099 --> 00:15:58,760 published their applications they have not told us which studies these were that they 170 00:15:58,760 --> 00:16:02,730 could not replicate. In the meantime I think they have published three of these 171 00:16:02,730 --> 00:16:07,290 replications but most of it is a bit in the dark which points to another problem 172 00:16:07,290 --> 00:16:10,689 because they say they did this because they collaborated with the original 173 00:16:10,689 --> 00:16:16,109 researchers and they only did this by agreeing that they would not publish the 174 00:16:16,109 --> 00:16:22,379 results. But it still sounds very concerning so but some fields don't have a 175 00:16:22,379 --> 00:16:27,170 replication problem because just nobody is trying to replicate previous results I 176 00:16:27,170 --> 00:16:34,269 mean then you will never know if your results hold up. So what can be done about 177 00:16:34,269 --> 00:16:42,930 all this and fundamentally I think the core issue here is that the scientific 178 00:16:42,930 --> 00:16:49,970 process is tied together with results, so we do a study and only after that we 179 00:16:49,970 --> 00:16:54,759 decide whether it's going to be published. Or we do a study and only after we have 180 00:16:54,759 --> 00:17:01,230 the data we're trying to analyze it. So essentially we need to decouple the 181 00:17:01,230 --> 00:17:09,800 scientific process from its results and one way of doing that is pre-registration 182 00:17:09,800 --> 00:17:14,490 so what you're doing there is that before you start doing a study you will register 183 00:17:14,490 --> 00:17:20,500 it in a public register and say "I'm gonna do a study like on this medication or 184 00:17:20,500 --> 00:17:25,670 whatever on this psychological effect" and that's how I'm gonna do it and then later 185 00:17:25,670 --> 00:17:33,980 on people can check if you really did that. And yeah that's what I said. And this 186 00:17:33,980 --> 00:17:41,179 is more or less standard practice in medical drug trials the summary about it 187 00:17:41,179 --> 00:17:47,130 is it does not work very well but it's better than nothing. So, and the problem 188 00:17:47,130 --> 00:17:52,029 is mostly enforcement so people register study and then don't publish it and 189 00:17:52,029 --> 00:17:57,190 nothing happens to them even though they are legally required to publish it. And 190 00:17:57,190 --> 00:18:01,889 there are two campaigns I'd like to point out, there's the all trials campaign which 191 00:18:01,889 --> 00:18:08,149 has been started by Ben Goldacre he's a doctor from the UK and they like demand 192 00:18:08,149 --> 00:18:13,330 that like every trial it's done on medication should be published. And 193 00:18:13,330 --> 00:18:18,870 there's also a project by the same guy the compare project and they are trying to see 194 00:18:18,870 --> 00:18:25,380 if a medical trial has been registered and later published did they do the same or 195 00:18:25,380 --> 00:18:29,480 did they change something in their protocol and was there a reason for it or 196 00:18:29,480 --> 00:18:36,799 did they just change it to get a result, which they otherwise wouldn't get.But then 197 00:18:36,799 --> 00:18:41,080 again like these issues in medicine they offer get a lot of attention and for good 198 00:18:41,080 --> 00:18:46,820 reasons because if we have bad science in medicine then people die, that's pretty 199 00:18:46,820 --> 00:18:52,960 immediate and pretty massive. But if you read about this you always have to think 200 00:18:52,960 --> 00:18:58,510 that these issues in drug trials at least they have pre-registration, most 201 00:18:58,510 --> 00:19:04,330 scientific fields don't bother doing anything like that. So whenever you hear 202 00:19:04,330 --> 00:19:08,470 something about maybe about publication bias in medicine you should always think 203 00:19:08,470 --> 00:19:12,630 the same thing happens in many fields of science and usually nobody is doing 204 00:19:12,630 --> 00:19:18,809 anything about it. And particularly to this audience I'd like to say there's 205 00:19:18,809 --> 00:19:23,580 currently a big trend that people from computer science want to revolutionize 206 00:19:23,580 --> 00:19:30,300 medicine: big data and machine learning, these things, which in principle is ok but 207 00:19:30,300 --> 00:19:34,750 I know a lot of people in medicine are very worried about this and the reason is, 208 00:19:34,750 --> 00:19:39,470 that these computer science people don't have the same scientific standards as 209 00:19:39,470 --> 00:19:44,399 people in medicine expect them and might say "Yeah we don't need really need to do 210 00:19:44,399 --> 00:19:50,450 a study on this it's obvious that this helps" and that is worrying and I come 211 00:19:50,450 --> 00:19:53,580 from computer science and I very well understand that people from medicine are 212 00:19:53,580 --> 00:20:00,540 worried about this. So there's an idea that goes even further as pre-registration 213 00:20:00,540 --> 00:20:05,210 and it's called registered reports. There is a couple of years ago some scientists 214 00:20:05,210 --> 00:20:10,539 wrote an open letter to the Guardian where they.. that was published there and the idea 215 00:20:10,539 --> 00:20:16,451 there is that you turn the scientific publication process upside down, so if you 216 00:20:16,451 --> 00:20:21,210 want to do a study the first thing you would do with the register report is, you 217 00:20:21,210 --> 00:20:27,000 submit your design your study design protocol to the journal and then the 218 00:20:27,000 --> 00:20:33,110 journal decides whether they will publish that before they see any result, because 219 00:20:33,110 --> 00:20:36,990 then you can prevent publication bias and then you prevent the journals only publish 220 00:20:36,990 --> 00:20:42,710 the nice findings and ignore the negative findings. And then you do the study and 221 00:20:42,710 --> 00:20:46,330 then it gets published but it gets published independent of what the result 222 00:20:46,330 --> 00:20:53,830 was. And there of course other things you can do to improve science, there's a lot 223 00:20:53,830 --> 00:20:58,610 of talk about sharing data, sharing code, sharing methods because if you want to 224 00:20:58,610 --> 00:21:04,130 replicate a study it's of course easier if you have access to all the details how the 225 00:21:04,130 --> 00:21:11,090 original study was done. Then you could say "Okay we could do large 226 00:21:11,090 --> 00:21:15,269 collaborations" because many studies are just too small if you have a study with 227 00:21:15,269 --> 00:21:19,630 twenty people you just don't get a very reliable outcome. So maybe in many 228 00:21:19,630 --> 00:21:25,669 situations it would be better get together 10 teams of scientists and let them all do 229 00:21:25,669 --> 00:21:31,640 a big study together and then you can reliably answer a question. And also some 230 00:21:31,640 --> 00:21:36,390 people propose just to get higher statistical thresholds that p-value of 231 00:21:36,390 --> 00:21:42,260 0.05 means practically nothing. There was recently a paper that just argued which 232 00:21:42,260 --> 00:21:47,880 would just like put the dot one more to the left and have 0.005 and that would 233 00:21:47,880 --> 00:21:55,029 already solve a lot of problems. And for example in physics they have they have 234 00:21:55,029 --> 00:22:00,870 something called Sigma 5 which is I think zero point and then 5 zeroes and 3 or 235 00:22:00,870 --> 00:22:08,350 something like that so in physics they have much higher statistical thresholds. 236 00:22:08,350 --> 00:22:13,210 Now whatever if you're working in any scientific field you might ask yourself 237 00:22:13,210 --> 00:22:20,200 like "If we have statistic results are they pre registered in any way and do we 238 00:22:20,200 --> 00:22:26,380 publish negative results?" like we tested an effect and we got nothing and are there 239 00:22:26,380 --> 00:22:32,350 replications of all relevant results and I would say if you answer all these 240 00:22:32,350 --> 00:22:36,289 questions with "no" which I think many people will do, then you're not really 241 00:22:36,289 --> 00:22:41,510 doing science what you're doing is the alchemy of our time. 242 00:22:41,510 --> 00:22:50,220 *Applause* Thanks. 243 00:22:50,220 --> 00:22:54,499 Herald: Thank you very much.. Hanno: No I have more, sorry, I have 244 00:22:54,499 --> 00:23:03,060 three more slides, that was not the finishing line. Big issue is also that 245 00:23:03,060 --> 00:23:09,830 there are bad incentives in science, so a very standard thing to evaluate the impact 246 00:23:09,830 --> 00:23:15,710 of science is citation counts for you say "if your scientific study is cited a lot 247 00:23:15,710 --> 00:23:18,960 then this is a good thing and if your journal is cited a lot this is a good 248 00:23:18,960 --> 00:23:22,390 thing" and this for example the impact factor but there are also other 249 00:23:22,390 --> 00:23:27,059 measurements. And also universities like publicity so if your study gets a lot of 250 00:23:27,059 --> 00:23:33,490 media reports then your press department likes you. And these incentives tend to 251 00:23:33,490 --> 00:23:40,200 favor interesting results but they don't favor correct results and this is bad 252 00:23:40,200 --> 00:23:44,899 because if we are realistic most results are not that interesting, most results 253 00:23:44,899 --> 00:23:49,879 will be "Yeah we have this interesting and counterintuitive theory and it's totally 254 00:23:49,879 --> 00:24:00,470 wrong" and then there's this idea that science is self-correcting. So if you 255 00:24:00,470 --> 00:24:05,320 confront scientists with these issues with publication bias and peer hacking surely 256 00:24:05,320 --> 00:24:11,909 they will immediately change that's what scientists do right? And I want to cite 257 00:24:11,909 --> 00:24:16,259 something here with this sorry it's a bit long but "There are some evidence that 258 00:24:16,259 --> 00:24:21,329 inferior statistical tests are commonly used research which yields non significant 259 00:24:21,329 --> 00:24:28,730 results is not published." That sounds like publication bias and then it also 260 00:24:28,730 --> 00:24:32,450 says: "Significant results published in these fields are seldom verified by 261 00:24:32,450 --> 00:24:37,889 independent replication" so it seems there's a replication problem. These wise 262 00:24:37,889 --> 00:24:46,750 words were set in 1959, so by a statistician called Theodore Sterling and 263 00:24:46,750 --> 00:24:52,059 because science is so self-correcting in 1995 he complained that this article 264 00:24:52,059 --> 00:24:56,389 presents evidence that published result of scientific investigations are not a 265 00:24:56,389 --> 00:25:01,240 representative sample of all scientific studies. "These results also indicate that 266 00:25:01,240 --> 00:25:06,899 practice leading to publication bias has not changed over a period of 30 years" and 267 00:25:06,899 --> 00:25:13,030 here we are in 2018 and publication bias is still a problem. So if science is self- 268 00:25:13,030 --> 00:25:21,090 correcting then it's pretty damn slow in correcting itself, right? And finally I 269 00:25:21,090 --> 00:25:27,400 would like to ask you, if you're prepared for boring science, because ultimately, I 270 00:25:27,400 --> 00:25:31,950 think, we have a choice between what I would like to call TEDTalk science and 271 00:25:31,950 --> 00:25:40,980 boring science.. *Applause* 272 00:25:40,980 --> 00:25:46,779 .. so with tedtalk science we get mostly positive and surprising results and 273 00:25:46,779 --> 00:25:53,380 interesting results we have large defects many citations lots of media attention and 274 00:25:53,380 --> 00:26:00,139 you may have a TED talk about it. Unfortunately usually it's not true and I 275 00:26:00,139 --> 00:26:03,820 would like to propose boring science as the alternative which is mostly negative 276 00:26:03,820 --> 00:26:11,620 results, pretty boring, small effects but it may be closer to the truth. And I would 277 00:26:11,620 --> 00:26:18,230 like to have boring science but I know it's a pretty tough sell. Sorry I didn't 278 00:26:18,230 --> 00:26:35,280 hear that. Yeah, thanks for listening. *Applause* 279 00:26:35,280 --> 00:26:38,480 Herald: Thank you. Hanno: Two questions, or? 280 00:26:38,480 --> 00:26:41,030 Herald: We don't have that much time for questions, three minutes, three minutes 281 00:26:41,030 --> 00:26:45,250 guys. Question one - shoot. Mic: This isn't a question but I just 282 00:26:45,250 --> 00:26:48,700 wanted to comment Hanno you missed out a very critical topic here, which is the use 283 00:26:48,700 --> 00:26:53,130 of Bayesian probability. So you did conflate p-values with the scientific 284 00:26:53,130 --> 00:26:57,260 method which isn't.. which gave the rest of you talk. I felt a slightly unnecessary 285 00:26:57,260 --> 00:27:02,380 anti science slant. On p, p-values isn't the be-all and end-all of the scientific 286 00:27:02,380 --> 00:27:06,840 method so p-values is sort of calculating the probability that your data will happen 287 00:27:06,840 --> 00:27:10,860 given that no hypothesis is true whereas Bayesian probability would be calculating 288 00:27:10,860 --> 00:27:15,960 the probability that your hypothesis is true given the data and more and more 289 00:27:15,960 --> 00:27:19,559 scientists are slowly starting to realize that this sort of method is probably a 290 00:27:19,559 --> 00:27:25,809 better way of doing science than p-values. So this is probably a a third alternative 291 00:27:25,809 --> 00:27:29,950 to your sort of proposal boring science is doing the other side's Bayesian 292 00:27:29,950 --> 00:27:34,029 probability. Hanno: Sorry yeah, I agree with you I 293 00:27:34,029 --> 00:27:37,530 unfortunately I only had half an hour here. 294 00:27:37,530 --> 00:27:40,610 Herald: Where are you going after this like where are we going after this lecture 295 00:27:40,610 --> 00:27:46,269 can they find you somewhere in the bar? Hanno: I know him.. 296 00:27:46,269 --> 00:27:50,559 Herald: You know science is broken but then scientists it's a little bit like the 297 00:27:50,559 --> 00:27:54,990 next lecture actually that's waiting there it's like: "you scratch my back and I 298 00:27:54,990 --> 00:27:59,160 scratch yours for publication". Hanno: Maybe two more minutes? 299 00:27:59,160 --> 00:28:04,870 Herald: One minute. Please go ahead. 300 00:28:04,870 --> 00:28:11,820 Mic: Yeah hi, thank you for your talk. I'm curious so you've raised, you know, ways 301 00:28:11,820 --> 00:28:15,529 we can address this assuming good actors, assuming people who want to do better 302 00:28:15,529 --> 00:28:20,769 science that this happens out of ignorance or willful ignorance. What do we do about 303 00:28:20,769 --> 00:28:26,389 bad actors. So for example the medical community drug companies, maybe they 304 00:28:26,389 --> 00:28:29,539 really like the idea of being profitably incentivized by these random control 305 00:28:29,539 --> 00:28:34,929 trials, to make out essentially a placebo do something. How do we begin to address 306 00:28:34,929 --> 00:28:40,639 them current trying to maliciously p-hack or maliciously abuse the pre-reg system or 307 00:28:40,639 --> 00:28:44,409 something like that? Hanno: I mean it's a big question, right? 308 00:28:44,409 --> 00:28:50,660 But I think if the standards are kind of confining you so much that there's not 309 00:28:50,660 --> 00:28:56,380 much room to cheat that's way out right and a basis and also I don't think 310 00:28:56,380 --> 00:29:00,110 deliberate cheating is that much of a problem, I actually really think the 311 00:29:00,110 --> 00:29:07,120 bigger problem is people honestly believe what they do is true. 312 00:29:07,120 --> 00:29:15,640 Herald: Okay one last, you sir, please? Mic: So the value in science is often an 313 00:29:15,640 --> 00:29:20,559 account of publications right? Account of citations so and so on, so is it true that 314 00:29:20,559 --> 00:29:24,799 to improve this situation you've described, journals of whose publications 315 00:29:24,799 --> 00:29:31,120 are available, who are like prospective, should impose more higher standards so the 316 00:29:31,120 --> 00:29:37,470 journals are those who must like raise the bar, they should enforce publication of 317 00:29:37,470 --> 00:29:43,330 protocols before like accepting and etc etc. So is it journals who should, like, 318 00:29:43,330 --> 00:29:49,340 do work on that or can we regular scientists do something also? I mean you 319 00:29:49,340 --> 00:29:53,270 can publish in the journals that have better standards, right? There are 320 00:29:53,270 --> 00:29:59,299 journals that have these registered reports, but of course I mean as a single 321 00:29:59,299 --> 00:30:03,360 scientist is always difficult because you're playing in a system that has all 322 00:30:03,360 --> 00:30:06,580 these wrong incentives. Herald: Okay guys that's it, we have to 323 00:30:06,580 --> 00:30:12,670 shut down. Please. There is a reference better science dot-org, go there, and one 324 00:30:12,670 --> 00:30:16,299 last request give really warm applause! 325 00:30:16,299 --> 00:30:24,249 *Applause* 326 00:30:24,249 --> 00:30:29,245 *34c3 outro* 327 00:30:29,245 --> 00:30:46,000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!