0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/439 Thanks! 1 00:00:09,680 --> 00:00:12,199 So we welcome you want us 2 00:00:12,200 --> 00:00:14,719 back this talk attribution 3 00:00:14,720 --> 00:00:17,179 revolution, give a big hand of applause 4 00:00:19,400 --> 00:00:21,589 and say yes, 5 00:00:21,590 --> 00:00:22,590 thank you very much. 6 00:00:23,510 --> 00:00:25,339 Thank you all for braving this early 7 00:00:25,340 --> 00:00:27,109 morning and coming to attend this talk as 8 00:00:27,110 --> 00:00:28,110 well. 9 00:00:28,400 --> 00:00:29,809 I'm going to talk about the attribution 10 00:00:29,810 --> 00:00:31,939 revolution and why I think that we have 11 00:00:31,940 --> 00:00:34,189 a possibility here of turning copywrite 12 00:00:34,190 --> 00:00:35,539 upside down or inside out. 13 00:00:35,540 --> 00:00:37,729 If you want just a quick 14 00:00:37,730 --> 00:00:39,769 show of hands, just to give me an idea 15 00:00:39,770 --> 00:00:41,359 how many of you have heard me talk 16 00:00:41,360 --> 00:00:42,360 before? 17 00:00:42,930 --> 00:00:44,219 Not so many. Excellent. 18 00:00:44,220 --> 00:00:45,389 Then you're going to be surprised by this 19 00:00:45,390 --> 00:00:46,390 one. 20 00:00:46,980 --> 00:00:49,079 I'm going to show you an image 21 00:00:49,080 --> 00:00:50,849 and I would like to have a quick show of 22 00:00:50,850 --> 00:00:52,979 hands as well to see how many 23 00:00:52,980 --> 00:00:56,069 of you in this room can recognize 24 00:00:56,070 --> 00:00:57,329 where that image is from. 25 00:00:57,330 --> 00:00:59,519 If you can identify the author of it 26 00:00:59,520 --> 00:01:02,009 or perhaps identify the series, which is 27 00:01:02,010 --> 00:01:04,079 from Show Hands, where 28 00:01:04,080 --> 00:01:05,369 is this from? 29 00:01:05,370 --> 00:01:08,039 OK, fairly 30 00:01:08,040 --> 00:01:09,040 good. 31 00:01:09,890 --> 00:01:12,149 I had a talk last week 32 00:01:12,150 --> 00:01:14,489 in London at the Open Document 33 00:01:14,490 --> 00:01:16,319 Foundation's meeting. 34 00:01:16,320 --> 00:01:18,029 And where are you talking to people who 35 00:01:18,030 --> 00:01:20,669 are used to writing word processors? 36 00:01:20,670 --> 00:01:22,649 You can imagine that this joke went off 37 00:01:22,650 --> 00:01:23,650 quite well with them. 38 00:01:24,780 --> 00:01:27,179 OK, most of you recognize this. 39 00:01:27,180 --> 00:01:29,249 This is a CD is drawn by 40 00:01:29,250 --> 00:01:30,509 Randall Monroe. 41 00:01:30,510 --> 00:01:32,459 He Randall has a peculiar style of 42 00:01:32,460 --> 00:01:34,529 drawing, so it's quite easy to 43 00:01:34,530 --> 00:01:36,899 recognize it whenever you see it. 44 00:01:36,900 --> 00:01:38,579 He also has a sense of humor that 45 00:01:38,580 --> 00:01:40,099 attracts many of us. 46 00:01:40,100 --> 00:01:42,299 Now, let me show you another 47 00:01:42,300 --> 00:01:43,739 one, though, and I'll ask you the same 48 00:01:43,740 --> 00:01:46,319 question. Quick show of hands afterwards 49 00:01:46,320 --> 00:01:48,539 to see if you recognize where this image 50 00:01:48,540 --> 00:01:49,540 is from. 51 00:01:51,920 --> 00:01:52,920 OK, one 52 00:01:54,100 --> 00:01:55,100 oh, OK. 53 00:01:56,950 --> 00:01:59,769 To you have seen this before. 54 00:01:59,770 --> 00:02:01,989 OK, two people 55 00:02:01,990 --> 00:02:03,279 recognize this one. 56 00:02:03,280 --> 00:02:05,739 So the rest of you might be surprised 57 00:02:05,740 --> 00:02:07,929 when you learn that this image is, 58 00:02:07,930 --> 00:02:10,719 in fact, also by Randall Monroe. 59 00:02:10,720 --> 00:02:13,299 It is also part of the 60 00:02:13,300 --> 00:02:15,499 universe comic number seven in 61 00:02:15,500 --> 00:02:16,500 the series. 62 00:02:17,410 --> 00:02:19,599 Now, why I'm showing you this 63 00:02:19,600 --> 00:02:21,819 is because knowing 64 00:02:21,820 --> 00:02:23,909 that this image is part of 65 00:02:23,910 --> 00:02:26,079 a CD, it probably 66 00:02:26,080 --> 00:02:28,179 changes the way that we relate 67 00:02:28,180 --> 00:02:30,219 to it. It changes the way we feel about 68 00:02:30,220 --> 00:02:30,969 this. 69 00:02:30,970 --> 00:02:33,099 Before we knew that this was drawn by 70 00:02:33,100 --> 00:02:35,109 Randall Monroe, this was just an 71 00:02:35,110 --> 00:02:36,849 anonymous sketch that I could have taken 72 00:02:36,850 --> 00:02:38,679 from my own sketchbook or found anywhere 73 00:02:38,680 --> 00:02:39,909 on the Internet. 74 00:02:39,910 --> 00:02:42,039 But once we learn that this is by 75 00:02:42,040 --> 00:02:44,889 Randall Monroe is part of the CD 76 00:02:44,890 --> 00:02:47,289 universe, that suddenly 77 00:02:47,290 --> 00:02:49,389 we have the context that 78 00:02:49,390 --> 00:02:51,729 makes this image valuable 79 00:02:51,730 --> 00:02:54,369 to us. It gives us some meaning. 80 00:02:54,370 --> 00:02:57,099 And I can almost guarantee you that 81 00:02:57,100 --> 00:02:59,469 if we tried to sell this one, 82 00:02:59,470 --> 00:03:01,539 if Randall would sell the original 83 00:03:01,540 --> 00:03:03,759 of this, you would obviously get a lot 84 00:03:03,760 --> 00:03:05,919 more money as well if 85 00:03:05,920 --> 00:03:07,389 that knowledge was conveyed to the 86 00:03:07,390 --> 00:03:08,390 potential buyer. 87 00:03:09,760 --> 00:03:11,859 So knowing 88 00:03:11,860 --> 00:03:12,879 where things come from. 89 00:03:14,360 --> 00:03:16,459 Knowing who created something, who's the 90 00:03:16,460 --> 00:03:18,649 author, when it was 91 00:03:18,650 --> 00:03:20,779 created, where it was created, 92 00:03:20,780 --> 00:03:22,309 now all of those things are quite 93 00:03:22,310 --> 00:03:24,619 relevant and we see them all around us. 94 00:03:24,620 --> 00:03:25,999 We see them in Wikipedia. 95 00:03:26,000 --> 00:03:27,729 Citation needed. 96 00:03:27,730 --> 00:03:28,879 We see them in science. 97 00:03:28,880 --> 00:03:29,129 Right. 98 00:03:29,130 --> 00:03:31,489 Obviously, everything that we've done 99 00:03:31,490 --> 00:03:33,349 and for as long as we can remember in 100 00:03:33,350 --> 00:03:35,539 terms of scientific advances builds 101 00:03:35,540 --> 00:03:37,099 upon what people have done before. 102 00:03:37,100 --> 00:03:38,749 And we were used to crediting those 103 00:03:38,750 --> 00:03:40,879 people by attributing them 104 00:03:40,880 --> 00:03:43,219 when we write our papers or journals 105 00:03:44,360 --> 00:03:45,949 in politics. 106 00:03:45,950 --> 00:03:48,319 Well, know you can claim that 107 00:03:48,320 --> 00:03:50,479 in politics is maybe not too common 108 00:03:50,480 --> 00:03:52,549 to attribute to some known 109 00:03:52,550 --> 00:03:54,749 source that people can actually check. 110 00:03:54,750 --> 00:03:56,989 But as a politician, 111 00:03:56,990 --> 00:03:58,159 you do this all the time anyway. 112 00:03:58,160 --> 00:04:00,439 You attribute your statement to somewhere 113 00:04:00,440 --> 00:04:02,479 you attribute to farce from somewhere 114 00:04:03,500 --> 00:04:05,779 in culture, in art. 115 00:04:05,780 --> 00:04:07,669 We have this. We attribute we build upon 116 00:04:07,670 --> 00:04:09,109 something from before in food. 117 00:04:09,110 --> 00:04:11,179 Well, OK, I admit this is a bit 118 00:04:11,180 --> 00:04:13,249 of a stretch, but if you take 119 00:04:13,250 --> 00:04:15,529 up food cotton, one of the first 120 00:04:15,530 --> 00:04:17,629 things I do is I usually flip 121 00:04:17,630 --> 00:04:18,648 it around. I look at the list of 122 00:04:18,649 --> 00:04:19,699 ingredients. 123 00:04:19,700 --> 00:04:21,708 Now that's, you know, that's attribution 124 00:04:21,709 --> 00:04:23,689 and it tells you what does this actually 125 00:04:23,690 --> 00:04:25,219 contain? Where does it come from? 126 00:04:25,220 --> 00:04:27,290 What made into making this product? 127 00:04:28,460 --> 00:04:31,069 And this is the provenance 128 00:04:31,070 --> 00:04:32,209 of a work. 129 00:04:32,210 --> 00:04:34,879 This is the history of something. 130 00:04:34,880 --> 00:04:37,069 This is where something has been before, 131 00:04:37,070 --> 00:04:39,229 where it was created, who 132 00:04:39,230 --> 00:04:41,299 created it, when it was created, 133 00:04:41,300 --> 00:04:43,489 for what purpose was created and 134 00:04:43,490 --> 00:04:45,229 then what has happened with it. 135 00:04:45,230 --> 00:04:47,299 Until we see today, 136 00:04:47,300 --> 00:04:49,519 if you walk into a gallery and you look 137 00:04:49,520 --> 00:04:50,959 at paintings on the walls. 138 00:04:52,870 --> 00:04:55,209 You're most likely going to be interested 139 00:04:55,210 --> 00:04:57,279 in the information about 140 00:04:57,280 --> 00:04:58,239 those paintings as well. 141 00:04:58,240 --> 00:04:59,149 You're not just going to look at 142 00:04:59,150 --> 00:05:00,339 paintings, you're going to look at the 143 00:05:00,340 --> 00:05:02,199 provenance of those paintings. 144 00:05:02,200 --> 00:05:03,879 You're going to look at who actually 145 00:05:03,880 --> 00:05:05,799 painted them when they were painted, 146 00:05:05,800 --> 00:05:07,179 perhaps why they were painted. 147 00:05:07,180 --> 00:05:08,349 The title of it can give you some 148 00:05:08,350 --> 00:05:10,119 information, can use some knowledge. 149 00:05:10,120 --> 00:05:12,100 It gives the paintings some meaning 150 00:05:13,480 --> 00:05:14,480 now. 151 00:05:15,610 --> 00:05:16,610 Provenance. 152 00:05:17,740 --> 00:05:19,929 Is also connected to the aspect 153 00:05:19,930 --> 00:05:20,949 of reputation, 154 00:05:22,180 --> 00:05:24,789 reputation, obviously be, you know, 155 00:05:24,790 --> 00:05:26,859 something that we all have around 156 00:05:26,860 --> 00:05:27,489 us today as well. 157 00:05:27,490 --> 00:05:29,169 If you look at LinkedIn, Facebook, 158 00:05:29,170 --> 00:05:31,269 Twitter, everything is about building our 159 00:05:31,270 --> 00:05:32,289 reputation. 160 00:05:32,290 --> 00:05:33,639 Go on GitHub. 161 00:05:33,640 --> 00:05:34,879 You're talking about reputation. 162 00:05:34,880 --> 00:05:37,869 Your every contribution that you make 163 00:05:37,870 --> 00:05:40,179 contributes to your reputation, your 164 00:05:40,180 --> 00:05:41,949 standing in society. 165 00:05:41,950 --> 00:05:44,139 And that's facilitated by 166 00:05:44,140 --> 00:05:46,329 the attribution is facilitated by people 167 00:05:46,330 --> 00:05:48,879 knowing what you have actually created 168 00:05:48,880 --> 00:05:51,099 is knowing what 169 00:05:51,100 --> 00:05:52,690 you have contributed to cyto. 170 00:05:53,910 --> 00:05:55,979 Now, let me ask you 171 00:05:55,980 --> 00:05:58,049 one more thing so quick, so 172 00:05:58,050 --> 00:06:00,089 has this world see who is the avid reader 173 00:06:00,090 --> 00:06:01,090 in this room? 174 00:06:02,610 --> 00:06:03,660 If I say Woofy 175 00:06:04,740 --> 00:06:06,479 show hands, how many knows what a goofy 176 00:06:06,480 --> 00:06:07,480 is? 177 00:06:08,520 --> 00:06:11,009 OK, I'm going to send you to the library 178 00:06:11,010 --> 00:06:13,109 straight after this over 179 00:06:13,110 --> 00:06:15,389 here is a reputation 180 00:06:15,390 --> 00:06:17,669 based currency that was first 181 00:06:17,670 --> 00:06:19,649 envisioned in down and out in the Magic 182 00:06:19,650 --> 00:06:21,509 Kingdom by Cory Doctorow. 183 00:06:22,890 --> 00:06:25,139 Now, in this story, down 184 00:06:25,140 --> 00:06:26,789 and out in the Magic Kingdom, Cory 185 00:06:26,790 --> 00:06:28,889 Doctorow hypothesizes about 186 00:06:28,890 --> 00:06:31,079 a potential future in which 187 00:06:31,080 --> 00:06:33,209 the currency that we had today is 188 00:06:33,210 --> 00:06:35,489 replaced by reputation. 189 00:06:35,490 --> 00:06:37,709 What do you do? And what you create 190 00:06:37,710 --> 00:06:39,809 contributes to your woofy, which you 191 00:06:39,810 --> 00:06:41,579 can do in exchange for other things in 192 00:06:41,580 --> 00:06:42,580 turn. 193 00:06:43,620 --> 00:06:45,689 Now, when Cory doctor of wrote 194 00:06:45,690 --> 00:06:46,690 this, 195 00:06:47,790 --> 00:06:49,619 this all obviously seemed quite a lot 196 00:06:49,620 --> 00:06:51,279 like science fiction and it's written a 197 00:06:51,280 --> 00:06:52,739 science fiction is Cory Doctorow after 198 00:06:52,740 --> 00:06:53,740 all. 199 00:06:54,060 --> 00:06:55,679 I would argue, however, that it's 200 00:06:55,680 --> 00:06:57,389 actually not science fiction. 201 00:06:57,390 --> 00:06:59,549 We actually have a reputation based 202 00:06:59,550 --> 00:07:01,019 currency today as well, 203 00:07:02,160 --> 00:07:04,229 maybe not exactly in the larger 204 00:07:04,230 --> 00:07:06,479 scale that Cory Doctorow envisioned, 205 00:07:06,480 --> 00:07:09,149 but we do have it nonetheless. 206 00:07:09,150 --> 00:07:11,039 And this morning I was reminded of one 207 00:07:11,040 --> 00:07:13,169 example of it, and I took the liberty of 208 00:07:13,170 --> 00:07:14,220 slotting that in here. 209 00:07:15,630 --> 00:07:16,959 How many noble advocates? 210 00:07:18,720 --> 00:07:20,189 Not so many as well. So I'm going to 211 00:07:20,190 --> 00:07:21,190 introduce that to you as well. 212 00:07:22,850 --> 00:07:24,959 OK, this idea, advocates, not 213 00:07:24,960 --> 00:07:27,059 orga advocates, is one of 214 00:07:27,060 --> 00:07:29,489 the earliest, very earliest 215 00:07:29,490 --> 00:07:32,159 attempts at creating 216 00:07:32,160 --> 00:07:34,599 essentially a social network. 217 00:07:34,600 --> 00:07:37,049 It pioneered the concept 218 00:07:37,050 --> 00:07:39,179 of blogging on the Internet, 219 00:07:39,180 --> 00:07:40,709 sharing your experience with other 220 00:07:40,710 --> 00:07:42,959 people, and it developed a system 221 00:07:42,960 --> 00:07:44,159 of trust. 222 00:07:44,160 --> 00:07:46,709 It took the web of trust and implemented 223 00:07:46,710 --> 00:07:49,619 it in its own system so you could certify 224 00:07:49,620 --> 00:07:51,179 other people according to their 225 00:07:51,180 --> 00:07:53,309 experience in this case, within the free 226 00:07:53,310 --> 00:07:54,310 software community. 227 00:07:55,410 --> 00:07:58,049 Now, it was founded in 1999. 228 00:07:58,050 --> 00:07:59,319 So that's quite a while ago. 229 00:07:59,320 --> 00:08:00,320 Right. 230 00:08:00,660 --> 00:08:02,459 And it's been quite frequently cited 231 00:08:02,460 --> 00:08:04,589 since because it really was one 232 00:08:04,590 --> 00:08:06,029 of the first to try to do anything 233 00:08:06,030 --> 00:08:07,030 similar to this. 234 00:08:08,130 --> 00:08:10,239 Now, as you can see, I was 235 00:08:10,240 --> 00:08:12,989 on Adubato already in 1999. 236 00:08:12,990 --> 00:08:14,489 Now, that gives you a hint about how old 237 00:08:14,490 --> 00:08:16,049 I am. But it also gives you an 238 00:08:16,050 --> 00:08:17,759 understanding of how long I've been 239 00:08:17,760 --> 00:08:20,339 working with this now. 240 00:08:20,340 --> 00:08:22,259 Is that important? 241 00:08:22,260 --> 00:08:24,299 Well, to some extent, you know, it's not 242 00:08:24,300 --> 00:08:26,459 really. Um, but if I look 243 00:08:26,460 --> 00:08:29,029 at other people that are on advocator 244 00:08:29,030 --> 00:08:31,110 did this morning because I was curious. 245 00:08:32,780 --> 00:08:35,189 You got Bruce parents 246 00:08:35,190 --> 00:08:36,899 joined in early 2000. 247 00:08:36,900 --> 00:08:39,779 Richard Stallman joined in mid 2000. 248 00:08:39,780 --> 00:08:42,779 Bradley Kuhn joined in early 2001. 249 00:08:42,780 --> 00:08:45,019 And then you got me joining in 1999. 250 00:08:47,020 --> 00:08:49,349 And of course, that's then part 251 00:08:49,350 --> 00:08:51,419 of the story. I was before everyone else. 252 00:08:51,420 --> 00:08:52,259 Right. 253 00:08:52,260 --> 00:08:53,459 Do I feel proud about that? 254 00:08:53,460 --> 00:08:55,649 Well, you know, I'm human, so of course, 255 00:08:55,650 --> 00:08:56,549 I feel proud about that. 256 00:08:56,550 --> 00:08:58,679 I was an advocate for all these 257 00:08:58,680 --> 00:09:00,359 are big shots. 258 00:09:00,360 --> 00:09:02,069 Doesn't mean anything. 259 00:09:02,070 --> 00:09:03,539 Not really. 260 00:09:03,540 --> 00:09:05,909 But it's part of the reputation 261 00:09:05,910 --> 00:09:06,910 mechanism. 262 00:09:07,980 --> 00:09:10,289 And I'll introduce to you another 263 00:09:10,290 --> 00:09:12,479 product as well, which came 264 00:09:12,480 --> 00:09:14,820 to my attention until fairly recently. 265 00:09:18,480 --> 00:09:20,579 So, again, see how 266 00:09:20,580 --> 00:09:22,679 many of you know about this project 267 00:09:22,680 --> 00:09:23,899 as P2P value, 268 00:09:25,380 --> 00:09:27,749 how many recognize P2P value. 269 00:09:27,750 --> 00:09:29,849 OK, one person, Chris, 270 00:09:29,850 --> 00:09:31,019 is not here. Is he here? 271 00:09:31,020 --> 00:09:32,020 OK. 272 00:09:33,960 --> 00:09:35,909 Peer to peer value is an European Union 273 00:09:35,910 --> 00:09:38,009 funded project, so it means that 274 00:09:38,010 --> 00:09:40,079 is huge research portion within 275 00:09:40,080 --> 00:09:41,429 this project. 276 00:09:41,430 --> 00:09:43,529 But what I find interesting when I look 277 00:09:43,530 --> 00:09:44,909 at this project and I look at what 278 00:09:44,910 --> 00:09:46,409 they're promising to deliver or at least 279 00:09:46,410 --> 00:09:48,569 what their objectives are, 280 00:09:48,570 --> 00:09:50,699 I highlighted two things here that's 281 00:09:50,700 --> 00:09:51,700 coming up. 282 00:09:52,320 --> 00:09:54,779 They want to deploy a federated 283 00:09:54,780 --> 00:09:56,849 platform in which real 284 00:09:56,850 --> 00:09:59,159 world communities will interact, 285 00:09:59,160 --> 00:10:01,529 participate and collaboratively 286 00:10:01,530 --> 00:10:03,929 create content to the Commons 287 00:10:03,930 --> 00:10:06,029 based peer production school. 288 00:10:06,030 --> 00:10:07,949 And they want to develop a set of value 289 00:10:07,950 --> 00:10:10,589 metrics and reward mechanisms 290 00:10:10,590 --> 00:10:12,749 that incentivize the participation 291 00:10:12,750 --> 00:10:15,029 of citizens and so-called commerce based 292 00:10:15,030 --> 00:10:16,320 peer production. Now, to me, 293 00:10:18,000 --> 00:10:19,979 that's a reputation based economy. 294 00:10:19,980 --> 00:10:22,769 That's the budding stages of 295 00:10:22,770 --> 00:10:24,959 taking what we have on 296 00:10:24,960 --> 00:10:27,089 LinkedIn, on Twitter, everywhere 297 00:10:27,090 --> 00:10:29,039 else where we're talking about reputation 298 00:10:29,040 --> 00:10:31,409 and trying to put it in a larger 299 00:10:31,410 --> 00:10:33,509 context, trying to create some platform 300 00:10:33,510 --> 00:10:35,669 that can actually facilitate this, 301 00:10:35,670 --> 00:10:38,099 not only in the sort of reputation 302 00:10:38,100 --> 00:10:40,199 because I'm publishing something, but 303 00:10:40,200 --> 00:10:41,669 in terms of reputation, because I'm 304 00:10:41,670 --> 00:10:42,670 creating something. 305 00:10:47,660 --> 00:10:49,249 When I started thinking about 306 00:10:49,250 --> 00:10:51,019 attribution, I started thinking about the 307 00:10:51,020 --> 00:10:52,939 attribution revolution. 308 00:10:52,940 --> 00:10:54,469 I started talking to people and I started 309 00:10:54,470 --> 00:10:56,269 talking to photographers primarily 310 00:10:56,270 --> 00:10:58,039 because I saw them obviously carrying 311 00:10:58,040 --> 00:11:00,019 quite a bit about being attributed for 312 00:11:00,020 --> 00:11:01,189 the photographs that they take. 313 00:11:01,190 --> 00:11:04,069 And if you look around at newspapers, 314 00:11:04,070 --> 00:11:06,019 you see pretty much all the photographs 315 00:11:06,020 --> 00:11:08,569 are attributed to Getty Images, AFP 316 00:11:08,570 --> 00:11:10,459 or some other agency. 317 00:11:10,460 --> 00:11:12,499 And you might even have the name of the 318 00:11:12,500 --> 00:11:13,500 photographer there. 319 00:11:14,660 --> 00:11:17,329 Now, something that I realized 320 00:11:17,330 --> 00:11:19,489 when talking to people was that everyone 321 00:11:19,490 --> 00:11:21,559 seems to agree that attribution is 322 00:11:21,560 --> 00:11:23,689 important and was talking to some 323 00:11:23,690 --> 00:11:25,019 friends of mine who are photographers. 324 00:11:25,020 --> 00:11:27,109 They keep telling me that, you 325 00:11:27,110 --> 00:11:29,569 know, I know the direction 326 00:11:29,570 --> 00:11:31,339 in which the world is turning. 327 00:11:31,340 --> 00:11:33,259 I see the way the people are taking my 328 00:11:33,260 --> 00:11:35,179 photographs. They're sharing them online. 329 00:11:35,180 --> 00:11:36,709 They're publishing them on Twitter, on 330 00:11:36,710 --> 00:11:38,359 Facebook. 331 00:11:38,360 --> 00:11:40,639 And I'm OK with that because I know 332 00:11:40,640 --> 00:11:42,829 that I can't actually change the course 333 00:11:42,830 --> 00:11:44,729 of history and I can't change the way the 334 00:11:44,730 --> 00:11:46,009 people behave. 335 00:11:46,010 --> 00:11:48,949 But if we can make sure 336 00:11:48,950 --> 00:11:50,749 that whenever my photographs get 337 00:11:50,750 --> 00:11:53,269 published, I at least get attributed. 338 00:11:54,380 --> 00:11:57,079 That would solve a lot of concerns 339 00:11:57,080 --> 00:12:00,529 that people have, unfortunately, 340 00:12:00,530 --> 00:12:02,719 we're rather bad when it comes 341 00:12:02,720 --> 00:12:04,879 to actually giving credit where credit 342 00:12:04,880 --> 00:12:07,489 is due to actually attribute photographs 343 00:12:07,490 --> 00:12:09,439 when we do use them. 344 00:12:09,440 --> 00:12:11,599 Now, Creative Commons licensing as one 345 00:12:11,600 --> 00:12:13,669 example, they stipulate that 346 00:12:13,670 --> 00:12:15,409 whenever you reuse a work, you must 347 00:12:15,410 --> 00:12:17,509 attribute it in a manner reasonable 348 00:12:17,510 --> 00:12:18,890 for the medium of publishing it. 349 00:12:20,130 --> 00:12:22,889 Still, we see a large 350 00:12:22,890 --> 00:12:25,079 part of the commons which is not 351 00:12:25,080 --> 00:12:26,549 attributed when it gets shared. 352 00:12:28,230 --> 00:12:30,479 So two years ago, I 353 00:12:30,480 --> 00:12:32,579 started working on a project called 354 00:12:32,580 --> 00:12:34,889 Commons Machinery, which is 355 00:12:34,890 --> 00:12:37,259 an organization that aims 356 00:12:37,260 --> 00:12:40,139 to make attribution information, 357 00:12:40,140 --> 00:12:42,869 metadata about creative works, visible 358 00:12:42,870 --> 00:12:45,119 and actionable, now 359 00:12:45,120 --> 00:12:46,120 visible. 360 00:12:46,890 --> 00:12:48,749 Means that we should actually be able to 361 00:12:48,750 --> 00:12:51,299 see the meta data 362 00:12:51,300 --> 00:12:52,829 that are connected to the works that 363 00:12:52,830 --> 00:12:53,830 we're sharing. 364 00:12:54,960 --> 00:12:57,039 Unfortunately, that's not always true. 365 00:12:57,040 --> 00:12:58,979 And of course, a bunch of issues along 366 00:12:58,980 --> 00:13:01,619 the way in the early 2000s 367 00:13:01,620 --> 00:13:03,269 when at least the Swedish government and 368 00:13:03,270 --> 00:13:04,799 I'm sure other governments as well, 369 00:13:04,800 --> 00:13:07,439 started publishing like court proceedings 370 00:13:07,440 --> 00:13:09,689 and similar documents online, 371 00:13:09,690 --> 00:13:11,009 usually did it as word documents. 372 00:13:11,010 --> 00:13:13,169 And when it tried to hide someone's name, 373 00:13:13,170 --> 00:13:15,179 they would just take the marker in in 374 00:13:15,180 --> 00:13:17,399 Microsoft Word and strike something out 375 00:13:17,400 --> 00:13:19,079 of black. And obviously, people figured 376 00:13:19,080 --> 00:13:20,879 out that you can just open Disneyesque 377 00:13:20,880 --> 00:13:23,339 control said a few times and undo that. 378 00:13:23,340 --> 00:13:25,559 And then you got a name right. 379 00:13:25,560 --> 00:13:27,179 If you put it as PDF, 380 00:13:28,320 --> 00:13:29,369 it might look OK. 381 00:13:29,370 --> 00:13:31,319 But if you look underneath everything, 382 00:13:31,320 --> 00:13:32,699 you know, you still have the text and you 383 00:13:32,700 --> 00:13:34,769 still have a block of black above it 384 00:13:34,770 --> 00:13:36,509 and you just need to separate the two. 385 00:13:36,510 --> 00:13:38,579 And then you have the name, which means 386 00:13:38,580 --> 00:13:40,709 that today people are so afraid of 387 00:13:40,710 --> 00:13:42,359 publishing anything detailed that most of 388 00:13:42,360 --> 00:13:44,759 the time they print something 389 00:13:44,760 --> 00:13:47,639 physically market and then scan it again, 390 00:13:47,640 --> 00:13:48,989 which is obviously ridiculous because you 391 00:13:48,990 --> 00:13:51,659 lose a lot of information in doing that. 392 00:13:51,660 --> 00:13:53,819 But they do that for one particular 393 00:13:53,820 --> 00:13:56,189 reason because they don't know. 394 00:13:56,190 --> 00:13:58,859 It's not obvious to them what information 395 00:13:58,860 --> 00:14:00,329 is conveyed when they're publishing 396 00:14:00,330 --> 00:14:01,439 something, when they're sending their 397 00:14:01,440 --> 00:14:02,969 files around. 398 00:14:02,970 --> 00:14:04,919 They cannot trust that what they see on 399 00:14:04,920 --> 00:14:06,539 the screen is the only thing that is 400 00:14:06,540 --> 00:14:09,239 available there, even if they take 401 00:14:09,240 --> 00:14:11,429 painstaking effort to remove 402 00:14:11,430 --> 00:14:13,529 all the names completely, you know, 403 00:14:13,530 --> 00:14:15,630 clear away the history of the document, 404 00:14:16,800 --> 00:14:19,709 it's very easy to just leave out 405 00:14:19,710 --> 00:14:21,719 the fact that you can just go to file 406 00:14:21,720 --> 00:14:23,759 properties and you maybe have some names 407 00:14:23,760 --> 00:14:25,829 in the title or the description of that 408 00:14:25,830 --> 00:14:26,999 document. 409 00:14:27,000 --> 00:14:29,309 So with machinery, we wanted to make 410 00:14:29,310 --> 00:14:31,409 the metadata visible so 411 00:14:31,410 --> 00:14:33,359 that people actually are aware of the 412 00:14:33,360 --> 00:14:35,339 meta data that gets passed around. 413 00:14:35,340 --> 00:14:37,109 They are aware of the information that 414 00:14:37,110 --> 00:14:39,839 their documents and files contain. 415 00:14:39,840 --> 00:14:41,339 Obviously, there's a privacy issue in 416 00:14:41,340 --> 00:14:43,139 that as well to make that visible. 417 00:14:44,340 --> 00:14:46,199 But then the other part is to make it 418 00:14:46,200 --> 00:14:48,599 actionable and by actionable, 419 00:14:48,600 --> 00:14:50,729 it means that we need to have a 420 00:14:50,730 --> 00:14:53,189 way to actually develop 421 00:14:53,190 --> 00:14:55,589 our software so that it can 422 00:14:55,590 --> 00:14:58,049 act upon that metadata 423 00:14:58,050 --> 00:15:00,359 to give us helpful advice and to give us 424 00:15:00,360 --> 00:15:02,429 helpful information about the 425 00:15:02,430 --> 00:15:04,679 works that we're using 426 00:15:04,680 --> 00:15:06,929 to allow, as an example, a word 427 00:15:06,930 --> 00:15:08,999 processor. When you're inserting an image 428 00:15:09,000 --> 00:15:11,309 to automatically tell you that this image 429 00:15:11,310 --> 00:15:12,869 is from this particular author. 430 00:15:12,870 --> 00:15:14,339 Would you like me to put an automatic 431 00:15:14,340 --> 00:15:17,189 attribution to that author in there? 432 00:15:17,190 --> 00:15:19,319 That would be helpful, but we can only do 433 00:15:19,320 --> 00:15:21,019 that if we have actionable metadata. 434 00:15:22,590 --> 00:15:24,749 So fortunately, we are 435 00:15:24,750 --> 00:15:26,669 funded by the Shuttleworth Foundation 436 00:15:28,110 --> 00:15:29,909 for a period of two years, which is now 437 00:15:29,910 --> 00:15:30,990 just coming to close. 438 00:15:33,510 --> 00:15:35,879 Because they were interested, as I was, 439 00:15:35,880 --> 00:15:38,069 to see what would happen 440 00:15:38,070 --> 00:15:40,469 if we practically start putting our ideas 441 00:15:40,470 --> 00:15:42,719 in practice, what would happen 442 00:15:42,720 --> 00:15:44,849 if we start implementing systems 443 00:15:44,850 --> 00:15:46,949 that supported retention 444 00:15:46,950 --> 00:15:49,529 of metadata for digital works? 445 00:15:49,530 --> 00:15:51,509 Where would that lead us and where would 446 00:15:51,510 --> 00:15:53,729 the problems occur along the way? 447 00:15:53,730 --> 00:15:55,259 And we've learned a lot since he started 448 00:15:55,260 --> 00:15:56,129 working on this. 449 00:15:56,130 --> 00:15:58,259 So for the remainder 450 00:15:58,260 --> 00:16:00,419 of this presentation, I'm going to 451 00:16:00,420 --> 00:16:02,609 take one small step back and 452 00:16:02,610 --> 00:16:04,199 I'm going to talk a little bit about a 453 00:16:04,200 --> 00:16:05,969 retrospective to talk about where we came 454 00:16:05,970 --> 00:16:07,739 from and what we did in the process. 455 00:16:07,740 --> 00:16:09,989 Up to now, I'm going to mention where 456 00:16:09,990 --> 00:16:12,269 we are now, what we need to do next. 457 00:16:12,270 --> 00:16:13,409 And then I'm going to come back at the 458 00:16:13,410 --> 00:16:15,599 end to talk about what does 459 00:16:15,600 --> 00:16:17,399 that actually mean for copyright? 460 00:16:17,400 --> 00:16:18,899 Because you remember that as part of the 461 00:16:18,900 --> 00:16:20,999 title to any copyright upside down. 462 00:16:21,000 --> 00:16:22,079 And I hope that I will live up to that 463 00:16:22,080 --> 00:16:23,080 promise. 464 00:16:24,450 --> 00:16:26,699 This is an image 465 00:16:26,700 --> 00:16:28,589 from one of the first white papers we 466 00:16:28,590 --> 00:16:29,909 produced. 467 00:16:29,910 --> 00:16:32,129 Now, this shows you 468 00:16:32,130 --> 00:16:33,749 the different standards that are 469 00:16:33,750 --> 00:16:36,509 available to convey 470 00:16:36,510 --> 00:16:38,459 information about works. 471 00:16:38,460 --> 00:16:40,229 So these are all metadata standards, 472 00:16:40,230 --> 00:16:41,419 different levels. 473 00:16:41,420 --> 00:16:43,139 It can be difficult to read from the 474 00:16:43,140 --> 00:16:45,239 back. But we've got EXIF standard, 475 00:16:45,240 --> 00:16:47,309 which is a meta data representation, but 476 00:16:47,310 --> 00:16:49,769 it also is information about 477 00:16:49,770 --> 00:16:51,269 the work itself, like the author and the 478 00:16:51,270 --> 00:16:54,029 license that fits in there. 479 00:16:54,030 --> 00:16:56,639 IPC is a similar standard like EXIF, 480 00:16:56,640 --> 00:16:57,869 but it's created by the International 481 00:16:57,870 --> 00:16:59,489 Press and Telecommunications Council 482 00:16:59,490 --> 00:17:01,509 specifically for images we got. 483 00:17:01,510 --> 00:17:03,569 S&P was coming out of Adobe 484 00:17:03,570 --> 00:17:05,368 as well, doubling core 485 00:17:06,540 --> 00:17:08,819 odiously specifies the licenses 486 00:17:10,859 --> 00:17:13,649 problem, which is provenance 487 00:17:13,650 --> 00:17:16,439 standard and all these other standards. 488 00:17:16,440 --> 00:17:18,269 Is actually a bunch more. 489 00:17:18,270 --> 00:17:20,068 You'd be surprised what you find when you 490 00:17:20,069 --> 00:17:21,959 start looking at this. 491 00:17:21,960 --> 00:17:23,549 It seems that everyone has been thinking 492 00:17:23,550 --> 00:17:25,348 about this at any point in the past, has 493 00:17:25,349 --> 00:17:27,479 decided that whatever standards 494 00:17:27,480 --> 00:17:29,190 are available are not suitable for them. 495 00:17:30,510 --> 00:17:32,789 So we figured out quite 496 00:17:32,790 --> 00:17:34,919 quickly that there are simply too many 497 00:17:34,920 --> 00:17:35,909 standards. 498 00:17:35,910 --> 00:17:38,399 There's no way that we can make this work 499 00:17:38,400 --> 00:17:40,979 if we have, you know, even 500 00:17:40,980 --> 00:17:42,539 five percent of the works using excessive 501 00:17:42,540 --> 00:17:44,279 standard to describe themselves, five 502 00:17:44,280 --> 00:17:46,349 percent using AP, DC, five percent using 503 00:17:46,350 --> 00:17:48,419 five percent using something else, 504 00:17:48,420 --> 00:17:49,559 there's going to be a nightmare to 505 00:17:49,560 --> 00:17:51,599 actually try to implement that. 506 00:17:51,600 --> 00:17:53,909 And each of them don't really see enough 507 00:17:53,910 --> 00:17:54,910 use either. 508 00:17:55,590 --> 00:17:57,899 Even EXIF, which is probably the most use 509 00:17:57,900 --> 00:17:59,369 standard for images to convey 510 00:17:59,370 --> 00:18:01,529 information, it doesn't 511 00:18:01,530 --> 00:18:02,759 really have enough use. 512 00:18:02,760 --> 00:18:05,009 It has no tool support to a very large 513 00:18:05,010 --> 00:18:06,119 extent. 514 00:18:06,120 --> 00:18:09,149 Load something into a photo editor, 515 00:18:09,150 --> 00:18:10,589 change it round and save it. 516 00:18:10,590 --> 00:18:12,359 And that information is very often just 517 00:18:12,360 --> 00:18:14,279 lost because the tools don't actually 518 00:18:14,280 --> 00:18:16,349 support retention of metadata 519 00:18:16,350 --> 00:18:17,350 or passing it along. 520 00:18:18,570 --> 00:18:21,119 There's an embedded metadata manifesto 521 00:18:21,120 --> 00:18:22,739 that came out or the international press 522 00:18:22,740 --> 00:18:23,969 and telecommunication councils. 523 00:18:25,050 --> 00:18:27,599 They did a study of social 524 00:18:27,600 --> 00:18:30,179 media platforms and 525 00:18:30,180 --> 00:18:31,679 it was fairly easy. I mean, you just took 526 00:18:31,680 --> 00:18:34,289 an image with EXIF and IPTV metadata 527 00:18:34,290 --> 00:18:35,909 embedded within that image. 528 00:18:35,910 --> 00:18:38,309 They uploaded it to social media platform 529 00:18:38,310 --> 00:18:40,079 and then downloaded it again and they saw 530 00:18:40,080 --> 00:18:41,609 what happened to the metadata. 531 00:18:41,610 --> 00:18:43,679 And lo and behold, in almost 532 00:18:43,680 --> 00:18:45,359 80 percent of the cases, the metadata was 533 00:18:45,360 --> 00:18:46,360 just lost. 534 00:18:47,910 --> 00:18:48,910 Thika, 535 00:18:50,190 --> 00:18:52,349 500 pics, Twitter, 536 00:18:52,350 --> 00:18:54,329 Facebook, probably one of the worst of 537 00:18:54,330 --> 00:18:55,259 them. 538 00:18:55,260 --> 00:18:56,539 They just ignored metadata. 539 00:18:56,540 --> 00:18:57,540 They took it away. 540 00:18:58,680 --> 00:19:00,029 Google was one of the better ones. 541 00:19:00,030 --> 00:19:02,549 They actually took some effort to retain 542 00:19:02,550 --> 00:19:04,709 at least EXIF and BTC information, 543 00:19:04,710 --> 00:19:06,069 but some of the other information was 544 00:19:06,070 --> 00:19:07,070 still lost as well. 545 00:19:07,980 --> 00:19:08,980 So. 546 00:19:10,440 --> 00:19:12,509 Retaining metadata by hoping 547 00:19:12,510 --> 00:19:14,399 that whatever you embed within the file 548 00:19:14,400 --> 00:19:16,289 will get retained. 549 00:19:16,290 --> 00:19:17,669 It's not going to happen. 550 00:19:17,670 --> 00:19:20,699 It's a panacea at best. 551 00:19:20,700 --> 00:19:22,169 So we're sort of thinking about what 552 00:19:22,170 --> 00:19:24,299 other ways are people using 553 00:19:24,300 --> 00:19:25,979 creative works. 554 00:19:25,980 --> 00:19:28,859 And we came up with a case of copy paste, 555 00:19:28,860 --> 00:19:30,689 which is the very simple procedure of 556 00:19:30,690 --> 00:19:32,249 someone finding an image online that they 557 00:19:32,250 --> 00:19:34,319 like clicking copy on it and then 558 00:19:34,320 --> 00:19:36,449 going to, you know, like a presentation 559 00:19:36,450 --> 00:19:38,499 editor and clicking paste. 560 00:19:38,500 --> 00:19:40,469 And we're sort of thinking about how can 561 00:19:40,470 --> 00:19:42,659 we make the metadata 562 00:19:42,660 --> 00:19:44,759 of that image information that 563 00:19:44,760 --> 00:19:47,639 we need in order to attribute accurately 564 00:19:47,640 --> 00:19:49,670 be carried over in that operation. 565 00:19:51,070 --> 00:19:53,619 And now I'm going to slide into slightly 566 00:19:53,620 --> 00:19:55,689 technicalities of how we did that, 567 00:19:55,690 --> 00:19:57,579 but I hope you'll follow along anyway. 568 00:19:57,580 --> 00:19:58,839 It's not too technical indeed. 569 00:20:00,130 --> 00:20:02,529 So the first thing we did was 570 00:20:02,530 --> 00:20:04,719 we simply split 571 00:20:04,720 --> 00:20:06,279 the clipboard in two, essentially 572 00:20:07,300 --> 00:20:08,889 on the clipboard by default. 573 00:20:08,890 --> 00:20:10,689 If you copy an image, you might place an 574 00:20:10,690 --> 00:20:12,759 image JPEG resource 575 00:20:12,760 --> 00:20:15,099 available for the recipient application. 576 00:20:15,100 --> 00:20:17,409 And it just looks and say, oh, here's an 577 00:20:17,410 --> 00:20:19,539 image, grab that one. 578 00:20:19,540 --> 00:20:22,239 What we did when someone clicked copy 579 00:20:22,240 --> 00:20:24,339 was that we put not only the 580 00:20:24,340 --> 00:20:26,589 JPEG image on the clipboard, but we also 581 00:20:26,590 --> 00:20:28,689 put an RDF fragment 582 00:20:28,690 --> 00:20:30,519 containing the meta data as a machine 583 00:20:30,520 --> 00:20:32,139 readable meter reader. 584 00:20:32,140 --> 00:20:33,999 And then it would up to would be up to to 585 00:20:34,000 --> 00:20:35,349 receive the application. 586 00:20:35,350 --> 00:20:37,419 Once I click paste to say, 587 00:20:37,420 --> 00:20:39,549 you know, I can get either the image 588 00:20:39,550 --> 00:20:41,559 of self, I can get a meta data or I'll 589 00:20:41,560 --> 00:20:42,560 get both. 590 00:20:43,910 --> 00:20:45,200 So that was a first attempt. 591 00:20:47,320 --> 00:20:49,419 Later, we changed that completely 592 00:20:49,420 --> 00:20:51,729 again, we 593 00:20:51,730 --> 00:20:53,979 realize that a bunch of issues for this, 594 00:20:53,980 --> 00:20:55,629 I'll get back to them. 595 00:20:55,630 --> 00:20:57,460 And in our more recent. 596 00:20:58,530 --> 00:21:00,869 Prototypes that were built, we instead 597 00:21:00,870 --> 00:21:02,939 of putting the image on the clipboard, we 598 00:21:02,940 --> 00:21:05,009 actually put an HTML fragment on 599 00:21:05,010 --> 00:21:07,439 the clipboard, which has RDF 600 00:21:07,440 --> 00:21:09,779 metadata embedded within it. 601 00:21:09,780 --> 00:21:11,279 So you can see, for instance, the title 602 00:21:11,280 --> 00:21:13,349 here and the license together 603 00:21:13,350 --> 00:21:15,059 with the image and the source of it. 604 00:21:16,330 --> 00:21:18,449 Now, we implemented 605 00:21:18,450 --> 00:21:20,579 this hour variations of this and 606 00:21:20,580 --> 00:21:21,509 quite a few tools. 607 00:21:21,510 --> 00:21:23,909 And Gady Teekay, Jim Inkscape, 608 00:21:23,910 --> 00:21:25,409 Lee Ralph is a little higher ed media 609 00:21:25,410 --> 00:21:27,209 goblin and I'm quite proud of and I'm 610 00:21:27,210 --> 00:21:29,459 quite happy that we were able to 611 00:21:29,460 --> 00:21:31,529 bring this sort of copy paste 612 00:21:31,530 --> 00:21:33,689 scenario to a close to the point 613 00:21:33,690 --> 00:21:35,639 where we could find an image online. 614 00:21:35,640 --> 00:21:37,769 We could find it on Flickr with click 615 00:21:37,770 --> 00:21:40,589 copy of it, would get it into 616 00:21:40,590 --> 00:21:42,749 LibreOffice, would kick paste and will 617 00:21:42,750 --> 00:21:44,039 insert the image together with the 618 00:21:44,040 --> 00:21:46,139 attribution. And then if you copy again 619 00:21:46,140 --> 00:21:47,939 from there and get it into a Web based 620 00:21:47,940 --> 00:21:49,919 locator and could paste the attribution 621 00:21:49,920 --> 00:21:50,940 carries with it as well. 622 00:21:52,020 --> 00:21:54,449 Now the problem we had when implementing 623 00:21:54,450 --> 00:21:56,189 all of this is that in most cases, 624 00:21:56,190 --> 00:21:57,789 whenever you're talking about copy paste, 625 00:21:57,790 --> 00:21:58,799 that's an operation. 626 00:21:58,800 --> 00:22:00,779 It involves changing the core of the 627 00:22:00,780 --> 00:22:02,389 applications. 628 00:22:02,390 --> 00:22:03,989 It's not possible to do this in the 629 00:22:03,990 --> 00:22:06,059 general case with Yuson additions or an 630 00:22:06,060 --> 00:22:07,679 extension to a program. 631 00:22:07,680 --> 00:22:09,059 You need to actually change the core. 632 00:22:09,060 --> 00:22:11,459 Or alternatively, you need to implement 633 00:22:11,460 --> 00:22:13,749 your own copy and paste functions, 634 00:22:13,750 --> 00:22:16,859 but obviously very quickly gets messy. 635 00:22:16,860 --> 00:22:18,689 You also have the UI visibility issue. 636 00:22:18,690 --> 00:22:20,249 End of most applications don't really 637 00:22:20,250 --> 00:22:21,779 show the metadata. 638 00:22:21,780 --> 00:22:23,609 They don't care about it, so they hide it 639 00:22:23,610 --> 00:22:25,919 away. And as well, we realize 640 00:22:25,920 --> 00:22:27,419 that there are significant clipboard 641 00:22:27,420 --> 00:22:29,579 differences. What worked for us on an 642 00:22:29,580 --> 00:22:30,930 X based Linux system 643 00:22:32,490 --> 00:22:34,229 did not work on Windows and did not work 644 00:22:34,230 --> 00:22:36,299 on Mac OS X, so we were kind 645 00:22:36,300 --> 00:22:38,069 of stuck in that path. 646 00:22:38,070 --> 00:22:40,139 So that's why we went back and we did the 647 00:22:40,140 --> 00:22:41,999 e-mail approach as well, because that 648 00:22:42,000 --> 00:22:43,410 works on all platforms. 649 00:22:44,840 --> 00:22:46,939 So we did a copy paste, 650 00:22:46,940 --> 00:22:49,249 we even got to the point where we 651 00:22:49,250 --> 00:22:51,619 got LibreOffice Press Center 652 00:22:51,620 --> 00:22:53,749 to accept images that repace it 653 00:22:53,750 --> 00:22:55,759 into it, and you could paste as many as 654 00:22:55,760 --> 00:22:57,439 you want and you can move them around. 655 00:22:57,440 --> 00:22:59,989 You could remove them or add new images. 656 00:22:59,990 --> 00:23:01,429 And then at the end of the presentation, 657 00:23:01,430 --> 00:23:03,149 you would ask you to insert new slide and 658 00:23:03,150 --> 00:23:04,429 then insert credits. 659 00:23:04,430 --> 00:23:05,839 And it will give you a list of all the 660 00:23:05,840 --> 00:23:07,039 images that you used in your 661 00:23:07,040 --> 00:23:08,040 presentation. 662 00:23:09,770 --> 00:23:11,719 All of that code is up on our GitHub, so 663 00:23:11,720 --> 00:23:13,849 please feel free to check that out 664 00:23:13,850 --> 00:23:14,850 if you want. 665 00:23:15,650 --> 00:23:18,649 Now, doing this, however, 666 00:23:18,650 --> 00:23:20,240 is a very massive effort 667 00:23:21,830 --> 00:23:23,869 because it involves changing every single 668 00:23:23,870 --> 00:23:25,099 application that we use. 669 00:23:26,550 --> 00:23:28,649 And that's quite a few and 670 00:23:28,650 --> 00:23:31,169 it becomes very application specific 671 00:23:31,170 --> 00:23:32,519 whenever you want to do something for 672 00:23:32,520 --> 00:23:34,079 LibreOffice, it was different than doing 673 00:23:34,080 --> 00:23:36,089 it from a lower ed, even if you can 674 00:23:36,090 --> 00:23:38,249 abstract parts of it away and make use 675 00:23:38,250 --> 00:23:40,619 of some common libraries, 676 00:23:40,620 --> 00:23:42,839 it was still quite a substantial 677 00:23:42,840 --> 00:23:44,459 effort to actually get this working at 678 00:23:44,460 --> 00:23:45,460 all. 679 00:23:45,900 --> 00:23:47,669 So we're sort of thinking what would be 680 00:23:47,670 --> 00:23:49,289 the Unix way of doing this? 681 00:23:53,110 --> 00:23:55,719 So if the problem is to retain 682 00:23:55,720 --> 00:23:57,939 and manage metadata, 683 00:23:57,940 --> 00:24:00,369 why don't we solve that particular 684 00:24:00,370 --> 00:24:02,439 problem? Let us not solve the 685 00:24:02,440 --> 00:24:04,509 issue of making this work 686 00:24:04,510 --> 00:24:06,519 in an application, but let's solve the 687 00:24:06,520 --> 00:24:09,009 simple problem of retaining and managing 688 00:24:09,010 --> 00:24:10,010 metadata. 689 00:24:12,340 --> 00:24:13,959 So we started working on what became 690 00:24:13,960 --> 00:24:14,960 known as a logio. 691 00:24:16,370 --> 00:24:18,589 Elijo is a distributed 692 00:24:18,590 --> 00:24:20,549 catalog of creative works that's a 693 00:24:20,550 --> 00:24:22,699 glorified term, what we're 694 00:24:22,700 --> 00:24:25,069 really talking about, honestly, is 695 00:24:25,070 --> 00:24:26,929 a meta data database. 696 00:24:26,930 --> 00:24:29,179 It's a database that is specifically 697 00:24:29,180 --> 00:24:31,549 crafted to hold information 698 00:24:31,550 --> 00:24:32,900 about creative works. 699 00:24:34,100 --> 00:24:36,469 And it can look like this. 700 00:24:36,470 --> 00:24:37,470 You'll get the. 701 00:24:38,920 --> 00:24:40,839 The identifier at work, which in this 702 00:24:40,840 --> 00:24:42,459 case points to our catalog and the 703 00:24:42,460 --> 00:24:44,799 identifier of it, and you will get a 704 00:24:44,800 --> 00:24:47,019 Jassam structure in this case back, 705 00:24:47,020 --> 00:24:49,269 which gives you a locator, for instance, 706 00:24:49,270 --> 00:24:51,219 saying that this image is while in this 707 00:24:51,220 --> 00:24:53,529 case Alexanderplatz in Berlin. 708 00:24:53,530 --> 00:24:55,209 It has a block cache, which I'll show in 709 00:24:55,210 --> 00:24:56,829 a little bit, and it has a particular 710 00:24:56,830 --> 00:24:58,389 license. In this case, it's not a 711 00:24:58,390 --> 00:25:00,189 license. In itself, it's just a public 712 00:25:00,190 --> 00:25:03,279 domain. Mark now 713 00:25:03,280 --> 00:25:05,589 Eulogio uses W three C media 714 00:25:05,590 --> 00:25:07,689 annotations as its way of recording 715 00:25:07,690 --> 00:25:10,089 information about works, which, 716 00:25:10,090 --> 00:25:11,769 you know, it's a fair enough metadata 717 00:25:11,770 --> 00:25:13,509 standard that most other metadata 718 00:25:13,510 --> 00:25:15,669 standards can be mapped into, like 719 00:25:15,670 --> 00:25:18,069 EXIF and Ibtissam and provides 720 00:25:18,070 --> 00:25:19,119 an API. 721 00:25:19,120 --> 00:25:21,549 So for any image that is part 722 00:25:21,550 --> 00:25:23,619 of this catalog, you can easily look it 723 00:25:23,620 --> 00:25:25,689 up using the jewel of that image. 724 00:25:25,690 --> 00:25:27,789 Or if it's an image indeed, then 725 00:25:27,790 --> 00:25:28,959 the block hash of that image. 726 00:25:28,960 --> 00:25:31,179 And I'll explain Bloxwich in a while. 727 00:25:31,180 --> 00:25:33,519 The way it works is that you have a work 728 00:25:33,520 --> 00:25:35,709 record which explains the 729 00:25:35,710 --> 00:25:37,299 image itself, gives you the author, gives 730 00:25:37,300 --> 00:25:39,189 it license, and then you have multiple 731 00:25:39,190 --> 00:25:41,409 media records, because we realized quite 732 00:25:41,410 --> 00:25:43,659 quickly as well that if someone 733 00:25:43,660 --> 00:25:46,029 takes an image posted to their own 734 00:25:46,030 --> 00:25:47,979 website, it will most likely get a 735 00:25:47,980 --> 00:25:49,299 different URL. 736 00:25:49,300 --> 00:25:51,519 So it will be the same work but will 737 00:25:51,520 --> 00:25:52,659 have a different media. 738 00:25:52,660 --> 00:25:54,189 It might have a different resolution, 739 00:25:54,190 --> 00:25:56,049 might have been changed in some way. 740 00:25:56,050 --> 00:25:57,339 So you can have multiple mediums 741 00:25:57,340 --> 00:25:59,140 connected to each creative work. 742 00:26:00,220 --> 00:26:02,559 And we've seen in the database 743 00:26:02,560 --> 00:26:04,899 with 22 million images from Wikimedia 744 00:26:04,900 --> 00:26:05,679 Commons. 745 00:26:05,680 --> 00:26:07,269 So essentially for any image that as part 746 00:26:07,270 --> 00:26:08,859 of the Wikimedia Commons, you can look 747 00:26:08,860 --> 00:26:10,809 that up in our database and we'll get you 748 00:26:10,810 --> 00:26:11,810 back to metadata. 749 00:26:13,350 --> 00:26:15,449 We also developed to browser plugins, 750 00:26:15,450 --> 00:26:17,669 one for Chrome, one for Firefox that 751 00:26:17,670 --> 00:26:19,469 can interact with this API 752 00:26:20,580 --> 00:26:22,439 and you're asking yourself, so what does 753 00:26:22,440 --> 00:26:23,519 it really do? 754 00:26:23,520 --> 00:26:25,799 Well, this is one of the things it does. 755 00:26:25,800 --> 00:26:27,939 If you're out browsing the Web, you 756 00:26:27,940 --> 00:26:29,729 got a logic plug in installing Chrome or 757 00:26:29,730 --> 00:26:30,869 Firefox. 758 00:26:30,870 --> 00:26:32,069 If you see an image or define 759 00:26:32,070 --> 00:26:33,869 interesting, you can open the allowed 760 00:26:33,870 --> 00:26:35,969 your sidebar and you identify 761 00:26:35,970 --> 00:26:37,289 the emerging query. 762 00:26:37,290 --> 00:26:39,509 And if that image is part in this case 763 00:26:39,510 --> 00:26:41,579 of a logo, meaning by extension, that is 764 00:26:41,580 --> 00:26:43,889 part of the comments at a moment, then 765 00:26:43,890 --> 00:26:45,389 it will get you the information about 766 00:26:45,390 --> 00:26:47,279 that image. It will show you the title of 767 00:26:47,280 --> 00:26:49,529 it, who authored it 768 00:26:49,530 --> 00:26:51,599 and give you the 769 00:26:51,600 --> 00:26:53,199 appropriate license for it. 770 00:26:53,200 --> 00:26:54,200 If that is not an. 771 00:26:56,880 --> 00:26:59,339 It will even Greenmarket 772 00:26:59,340 --> 00:27:01,169 licenses that are free cultural licenses 773 00:27:01,170 --> 00:27:02,170 because we love them. 774 00:27:04,630 --> 00:27:07,269 Now, it also offers you the opportunity 775 00:27:07,270 --> 00:27:09,759 to copy this image as an e-mail fragment, 776 00:27:09,760 --> 00:27:11,859 and you can take that image and paste 777 00:27:11,860 --> 00:27:14,049 it into LibreOffice as an example, 778 00:27:14,050 --> 00:27:16,209 and it will copy over 779 00:27:16,210 --> 00:27:17,439 not only the image, but also the 780 00:27:17,440 --> 00:27:19,719 attribution and at work straight 781 00:27:19,720 --> 00:27:21,459 off without anything except a browser 782 00:27:21,460 --> 00:27:23,059 plugin, which is nice. 783 00:27:23,060 --> 00:27:25,269 Now, what's the catch 784 00:27:25,270 --> 00:27:26,919 of this? Well, there's obviously a catch 785 00:27:26,920 --> 00:27:28,930 to this, which is that. 786 00:27:30,800 --> 00:27:33,199 Identifying an image 787 00:27:33,200 --> 00:27:35,680 that has been resized as an example. 788 00:27:37,390 --> 00:27:39,579 How we can do that depends heavily 789 00:27:39,580 --> 00:27:41,739 on the algorithm that we used to do 790 00:27:41,740 --> 00:27:44,109 that matching, and 791 00:27:44,110 --> 00:27:46,719 for Lajo, we wanted to have an algorithm 792 00:27:46,720 --> 00:27:48,909 that was very lightweight, that didn't 793 00:27:48,910 --> 00:27:51,190 take a lot of resources that. 794 00:27:52,260 --> 00:27:54,269 Could be calculated quickly within the 795 00:27:54,270 --> 00:27:56,969 browser, and that would generate 796 00:27:56,970 --> 00:27:59,189 some kind of value for an image 797 00:27:59,190 --> 00:28:01,409 that would not change even if you 798 00:28:01,410 --> 00:28:02,549 resize the image 799 00:28:03,690 --> 00:28:05,879 and ideally it should generate as 800 00:28:05,880 --> 00:28:08,279 few as possible false positives 801 00:28:08,280 --> 00:28:09,599 or false negatives. 802 00:28:11,040 --> 00:28:13,439 So the way that our algorithm works now 803 00:28:13,440 --> 00:28:16,019 run you through the algorithm as quickly. 804 00:28:16,020 --> 00:28:18,239 So did you see it before 805 00:28:18,240 --> 00:28:20,009 I talk about where it does not work? 806 00:28:21,570 --> 00:28:23,429 So this Alexanderplatz, this Alexandrov 807 00:28:23,430 --> 00:28:25,799 person, seventeen hundreds in Berlin 808 00:28:25,800 --> 00:28:27,569 and you'll see that I've taken this image 809 00:28:27,570 --> 00:28:29,729 and we've split it into 16 by 810 00:28:29,730 --> 00:28:30,989 16 cells. 811 00:28:30,990 --> 00:28:33,389 So it's a matrix here, 16 812 00:28:33,390 --> 00:28:35,369 by 16 happens to be 256. 813 00:28:35,370 --> 00:28:36,809 So that's a number of bits that are 814 00:28:36,810 --> 00:28:38,879 hashes actually generate 815 00:28:38,880 --> 00:28:40,739 what we do with this image after we 816 00:28:40,740 --> 00:28:42,749 segmented it into this way, into this 817 00:28:42,750 --> 00:28:44,939 matrix, is that for each 818 00:28:44,940 --> 00:28:47,009 cell we calculate the 819 00:28:47,010 --> 00:28:49,199 sum of all the pixels within the 820 00:28:49,200 --> 00:28:51,179 cell, and we do that for all of the 821 00:28:51,180 --> 00:28:53,069 cells. So we'll get something looking 822 00:28:53,070 --> 00:28:54,809 like this, a bunch of numbers across the 823 00:28:54,810 --> 00:28:55,739 board. 824 00:28:55,740 --> 00:28:57,839 We calculate the median 825 00:28:57,840 --> 00:28:59,909 of all those numbers and then 826 00:28:59,910 --> 00:29:02,129 we go through each cell in turn and 827 00:29:02,130 --> 00:29:04,619 we see is the value 828 00:29:04,620 --> 00:29:06,959 within that cell above or below 829 00:29:06,960 --> 00:29:08,249 the median. 830 00:29:08,250 --> 00:29:10,829 And then we assign either a zero or a one 831 00:29:10,830 --> 00:29:11,759 to that one. 832 00:29:11,760 --> 00:29:13,589 So then we get to a hash looking like 833 00:29:13,590 --> 00:29:15,239 that, and then we just wrap that up and 834 00:29:15,240 --> 00:29:16,979 pack it as a hexadecimal number. 835 00:29:16,980 --> 00:29:18,269 And that's our hash. 836 00:29:18,270 --> 00:29:20,129 OK. So it's very simple. 837 00:29:20,130 --> 00:29:21,629 It's very efficient. 838 00:29:21,630 --> 00:29:23,130 Takes almost no time to compute. 839 00:29:24,270 --> 00:29:25,649 And you end up with hashes looking like 840 00:29:25,650 --> 00:29:28,169 this. So the first one is a hash 841 00:29:28,170 --> 00:29:30,299 that I made of Alexanderplatz in a six 842 00:29:30,300 --> 00:29:31,649 hundred forty eight times three hundred 843 00:29:31,650 --> 00:29:35,099 twenty six pixel resolution. 844 00:29:35,100 --> 00:29:37,499 And the second one is the same image 845 00:29:37,500 --> 00:29:39,749 but rescale to two hundred by one hundred 846 00:29:39,750 --> 00:29:40,859 and two pixels. 847 00:29:40,860 --> 00:29:43,229 So about one third the size. 848 00:29:43,230 --> 00:29:45,419 And you'll see that they do indeed look 849 00:29:45,420 --> 00:29:46,319 similar. 850 00:29:46,320 --> 00:29:48,419 They're not identical 851 00:29:48,420 --> 00:29:49,649 because obviously some things might 852 00:29:49,650 --> 00:29:51,479 change when you rescaling the JPEG. 853 00:29:51,480 --> 00:29:53,399 They're not identical, but they don't 854 00:29:53,400 --> 00:29:54,689 differ that much either. 855 00:29:54,690 --> 00:29:56,849 They differ. If you expand this into the 856 00:29:56,850 --> 00:29:59,189 bit feel that we have, they differ 857 00:29:59,190 --> 00:30:00,299 in six positions. 858 00:30:00,300 --> 00:30:02,669 So six bits are the difference 859 00:30:02,670 --> 00:30:04,829 between this larger 860 00:30:04,830 --> 00:30:06,029 size and a smaller size. 861 00:30:06,030 --> 00:30:07,679 So when we apply the block hash algorithm 862 00:30:09,240 --> 00:30:11,399 and we come from experience 863 00:30:11,400 --> 00:30:13,649 to say that if something is 864 00:30:13,650 --> 00:30:15,779 six bits or lower or ten bits 865 00:30:15,780 --> 00:30:17,849 or lower, then we can be fairly 866 00:30:17,850 --> 00:30:19,319 confident that we're talking about the 867 00:30:19,320 --> 00:30:22,109 same image, even if it has been resized. 868 00:30:22,110 --> 00:30:24,659 Unfortunately, however, reality 869 00:30:24,660 --> 00:30:26,369 comes and bites you in the ass. 870 00:30:26,370 --> 00:30:28,919 So this is 871 00:30:28,920 --> 00:30:32,129 my son, this in Greece a few days ago, 872 00:30:32,130 --> 00:30:34,559 and it represents something that 873 00:30:34,560 --> 00:30:35,499 people love doing. 874 00:30:35,500 --> 00:30:37,829 And they take pictures in this case, our 875 00:30:37,830 --> 00:30:39,989 kids, they take pictures of 876 00:30:39,990 --> 00:30:41,099 Skyline's. 877 00:30:41,100 --> 00:30:43,469 And all those pictures have a common 878 00:30:43,470 --> 00:30:46,019 denominator that they have a very 879 00:30:46,020 --> 00:30:48,299 bright upper half, usually 880 00:30:48,300 --> 00:30:50,489 white or blue sky, and then they have 881 00:30:50,490 --> 00:30:52,679 a very dark contrasting 882 00:30:52,680 --> 00:30:54,269 lower part. 883 00:30:54,270 --> 00:30:56,039 And what happens if you have a very 884 00:30:56,040 --> 00:30:58,379 bright part on top and a very dark 885 00:30:58,380 --> 00:30:59,919 part of the bottom? 886 00:30:59,920 --> 00:31:00,989 Well, what happens when you do the 887 00:31:00,990 --> 00:31:03,479 numbers? You end up with an upper half 888 00:31:03,480 --> 00:31:05,549 being with very low numbers and the 889 00:31:05,550 --> 00:31:08,099 lower half being with very high numbers. 890 00:31:08,100 --> 00:31:10,019 And if you take the median, it will be 891 00:31:10,020 --> 00:31:11,669 somewhere in the middle of this. 892 00:31:11,670 --> 00:31:13,799 But when you then check if something is 893 00:31:13,800 --> 00:31:16,019 higher or lower than the median, you end 894 00:31:16,020 --> 00:31:18,479 up with the hash that essentially 895 00:31:18,480 --> 00:31:20,639 a bunch of zeros followed by a bunch 896 00:31:20,640 --> 00:31:22,829 of ones because the contrast is so 897 00:31:22,830 --> 00:31:24,359 great between upper half and the lower 898 00:31:24,360 --> 00:31:26,969 half that all the differences 899 00:31:26,970 --> 00:31:29,099 within those regions are simply lost. 900 00:31:29,100 --> 00:31:30,300 Their overpowered by this. 901 00:31:31,680 --> 00:31:33,869 So this was the original block 902 00:31:33,870 --> 00:31:35,549 hash algorithm, the way it worked when it 903 00:31:35,550 --> 00:31:37,139 was implemented in straight off from the 904 00:31:37,140 --> 00:31:38,699 research literature. 905 00:31:38,700 --> 00:31:40,859 We changed this algorithm and we changed 906 00:31:40,860 --> 00:31:43,089 it in a very easy way. 907 00:31:43,090 --> 00:31:45,479 We simply split it up, 908 00:31:45,480 --> 00:31:47,969 split this field up into four 909 00:31:47,970 --> 00:31:50,129 distinct horizontal blocks. 910 00:31:50,130 --> 00:31:52,919 And we do the median calculation 911 00:31:52,920 --> 00:31:55,079 not for the entire image, but for each 912 00:31:55,080 --> 00:31:57,179 block itself, which means that 913 00:31:57,180 --> 00:31:59,969 even if the first block is only blue sky, 914 00:31:59,970 --> 00:32:01,619 even a blue sky will have slight 915 00:32:01,620 --> 00:32:03,249 variations in it. 916 00:32:03,250 --> 00:32:06,329 And if we calculate the median on that 917 00:32:06,330 --> 00:32:09,149 and then due to the calculation, 918 00:32:09,150 --> 00:32:12,029 then we'll get a lot more contrast 919 00:32:12,030 --> 00:32:14,249 or got a lot more details of it. 920 00:32:14,250 --> 00:32:16,349 So that's the way the algorithm works 921 00:32:16,350 --> 00:32:18,539 right now and gives us hashes like that. 922 00:32:18,540 --> 00:32:21,269 So it gives us much more detail 923 00:32:21,270 --> 00:32:23,010 for essentially the same images. 924 00:32:24,390 --> 00:32:25,619 Now we're still. 925 00:32:26,900 --> 00:32:29,869 Getting collisions, that's unavoidable, 926 00:32:29,870 --> 00:32:31,549 we're getting collisions in about one 927 00:32:31,550 --> 00:32:34,099 percent of cases, we 928 00:32:34,100 --> 00:32:35,809 got about 100000 images from the 929 00:32:35,810 --> 00:32:38,299 Internet. We ran our 930 00:32:38,300 --> 00:32:40,489 our algorithm on them and we compared 931 00:32:40,490 --> 00:32:42,649 them to each other in a crosswise manner. 932 00:32:42,650 --> 00:32:44,239 So there's one percent collisions. 933 00:32:44,240 --> 00:32:46,549 Collision here means that two images 934 00:32:46,550 --> 00:32:48,649 or even more images generate the same 935 00:32:48,650 --> 00:32:50,569 hash, identical hash. 936 00:32:50,570 --> 00:32:52,939 Now, however, in 84 percent 937 00:32:52,940 --> 00:32:55,129 of those collisions, we are talking 938 00:32:55,130 --> 00:32:57,559 about two to three images generating 939 00:32:57,560 --> 00:32:59,779 the same. So we figured that this 940 00:32:59,780 --> 00:33:01,429 is fairly OK. 941 00:33:01,430 --> 00:33:03,679 This is also one hundred thousand random 942 00:33:03,680 --> 00:33:06,139 images, which means that 943 00:33:06,140 --> 00:33:08,209 ClipArt maps and 944 00:33:08,210 --> 00:33:10,189 various other things, which maybe differ 945 00:33:10,190 --> 00:33:11,419 in very small details. 946 00:33:11,420 --> 00:33:12,980 They're also counted as a collision here. 947 00:33:14,660 --> 00:33:16,189 We also get a number of false positives. 948 00:33:16,190 --> 00:33:17,390 However, these are. 949 00:33:18,600 --> 00:33:20,819 Images that are recognized as being 950 00:33:20,820 --> 00:33:24,029 similar without actually being similar, 951 00:33:24,030 --> 00:33:26,429 because the algorithm generates close 952 00:33:26,430 --> 00:33:28,529 matches for them, if we set the 953 00:33:28,530 --> 00:33:30,749 maximum distance and allow up to 954 00:33:30,750 --> 00:33:33,029 10 bits variation between two images 955 00:33:33,030 --> 00:33:34,859 to classify them as the same. 956 00:33:34,860 --> 00:33:37,409 Then we get about one point eight percent 957 00:33:37,410 --> 00:33:38,699 false positives. 958 00:33:38,700 --> 00:33:40,349 We can get that down quite substantially 959 00:33:40,350 --> 00:33:41,789 by us lowering the distance that we 960 00:33:41,790 --> 00:33:43,889 allow. If you say three or 961 00:33:43,890 --> 00:33:46,139 five, we're down to less than zero point 962 00:33:46,140 --> 00:33:48,239 two percent. So 963 00:33:48,240 --> 00:33:50,879 somewhere there, we feel that we're 964 00:33:50,880 --> 00:33:51,880 doing quite well. 965 00:33:52,810 --> 00:33:55,349 Now, what about derivative works? 966 00:33:55,350 --> 00:33:56,759 What about ClipArt? 967 00:33:56,760 --> 00:33:57,960 Well, in one word, forget it. 968 00:33:59,400 --> 00:34:01,109 Derivative work, meaning when you take an 969 00:34:01,110 --> 00:34:02,789 image, you add a border to it as an 970 00:34:02,790 --> 00:34:04,849 example. You take an image, you 971 00:34:04,850 --> 00:34:05,880 crop it in some way. 972 00:34:07,080 --> 00:34:09,689 Now, if you can imagine the algorithm, 973 00:34:09,690 --> 00:34:11,908 you'll quickly recognize that if you drop 974 00:34:11,909 --> 00:34:14,009 an image, it will generate a very 975 00:34:14,010 --> 00:34:17,009 different hash from it. 976 00:34:17,010 --> 00:34:19,928 So we set the bar and we set 977 00:34:19,929 --> 00:34:21,988 the limit for ourselves, saying 978 00:34:21,989 --> 00:34:24,119 that we will do 979 00:34:24,120 --> 00:34:26,789 our best to match 980 00:34:26,790 --> 00:34:28,888 verbatim copies of what work you 981 00:34:28,889 --> 00:34:31,468 can resize it as much as you want. 982 00:34:31,469 --> 00:34:33,539 You can change the format from A to 983 00:34:33,540 --> 00:34:35,759 JPEG to give them back again if you want, 984 00:34:35,760 --> 00:34:38,009 and we'll do our best to match that. 985 00:34:38,010 --> 00:34:40,049 But if you make a derivative work, if you 986 00:34:40,050 --> 00:34:41,759 add a border or you try to change around 987 00:34:41,760 --> 00:34:43,979 an image in some way, then 988 00:34:43,980 --> 00:34:44,908 all bets are off. 989 00:34:44,909 --> 00:34:46,259 We're not going to guarantee you a match 990 00:34:46,260 --> 00:34:48,959 on that one as well as ClipArt 991 00:34:48,960 --> 00:34:51,299 or any other diagrams or 992 00:34:51,300 --> 00:34:53,729 graphs where you have large areas 993 00:34:53,730 --> 00:34:56,399 of white or black or some other color and 994 00:34:56,400 --> 00:34:57,839 a few lines. 995 00:34:57,840 --> 00:34:59,759 It will do a generally a rather bad job 996 00:34:59,760 --> 00:35:01,289 at that as well, because, again, you have 997 00:35:01,290 --> 00:35:02,640 this high contrast areas, 998 00:35:03,720 --> 00:35:05,669 but instead we do get something that's 999 00:35:05,670 --> 00:35:07,709 blindingly fast and we're very small 1000 00:35:07,710 --> 00:35:09,599 hashas with few false positives. 1001 00:35:09,600 --> 00:35:11,699 And this all up on blockhouse, on IO, if 1002 00:35:11,700 --> 00:35:12,999 you feel like implementing this yourself, 1003 00:35:13,000 --> 00:35:14,879 we have a look at what it does in 1004 00:35:14,880 --> 00:35:16,189 practice. 1005 00:35:16,190 --> 00:35:18,479 Now, unfortunately 1006 00:35:18,480 --> 00:35:20,819 for our case, 22 million 1007 00:35:20,820 --> 00:35:22,979 images is not nearly 1008 00:35:22,980 --> 00:35:24,929 as much as we need to actually make this 1009 00:35:24,930 --> 00:35:26,159 useful. 1010 00:35:26,160 --> 00:35:28,349 Now, 22 million images may sound like 1011 00:35:28,350 --> 00:35:30,509 a lot, but it's a very small 1012 00:35:30,510 --> 00:35:32,729 fraction of what we actually need now. 1013 00:35:32,730 --> 00:35:33,689 Creative Commons. 1014 00:35:33,690 --> 00:35:35,759 But a month ago released their 1015 00:35:35,760 --> 00:35:37,979 sort of state seat 1016 00:35:37,980 --> 00:35:39,989 of the year and saying that about 800 1017 00:35:39,990 --> 00:35:42,119 million images, 800 1018 00:35:42,120 --> 00:35:44,189 million works out there, which are 1019 00:35:44,190 --> 00:35:45,359 openly licensed. 1020 00:35:45,360 --> 00:35:47,159 That's the size of the Commons at the 1021 00:35:47,160 --> 00:35:48,089 moment. 1022 00:35:48,090 --> 00:35:50,219 Now, not all of those are images, but 1023 00:35:50,220 --> 00:35:51,899 a fair portion of them are. 1024 00:35:51,900 --> 00:35:53,399 I'm estimating that there's probably 1025 00:35:53,400 --> 00:35:55,229 about a half a billion images out there 1026 00:35:55,230 --> 00:35:56,429 that are openly licensed. 1027 00:35:56,430 --> 00:35:58,199 That should be part of a larger but which 1028 00:35:58,200 --> 00:35:59,200 is not there today. 1029 00:36:00,540 --> 00:36:02,759 Now scaling up to half 1030 00:36:02,760 --> 00:36:05,279 a billion images, it's doable 1031 00:36:05,280 --> 00:36:06,599 in terms of databases. 1032 00:36:06,600 --> 00:36:09,299 Doesn't add as much as we would fear. 1033 00:36:09,300 --> 00:36:11,079 So we can easily do that. 1034 00:36:11,080 --> 00:36:13,439 However, we're talking about 1035 00:36:13,440 --> 00:36:15,269 searching by a perceptual hasher. 1036 00:36:15,270 --> 00:36:17,849 We're talking about searching for 1037 00:36:17,850 --> 00:36:20,669 a hash value, an image where we allow 1038 00:36:20,670 --> 00:36:23,369 up to 10 bits of difference. 1039 00:36:23,370 --> 00:36:25,529 Now, if we say that, we're not 1040 00:36:25,530 --> 00:36:27,000 going to allow any difference. 1041 00:36:28,580 --> 00:36:30,079 That would be a very easy search. 1042 00:36:30,080 --> 00:36:32,329 We can do any kind of database, 1043 00:36:32,330 --> 00:36:35,269 can search for unique values. 1044 00:36:35,270 --> 00:36:36,859 That's not a problem. 1045 00:36:36,860 --> 00:36:38,659 But if you're searching for something 1046 00:36:38,660 --> 00:36:41,179 that is similar to something else, 1047 00:36:41,180 --> 00:36:42,769 that becomes a very different problem. 1048 00:36:44,570 --> 00:36:47,059 So we found, 1049 00:36:47,060 --> 00:36:48,889 again, some research to help us along our 1050 00:36:48,890 --> 00:36:49,669 way. 1051 00:36:49,670 --> 00:36:52,519 This algorithm, surprisingly not perhaps 1052 00:36:52,520 --> 00:36:53,959 comes from Google. 1053 00:36:53,960 --> 00:36:56,569 It's called Items Search, 1054 00:36:56,570 --> 00:36:59,209 and it partitions 1055 00:36:59,210 --> 00:37:01,579 the hashes in a way 1056 00:37:01,580 --> 00:37:03,679 that you avoid doing a 1057 00:37:03,680 --> 00:37:06,109 search of all 22 million 1058 00:37:06,110 --> 00:37:08,549 for any hash you throw at this algorithm, 1059 00:37:08,550 --> 00:37:10,549 it will give you back maybe a few 1060 00:37:10,550 --> 00:37:12,539 thousands of possible matches. 1061 00:37:12,540 --> 00:37:14,419 And then you need to sift through those 1062 00:37:14,420 --> 00:37:16,219 to figure out which are real matches and 1063 00:37:16,220 --> 00:37:17,220 which are not. 1064 00:37:18,020 --> 00:37:20,149 Again, also available on GitHub 1065 00:37:20,150 --> 00:37:21,979 in search of IoE. 1066 00:37:21,980 --> 00:37:23,749 Now, where are we going from here? 1067 00:37:23,750 --> 00:37:25,969 Well, the first thing we want to do 1068 00:37:25,970 --> 00:37:28,129 beyond scaling to half a billion 1069 00:37:28,130 --> 00:37:30,439 works, which we should do, is 1070 00:37:30,440 --> 00:37:32,089 to flip the reed right bit. 1071 00:37:32,090 --> 00:37:34,429 Because at the moment, while logic 1072 00:37:34,430 --> 00:37:36,589 and while the API has provisions 1073 00:37:36,590 --> 00:37:38,659 that makes it possible or would make 1074 00:37:38,660 --> 00:37:40,039 it possible for someone to edit 1075 00:37:40,040 --> 00:37:42,259 information that's all within a LAJO, 1076 00:37:42,260 --> 00:37:44,239 we haven't actually enabled that yet. 1077 00:37:44,240 --> 00:37:46,189 So far we have just taken information 1078 00:37:46,190 --> 00:37:48,439 from Wikimedia Commons and put 1079 00:37:48,440 --> 00:37:50,269 that into the database and sort of read 1080 00:37:50,270 --> 00:37:51,799 Only Repository. 1081 00:37:51,800 --> 00:37:53,819 And we rely on people updating 1082 00:37:53,820 --> 00:37:55,819 information on Wikimedia Commons and then 1083 00:37:55,820 --> 00:37:57,860 we get that information into a larger. 1084 00:38:01,260 --> 00:38:03,509 But flipping that bit and making it read 1085 00:38:03,510 --> 00:38:05,249 right, that's what's going to change 1086 00:38:05,250 --> 00:38:06,250 things. 1087 00:38:06,930 --> 00:38:08,909 We also need to extend a larger to 1088 00:38:08,910 --> 00:38:10,829 support non images so any other kind of 1089 00:38:10,830 --> 00:38:12,899 creative works. Again, scales quite 1090 00:38:12,900 --> 00:38:15,359 massively beyond a half billion even. 1091 00:38:15,360 --> 00:38:17,399 And we want to implement support for the 1092 00:38:17,400 --> 00:38:19,499 API directly in applications. 1093 00:38:19,500 --> 00:38:21,509 So, again, going back to the application 1094 00:38:21,510 --> 00:38:23,579 side and figuring out, OK, now 1095 00:38:23,580 --> 00:38:25,919 that we have solved retention and 1096 00:38:25,920 --> 00:38:28,109 editing or meta data separately, how can 1097 00:38:28,110 --> 00:38:29,729 we then make a link to the application? 1098 00:38:30,840 --> 00:38:31,840 Now. 1099 00:38:33,310 --> 00:38:35,349 How the heck does this relate to 1100 00:38:35,350 --> 00:38:36,729 copyright, as I promised from the 1101 00:38:36,730 --> 00:38:37,869 beginning? 1102 00:38:37,870 --> 00:38:40,389 Well, it's easy to think 1103 00:38:40,390 --> 00:38:43,059 of a logo as a copyright registry, 1104 00:38:43,060 --> 00:38:44,650 and I promise you that it is not 1105 00:38:45,970 --> 00:38:48,459 a copyright registry is 1106 00:38:48,460 --> 00:38:50,650 something that I personally detest. 1107 00:38:51,730 --> 00:38:53,529 A copyright registry is an attempt by 1108 00:38:53,530 --> 00:38:55,749 someone to 1109 00:38:55,750 --> 00:38:57,879 provide an authoritative database 1110 00:38:57,880 --> 00:39:00,429 and authoritative information about 1111 00:39:00,430 --> 00:39:02,769 who owns certain creative 1112 00:39:02,770 --> 00:39:03,770 works. 1113 00:39:05,170 --> 00:39:07,389 LIGO is not an authoritative source 1114 00:39:07,390 --> 00:39:08,979 of information, and logic is not a 1115 00:39:08,980 --> 00:39:11,439 copyright registry, Eladio 1116 00:39:11,440 --> 00:39:13,899 is billed as a community curated 1117 00:39:13,900 --> 00:39:14,739 repository. 1118 00:39:14,740 --> 00:39:16,569 In this case, the Wikipedia community to 1119 00:39:16,570 --> 00:39:18,669 start with, with an 1120 00:39:18,670 --> 00:39:20,439 implicit agreement or respect. 1121 00:39:21,820 --> 00:39:23,319 So this is something that we learn from 1122 00:39:23,320 --> 00:39:26,199 Wikimedia as well, that there's a reason 1123 00:39:26,200 --> 00:39:27,999 that people keep contributing information 1124 00:39:28,000 --> 00:39:30,759 to Wikipedia. There's a reason why people 1125 00:39:30,760 --> 00:39:32,859 take painstaking efforts to 1126 00:39:32,860 --> 00:39:34,929 actually keep the meta data on Wikimedia 1127 00:39:34,930 --> 00:39:37,149 Commons up to date 1128 00:39:37,150 --> 00:39:38,949 and reliable. 1129 00:39:38,950 --> 00:39:40,809 And that's because there's an implicit 1130 00:39:40,810 --> 00:39:42,849 agreement that we actually want to 1131 00:39:42,850 --> 00:39:44,079 respect the author. 1132 00:39:44,080 --> 00:39:46,209 We want to respect 1133 00:39:46,210 --> 00:39:48,249 the author enough to give accurate credit 1134 00:39:48,250 --> 00:39:49,569 where credit is due. 1135 00:39:49,570 --> 00:39:50,949 We don't want to lose that. 1136 00:39:53,270 --> 00:39:54,270 Now. 1137 00:39:58,710 --> 00:39:59,710 Joe. 1138 00:40:00,890 --> 00:40:02,989 In this way takes a slight 1139 00:40:02,990 --> 00:40:06,049 side step away from 1140 00:40:06,050 --> 00:40:07,399 an initiative like Creative Commons. 1141 00:40:07,400 --> 00:40:08,989 Now, obviously, Creative Commons is the 1142 00:40:08,990 --> 00:40:11,119 licenses themselves and Creative 1143 00:40:11,120 --> 00:40:13,879 Commons was an attempt to work within 1144 00:40:13,880 --> 00:40:16,100 existing copyright regime to show that 1145 00:40:17,120 --> 00:40:20,269 given the situation at a time in 2001, 1146 00:40:20,270 --> 00:40:22,789 we have copyright you want to share. 1147 00:40:22,790 --> 00:40:24,859 We can work within this system 1148 00:40:24,860 --> 00:40:26,929 to give you the tools, the legal 1149 00:40:26,930 --> 00:40:28,790 tools that enables you to do that. 1150 00:40:31,190 --> 00:40:33,529 I believe that we are coming 1151 00:40:33,530 --> 00:40:35,719 to the end of copyright 1152 00:40:35,720 --> 00:40:37,280 as we know it right now. 1153 00:40:39,170 --> 00:40:41,539 Quite recent phrase within the software 1154 00:40:41,540 --> 00:40:43,819 community has been POS was 1155 00:40:43,820 --> 00:40:46,159 posted open source Sweida, 1156 00:40:46,160 --> 00:40:48,769 where the guiding 1157 00:40:48,770 --> 00:40:50,780 light is essentially the phrase 1158 00:40:52,610 --> 00:40:54,590 fuck licensors put it on GitHub 1159 00:40:56,120 --> 00:40:57,120 and. 1160 00:40:58,340 --> 00:40:59,340 Yeah. 1161 00:41:03,030 --> 00:41:04,749 And I think we're seeing the same in the 1162 00:41:04,750 --> 00:41:06,719 criticism as well. 1163 00:41:06,720 --> 00:41:09,419 Copyright is losing its importance 1164 00:41:09,420 --> 00:41:10,550 day today. 1165 00:41:12,320 --> 00:41:14,719 And we're coming to a place 1166 00:41:14,720 --> 00:41:16,939 in time where, 1167 00:41:16,940 --> 00:41:19,310 you know, within five, 10 years, 1168 00:41:20,390 --> 00:41:22,339 I'm very sure that the European 1169 00:41:22,340 --> 00:41:24,589 Parliament and other parliaments around 1170 00:41:24,590 --> 00:41:26,959 the world will take steps to make 1171 00:41:26,960 --> 00:41:29,239 additional exceptions to copyright 1172 00:41:29,240 --> 00:41:31,219 as it is today, to allow even more 1173 00:41:31,220 --> 00:41:33,649 private use as an example 1174 00:41:33,650 --> 00:41:36,079 without hindering people in 1175 00:41:36,080 --> 00:41:37,310 their day to day activities. 1176 00:41:39,020 --> 00:41:41,659 So copyright is changing 1177 00:41:41,660 --> 00:41:43,609 and a logo is one of the tools that we 1178 00:41:43,610 --> 00:41:46,099 need along the way because the logo 1179 00:41:46,100 --> 00:41:48,019 is post copyright licensing, it doesn't 1180 00:41:48,020 --> 00:41:50,329 really care about the license itself. 1181 00:41:50,330 --> 00:41:53,209 It obviously implements support for the 1182 00:41:53,210 --> 00:41:55,009 free media annotation standard. 1183 00:41:55,010 --> 00:41:57,799 It gives you the tools that you need if 1184 00:41:57,800 --> 00:41:59,959 you want to record information 1185 00:41:59,960 --> 00:42:00,960 about a license. 1186 00:42:01,820 --> 00:42:03,769 But the license is not terribly 1187 00:42:03,770 --> 00:42:04,729 important. 1188 00:42:04,730 --> 00:42:06,949 The important part here is who 1189 00:42:06,950 --> 00:42:08,719 actually created it was the provenance of 1190 00:42:08,720 --> 00:42:10,999 a particular work, was the details about 1191 00:42:11,000 --> 00:42:12,000 that work. 1192 00:42:12,800 --> 00:42:15,409 Meaning that from your side, 1193 00:42:15,410 --> 00:42:17,509 we take very great care to respect the 1194 00:42:17,510 --> 00:42:19,459 author, but not the institutional 1195 00:42:19,460 --> 00:42:22,069 copyright, because just as 1196 00:42:22,070 --> 00:42:23,749 the friends photographers that I'm 1197 00:42:23,750 --> 00:42:26,659 talking about before yesterday saying, 1198 00:42:26,660 --> 00:42:28,969 as long as we make sure 1199 00:42:28,970 --> 00:42:30,139 to attribute accurately. 1200 00:42:31,310 --> 00:42:33,409 We're good so we can 1201 00:42:33,410 --> 00:42:36,049 take control of the provenance 1202 00:42:36,050 --> 00:42:38,239 of creative works by using tools 1203 00:42:38,240 --> 00:42:39,829 like a Elijo doesn't need to be allowed 1204 00:42:39,830 --> 00:42:42,139 to be tools like it. 1205 00:42:42,140 --> 00:42:44,419 And we can show the world that we 1206 00:42:44,420 --> 00:42:47,359 care about authors. 1207 00:42:47,360 --> 00:42:48,979 We just don't care about copyright. 1208 00:42:48,980 --> 00:42:51,079 But we care enough about authors to 1209 00:42:51,080 --> 00:42:53,599 take as a collective effort 1210 00:42:53,600 --> 00:42:56,179 control over that information, 1211 00:42:56,180 --> 00:42:58,489 to control the provenance, to keep 1212 00:42:58,490 --> 00:43:00,829 a record of where 1213 00:43:00,830 --> 00:43:03,139 creative works come from, what happens 1214 00:43:03,140 --> 00:43:04,999 to them, and make sure that we attribute 1215 00:43:05,000 --> 00:43:06,000 the author's value. 1216 00:43:07,430 --> 00:43:09,559 It's my firm belief that 1217 00:43:09,560 --> 00:43:11,719 if we respect 1218 00:43:11,720 --> 00:43:13,849 authors, if we attribute 1219 00:43:13,850 --> 00:43:16,129 the authors, if they record 1220 00:43:16,130 --> 00:43:18,109 their contributions, and if we're honest 1221 00:43:18,110 --> 00:43:20,659 about all this, it will make it easier 1222 00:43:20,660 --> 00:43:22,459 to contribute to the Commons. 1223 00:43:22,460 --> 00:43:24,589 It'll be much easier for someone to say, 1224 00:43:24,590 --> 00:43:26,089 here is my image. 1225 00:43:26,090 --> 00:43:27,380 I'm going to upload it here. 1226 00:43:28,400 --> 00:43:30,469 Do what you want with it, just make sure 1227 00:43:30,470 --> 00:43:31,660 that I get credit for it. 1228 00:43:34,290 --> 00:43:36,509 If we respect, attribute and 1229 00:43:36,510 --> 00:43:38,760 record information about images 1230 00:43:39,780 --> 00:43:41,819 that will help raise the value and 1231 00:43:41,820 --> 00:43:43,709 meaning of digital works, just as I 1232 00:43:43,710 --> 00:43:45,899 showed you in the beginning, just 1233 00:43:45,900 --> 00:43:47,999 as knowing that 1234 00:43:48,000 --> 00:43:51,149 the sketch I showed was by Randall Monroe 1235 00:43:51,150 --> 00:43:53,609 changed a value and changed the meaning 1236 00:43:53,610 --> 00:43:55,679 of that work for you, it will 1237 00:43:55,680 --> 00:43:57,299 change the value and meaning of other 1238 00:43:57,300 --> 00:43:58,769 works as well. 1239 00:43:58,770 --> 00:44:02,039 And if we do this as a community, 1240 00:44:02,040 --> 00:44:04,739 then copyright holders 1241 00:44:04,740 --> 00:44:06,869 will eventually be devoid of 1242 00:44:06,870 --> 00:44:09,029 their currently exclusive 1243 00:44:09,030 --> 00:44:11,159 right to dictate, because that's 1244 00:44:11,160 --> 00:44:12,209 what we're doing with copyright 1245 00:44:12,210 --> 00:44:13,319 registries. 1246 00:44:13,320 --> 00:44:15,569 They're telling us that we are 1247 00:44:15,570 --> 00:44:18,449 the owners or they are the owners of 1248 00:44:18,450 --> 00:44:20,639 the culture that we have around 1249 00:44:20,640 --> 00:44:22,829 us with tools like Lajo. 1250 00:44:24,100 --> 00:44:25,749 We're coming together as a community and 1251 00:44:25,750 --> 00:44:27,559 saying that. 1252 00:44:27,560 --> 00:44:29,779 We know who authored this, 1253 00:44:29,780 --> 00:44:32,299 and we will take care to recognize that 1254 00:44:32,300 --> 00:44:34,339 you don't need to tell us, we'll keep 1255 00:44:34,340 --> 00:44:35,479 track of that ourselves. 1256 00:44:35,480 --> 00:44:36,379 Thank you very much. 1257 00:44:36,380 --> 00:44:37,380 And thank you for listening. 1258 00:44:44,360 --> 00:44:45,360 Thank you, Eunice. 1259 00:44:46,520 --> 00:44:48,619 So we have 15 more minutes 1260 00:44:48,620 --> 00:44:50,479 for questions. 1261 00:44:50,480 --> 00:44:52,759 Do we have any so 1262 00:44:52,760 --> 00:44:54,769 microphone four and then we have one 1263 00:44:54,770 --> 00:44:55,770 online as well. 1264 00:44:57,320 --> 00:44:59,479 Hello, thank you very much for your 1265 00:44:59,480 --> 00:45:01,609 talk. I'm very interested in this 1266 00:45:01,610 --> 00:45:03,979 function, but I don't see how this 1267 00:45:03,980 --> 00:45:06,619 function in desktop applications 1268 00:45:06,620 --> 00:45:10,189 is still useful. 1269 00:45:10,190 --> 00:45:12,259 I cannot use a LibreOffice 1270 00:45:12,260 --> 00:45:14,509 plug in. I need something in 1271 00:45:14,510 --> 00:45:16,639 WordPress. Com. I need it and I need 1272 00:45:16,640 --> 00:45:17,989 it in Facebook. 1273 00:45:17,990 --> 00:45:20,539 Have you talked to these platforms? 1274 00:45:20,540 --> 00:45:22,189 Yeah, OK, you're right. 1275 00:45:22,190 --> 00:45:23,419 And that's what I handed at the beginning 1276 00:45:23,420 --> 00:45:25,909 as well, that in order to make this truly 1277 00:45:25,910 --> 00:45:28,039 useful, we really need to 1278 00:45:28,040 --> 00:45:29,959 support us while we need support for this 1279 00:45:29,960 --> 00:45:31,849 in the applications that people use day 1280 00:45:31,850 --> 00:45:33,529 to day. And that was one of the reasons 1281 00:45:33,530 --> 00:45:36,289 why we decided to change our approach 1282 00:45:36,290 --> 00:45:38,929 to passing information on the clipboard 1283 00:45:38,930 --> 00:45:40,879 by passing information on the clipboard 1284 00:45:40,880 --> 00:45:42,589 as an HTML fragment. 1285 00:45:42,590 --> 00:45:45,049 We've actually shown that this works 1286 00:45:45,050 --> 00:45:47,209 in LibreOffice, it works in WordPress, 1287 00:45:47,210 --> 00:45:49,279 it works in Microsoft Office 1288 00:45:49,280 --> 00:45:50,419 as well. It works on a 1289 00:45:51,770 --> 00:45:53,839 whole range of tools by 1290 00:45:53,840 --> 00:45:55,429 default because most of the tools today 1291 00:45:55,430 --> 00:45:58,129 can handle e-mail right now. 1292 00:45:58,130 --> 00:45:59,449 That's not the whole story, though, 1293 00:45:59,450 --> 00:46:01,429 because in order to actually make use of 1294 00:46:01,430 --> 00:46:03,649 the meter reader that gets passed along 1295 00:46:03,650 --> 00:46:05,509 and to do something intelligible with it, 1296 00:46:05,510 --> 00:46:07,489 then again, you do need application 1297 00:46:07,490 --> 00:46:09,409 support. So we've started to have those 1298 00:46:09,410 --> 00:46:10,849 discussions. We started talking with the 1299 00:46:10,850 --> 00:46:13,729 libraries community as one example, 1300 00:46:13,730 --> 00:46:14,929 and they are catching up. 1301 00:46:14,930 --> 00:46:17,239 But unfortunately, the 1302 00:46:17,240 --> 00:46:18,919 the awareness of metadata, Sanders 1303 00:46:18,920 --> 00:46:21,229 awareness of what could be possible 1304 00:46:21,230 --> 00:46:22,729 is a little bit lacking today. 1305 00:46:22,730 --> 00:46:24,799 So it will take quite a long time until 1306 00:46:24,800 --> 00:46:26,269 we actually make something sensible out 1307 00:46:26,270 --> 00:46:27,270 of it. 1308 00:46:28,860 --> 00:46:31,259 OK, so the next question from the online 1309 00:46:31,260 --> 00:46:33,089 world, thank you. 1310 00:46:33,090 --> 00:46:34,379 There's one question. 1311 00:46:34,380 --> 00:46:36,209 Are there any plans to to make images 1312 00:46:36,210 --> 00:46:39,629 trackable after they have been cropped? 1313 00:46:39,630 --> 00:46:41,759 And have you looked into how YouTube 1314 00:46:41,760 --> 00:46:43,889 does it? Because there seems to be very 1315 00:46:43,890 --> 00:46:45,089 good in it. 1316 00:46:45,090 --> 00:46:46,260 Yes. And Google it as well. 1317 00:46:47,970 --> 00:46:48,469 Yes. 1318 00:46:48,470 --> 00:46:50,939 So we looked at a number 1319 00:46:50,940 --> 00:46:53,249 of different ways of doing 1320 00:46:53,250 --> 00:46:54,989 the calculation so that we could 1321 00:46:54,990 --> 00:46:57,089 potentially detect images which have 1322 00:46:57,090 --> 00:46:59,249 been cropped or changed in other ways 1323 00:46:59,250 --> 00:47:00,250 as well. 1324 00:47:00,660 --> 00:47:03,179 Unfortunately, from our perspective, 1325 00:47:03,180 --> 00:47:04,649 most of those algorithms that are 1326 00:47:04,650 --> 00:47:06,809 available are either kept secret 1327 00:47:06,810 --> 00:47:09,399 or they're patented, which means that 1328 00:47:09,400 --> 00:47:10,859 implementing this and free and open 1329 00:47:10,860 --> 00:47:12,380 source software is a no go zone. 1330 00:47:13,710 --> 00:47:14,849 It will get better. 1331 00:47:14,850 --> 00:47:17,069 There are research underway to make 1332 00:47:17,070 --> 00:47:19,469 this possible and we're continuously 1333 00:47:19,470 --> 00:47:22,289 looking at changing the algorithms, 1334 00:47:22,290 --> 00:47:25,019 updating it according to what we learn. 1335 00:47:25,020 --> 00:47:27,479 But it's quite far from 1336 00:47:27,480 --> 00:47:30,119 having something that would detect a 1337 00:47:30,120 --> 00:47:31,120 derivative work as well. 1338 00:47:33,180 --> 00:47:34,109 OK, thank you. 1339 00:47:34,110 --> 00:47:35,110 Microphone number two, 1340 00:47:36,670 --> 00:47:38,849 hello, thank you for your work on your 1341 00:47:38,850 --> 00:47:41,129 underfunded site and the workflow side 1342 00:47:41,130 --> 00:47:43,049 of things. I have a question about you 1343 00:47:43,050 --> 00:47:45,389 talked about distributed database and 1344 00:47:45,390 --> 00:47:48,179 communicate curated 1345 00:47:48,180 --> 00:47:50,309 direction, but now the focus seems to be 1346 00:47:50,310 --> 00:47:52,559 on very specific projects and very 1347 00:47:52,560 --> 00:47:53,560 things. 1348 00:47:53,970 --> 00:47:56,279 So what could be scalable things 1349 00:47:56,280 --> 00:47:58,679 to to work on, to get more sources 1350 00:47:58,680 --> 00:48:00,749 involved or to create those kinds of 1351 00:48:00,750 --> 00:48:01,889 kinds of things? 1352 00:48:01,890 --> 00:48:03,299 And would you be open to other 1353 00:48:03,300 --> 00:48:04,619 contributors like, I don't know, 1354 00:48:04,620 --> 00:48:06,869 libraries, archives, 1355 00:48:06,870 --> 00:48:08,339 European projects, 1356 00:48:09,720 --> 00:48:10,829 whatever you can think of. 1357 00:48:10,830 --> 00:48:13,019 So what would be the long game in 1358 00:48:13,020 --> 00:48:15,089 kind of distributing and community 1359 00:48:15,090 --> 00:48:16,090 creating for. 1360 00:48:17,370 --> 00:48:19,169 So there's two communities are two groups 1361 00:48:19,170 --> 00:48:21,569 of repositories that we're talking to 1362 00:48:21,570 --> 00:48:22,919 to get their information as well. 1363 00:48:22,920 --> 00:48:25,289 Within a larger one is Europeana, 1364 00:48:25,290 --> 00:48:27,389 which obviously captures a lot of the 1365 00:48:27,390 --> 00:48:28,499 galleries, libraries, archives and 1366 00:48:28,500 --> 00:48:30,209 museums around Europe. 1367 00:48:30,210 --> 00:48:31,109 That would be one. 1368 00:48:31,110 --> 00:48:32,999 They are safe, creative, which is in 1369 00:48:33,000 --> 00:48:35,249 fact, you know, part 1370 00:48:35,250 --> 00:48:37,379 Kopra registry to get their 1371 00:48:37,380 --> 00:48:38,849 information within LIGO as well. 1372 00:48:40,500 --> 00:48:42,269 But still, you know, at that point, we're 1373 00:48:42,270 --> 00:48:43,979 only talking about specific collections. 1374 00:48:43,980 --> 00:48:44,939 We're talking about read-only 1375 00:48:44,940 --> 00:48:45,869 information. 1376 00:48:45,870 --> 00:48:47,939 So the logical next step is 1377 00:48:47,940 --> 00:48:50,249 indeed to flip, to read roadbed, to make 1378 00:48:50,250 --> 00:48:52,559 it editable, but to think through 1379 00:48:52,560 --> 00:48:54,809 because we're honest and not quite sure 1380 00:48:54,810 --> 00:48:55,769 how that would look like. 1381 00:48:55,770 --> 00:48:58,319 Because how do you deal with potential 1382 00:48:58,320 --> 00:48:59,999 conflicts when people keep editing the 1383 00:49:00,000 --> 00:49:00,989 same information? 1384 00:49:00,990 --> 00:49:03,179 So we'll need to go again to see 1385 00:49:03,180 --> 00:49:05,429 what Wikipedia is doing in this, you 1386 00:49:05,430 --> 00:49:06,659 know, see what policies they have in 1387 00:49:06,660 --> 00:49:08,729 place and how that works and see if 1388 00:49:08,730 --> 00:49:10,769 we can replicate that on our side. 1389 00:49:10,770 --> 00:49:12,689 And in terms of scaling beyond this, in 1390 00:49:12,690 --> 00:49:14,759 terms of distributing this, we 1391 00:49:14,760 --> 00:49:17,399 made very sure from the beginning that 1392 00:49:17,400 --> 00:49:20,039 the identifier that we have 1393 00:49:20,040 --> 00:49:22,199 for individual works within 1394 00:49:22,200 --> 00:49:25,199 the larger catalog is a URL, 1395 00:49:25,200 --> 00:49:26,759 which means that anyone can essentially 1396 00:49:26,760 --> 00:49:29,009 set up a catalog and have their own URL 1397 00:49:29,010 --> 00:49:31,289 scheme for that catalog as 1398 00:49:31,290 --> 00:49:34,139 long as they don't change the API. 1399 00:49:34,140 --> 00:49:35,879 If you have the URL, it doesn't matter in 1400 00:49:35,880 --> 00:49:37,259 which catalog you actually look up the 1401 00:49:37,260 --> 00:49:38,849 information, you'll get it anyway. 1402 00:49:41,200 --> 00:49:44,199 Microphone number four, yeah, 1403 00:49:44,200 --> 00:49:46,879 there's one comment on the 1404 00:49:46,880 --> 00:49:48,999 euro that 1405 00:49:49,000 --> 00:49:51,249 we probably want something which 1406 00:49:51,250 --> 00:49:53,529 can survive like 100, 1407 00:49:53,530 --> 00:49:54,530 200 years, 1408 00:49:55,810 --> 00:49:58,059 and whether that's going 1409 00:49:58,060 --> 00:50:00,459 to be solved 1410 00:50:00,460 --> 00:50:03,099 by using euro as we 1411 00:50:03,100 --> 00:50:04,100 have today, 1412 00:50:05,260 --> 00:50:06,730 might create some problems. 1413 00:50:08,710 --> 00:50:11,229 I was also interested about this on 1414 00:50:11,230 --> 00:50:13,539 if all this can be applied to 1415 00:50:13,540 --> 00:50:15,819 not just pictures, but books, 1416 00:50:15,820 --> 00:50:17,109 music, whatever. 1417 00:50:18,260 --> 00:50:20,529 And then there was 1418 00:50:20,530 --> 00:50:21,919 one technical comment. 1419 00:50:21,920 --> 00:50:24,339 The all 1420 00:50:24,340 --> 00:50:26,499 functions should be one pass 1421 00:50:26,500 --> 00:50:28,629 because if you go otherwise, you will 1422 00:50:28,630 --> 00:50:31,119 go to the RAM so 1423 00:50:31,120 --> 00:50:32,199 twice. 1424 00:50:32,200 --> 00:50:34,419 So that was the technical part 1425 00:50:34,420 --> 00:50:35,420 of the comment. 1426 00:50:36,370 --> 00:50:38,469 OK, I see if I 1427 00:50:38,470 --> 00:50:39,550 can remember those three things. 1428 00:50:41,860 --> 00:50:43,479 OK, so on that bit, yes. 1429 00:50:45,970 --> 00:50:47,889 We were quite a lot with the specific 1430 00:50:47,890 --> 00:50:49,599 algorithm during this. 1431 00:50:49,600 --> 00:50:51,819 And now that we feel that we settled on, 1432 00:50:51,820 --> 00:50:53,769 you know, the way it works best in our 1433 00:50:53,770 --> 00:50:56,139 environment, we've 1434 00:50:56,140 --> 00:50:58,299 documented this as well as NFC, which 1435 00:50:58,300 --> 00:50:59,709 was submitted to research the namespace 1436 00:50:59,710 --> 00:51:01,359 for it. And then it has a very specific 1437 00:51:01,360 --> 00:51:03,339 definition which makes sure that, you 1438 00:51:03,340 --> 00:51:04,689 know, if you want to call something like 1439 00:51:04,690 --> 00:51:06,669 hash, then you need to follow this 1440 00:51:06,670 --> 00:51:07,670 particular specification 1441 00:51:09,700 --> 00:51:10,700 now. 1442 00:51:13,460 --> 00:51:14,499 OK, I'm sorry. 1443 00:51:14,500 --> 00:51:16,369 Could you go back to the first question, 1444 00:51:16,370 --> 00:51:18,519 one from the other point? 1445 00:51:18,520 --> 00:51:19,839 Yes. The Euro. 1446 00:51:19,840 --> 00:51:21,789 Thank you. Um, yes. 1447 00:51:21,790 --> 00:51:23,499 I skip over a very important piece when I 1448 00:51:23,500 --> 00:51:24,849 say that everything will be solved by 1449 00:51:24,850 --> 00:51:25,850 euro, 1450 00:51:27,010 --> 00:51:28,599 because as I said in the talk as well, 1451 00:51:28,600 --> 00:51:31,509 from the beginning, we know that 1452 00:51:31,510 --> 00:51:33,309 any kind of metadata gets stripped very 1453 00:51:33,310 --> 00:51:34,509 easily from a work. 1454 00:51:34,510 --> 00:51:36,609 And even if we say that, you know, all 1455 00:51:36,610 --> 00:51:38,409 we need is a URL, all we need is an 1456 00:51:38,410 --> 00:51:40,509 identifier, which is a URL 1457 00:51:40,510 --> 00:51:42,249 that's going to be stripped as. 1458 00:51:42,250 --> 00:51:44,319 So that's not the 1459 00:51:44,320 --> 00:51:46,509 final solution to anything. 1460 00:51:46,510 --> 00:51:48,639 We need to to work on different 1461 00:51:48,640 --> 00:51:50,279 approaches of identifying works. 1462 00:51:50,280 --> 00:51:52,329 Um, the only thing I was saying there is 1463 00:51:52,330 --> 00:51:54,999 that at least with the URL, 1464 00:51:55,000 --> 00:51:56,559 we can make sure that this could 1465 00:51:56,560 --> 00:51:58,389 potentially be distributed across 1466 00:51:58,390 --> 00:52:00,129 different catalogs and not just be one 1467 00:52:00,130 --> 00:52:01,179 single monolith. 1468 00:52:01,180 --> 00:52:03,249 OK, and the second point 1469 00:52:03,250 --> 00:52:04,250 was. 1470 00:52:05,020 --> 00:52:06,020 Oh. 1471 00:52:07,210 --> 00:52:09,379 I think I think 1472 00:52:09,380 --> 00:52:11,569 probably the survivability. 1473 00:52:13,500 --> 00:52:14,519 I remember correctly. 1474 00:52:16,990 --> 00:52:18,369 Although, yes, lighter works. 1475 00:52:18,370 --> 00:52:20,709 Yes, thank you. You're applying to other 1476 00:52:20,710 --> 00:52:22,449 non images, right? 1477 00:52:22,450 --> 00:52:23,450 OK. 1478 00:52:24,070 --> 00:52:26,199 One of the reasons why there are so many 1479 00:52:26,200 --> 00:52:28,359 meta data standards is that there 1480 00:52:28,360 --> 00:52:30,609 are so many different kinds of works. 1481 00:52:30,610 --> 00:52:33,279 What is relevant for images 1482 00:52:33,280 --> 00:52:35,619 in terms of metadata is not relevant 1483 00:52:35,620 --> 00:52:36,609 for classical music. 1484 00:52:36,610 --> 00:52:38,349 As an example, what is relevant for 1485 00:52:38,350 --> 00:52:40,839 classical music in terms of 1486 00:52:40,840 --> 00:52:43,029 authoress and who plays what instruments 1487 00:52:43,030 --> 00:52:44,679 and what instruments are available is not 1488 00:52:44,680 --> 00:52:46,820 relevant for pop music. 1489 00:52:48,430 --> 00:52:50,109 So that's reason why all these standards 1490 00:52:50,110 --> 00:52:51,669 have come up already is one of the 1491 00:52:51,670 --> 00:52:52,659 reasons. 1492 00:52:52,660 --> 00:52:55,059 And we believe that 1493 00:52:55,060 --> 00:52:57,369 using media annotations allow 1494 00:52:57,370 --> 00:52:59,739 us to cover different sorts of works, 1495 00:52:59,740 --> 00:53:01,929 but it really need to be thought 1496 00:53:01,930 --> 00:53:03,579 through a bit more before we start 1497 00:53:03,580 --> 00:53:05,439 working with it actively to figure out 1498 00:53:05,440 --> 00:53:07,779 what information is actually important 1499 00:53:07,780 --> 00:53:09,909 to convey about different kinds of works 1500 00:53:09,910 --> 00:53:12,039 and what they would 1501 00:53:12,040 --> 00:53:13,599 do, the meta data centers looks like for 1502 00:53:13,600 --> 00:53:14,889 those kind of works. 1503 00:53:14,890 --> 00:53:16,630 So that's a larger piece of work. 1504 00:53:18,970 --> 00:53:20,109 So there's one question from the 1505 00:53:20,110 --> 00:53:21,639 Internet. Yes, thank you. 1506 00:53:21,640 --> 00:53:23,409 Why don't you work with data from other 1507 00:53:23,410 --> 00:53:25,689 sources like Flickr that also provides 1508 00:53:25,690 --> 00:53:27,609 good metadata, contains a lot of free 1509 00:53:27,610 --> 00:53:29,649 work and info about the author? 1510 00:53:29,650 --> 00:53:31,659 Yeah, we are. 1511 00:53:31,660 --> 00:53:33,849 In fact, we don't 1512 00:53:33,850 --> 00:53:35,229 have the place yet. We have spoken to 1513 00:53:35,230 --> 00:53:36,129 Flickr. 1514 00:53:36,130 --> 00:53:37,809 We have a communication going to figure 1515 00:53:37,810 --> 00:53:39,879 out how 1516 00:53:39,880 --> 00:53:41,319 we can get that information, how we can 1517 00:53:41,320 --> 00:53:42,429 integrate it in our system 1518 00:53:43,720 --> 00:53:45,829 depending on how things go. 1519 00:53:45,830 --> 00:53:47,469 I'm quite confident, though, that we'll 1520 00:53:47,470 --> 00:53:49,539 be able to integrate that and to make it 1521 00:53:49,540 --> 00:53:51,010 available through the same API. 1522 00:53:52,930 --> 00:53:55,210 Unfortunately or fortunately for 1523 00:53:56,410 --> 00:53:58,749 for us, Flickr is a huge 1524 00:53:58,750 --> 00:54:00,969 resources, but 300 more than 300 1525 00:54:00,970 --> 00:54:03,579 million images, which means that 1526 00:54:03,580 --> 00:54:05,769 even if we took, you know, 1527 00:54:05,770 --> 00:54:07,869 took a year to do that, we're still 1528 00:54:07,870 --> 00:54:09,699 talking about incorporating about a 1529 00:54:09,700 --> 00:54:11,859 million words per day. 1530 00:54:11,860 --> 00:54:14,079 At a moment, we can do about six 1531 00:54:14,080 --> 00:54:15,849 million words per day, adding to our 1532 00:54:15,850 --> 00:54:18,999 database, which is a fairly large number. 1533 00:54:19,000 --> 00:54:20,439 We can probably scale a little bit beyond 1534 00:54:20,440 --> 00:54:22,539 that. But still, we're talking about a 1535 00:54:22,540 --> 00:54:24,129 number of months works when we actually 1536 00:54:24,130 --> 00:54:25,130 start working on that. 1537 00:54:28,350 --> 00:54:29,699 So one more question, 1538 00:54:31,430 --> 00:54:32,639 there are additional one from the 1539 00:54:32,640 --> 00:54:33,640 Internet. 1540 00:54:34,790 --> 00:54:36,739 If you have questions, I'll be available 1541 00:54:36,740 --> 00:54:38,299 up here for a little bit more after this 1542 00:54:38,300 --> 00:54:40,369 talk. We also have some information about 1543 00:54:40,370 --> 00:54:41,719 Comus machinery and the work that we've 1544 00:54:41,720 --> 00:54:43,369 done, which will be available down here 1545 00:54:43,370 --> 00:54:44,749 from Lainer. 1546 00:54:44,750 --> 00:54:46,099 And I will have it up here as well. 1547 00:54:46,100 --> 00:54:47,540 So feel free to Chrebet. 1548 00:54:48,890 --> 00:54:50,359 OK, thank you, your honor. 1549 00:54:50,360 --> 00:54:51,919 Thank you very much for the kind of 1550 00:54:51,920 --> 00:54:52,920 applause.