1 00:00:00,000 --> 00:00:18,772 *Music* 2 00:00:18,772 --> 00:00:25,332 Herald:Hi! Welcome, welcome in Wikipaka- WG, in this extremely crowded Esszimmer. 3 00:00:25,332 --> 00:00:32,079 I'm Jakob, I'm your Herald for tonight until 10:00 and I'm here to welcome you 4 00:00:32,079 --> 00:00:36,690 and to welcome these wonderful three guys on the stage. They're going to talk about 5 00:00:36,690 --> 00:00:44,710 the infrastructure of Wikipedia. And yeah, they are Lucas, Amir, and Daniel 6 00:00:44,710 --> 00:00:52,970 and I hope you'll have fun! *Applause* 7 00:00:52,970 --> 00:00:57,059 Amir Sarabadani: Hello, my name is Amir, um, I'm a software engineer at 8 00:00:57,059 --> 00:01:01,130 Wikimedia Deutschland, which is the German chapter of Wikimedia Foundation. Wikimedia 9 00:01:01,130 --> 00:01:06,520 Foundation runs Wikipedia. Here is Lucas. Lucas is also a software engineer, at 10 00:01:06,520 --> 00:01:10,300 Wikimedia Deutschland, and Daniel here is a software architect at Wikimedia 11 00:01:10,300 --> 00:01:15,110 Foundation. We are all based in Germany, Daniel in Leipzig, we are in Berlin. And 12 00:01:15,110 --> 00:01:21,420 today we want to talk about how we run Wikipedia, with using donors' money and 13 00:01:21,420 --> 00:01:29,910 not lots of advertisement and collecting data. So in this talk, first we are going 14 00:01:29,910 --> 00:01:34,860 to go on an inside-out approach. So we are going to first talk about the application 15 00:01:34,860 --> 00:01:39,830 layer and then the outside layers, and then we go to an outside-in approach and 16 00:01:39,830 --> 00:01:48,635 then talk about how you're going to hit Wikipedia from the outside. 17 00:01:48,635 --> 00:01:53,320 So first of all, let's some, let me get you some information. First of 18 00:01:53,320 --> 00:01:57,259 all, all of Wikimedia, Wikipedia infrastructure is run by Wikimedia 19 00:01:57,259 --> 00:02:01,810 Foundation, an American nonprofit charitable organization. We don't run any 20 00:02:01,810 --> 00:02:07,960 ads and we are only 370 people. If you count Wikimedia Deutschland or all other 21 00:02:07,960 --> 00:02:12,500 chapters, it's around 500 people in total. It's nothing compared to the companies 22 00:02:12,500 --> 00:02:19,530 outside. But all of the content is managed by volunteers. Even our staff 23 00:02:19,530 --> 00:02:24,170 doesn't do edits, add content to Wikipedia. And we support 300 languages, 24 00:02:24,170 --> 00:02:29,501 which is a very large number. And Wikipedia, it's eighteen years old, so it 25 00:02:29,501 --> 00:02:37,950 can vote now. And also, Wikipedia has some really, really weird articles. Um, I want 26 00:02:37,950 --> 00:02:42,510 to ask you, what is your, if you have encountered any really weird article 27 00:02:42,510 --> 00:02:47,970 in Wikipedia? My favorite is a list of people who died on the toilet. But if you 28 00:02:47,970 --> 00:02:54,620 know anything, raise your hands. Uh, do you know any weird articles in Wikipedia? 29 00:02:54,620 --> 00:02:58,750 Do you know some? Daniel Kinzler: Oh, the classic one…. 30 00:02:58,750 --> 00:03:03,600 Amir: You need to unmute yourself. Oh, okay. 31 00:03:03,600 --> 00:03:09,551 Daniel: This is technology. I don't know anything about technology. OK, no. The, my 32 00:03:09,551 --> 00:03:13,900 favorite example is "people killed by their own invention". That's yeah. That's 33 00:03:13,900 --> 00:03:20,510 a lot of fun. Look it up. It's amazing. Lucas Werkmeister: There's also a list, 34 00:03:20,510 --> 00:03:24,810 there is also a list of prison escapes using helicopters. I almost said 35 00:03:24,810 --> 00:03:28,790 helicopter escapes using prisons, which doesn't make any sense. But that was also 36 00:03:28,790 --> 00:03:31,830 a very interesting list. Daniel: I think we also have a category of 37 00:03:31,830 --> 00:03:35,310 lists of lists of lists. Amir: That's a page. 38 00:03:35,310 --> 00:03:39,040 Lucas: And every few months someone thinks it's funny to redirect it to Russel's 39 00:03:39,040 --> 00:03:42,940 paradox or so. Daniel: Yeah. 40 00:03:42,940 --> 00:03:49,209 Amir: But also beside that, people cannot read Wikipedia in Turkey or China. But 41 00:03:49,209 --> 00:03:54,450 three days ago, actually, the block in Turkey was ruled unconstitutional, but 42 00:03:54,450 --> 00:04:01,000 it's not lifted yet. Hopefully they will lift it soon. Um, so Wikipedia, Wikimedia 43 00:04:01,000 --> 00:04:05,660 projects is just not Wikipedia. It's lots and lots of projects. Some of them are not 44 00:04:05,660 --> 00:04:11,650 as successful as the Wikipedia. Um, uh, like Wikinews. But uh, for example, 45 00:04:11,650 --> 00:04:16,190 Wikipedia is the most successful one, and there's another one, that's Wikidata. It's 46 00:04:16,190 --> 00:04:21,680 being developed by Wikimedia Deutschland. I mean the Wikidata team, with Lucas, um, 47 00:04:21,680 --> 00:04:26,520 and it's being used – it's infobox – it has the data that Wikipedia or Google 48 00:04:26,520 --> 00:04:31,449 Knowledge Graph or Siri or Alexa uses. It's basically, it's sort of a backbone of 49 00:04:31,449 --> 00:04:37,981 all of the data, uh, through the whole Internet. Um, so our infrastructure. Let 50 00:04:37,981 --> 00:04:42,910 me… So first of all, our infrastructure is all Open Source. By principle, we never 51 00:04:42,910 --> 00:04:48,081 use any commercial software. Uh, we could use a lots of things. They are even 52 00:04:48,081 --> 00:04:54,330 sometimes were given us for free, but we were, refused to use them. Second 53 00:04:54,330 --> 00:04:59,060 thing is we have two primary data center for like failovers, when, for example, a 54 00:04:59,060 --> 00:05:03,960 whole datacenter goes offline, so we can failover to another data center. We have 55 00:05:03,960 --> 00:05:11,100 three caching points of presence or CDNs. Our CDNs are all over the world. Uh, 56 00:05:11,100 --> 00:05:15,180 also, we have our own CDN. We don't have, we don't use CloudFlare, because 57 00:05:15,180 --> 00:05:20,960 CloudFlare, we care about the privacy of the users and is very important that, for 58 00:05:20,960 --> 00:05:25,490 example, people edit from countries that might be, uh, dangerous for them to edit 59 00:05:25,490 --> 00:05:29,810 Wikipedia. So we really care to keep the data as protected as possible. 60 00:05:29,810 --> 00:05:32,400 *Applause* 61 00:05:32,400 --> 00:05:39,460 Amir: Uh, we have 17 billion page views per month, and, which goes up and down 62 00:05:39,460 --> 00:05:44,350 based on the season and everything, we have around 100 to 200 thousand requests 63 00:05:44,350 --> 00:05:48,449 per second. It's different from the pageview because requests can be requests 64 00:05:48,449 --> 00:05:54,540 to the objects, can be API, can be lots of things. And we have 300,000 new editors 65 00:05:54,540 --> 00:06:03,120 per month and we run all of this with 1300 bare metal servers. So right now, Daniel 66 00:06:03,120 --> 00:06:07,010 is going to talk about the application layer and the inside of that 67 00:06:07,010 --> 00:06:11,830 infrastructure. Daniel: Thanks, Amir. Oh, the clicky 68 00:06:11,830 --> 00:06:20,330 thing. Thank you. So the application layer is basically the software that actually 69 00:06:20,330 --> 00:06:25,050 does what a wiki does, right? It lets you edit pages, create or update pages and 70 00:06:25,050 --> 00:06:29,650 then search the page views. *interference noise* The challenge for Wikipedia, of 71 00:06:29,650 --> 00:06:37,150 course, is serving all the many page views that Amir just described. The core of the 72 00:06:37,150 --> 00:06:42,690 application is a classic LAMP application. *interference noise* I have to stop 73 00:06:42,690 --> 00:06:50,130 moving. Yes? Is that it? It's a classic LAMP stack application. So it's written in 74 00:06:50,130 --> 00:06:57,080 PHP, it runs on an Apache server. It uses MySQL as a database in the backend. We 75 00:06:57,080 --> 00:07:01,630 used to use a HHVM instead of the… Yeah, we… 76 00:07:01,630 --> 00:07:13,830 Herald: Hier. Sorry. Nimm mal das hier. Daniel: Hello. We used to use HHVM as the 77 00:07:13,830 --> 00:07:20,810 PHP engine, but we just switched back to the mainstream PHP, using PHP 7.2 now, 78 00:07:20,810 --> 00:07:24,720 because Facebook decided that HHVM is going to be incompatible with the standard 79 00:07:24,720 --> 00:07:35,430 and they were just basically developing it for, for themselves. Right. So we have 80 00:07:35,430 --> 00:07:42,740 separate clusters of servers for serving requests, for serving different requests, 81 00:07:42,740 --> 00:07:48,020 page views on the one hand, and also handling edits. Then we have a cluster for 82 00:07:48,020 --> 00:07:55,350 handling API calls and then we have a bunch of servers set up to handle 83 00:07:55,350 --> 00:08:01,050 asynchronous jobs, things that happen in the background, the job runners, and… 84 00:08:01,050 --> 00:08:05,240 I guess video scaling is a very obvious example of that. It just takes too long to 85 00:08:05,240 --> 00:08:11,720 do it on the fly. But we use it for many other things as well. MediaWiki, MediaWiki 86 00:08:11,720 --> 00:08:15,930 is kind of an amazing thing because you can just install it on your own shared- 87 00:08:15,930 --> 00:08:23,419 hosting, 10-bucks-a-month's webspace and it will run. But you can also use it to, 88 00:08:23,419 --> 00:08:29,270 you know, serve half the world. And so it's a very powerful and versatile system, 89 00:08:29,270 --> 00:08:34,479 which also… I mean, this, this wide span of different applications also creates 90 00:08:34,479 --> 00:08:41,000 problems. That's something that I will talk about tomorrow. But for now, let's 91 00:08:41,000 --> 00:08:49,230 look at the fun things. So if you want to serve a lot of page views, you have to do 92 00:08:49,230 --> 00:08:55,550 a lot of caching. And so we have a whole… yeah, a whole set of different caching 93 00:08:55,550 --> 00:09:00,880 systems. The most important one is probably the parser cache. So as you 94 00:09:00,880 --> 00:09:07,431 probably know, wiki pages are created in, in a markup language, Wikitext, and they 95 00:09:07,431 --> 00:09:13,290 need to be parsed and turned into HTML. And the result of that parsing is, of 96 00:09:13,290 --> 00:09:19,940 course, cached. And that cache is semi- persistent, it… nothing really ever drops 97 00:09:19,940 --> 00:09:25,060 out of it. It's a huge thing. And it's, it lives in a dedicated MySQL database 98 00:09:25,060 --> 00:09:33,490 system. Yeah. We use memcached a lot for all kinds of miscellaneous things, 99 00:09:33,490 --> 00:09:38,930 anything that we need to keep around and share between server instances. And we 100 00:09:38,930 --> 00:09:43,589 have been using redis for a while, for anything that we want to have available, 101 00:09:43,589 --> 00:09:47,560 not just between different servers, but also between different data centers, 102 00:09:47,560 --> 00:09:53,200 because redis is a bit better about synchronizing things between, between 103 00:09:53,200 --> 00:09:59,820 different systems, we still use it for session storage, especially, though we are 104 00:09:59,820 --> 00:10:09,600 about to move away from that and we'll be using Cassandra for session storage. We 105 00:10:09,600 --> 00:10:19,310 have a bunch of additional services running for specialized purposes, like 106 00:10:19,310 --> 00:10:27,120 scaling images, rendering formulas, math formulas, ORES is pretty interesting. ORES 107 00:10:27,120 --> 00:10:33,400 is a system for automatically detecting vandalism or rating edits. So this is a 108 00:10:33,400 --> 00:10:38,120 machine learning based system for detecting problems and highlighting edits 109 00:10:38,120 --> 00:10:45,060 that may not be, may not be great and need more attention. We have some additional 110 00:10:45,060 --> 00:10:50,940 services that process our content for consumption on mobile devices, chopping 111 00:10:50,940 --> 00:10:56,480 pages up into bits and pieces that then can be consumed individually and many, 112 00:10:56,480 --> 00:11:08,200 many more. In the background, we also have to manage events, right, we use Kafka for 113 00:11:08,200 --> 00:11:14,640 message queuing, and we use that to notify different parts of the system about 114 00:11:14,640 --> 00:11:19,980 changes. On the one hand, we use that to feed the job runners that I just 115 00:11:19,980 --> 00:11:27,540 mentioned. But we also use it, for instance, to purge the entries in the 116 00:11:27,540 --> 00:11:35,050 CDN when pages become updated and things like that. OK, the next session is going 117 00:11:35,050 --> 00:11:40,269 to be about the databases. Are there, very quickly, we will have quite a bit of time 118 00:11:40,269 --> 00:11:45,230 for discussion afterwards. But are there any questions right now about what we said 119 00:11:45,230 --> 00:11:57,120 so far? Everything extremely crystal clear. OK, no clarity is left? I see. Oh, 120 00:11:57,120 --> 00:12:07,570 one question, in the back. Q: Can you maybe turn the volume up a 121 00:12:07,570 --> 00:12:20,220 little bit? Thank you. Daniel: Yeah, I think this is your 122 00:12:20,220 --> 00:12:27,959 section, right? Oh, its Amir again. Sorry. Amir: So I want to talk about my favorite 123 00:12:27,959 --> 00:12:32,279 topic, the dungeons of, dungeons of every production system, databases. The database 124 00:12:32,279 --> 00:12:39,580 of Wikipedia is really interesting and complicated on its own. We use MariaDB, we 125 00:12:39,580 --> 00:12:45,870 switched from MySQL in 2013 for lots of complicated reasons. As, as I said, 126 00:12:45,870 --> 00:12:50,200 because we are really open source, you can go and not just check our database tree, 127 00:12:50,200 --> 00:12:55,310 that says, like, how it looks and what's the replicas and masters. Actually, you 128 00:12:55,310 --> 00:12:59,650 can even query the Wikipedia's database live when you have that, you can just go 129 00:12:59,650 --> 00:13:02,930 to that address and login with your Wikipedia account and just can do whatever 130 00:13:02,930 --> 00:13:07,430 you want. Like, it was a funny thing that a couple of months ago, someone sent me a 131 00:13:07,430 --> 00:13:12,970 message, sent me a message like, oh, I found a security issue. You can just query 132 00:13:12,970 --> 00:13:18,000 Wikipedia's database. I was like, no, no, it's actually, we, we let this happen. 133 00:13:18,000 --> 00:13:21,900 It's like, it's sanitized. We removed the password hashes and everything. But still, 134 00:13:21,900 --> 00:13:27,779 you can use this. And, but if you wanted to say, like, how the clusters work, the 135 00:13:27,779 --> 00:13:32,029 database clusters, because it gets too big, they first started sharding, but now 136 00:13:32,029 --> 00:13:36,279 we have sections that are basically different clusters. Uh, really large wikis 137 00:13:36,279 --> 00:13:42,839 have their own section. For example, English Wikipedia is s1. German Wikipedia 138 00:13:42,839 --> 00:13:50,820 with two or three other small wikis are in s5. Wikidata is on s8, and so on. And 139 00:13:50,820 --> 00:13:56,250 each section have a master and several replicas. But one of the replicas is 140 00:13:56,250 --> 00:14:01,700 actually a master in another data center because of the failover that I told you. 141 00:14:01,700 --> 00:14:08,079 So it is, basically two layers of replication exist. This is, what I'm 142 00:14:08,079 --> 00:14:13,070 telling you, is about metadata. But for Wikitext, we also need to have a complete 143 00:14:13,070 --> 00:14:19,450 different set of databases. But it can be, we use consistent hashing to just scale it 144 00:14:19,450 --> 00:14:27,630 horizontally so we can just put more databases on it, for that. Uh, but I don't 145 00:14:27,630 --> 00:14:32,070 know if you know it, but Wikipedia stores every edit. So you have the text of, 146 00:14:32,070 --> 00:14:36,930 Wikitext of every edit in the whole history in the database. Uhm, also we have 147 00:14:36,930 --> 00:14:41,910 parser cache that Daniel explained, and parser cache is also consistent hashing. 148 00:14:41,910 --> 00:14:47,000 So we just can horizontally scale it. But for metadata, it is slightly more 149 00:14:47,000 --> 00:14:56,440 complicated. Um, metadata shows and is being used to render the page. So in order 150 00:14:56,440 --> 00:15:01,680 to do this, this is, for example, a very short version of the database tree that I 151 00:15:01,680 --> 00:15:07,019 showed you. You can even go and look for other ones but this is a s1. s1 eqiad this 152 00:15:07,019 --> 00:15:12,100 is the main data center the master is this number and it replicates to some of this 153 00:15:12,100 --> 00:15:16,860 and then this 7, the second one that this was with 2000 because it's the second data 154 00:15:16,860 --> 00:15:24,750 center and it's a master of the other one. And it has its own replications 155 00:15:24,750 --> 00:15:30,680 between cross three replications because the master, that master data center is in 156 00:15:30,680 --> 00:15:37,399 Ashburn, Virginia. The second data center is in Dallas, Texas. So they need to have a 157 00:15:37,399 --> 00:15:43,220 cross DC replication and that happens with a TLS to make sure that no one starts 158 00:15:43,220 --> 00:15:49,200 to listen to, in between these two, and we have snapshots and even dumps of the whole 159 00:15:49,200 --> 00:15:53,440 history of Wikipedia. You can go to dumps.wikimedia.org and download the whole 160 00:15:53,440 --> 00:15:59,130 reserve every wiki you want, except the ones that we had to remove for privacy 161 00:15:59,130 --> 00:16:04,899 reasons and with a lots and lots of backups. I recently realized we have lots 162 00:16:04,899 --> 00:16:15,149 of backups. And in total it is 570 TB of data and total 150 database servers and a 163 00:16:15,149 --> 00:16:20,269 queries that happens to them is around 350,000 queries per second and, in total, 164 00:16:20,269 --> 00:16:29,459 it requires 70 terabytes of RAM. So and also we have another storage section that 165 00:16:29,459 --> 00:16:35,000 called Elasticsearch which you can guess it- it's being used for search, on the top 166 00:16:35,000 --> 00:16:39,050 right, if you're using desktop. It's different in mobile, I think. And also it 167 00:16:39,050 --> 00:16:44,610 depends on if you're rtl language as well, but also it runs by a team called search 168 00:16:44,610 --> 00:16:47,550 platform because none of us are from search platform we cannot explain it this 169 00:16:47,550 --> 00:16:54,010 much we don't know much how it works it slightly. Also we have a media storage for 170 00:16:54,010 --> 00:16:58,420 all of the free pictures that's being uploaded to Wikimedia like, for example, 171 00:16:58,420 --> 00:17:02,400 if you have a category in Commons. Commons is our wiki that holds all of the free 172 00:17:02,400 --> 00:17:08,130 media and if we have a category in Commons called cats looking at left and you have 173 00:17:08,130 --> 00:17:15,630 category cats looking at right so we have lots and lots of images. It's 390 terabytes 174 00:17:15,630 --> 00:17:20,620 of media, 1 billion object and uses Swift. Swift is the object is storage component 175 00:17:20,620 --> 00:17:29,190 of OpenStack and it has it has several layers of caching, frontend, backend. 176 00:17:29,190 --> 00:17:36,799 Yeah, that's mostly it. And we want to talk about traffic now and so this picture 177 00:17:36,799 --> 00:17:43,929 is when Sweden in 1967 moved from a left- driving from left to there driving to 178 00:17:43,929 --> 00:17:48,999 right. This is basically what happens in Wikipedia infrastructure as well. So we 179 00:17:48,999 --> 00:17:54,942 have five caching layers and the most recent one is eqsin which is in Singapore, 180 00:17:54,942 --> 00:17:59,310 the three one are just CDN ulsfo, codfw, esams and eqsin. Sorry, ulsfo, esams and 181 00:17:59,310 --> 00:18:06,590 eqsin are just CDNs. We have also two points of presence, one in Chicago and the 182 00:18:06,590 --> 00:18:15,080 other one is also in Amsterdam, but we don't get to that. So, we have, as I said, 183 00:18:15,080 --> 00:18:20,230 we have our own content delivery network with our traffic or allocation is done by 184 00:18:20,230 --> 00:18:26,860 GeoDNS which actually is written and maintained by one of the traffic people, 185 00:18:26,860 --> 00:18:32,140 and we can pool and depool DCs. It has a time to live of 10 minute- 10 minutes, so 186 00:18:32,140 --> 00:18:37,950 if a data center goes down. We have - it takes 10 minutes to actually propagate for 187 00:18:37,950 --> 00:18:47,110 being depooled and repooled again. And we use LVS as transport layer and this layer 188 00:18:47,110 --> 00:18:55,799 3 and 4 of the Linux load balancer for Linux and supports consistent hashing and 189 00:18:55,799 --> 00:19:00,679 also we ever got we grow so big that we needed to have something that manages the 190 00:19:00,679 --> 00:19:07,100 load balancer so we wrote something our own system is called pybal. And also we - 191 00:19:07,100 --> 00:19:11,210 lots of companies actually peer with us. We for example directly connect to 192 00:19:11,210 --> 00:19:20,440 Amsterdam amps X. So this is how the caching works, which is, anyway, it's 193 00:19:20,440 --> 00:19:24,779 there is lots of reasons for this. Let's just get the started. We use TLS, we 194 00:19:24,779 --> 00:19:31,080 support TLS 1.2 where we have K then the first layer we have nginx-. Do you 195 00:19:31,080 --> 00:19:40,049 know it - does anyone know what nginx- means? And so that's related but not - not 196 00:19:40,049 --> 00:19:46,780 correct. So we have nginx which is the free version and we have nginx plus which is 197 00:19:46,780 --> 00:19:51,729 the commercial version and nginx. But we don't use nginx to do load balancing or 198 00:19:51,729 --> 00:19:56,389 anything so we stripped out everything from it, and we just use it for TLS 199 00:19:56,389 --> 00:20:02,019 termination so we call it nginx-, is an internal joke. So and then we have Varnish 200 00:20:02,019 --> 00:20:09,809 frontend. Varnish also is a caching layer and this is the frontend is on the memory 201 00:20:09,809 --> 00:20:15,000 which is very very fast and you have the backend which is on the storage and the 202 00:20:15,000 --> 00:20:22,559 hard disk but this is slow. The fun thing is like just CDN caching layer takes 90% 203 00:20:22,559 --> 00:20:26,869 of our requests. Its response and 90% of because just gets to the Varnish and just 204 00:20:26,869 --> 00:20:34,720 return and then with doesn't work it goes through the application layer. The Varnish 205 00:20:34,720 --> 00:20:41,259 holds-- it has a TTL of 24 hours so if you change an article, it also get invalidated 206 00:20:41,259 --> 00:20:47,159 by the application. So if someone added the CDN actually purges the result. And the 207 00:20:47,159 --> 00:20:52,330 thing is, the frontend is shorted that can spike by request so you come here load 208 00:20:52,330 --> 00:20:56,470 balancer just randomly sends your request to a frontend but then the backend is 209 00:20:56,470 --> 00:21:00,989 actually, if the frontend can't find it, it sends it to the backend and the backend 210 00:21:00,989 --> 00:21:09,700 is actually sort of - how is it called? - it's a used hash by request, so, for 211 00:21:09,700 --> 00:21:15,402 example, article of Barack Obama is only being served from one node in the data 212 00:21:15,402 --> 00:21:22,059 center in the CDN. If none of this works it actually hits the other data center. So, 213 00:21:22,059 --> 00:21:29,940 yeah, I actually explained all of this. So we have two - two caching clusters and one 214 00:21:29,940 --> 00:21:35,820 is called text and the other one is called upload, it's not confusing at all, and if 215 00:21:35,820 --> 00:21:42,559 you want to find out, you can just do mtr en.wikipedia.org and you - you're - the end 216 00:21:42,559 --> 00:21:49,909 node is text-lb.wikimedia.org which is the our text storage but if you go to 217 00:21:49,909 --> 00:21:57,789 upload.wikimedia.org, you get to hit the upload cluster. Yeah this is so far, what 218 00:21:57,789 --> 00:22:03,669 is it, and it has lots of problems because a) varnish is open core, so the version 219 00:22:03,669 --> 00:22:09,309 that you use is open source we don't use the commercial one, but the open core one 220 00:22:09,309 --> 00:22:21,009 doesn't support TLS. What? What happened? Okay. No, no, no! You should I just- 221 00:22:21,009 --> 00:22:35,789 you're not supposed to see this. Okay, sorry for the- huh? Okay, okay sorry. So 222 00:22:35,789 --> 00:22:40,119 Varnish has lots of problems, Varnish is open core, it doesn't support TLS 223 00:22:40,119 --> 00:22:45,220 termination which makes us to have this nginx- their system just to do TLS 224 00:22:45,220 --> 00:22:49,539 termination, makes our system complicated. It doesn't work very well with so if that 225 00:22:49,539 --> 00:22:55,970 causes us to have a cron job to restart every Varnish node twice a week. We have a 226 00:22:55,970 --> 00:23:04,330 cron job that this restarts every Vanish node which is embarrassing, but also, on 227 00:23:04,330 --> 00:23:08,809 the other hand then the end of Varnish like backend wants to talk to the 228 00:23:08,809 --> 00:23:13,010 application layer, it also doesn't support terminate - TLS termination, so we use 229 00:23:13,010 --> 00:23:19,970 IPSec which is even more embarrassing, but we are changing it. So we call it, if you 230 00:23:19,970 --> 00:23:25,080 are using a particular fixed server which is very very nice and it's also open 231 00:23:25,080 --> 00:23:31,070 source, a fully open source like in with Apache Foundation, Apache does the TLS, 232 00:23:31,070 --> 00:23:37,169 does the TLS by termination and still for now we have a Varnish frontend that 233 00:23:37,169 --> 00:23:44,809 still exists but a backend is also going to change to the ATS, so we call this ATS 234 00:23:44,809 --> 00:23:49,970 sandwich. Two ATS happening between and there the middle there's a Varnish. The 235 00:23:49,970 --> 00:23:55,269 good thing is that the TLS termination when it moves to ATS, you can actually use 236 00:23:55,269 --> 00:24:01,499 TLS 1.3 which is more modern and more secure and even very faster so it 237 00:24:01,499 --> 00:24:05,889 basically drops 100 milliseconds from every request that goes to Wikipedia. 238 00:24:05,889 --> 00:24:12,350 That translates to centuries of our users' time every month, but ATS is going 239 00:24:12,350 --> 00:24:19,480 on and hopefully it will go live soon and once these are done, so this is the new 240 00:24:19,480 --> 00:24:25,669 version. And, as I said, the TLS and when we can do this we can actually use the 241 00:24:25,669 --> 00:24:36,519 more secure instead of IPSec to talk about between data centers. Yes. And now it's 242 00:24:36,519 --> 00:24:42,260 time that Lucas talks about what happens when you type in en.wikipedia.org. 243 00:24:42,260 --> 00:24:44,879 Lucas: Yes, this makes sense, thank you. 244 00:24:44,879 --> 00:24:49,070 So, first of all, what you see on the slide here as the image doesn't really 245 00:24:49,070 --> 00:24:52,299 have anything to do with what happens when you type in wikipedia.org because it's an 246 00:24:52,299 --> 00:24:57,249 offline Wikipedia reader but it's just a nice image. So this is basically a summary 247 00:24:57,249 --> 00:25:02,850 of everything they already said, so if, which is the most common case, you are 248 00:25:02,850 --> 00:25:10,969 lucky and get a URL which is cached, then, so, first your computer asked for the IP 249 00:25:10,969 --> 00:25:15,619 address of en.wikipedia.org it reaches this whole DNS daemon and because we're at 250 00:25:15,619 --> 00:25:19,239 Congress here it tells you the closest data center is the one in Amsterdam, so 251 00:25:19,239 --> 00:25:25,759 esams and it's going to hit the edge, what we call load bouncers/router there, then 252 00:25:25,759 --> 00:25:31,929 going through TLS termination through nginx- and then it's going to hit the 253 00:25:31,929 --> 00:25:36,809 Varnish caching server, either frontend or backends and then you get a response and 254 00:25:36,809 --> 00:25:40,940 that's already it and nothing else is ever bothered again. It doesn't even reach any 255 00:25:40,940 --> 00:25:46,320 other data center which is very nice and so that's, you said around 90% of the 256 00:25:46,320 --> 00:25:52,419 requests we get, and if you're unlucky and the URL you requested is not in the 257 00:25:52,419 --> 00:25:57,400 Varnish in the Amsterdam data center then it gets forwarded to the eqiad data 258 00:25:57,400 --> 00:26:01,519 center, which is the primary one and there it still has a chance to hit the cache and 259 00:26:01,519 --> 00:26:04,840 perhaps this time it's there and then the response is going to get cached in the 260 00:26:04,840 --> 00:26:09,739 frontend, no, in the Amsterdam Varnish and you're also going to get a response and we 261 00:26:09,739 --> 00:26:13,639 still don't have to run any application stuff. If we do have to hit any 262 00:26:13,639 --> 00:26:17,450 application stuff and then Varnish is going to forward that, if it's 263 00:26:17,450 --> 00:26:22,970 upload.wikimedia.org, it goes to the media storage Swift, if it's any other domain it 264 00:26:22,970 --> 00:26:28,450 goes to MediaWiki and then MediaWiki does a ton of work to connect to the database, 265 00:26:28,450 --> 00:26:33,529 in this case the first shard for English Wikipedia, get the wiki text from there, 266 00:26:33,529 --> 00:26:38,599 get the wiki text of all the related pages and templates. No, wait I forgot 267 00:26:38,599 --> 00:26:43,519 something. First it checks if the HTML for this page is available in parser cache, so 268 00:26:43,519 --> 00:26:46,909 that's another caching layer, and this application cache - this parser cache 269 00:26:46,909 --> 00:26:53,529 might either be memcached or the database cache behind it and if it's not there, 270 00:26:53,529 --> 00:26:57,679 then it has to go get the wikitext, get all the related things and render that 271 00:26:57,679 --> 00:27:03,679 into HTML which takes a long time and goes through some pretty ancient code and if 272 00:27:03,679 --> 00:27:07,779 you are doing an edit or an upload, it's even worse, because then always has to go 273 00:27:07,779 --> 00:27:13,969 to MediaWiki and then it not only has to store this new edit, either in the media 274 00:27:13,969 --> 00:27:19,629 back-end or in the database, it also has update a bunch of stuff, like, especially 275 00:27:19,629 --> 00:27:25,200 if you-- first of all, it has to purge the cache, it has to tell all the Varnish 276 00:27:25,200 --> 00:27:28,999 servers that there's a new version of this URL available so that it doesn't take a 277 00:27:28,999 --> 00:27:33,940 full day until the time-to-live expires. It also has to update a bunch of things, 278 00:27:33,940 --> 00:27:38,639 for example, if you edited a template, it might have been used in a million pages 279 00:27:38,639 --> 00:27:43,750 and the next time anyone requests one of those million pages, those should also 280 00:27:43,750 --> 00:27:49,019 actually be rendered again using the new version of the template so it has to 281 00:27:49,019 --> 00:27:54,149 invalidate the cache for all of those and all that is deferred through the job queue 282 00:27:54,149 --> 00:28:01,440 and it might have to calculate thumbnails if you uploaded the file or create a - 283 00:28:01,440 --> 00:28:06,609 retranscode media files because maybe you uploaded in - what do we support? - you 284 00:28:06,609 --> 00:28:09,839 upload in WebM and the browser only supports some other media codec or 285 00:28:09,839 --> 00:28:12,869 something, we transcode that and also encode it down to the different 286 00:28:12,869 --> 00:28:19,740 resolutions, so then it goes through that whole dance and, yeah, that was already 287 00:28:19,740 --> 00:28:23,769 those slides. Is Amir going to talk again about how we manage - 288 00:28:23,769 --> 00:28:29,519 Amir: I mean okay yeah I quickly come back just for a short break to talk about 289 00:28:29,519 --> 00:28:36,690 managing to manage because managing 100- 1300 bare metal hardware plus a Kubernetes 290 00:28:36,690 --> 00:28:42,700 cluster is not easy, so what we do is that we use Puppet for configuration 291 00:28:42,700 --> 00:28:48,220 management in our bare metal systems, it's fun, five to 50,000 lines of Puppet code. I 292 00:28:48,220 --> 00:28:52,119 mean, lines of code is not a great indicator but you can roughly get an 293 00:28:52,119 --> 00:28:59,149 estimate of how its things work and we have 100,000 lines of Ruby and we have our 294 00:28:59,149 --> 00:29:04,429 CI and CD cluster, we have so we don't store anything in GitHub or GitLab, we 295 00:29:04,429 --> 00:29:10,559 have our own system which is based on Gerrit and for that we have a system of 296 00:29:10,559 --> 00:29:15,539 Jenkins and the Jenkins does all of this kind of things and also because we have a 297 00:29:15,539 --> 00:29:21,960 Kubernetes cluster for services, some of our services, if you make a merger change 298 00:29:21,960 --> 00:29:26,440 in the Gerrit it also builds the Docker files and containers and push it up to the 299 00:29:26,440 --> 00:29:35,440 production and also in order to run remote SSH commands, we have cumin that's like in 300 00:29:35,440 --> 00:29:39,200 the house automation and we built this farm for our systems and for example you 301 00:29:39,200 --> 00:29:45,570 go there and say ok we pull this node or run this command in all of the data 302 00:29:45,570 --> 00:29:52,889 Varnish nodes that I told you like you want to restart them. And with this I get 303 00:29:52,889 --> 00:29:57,899 back to Lucas. Lucas: So, I am going to talk a bit more 304 00:29:57,899 --> 00:30:01,929 about Wikimedia Cloud Services which is a bit different in that it's not really our 305 00:30:01,929 --> 00:30:06,269 production stuff but it's where you people, the volunteers of the Wikimedia 306 00:30:06,269 --> 00:30:11,489 movement can run their own code, so you can request a project which is kind of a 307 00:30:11,489 --> 00:30:15,509 group of users and then you get assigned a pool of you have this much CPU and this 308 00:30:15,509 --> 00:30:20,999 much RAM and you can create virtual machines with those resources and then do 309 00:30:20,999 --> 00:30:29,119 stuff there and run basically whatever you want, to create and boot and shut down the 310 00:30:29,119 --> 00:30:33,360 VMs and stuff we use OpenStack and there's a Horizon frontend for that which you use 311 00:30:33,360 --> 00:30:36,409 through the browser and it's largely out all the time but otherwise it works pretty 312 00:30:36,409 --> 00:30:42,619 well. Internally, ideally you manage the VMs using Puppet but a lot of people just 313 00:30:42,619 --> 00:30:47,860 SSH in and then do whatever they need to set up the VM manually and it happens, 314 00:30:47,860 --> 00:30:52,759 well, and there's a few big projects like Toolforge where you can run your own web- 315 00:30:52,759 --> 00:30:57,499 based tools or the beta cluster which is basically a copy of some of the biggest 316 00:30:57,499 --> 00:31:02,499 wikis like there's a beta English Wikipedia, beta Wikidata, beta Wikimedia 317 00:31:02,499 --> 00:31:08,320 Commons using mostly the same configuration as production but using the 318 00:31:08,320 --> 00:31:12,450 current master version of the software instead of whatever we deploy once a week so 319 00:31:12,450 --> 00:31:15,840 if there's a bug, we see it earlier hopefully, even if we didn't catch it 320 00:31:15,840 --> 00:31:20,279 locally, because the beta cluster is more similar to the production environment and 321 00:31:20,279 --> 00:31:24,230 also the continuous - continuous integration service run in Wikimedia Cloud 322 00:31:24,230 --> 00:31:28,979 Services as well. Yeah and also you have to have Kubernetes somewhere on these 323 00:31:28,979 --> 00:31:33,609 slides right, so you can use that to distribute work between the tools in 324 00:31:33,609 --> 00:31:37,179 Toolforge or you can use the grid engine which does a similar thing but it's like 325 00:31:37,179 --> 00:31:42,519 three decades old and through five forks now I think the current fork we use is son 326 00:31:42,519 --> 00:31:46,999 of grid engine and I don't know what it was called before, but that's Cloud 327 00:31:46,999 --> 00:31:54,789 Services. Amir: So in a nutshell, this is our - our 328 00:31:54,789 --> 00:32:01,090 systems. We have 1300 bare metal services with lots and lots of caching, like lots 329 00:32:01,090 --> 00:32:06,919 of layers of caching, because mostly we serves read and we can just keep them as a 330 00:32:06,919 --> 00:32:12,179 cached version and all of this is open source, you can contribute to it, if you 331 00:32:12,179 --> 00:32:18,089 want to and there's a lot of configuration is also open and I - this is the way I got 332 00:32:18,089 --> 00:32:21,940 hired like I open it started contributing to the system I feel like yeah we can- 333 00:32:21,940 --> 00:32:31,549 come and work for us, so this is a - Daniel: That's actually how all of us got 334 00:32:31,549 --> 00:32:38,350 hired. Amir: So yeah, and this is the whole thing 335 00:32:38,350 --> 00:32:47,570 that happens in Wikimedia and if you want to - no, if you want to help us, we are 336 00:32:47,570 --> 00:32:51,419 hiring. You can just go to jobs at wikimedia.org, if you want to work for 337 00:32:51,419 --> 00:32:54,379 Wikimedia Foundation. If you want to work with Wikimedia Deutschland, you can go to 338 00:32:54,379 --> 00:32:59,179 wikimedia.de and at the bottom there's a link for jobs because the links got too 339 00:32:59,179 --> 00:33:03,469 long. If you can contribute, if you want to contribute to us, there is so many ways 340 00:33:03,469 --> 00:33:07,929 to contribute, as I said, there's so many bugs, we have our own graphical system, 341 00:33:07,929 --> 00:33:12,721 you can just look at the monitor and a Phabricator is our bug tracker, you can 342 00:33:12,721 --> 00:33:20,639 just go there and find the bug and fix things. Actually, we have one repository 343 00:33:20,639 --> 00:33:26,469 that is private but it only holds the certificate for as TLS and things that are 344 00:33:26,469 --> 00:33:31,499 really really private then we cannot remove them. But also there are 345 00:33:31,499 --> 00:33:33,779 documentations, the documentation for infrastructure is at 346 00:33:33,779 --> 00:33:40,409 wikitech.wikimedia.org and documentation for configuration is at noc.wikimedia.org 347 00:33:40,409 --> 00:33:46,599 plus the documentation of our codebase. The documentation for MediaWiki itself is 348 00:33:46,599 --> 00:33:52,989 at mediawiki.org and also we have a our own system of URL shortener you can go to 349 00:33:52,989 --> 00:33:58,789 w.wiki and short and shorten any URL in Wikimedia structure so we reserved the 350 00:33:58,789 --> 00:34:08,779 dollar sign for the donate site and yeah, you have any questions, please. 351 00:34:08,779 --> 00:34:16,540 *Applause* 352 00:34:16,540 --> 00:34:21,679 Daniel: It's if you know we have quite a bit of time for questions so if anything wasn't 353 00:34:21,679 --> 00:34:27,149 clear or they're curious about anything please, please ask. 354 00:34:27,149 --> 00:34:37,200 AM: So one question what is not in the presentation. Do you have any efforts with 355 00:34:37,200 --> 00:34:42,460 hacking attacks? Amir: So the first rule of security issues 356 00:34:42,460 --> 00:34:49,210 is that we don't talk about security issues but let's say this baby has all sorts of 357 00:34:49,210 --> 00:34:56,240 attacks happening, we have usually we have DDo. Once there was happening a couple of 358 00:34:56,240 --> 00:34:59,819 months ago that was very successful. I don't know if you read the news about 359 00:34:59,819 --> 00:35:05,200 that, but we also, we have a infrastructure to handle this, we have a security team 360 00:35:05,200 --> 00:35:12,740 that handles these cases and yes. AM: Hello how do you manage access to your 361 00:35:12,740 --> 00:35:20,069 infrastructure from your employees? Amir: So it's SS-- so we have a LDAP 362 00:35:20,069 --> 00:35:25,390 group and LDAP for the web-based systems but for SSH and for this ssh we 363 00:35:25,390 --> 00:35:30,660 have strict protocols and then you get a private key and some people usually 364 00:35:30,660 --> 00:35:35,480 protect their private key using UV keys and then you have you can SSH to the 365 00:35:35,480 --> 00:35:40,420 system basically. Lucas: Yeah, well, there's some 366 00:35:40,420 --> 00:35:44,720 firewalling setup but there's only one server for data center that you can 367 00:35:44,720 --> 00:35:48,221 actually reach through SSH and then you have to tunnel through that to get to any 368 00:35:48,221 --> 00:35:51,359 other server. Amir: And also, like, we have we have a 369 00:35:51,359 --> 00:35:55,500 internal firewall and it's basically if you go to the inside of the production you 370 00:35:55,500 --> 00:36:01,450 cannot talk to the outside. You even, you for example do git clone github.org, it 371 00:36:01,450 --> 00:36:07,200 doesn't, github.com doesn't work. It only can access tools that are for inside 372 00:36:07,200 --> 00:36:13,390 Wikimedia Foundation infrastructure. AM: Okay, hi, you said you do TLS 373 00:36:13,390 --> 00:36:18,640 termination through nginx, do you still allow non-HTTPS so it should be non-secure access. 374 00:36:18,640 --> 00:36:22,780 Amir: No we dropped it a really long time ago but also 375 00:36:22,780 --> 00:36:25,069 Lucas: 2013 or so Amir: Yeah, 2015 376 00:36:25,069 --> 00:36:28,651 Lucas: 2015 Amir: 2013 started serving the most of the 377 00:36:28,651 --> 00:36:35,740 traffic but 15, we dropped all of the HTTP- non-HTTPS protocols and recently even 378 00:36:35,740 --> 00:36:43,940 dropped and we are not serving any SSL requests anymore and TLS 1.1 is also being 379 00:36:43,940 --> 00:36:48,460 phased out, so we are sending you a warning to the users like you're using TLS 1.1, 380 00:36:48,460 --> 00:36:54,810 please migrate to these new things that came out around 10 years ago, so yeah 381 00:36:54,810 --> 00:36:59,849 Lucas: Yeah I think the deadline for that is like February 2020 or something then 382 00:36:59,849 --> 00:37:04,710 we'll only have TLS 1.2 Amir: And soon we are going to support TLS 383 00:37:04,710 --> 00:37:06,640 1.3 Lucas: Yeah 384 00:37:06,640 --> 00:37:12,460 Are there any questions? Q: so does read-only traffic 385 00:37:12,460 --> 00:37:18,029 from logged in users hit all the way through to the parser cache or is there 386 00:37:18,029 --> 00:37:22,280 another layer of caching for that? Amir: Yes we, you bypass all of 387 00:37:22,280 --> 00:37:28,470 that, you can. Daniel: We need one more microphone. Yes, 388 00:37:28,470 --> 00:37:33,869 it actually does and this is a pretty big problem and something we want to look into 389 00:37:33,869 --> 00:37:38,930 *clears throat* but it requires quite a bit of rearchitecting. If you are 390 00:37:38,930 --> 00:37:44,250 interested in this kind of thing, maybe come to my talk tomorrow at noon. 391 00:37:44,250 --> 00:37:48,819 Amir: Yeah one reason we can, we are planning to do is active active so we have 392 00:37:48,819 --> 00:37:56,500 two primaries and the read request gets request - from like the users can hit 393 00:37:56,500 --> 00:37:58,460 their secondary data center instead of the main one. 394 00:37:58,460 --> 00:38:03,990 Lucas: I think there was a question way in the back there, for some time already 395 00:38:03,990 --> 00:38:13,950 AM: Hi, I got a question. I read on the Wikitech that you are using karate as a 396 00:38:13,950 --> 00:38:19,040 validation platform for some parts, can you tell us something about this or what 397 00:38:19,040 --> 00:38:24,619 parts of Wikipedia or Wikimedia are hosted on this platform? 398 00:38:24,619 --> 00:38:29,589 Amir: I am I'm not oh sorry so I don't know this kind of very very sure but take 399 00:38:29,589 --> 00:38:34,390 it with a grain of salt but as far as I know karate is used to build a very small 400 00:38:34,390 --> 00:38:39,829 VMs in productions that we need for very very small micro sites that we serve to 401 00:38:39,829 --> 00:38:45,619 the users. So we built just one or two VMs, we don't use it very as often as I think 402 00:38:45,619 --> 00:38:54,819 so. AM: Do you also think about open hardware? 403 00:38:54,819 --> 00:39:03,950 Amir: I don't, you can Daniel: Not - not for servers. I think for 404 00:39:03,950 --> 00:39:07,500 the offline Reader project, but this is not actually run by the Foundation, it's 405 00:39:07,500 --> 00:39:10,289 supported but it's not something that the Foundation does. They were sort of 406 00:39:10,289 --> 00:39:15,100 thinking about open hardware but really open hardware in practice usually means, 407 00:39:15,100 --> 00:39:19,609 you - you don't, you know, if you really want to go down to the chip design, it's 408 00:39:19,609 --> 00:39:25,210 pretty tough, so yeah, it's- it's it- it's usually not practical, sadly. 409 00:39:25,210 --> 00:39:31,660 Amir: And one thing I can say but this is that we have a some machine - machines that 410 00:39:31,660 --> 00:39:37,150 are really powerful that we give to the researchers to run analysis on the between 411 00:39:37,150 --> 00:39:43,369 this itself and we needed to have GPUs for those but the problem was - was there 412 00:39:43,369 --> 00:39:49,109 wasn't any open source driver for them so we migrated and use AMD I think, but AMD 413 00:39:49,109 --> 00:39:53,609 didn't fit in the rack it was a quite a endeavor to get it to work for our 414 00:39:53,609 --> 00:40:03,710 researchers to help you CPU. AM: I'm still impressed that you answer 415 00:40:03,710 --> 00:40:10,920 90% out of the cache. Do all people access the same pages or is the cache that huge? 416 00:40:10,920 --> 00:40:21,160 So what percentage of - of the whole database is in the cache then? 417 00:40:21,160 --> 00:40:29,760 Daniel: I don't have the exact numbers to be honest, but a large percentage of the 418 00:40:29,760 --> 00:40:36,769 whole database is in the cache. I mean it expires after 24 hours so really obscure 419 00:40:36,769 --> 00:40:43,430 stuff isn't there but I mean it's- it's a- it's a- it's a power-law distribution 420 00:40:43,430 --> 00:40:47,890 right? You have a few pages that are accessed a lot and you have many many many 421 00:40:47,890 --> 00:40:55,420 pages that are not actually accessed at all for a week or so except maybe for a 422 00:40:55,420 --> 00:41:01,740 crawler, so I don't know a number. My guess would be it's less than 50% that is 423 00:41:01,740 --> 00:41:06,520 actually cached but, you know, that still covers 90%-- it's probably the top 10% of 424 00:41:06,520 --> 00:41:11,630 pages would still cover 90% of the pageviews, but I don't-- this would be 425 00:41:11,630 --> 00:41:15,509 actually-- I should look this up, it would be interesting numbers to have, yes. 426 00:41:15,509 --> 00:41:20,710 Lucas: Do you know if this is 90% of the pageviews or 90% of the get requests 427 00:41:20,710 --> 00:41:24,279 because, like, requests for the JavaScript would also be cached more often, I assume 428 00:41:24,279 --> 00:41:27,529 Daniel: I would expect that for non- pageviews, it's even higher 429 00:41:27,529 --> 00:41:30,010 Lucas: Yeah Daniel: Yeah, because you know all the 430 00:41:30,010 --> 00:41:34,150 icons and- and, you know, JavaScript bundles and CSS and stuff doesn't ever 431 00:41:34,150 --> 00:41:40,309 change Lucas: I'm gonna say for every 180 min 90% 432 00:41:40,309 --> 00:41:50,790 but there's a question back there AM: Hey. Do your data centers run on green 433 00:41:50,790 --> 00:41:55,220 energy? Amir: Very valid question. So, the 434 00:41:55,220 --> 00:42:03,450 Amsterdam city n1 is a full green but the other ones are partially green, partially 435 00:42:03,450 --> 00:42:10,840 coal and like gas. As far as I know, there are some plans to make them move away from 436 00:42:10,840 --> 00:42:15,170 it but the other hand we realized that if we don't produce as much as a carbon 437 00:42:15,170 --> 00:42:21,349 emission because we don't have much servers and we don't use much data, there was a 438 00:42:21,349 --> 00:42:26,789 summation and that we realized our carbon emission is basically as the same as 200 439 00:42:26,789 --> 00:42:34,720 and in the datacenter plus all of their travel that all of this have to and all of 440 00:42:34,720 --> 00:42:37,880 the events is 250 households, it's very very small it's I think it's one 441 00:42:37,880 --> 00:42:44,890 thousandth of the comparable traffic with Facebook even if you just cut 442 00:42:44,890 --> 00:42:50,650 down with the same traffic because Facebook collects the data, it runs very 443 00:42:50,650 --> 00:42:54,269 sophisticated machine learning algorithms that's that's a real complicate, but for 444 00:42:54,269 --> 00:43:01,119 Wikimedia, we don't do this so we don't need much energy. Does - does the answer 445 00:43:01,119 --> 00:43:04,920 your question? Herald: Do we have any other 446 00:43:04,920 --> 00:43:15,720 questions left? Yeah sorry AM: hi how many developers do you need to 447 00:43:15,720 --> 00:43:19,789 maintain the whole infrastructure and how many developers or let's say head 448 00:43:19,789 --> 00:43:24,500 developer hours you needed to build the whole infrastructure like the question is 449 00:43:24,500 --> 00:43:29,329 because what I find very interesting about the talk it's a non-profit, so as an 450 00:43:29,329 --> 00:43:34,109 example for other nonprofits is how much money are we talking about in order to 451 00:43:34,109 --> 00:43:38,760 build something like this as a digital common. 452 00:43:45,630 --> 00:43:48,980 Daniel: If this is just about actually running all this so just operations is 453 00:43:48,980 --> 00:43:53,530 less than 20 people I think which makes if you if you basically divide the requests 454 00:43:53,530 --> 00:43:59,869 per second by people you get to something like 8,000 requests per second per 455 00:43:59,869 --> 00:44:04,369 operations engineer which I think is a pretty impressive number. This is probably 456 00:44:04,369 --> 00:44:09,809 a lot higher I would I would really like to know if there's any organization that 457 00:44:09,809 --> 00:44:17,270 tops that. I don't actually know the whole the the actual operations budget I know is 458 00:44:17,270 --> 00:44:24,559 it two two-digit millions annually. Total hours for building this over the last 18 459 00:44:24,559 --> 00:44:29,069 years, I have no idea. For the for the first five or so years, the people doing 460 00:44:29,069 --> 00:44:34,609 it were actually volunteers. We still had volunteer database administrators and 461 00:44:34,609 --> 00:44:42,160 stuff until maybe ten years ago, eight years ago, so yeah it's really nobody 462 00:44:42,160 --> 00:44:44,589 did any accounting of this I can only guess. 463 00:44:56,669 --> 00:45:03,810 AM: Hello a tools question. I a few years back I saw some interesting examples of 464 00:45:03,810 --> 00:45:09,089 saltstack use for Wikimedia but right now I see only Puppet that come in mentioned 465 00:45:09,089 --> 00:45:17,819 so kind of what happened with that Amir: I think we dished saltstack you - 466 00:45:17,819 --> 00:45:22,970 I don't I cannot because none of us are in the Cloud Services team and I don't think 467 00:45:22,970 --> 00:45:27,380 I can answer you but if you look at the wikitech.wikimedia.org, it's 468 00:45:27,380 --> 00:45:30,869 probably if last time I checked says like it's deprecated and obsolete we don't use 469 00:45:30,869 --> 00:45:32,144 it anymore. 470 00:45:37,394 --> 00:45:39,920 AM: Do you use the bat-ropes like the top 471 00:45:39,920 --> 00:45:46,130 runners to fill spare capacity on the web serving servers or do you have dedicated 472 00:45:46,130 --> 00:45:51,589 servers for the roles. Lucas: I think they're dedicated. 473 00:45:51,589 --> 00:45:56,390 Amir: The job runners if you're asking job runners are dedicated yes they are they are I 474 00:45:56,390 --> 00:46:02,910 think 5 per primary data center so Daniel: Yeah they don't, I mean do we do we 475 00:46:02,910 --> 00:46:06,559 actually have any spare capacity on anything? We don't have that much hardware 476 00:46:06,559 --> 00:46:08,700 everything is pretty much at a hundred percent. 477 00:46:08,700 --> 00:46:14,109 Lucas: I think we still have some server that is just called misc1111 or something 478 00:46:14,109 --> 00:46:18,620 which run five different things at once, you can look for those on wikitech. 479 00:46:18,620 --> 00:46:25,820 Amir: But but we go oh sorry it's not five it's 20 per data center 20 per primary 480 00:46:25,820 --> 00:46:31,440 data center that's our job runner and they run 700 jobs per second. 481 00:46:31,440 --> 00:46:35,690 Lucas: And I think that does not include the video scaler so those are separate 482 00:46:35,690 --> 00:46:38,109 again Amir: No, they merged them in like a month 483 00:46:38,109 --> 00:46:40,040 ago Lucas: Okay, cool 484 00:46:47,470 --> 00:46:51,420 AM: Maybe a little bit off topic that can tell us a little bit about decision making 485 00:46:51,420 --> 00:46:55,750 process for- for technical decision, architecture decisions, how does it work 486 00:46:55,750 --> 00:47:01,890 in an organization like this: decision making process for architectural 487 00:47:01,890 --> 00:47:03,409 decisions for example. 488 00:47:08,279 --> 00:47:11,009 Daniel: Yeah so Wikimedia has a 489 00:47:11,009 --> 00:47:16,539 committee for making high-level technical decisions, it's called a Wikimedia 490 00:47:16,539 --> 00:47:23,609 Technical Committee, techcom and we run an RFC process so any decision that is a 491 00:47:23,609 --> 00:47:27,540 cross-cutting strategic are especially hard to undo should go through this 492 00:47:27,540 --> 00:47:33,579 process and it's pretty informal, basically you file a ticket and start 493 00:47:33,579 --> 00:47:38,000 this process. It gets announced in the mailing list, hopefully you get 494 00:47:38,000 --> 00:47:45,009 input and feedback and at some point it is it's approved for implementation. We're 495 00:47:45,009 --> 00:47:48,640 currently looking into improving this process, it's not- sometimes it works 496 00:47:48,640 --> 00:47:52,200 pretty well, sometimes things don't get that much feedback but it still it makes 497 00:47:52,200 --> 00:47:55,890 sure that people are aware of these high- level decisions 498 00:47:55,890 --> 00:47:59,790 Amir: Daniel is the chair of that committee 499 00:48:02,160 --> 00:48:07,839 Daniel: Yeah, if you want to complain about the process, please do. 500 00:48:13,549 --> 00:48:21,440 AM: yes regarding CI and CD across along the pipeline, of course with that much traffic 501 00:48:21,440 --> 00:48:27,359 you want to keep everything consistent right. So is there any testing 502 00:48:27,359 --> 00:48:32,150 strategies that you have said internally, like of course unit tests integration 503 00:48:32,150 --> 00:48:35,790 tests but do you do something like continuous end to end testing on beta 504 00:48:35,790 --> 00:48:40,100 instances? Amir: So if we have beta cluster but also 505 00:48:40,100 --> 00:48:44,670 we do deploy, we call it train and so we deploy once a week, all of the changes 506 00:48:44,670 --> 00:48:50,349 gets merged to one, like a branch and the branch gets cut in every Tuesday and it 507 00:48:50,349 --> 00:48:54,680 first goes to the test wikis and then it goes to all of the wikis that are 508 00:48:54,680 --> 00:48:59,270 not Wikipedia except Catalan and Hebrew Wikipedia. So basically Hebrew and Catalan 509 00:48:59,270 --> 00:49:03,759 Wikipedia volunteer to be the guinea pigs of the next wikis and if everything works 510 00:49:03,759 --> 00:49:07,599 fine usually it goes there and is like oh the fatal mater and we have a logging and 511 00:49:07,599 --> 00:49:12,579 then it's like okay we need to fix this and we fix it immediately and then it goes 512 00:49:12,579 --> 00:49:18,690 live to all wikis. This is one way of looking at it well so okay yeah 513 00:49:18,690 --> 00:49:23,279 Daniel: So, our test coverage is not as great as it should be and so we kind of, 514 00:49:23,279 --> 00:49:30,970 you know, abuse our users for this. We are, of course, working to improve this 515 00:49:30,970 --> 00:49:37,230 and one thing that we started recently is a program for creating end-to-end tests 516 00:49:37,230 --> 00:49:43,460 for all the API modules we have, in the hope that we can thereby cover pretty much 517 00:49:43,460 --> 00:49:49,849 all of the application logic bypassing the user interface. I mean, full end-to-end 518 00:49:49,849 --> 00:49:52,770 should, of course, include the user interface but user interface tests are 519 00:49:52,770 --> 00:49:58,180 pretty brittle and often tests you know where things are on the screen and it just 520 00:49:58,180 --> 00:50:02,559 seems to us that it makes a lot of sense to have more- to have tests that actually 521 00:50:02,559 --> 00:50:07,259 test the application logic for what the system actually should be doing, rather 522 00:50:07,259 --> 00:50:15,910 than what it should look like and, yeah, we are currently working on making- so 523 00:50:15,910 --> 00:50:20,210 yeah, basically this has been a proof of concept and we're currently working to 524 00:50:20,210 --> 00:50:27,079 actually integrate it in- in CI. That perhaps should land once everyone is back 525 00:50:27,079 --> 00:50:34,560 from the vacations and then we have to write about a thousand or so tests, I 526 00:50:34,560 --> 00:50:37,930 guess. Lucas: I think there's also a plan to move 527 00:50:37,930 --> 00:50:42,559 to a system where we actually deploy basically after every commit and can 528 00:50:42,559 --> 00:50:45,910 immediately roll back if something goes wrong but that's more midterm stuff and 529 00:50:45,910 --> 00:50:48,339 I'm not sure what the current status of that proposal is 530 00:50:48,339 --> 00:50:50,450 Amir: And it will be in Kubernetes, so it will be completely different 531 00:50:50,450 --> 00:50:55,529 Daniel: That would be amazing Lucas: But right now, we are on this 532 00:50:55,529 --> 00:50:59,730 weekly basis, if something goes wrong, we roll back to the last week's version of 533 00:50:59,730 --> 00:51:06,049 the code Herald: Are there are any questions- 534 00:51:06,049 --> 00:51:18,549 questions left? Sorry. Yeah. Okay, um, I don't think so. So, yeah, thank you for 535 00:51:18,549 --> 00:51:25,329 this wonderful talk. Thank you for all your questions. Um, yeah, I hope you liked 536 00:51:25,329 --> 00:51:29,750 it. Um, see you around, yeah. 537 00:51:29,750 --> 00:51:33,725 *Applause* 538 00:51:33,725 --> 00:51:39,270 *Music* 539 00:51:39,270 --> 00:52:01,000 Subtitles created by c3subtitles.de in the year 2021. Join, and help us!