1 00:00:00,000 --> 00:00:19,090 *36c3 preroll music* 2 00:00:19,090 --> 00:00:24,929 Herald: Our next talk will be "The ultimate Acorn Archimedes Talk", in which 3 00:00:24,929 --> 00:00:28,819 there will be spoken about everything about the Archimedes computer. There's a 4 00:00:28,819 --> 00:00:33,360 promise in advance that there will be no heureka jokes in there. Give a warm 5 00:00:33,360 --> 00:00:35,483 welcome to Matt Evans. 6 00:00:35,483 --> 00:00:40,790 *applause* 7 00:00:40,790 --> 00:00:48,060 Matt Evans: Thank you. Okay. Little bit of retro computing first thing in the 8 00:00:48,060 --> 00:00:54,949 morning, sort of. Welcome. My name is Matt Evans. The Acorn Archimedes was my 9 00:00:54,949 --> 00:00:59,379 favorite computer when I was a small hacker and I'm privileged to be able to 10 00:00:59,379 --> 00:01:04,780 talk a bit little bit about it with you today. Let's start with: What is an Acorn 11 00:01:04,780 --> 00:01:08,720 Archimedes? So I'd like an interactive session, I'm afraid. Please indulge me, 12 00:01:08,720 --> 00:01:15,130 like a show of hands. Who's heard of the Acorn Archimedes before? Ah, OK, maybe 50, 13 00:01:15,130 --> 00:01:23,090 60%. Who has used one? Maybe 10%, maybe. Okay. Who has programs - 14 00:01:23,090 --> 00:01:30,139 who has coded on an Archimedes? Maybe half? Two, three people. Great. Okay. 15 00:01:30,139 --> 00:01:34,180 Three. *laughs* Okay, so a small percentage. I don't see these machines as 16 00:01:34,180 --> 00:01:39,650 being as famous as say the Apple Macintosh or IBM PC. And certainly outside of Europe 17 00:01:39,650 --> 00:01:44,030 they were not that common. So this is kind of interesting just how many people here 18 00:01:44,030 --> 00:01:49,840 have seen this. So it was the first ARM- based computer. This is an astonishingly 19 00:01:49,840 --> 00:01:55,530 1980s - I think one of them is drawing, actually. But they're not just the first 20 00:01:55,530 --> 00:02:01,439 ARM-based machine, but the machine that the ARM was originally designed to drive. 21 00:02:01,439 --> 00:02:07,230 It's a... Is that a comment for me? Mic? 22 00:02:07,230 --> 00:02:13,750 I'm being heckled already. It's only slide two. Let's see how this goes. So it's a 23 00:02:13,750 --> 00:02:18,849 two box computer. It looks a bit like a Mega S.T. ... to me. Its main unit with 24 00:02:18,849 --> 00:02:26,480 the processor and disks and expansion cards and so on. Now this is an A3000. 25 00:02:26,480 --> 00:02:30,519 This is mine, in fact, and I didn't bother to clean it before taking the photo. And 26 00:02:30,519 --> 00:02:33,335 now it's on this huge screen. That was a really bad idea. You can see all the 27 00:02:33,335 --> 00:02:37,429 disgusting muck in the keyboard. It has a bit of ink on it, I don't know why. But 28 00:02:37,429 --> 00:02:41,660 this machine is 30 years old. And this was luckily my machine, as I said, as 29 00:02:41,660 --> 00:02:45,069 a small hacker. And this is why I'm doing the talk today. This had a big influence 30 00:02:45,069 --> 00:02:52,540 on me. I'd like to say as a person, but more as an engineer. In terms of what my 31 00:02:52,540 --> 00:02:57,170 programing experience when I was learning to program and so on. So I live and work 32 00:02:57,170 --> 00:03:02,040 in Cambridge in the U.K., where this machine was designed. And through the 33 00:03:02,040 --> 00:03:05,470 funny sort of turn of events, I ended up there and actually work in the building 34 00:03:05,470 --> 00:03:09,310 next to the building where this was designed. And a bunch of the people that 35 00:03:09,310 --> 00:03:13,720 were on that original team that designed this system are still around and 36 00:03:13,720 --> 00:03:18,280 relatively contactable. And I thought this is a good opportunity to get on the phone 37 00:03:18,280 --> 00:03:21,760 and call them up or go for a beer with a couple of them and ask them: Why are 38 00:03:21,760 --> 00:03:25,280 things the way they are? There's all sorts of weird quirks to this machine. I was 39 00:03:25,280 --> 00:03:28,901 always wondering this, for 20 years. Can you please tell me - why did you do it 40 00:03:28,901 --> 00:03:33,330 this way? And they were a really good bunch of people. So I talked to Steve Ferber, 41 00:03:33,330 --> 00:03:37,790 who led the hardware design, Sophie Wilson, who was the same with software. 42 00:03:37,790 --> 00:03:43,350 Tudor Brown, who did the video system. Mike Miller, the IO system. John Biggs and 43 00:03:43,350 --> 00:03:46,489 Jamie Urquhart , who did the silicon design, I spoiled one of the 44 00:03:46,489 --> 00:03:50,140 surprises here. There's been some silicon design that's gone on in building this 45 00:03:50,140 --> 00:03:55,060 Acorn. And they were all wonderful people that gave me their time and told me a 46 00:03:55,060 --> 00:03:59,550 bunch of anecdotes that I will pass on to you. So I'm going to talk about the 47 00:03:59,550 --> 00:04:04,520 classic Arc. There's a bunch of different machines that Acorn built into the 1990s. 48 00:04:04,520 --> 00:04:08,960 But the ones I'm talking about started in 1987. There were 2 models, effectively a 49 00:04:08,960 --> 00:04:14,970 low end and a high end. One had an option for a hard disk, 20 megabytes, 2300 50 00:04:14,970 --> 00:04:20,700 pounds, up to 4MB of RAM. They all share the same basic architecture, they're all 51 00:04:20,700 --> 00:04:25,820 basically the same. So the A3000 that I just showed you came out in 1989. That was 52 00:04:25,820 --> 00:04:29,600 the machine I had. Those again, the same. It had the memory controller slightly 53 00:04:29,600 --> 00:04:35,970 updated, was slightly faster. They all had an ARM 2. This was the released version of 54 00:04:35,970 --> 00:04:40,910 the ARM processor designed for this machine, at 8 MHz. And then finally in 55 00:04:40,910 --> 00:04:46,250 1990, what I call the last of the classic Arc, Archimedes, is the A540. This was the 56 00:04:46,250 --> 00:04:50,720 top end machine - could have up to 16 MB of memory, which is a fair bit 57 00:04:50,720 --> 00:04:57,600 even in 1990. It had a 30 MHz ARM 3. The ARM 3 was the evolution of the ARM 2, but 58 00:04:57,600 --> 00:05:02,130 with a cache and a lot faster. So this talk will be centered around how these 59 00:05:02,130 --> 00:05:08,820 machines work, not the more modern machines. So around 1987, what else 60 00:05:08,820 --> 00:05:13,760 was available? This is a random selection of machines. Apologies if your favorite 61 00:05:13,760 --> 00:05:18,490 machine is not on this list. It wouldn't fit on the slide otherwise. So at the 62 00:05:18,490 --> 00:05:22,110 start of the 80s, we had the exotic things like the Apple Lisa and the Apple Mac. 63 00:05:22,110 --> 00:05:28,720 Very expensive machines. The Amiga - I had to put in here. Started off relatively 64 00:05:28,720 --> 00:05:32,530 expensive because the Amiga 500 was, you know, very good value for money, very 65 00:05:32,530 --> 00:05:37,160 capable machine. But I'm comparing this more to PCs and Macs, because that was the 66 00:05:37,160 --> 00:05:41,950 sort of, you know, market it was going for. And although it was an expensive 67 00:05:41,950 --> 00:05:46,790 machine compared to Macintosh, it was pretty cheap. Even put NeXT Cube on there, 68 00:05:46,790 --> 00:05:49,890 I figured that... I'd heard that they were incredibly expensive. And actually 69 00:05:49,890 --> 00:05:53,640 compared to the Macintosh, they're not that expensive at all. Well I don't know 70 00:05:53,640 --> 00:05:57,930 which one I would have preferred. So the first question I asked them - the first 71 00:05:57,930 --> 00:06:02,970 thing they told me: Why was it built? I've used them in school and as I said, had one 72 00:06:02,970 --> 00:06:08,560 at home. But I was never really quite sure what it was for. And I think a lot of the 73 00:06:08,560 --> 00:06:11,850 Acorn marketing wasn't quite sure what it was for either. They told me it was the 74 00:06:11,850 --> 00:06:15,940 successor to the BBC Micro, this 8 bit machine. Lovely 6502 machine, incredibly 75 00:06:15,940 --> 00:06:20,100 popular, especially in the UK. And the goal was to make a machine that was 10 76 00:06:20,100 --> 00:06:23,770 times the performance of this. The successor would be 10 times faster at the 77 00:06:23,770 --> 00:06:29,680 same price. And the thing I didn't know is they had been inspired. The team Acorn had 78 00:06:29,680 --> 00:06:35,620 seen the Apple Lisa and the Xerox Star, which comes from the famous Xerox Alto, 79 00:06:35,620 --> 00:06:41,140 Xerox PARC, first GUI workstation in the 70s, monumental machine. They'd been 80 00:06:41,140 --> 00:06:44,690 inspired by these machines and they wanted to make something very similar. So this is 81 00:06:44,690 --> 00:06:49,190 the same story as the Macintosh. They wanted to make something that was desktop 82 00:06:49,190 --> 00:06:52,310 machine for business, for office automation, desktop publishing and that 83 00:06:52,310 --> 00:06:56,270 kind of thing. But I never really understood this before. So this was this 84 00:06:56,270 --> 00:07:01,650 inspiration came from the Xerox machines. It was supposed to be obviously a lot more 85 00:07:01,650 --> 00:07:06,680 affordable and a lot faster. So this is what happens when Acorn marketing gets 86 00:07:06,680 --> 00:07:12,020 hold of this vision. So Xerox Star on the left is this nice, sensible business 87 00:07:12,020 --> 00:07:15,212 machine. Someone's wearing nice, crisp suit *bumps microphon* banging their 88 00:07:15,212 --> 00:07:20,470 microphone - and it gets turned into the very Cambridge Tweed version on the right. 89 00:07:20,470 --> 00:07:24,410 It's apparently illegal to program one of these if you're not wearing a top hat. But 90 00:07:24,410 --> 00:07:28,850 no one told me that when I was a kid. And my court case comes up next week. So 91 00:07:28,850 --> 00:07:32,240 Cambridge is a bit of a funny place. And for those that been there, this picture on 92 00:07:32,240 --> 00:07:38,680 the right sums it all up. So they began Project A, which was build this new 93 00:07:38,680 --> 00:07:43,240 machine. And they looked at the alternatives. They looked at the 94 00:07:43,240 --> 00:07:49,560 processors that were available at that time, the 286, the 68 K, then that semi 95 00:07:49,560 --> 00:07:55,056 32016, which was an early 32 bit machine, a bit of a weird processor. And 96 00:07:55,056 --> 00:07:58,030 they all had something in common that they're ridiculously expensive and in 97 00:07:58,030 --> 00:08:02,760 Tudors words a bit crap. They weren't a lot faster than the BBC Micro. They're a 98 00:08:02,760 --> 00:08:06,620 lot more expensive. They're much more complicated in terms of the processor 99 00:08:06,620 --> 00:08:10,490 itself. But also the system around them was very complicated. They need lots of 100 00:08:10,490 --> 00:08:15,400 weird support chips. This just drove the price up of the system and it wasn't going 101 00:08:15,400 --> 00:08:20,400 to hit that 10 times performance, let alone at the same price point. They'd 102 00:08:20,400 --> 00:08:24,100 visited a couple of other companies designing their own custom silicon. They 103 00:08:24,100 --> 00:08:28,090 got this idea in about 1983. They were looking at some of the RISC papers coming 104 00:08:28,090 --> 00:08:31,330 out of Berkeley and they were quite impressed by what a bunch of grad students 105 00:08:31,330 --> 00:08:38,070 were doing. They managed to get a working RISC processor and they went to Western 106 00:08:38,070 --> 00:08:42,140 Design Center and looked at 6502 successors being design there. They had a 107 00:08:42,140 --> 00:08:45,210 positive experience. They saw a bunch of high school kids with Apple 2s doing 108 00:08:45,210 --> 00:08:48,930 silicon layout. And they though "OK, well". They'd never designed a CPU before 109 00:08:48,930 --> 00:08:53,310 at ACORN. ACORN hadn't done any custom silicon to this degree, but they were 110 00:08:53,310 --> 00:08:57,160 buoyed by this and they thought, okay, well, maybe RISC is the secret and we can 111 00:08:57,160 --> 00:09:02,250 do this. And this was not really the done thing in this timeframe and not for a 112 00:09:02,250 --> 00:09:05,890 company the size of ACORN, but they designed their computer from scratch. They 113 00:09:05,890 --> 00:09:09,200 designed all of the major pieces of silicon in this machine. And it wasn't 114 00:09:09,200 --> 00:09:12,380 about designing the ARM chip. Hey, we've got a processor core. What should we do 115 00:09:12,380 --> 00:09:16,000 with it? But it was about designing the machine that ARM and the history of that 116 00:09:16,000 --> 00:09:20,310 company has kind of benefited from. But this is all about designing the machine as 117 00:09:20,310 --> 00:09:26,710 a whole. They're a tiny team. They're a handful of people - about a dozen...ish 118 00:09:26,710 --> 00:09:30,780 that did the hardware design, a similar sort of order for software and operating 119 00:09:30,780 --> 00:09:36,210 systems on top, which is orders of magnitude different from IBM and Motorola 120 00:09:36,210 --> 00:09:40,950 and so forth that were designing computers at this time. RISC was the key. They 121 00:09:40,950 --> 00:09:44,323 needed to be incredibly simple. One of the other experiences they had was they went 122 00:09:44,323 --> 00:09:48,820 to a CISC processor design center. They had a team in a couple of hundred people 123 00:09:48,820 --> 00:09:52,650 and they were on revision H and it still had bugs and it was just this unwieldy, 124 00:09:52,650 --> 00:09:58,160 complex machine. So RISC was the secret. Steve Ferber has an interview somewhere. 125 00:09:58,160 --> 00:10:03,470 He jokes about ACORN management giving him two things. Special sauce was two things 126 00:10:03,470 --> 00:10:07,810 that no one else had: He'd no people and no money. So it had to be incredibly 127 00:10:07,810 --> 00:10:14,710 simple. It had to be built on a shoestring, as Jamie said to me. So there 128 00:10:14,710 --> 00:10:18,460 are lots of corners cut, but in the right way. I would say "corners cut", that 129 00:10:18,460 --> 00:10:23,220 sounds ungenerous. There's some very shrewd design decisions, always weighing 130 00:10:23,220 --> 00:10:30,210 up cost versus benefit. And I think they erred on the correct side for all of them. 131 00:10:30,210 --> 00:10:34,480 So Steve sent me this picture. That's he's got a cameo here. That's the outline of 132 00:10:34,480 --> 00:10:39,180 him in the reflection on the glass there. He's got this up in his office. So he 133 00:10:39,180 --> 00:10:43,630 led the hardware design of all of these chips at ACORN. Across the top, we've got 134 00:10:43,630 --> 00:10:49,450 the original ARM, the ARM 1, ARM 2 and the ARM 3 - guess the naming scheme - and the 135 00:10:49,450 --> 00:10:53,090 video controller, memory controller and IO controller. Think, sort of see their 136 00:10:53,090 --> 00:10:57,320 relative sizes and it's kind of pretty. This was also on a processor where you 137 00:10:57,320 --> 00:11:00,930 could really point at that and say, "oh, that's the register file and you can see 138 00:11:00,930 --> 00:11:07,210 the cache over there". You can't really do that nowadays with modern processors. So 139 00:11:07,210 --> 00:11:11,080 the bit about the specification, what it could do, the end product. So I mentioned 140 00:11:11,080 --> 00:11:16,850 they all had this ARM 2 8MHz, up to four MB of RAM, 26-bit addresses, remember 141 00:11:16,850 --> 00:11:21,670 that. That's weird. So a lot of 32-bit machines, had 32-bit addresses or the ones 142 00:11:21,670 --> 00:11:25,550 that we know today do. That wasn't the case here. And I'll explain why in a 143 00:11:25,550 --> 00:11:32,610 minute. The A540 had a updated CPU. The memory controller had an MMU, which was 144 00:11:32,610 --> 00:11:39,350 unusual for machines of the mid 80s. So it could support, the hardware would support 145 00:11:39,350 --> 00:11:45,620 virtual memory, page faults and so on. It had decent sound, it had 8-channel sound, 146 00:11:45,620 --> 00:11:49,460 hardware mixed and stereo. It was 8 bit, but it was logarithmic - so it was a bit 147 00:11:49,460 --> 00:11:53,240 like u-law, if anyone knows that - instead of PCM, so you got more precision at the 148 00:11:53,240 --> 00:11:58,300 low end and it sounded to me a little bit like 12 bit PCM sound. So this is quite 149 00:11:58,300 --> 00:12:04,840 good. Storage wise, it's the same floppy controller as the Atari S.T.. It's fairly 150 00:12:04,840 --> 00:12:09,690 boring. Hard disk controller was a horrible standard called ST506, MFM 151 00:12:09,690 --> 00:12:16,420 drives, which were very, very crude compared to disks we have today. Keyboard 152 00:12:16,420 --> 00:12:19,980 and mouse, nothing to write home about. I mean, it was a normal keyboard. It was 153 00:12:19,980 --> 00:12:23,430 nothing special going on there. And printer port, serial port and some 154 00:12:23,430 --> 00:12:29,380 expansion slots which, I'll outline later on. The thing I really liked 155 00:12:29,380 --> 00:12:32,650 about the ARC was the graphics capabilities. It's fairly capable, 156 00:12:32,650 --> 00:12:37,800 especially for a machine of that era and of the price. It just had a flat frame 157 00:12:37,800 --> 00:12:42,170 buffer so it didn't have sprites, which is unfortunate. It didn't have a blitter and 158 00:12:42,170 --> 00:12:47,270 a bitplanes and so forth. But the upshot of that is dead simple to program. It had 159 00:12:47,270 --> 00:12:52,320 a 256 color mode, 8 bits per pixel, so it's a byte, and it's all just laid out as 160 00:12:52,320 --> 00:12:55,890 a linear string of bytes. So it was dead easy to just write some really nice 161 00:12:55,890 --> 00:12:59,910 optimized code to just blit stuff to the screen. Part of the reason why there isn't 162 00:12:59,910 --> 00:13:05,090 a blitter is actually the CPU was so good at doing this. Colorwise, it's got 163 00:13:05,090 --> 00:13:10,620 paletted modes out of a 4096 color palette, same as the Amiga. It has this 164 00:13:10,620 --> 00:13:16,350 256 color mode, which is different. The big high end machines, the top end 165 00:13:16,350 --> 00:13:21,290 machines, the A540 and the A400 series could also do this very high res 1152 by 166 00:13:21,290 --> 00:13:24,235 900, which was more of a workstation resolution. If you bought a Sun 167 00:13:24,235 --> 00:13:28,140 workstation a Sun 3 in those days, could do this and some higher resolutions. But 168 00:13:28,140 --> 00:13:32,890 this is really not seen on computers that might have in the office or school or 169 00:13:32,890 --> 00:13:36,370 education at the end of the market. And it's quite clever the way they did that. 170 00:13:36,370 --> 00:13:40,450 I'll come back to that in a sec. But for me, the thing about the ARC: For the 171 00:13:40,450 --> 00:13:45,920 money, it was the fastest machine around. It was definitely faster than 386s and all 172 00:13:45,920 --> 00:13:49,548 the stuff that Motorola was doing at the time by quite a long way. It is almost 173 00:13:49,548 --> 00:13:53,580 eight times faster than a 68k at about the same clock speed. And it's to do with it's 174 00:13:53,580 --> 00:13:57,020 pipelineing and to do with it having a 32 bit word and a couple of other tricks 175 00:13:57,020 --> 00:14:01,070 again. I'll show you later on what the secret to that performance was. About 176 00:14:01,070 --> 00:14:04,850 minicomputer speed and compared to some of the other RISC machines at the time, it 177 00:14:04,850 --> 00:14:09,450 wasn't the first RISC in the world, it was the first cheap RISC and the first RISC 178 00:14:09,450 --> 00:14:14,020 machine that people could feasibly buy and have on their desks at work or in 179 00:14:14,020 --> 00:14:19,222 education. And if you compare it to something like the MIPS or the SPARC, it 180 00:14:19,222 --> 00:14:25,300 was not as fast as a MIPS or SPARC chip. It was also a lot smaller, a lot cheaper. 181 00:14:25,300 --> 00:14:29,240 Both of those other processers had very big Die. They needed other support chips. 182 00:14:29,240 --> 00:14:33,040 They had huge packages, lots of pins, lots of cooling requirements. So all this 183 00:14:33,040 --> 00:14:36,180 really added up. So I priced up a Sun 4 workstation at the time and 184 00:14:36,180 --> 00:14:40,050 it was well over four times the price of one of these machines. And that was before 185 00:14:40,050 --> 00:14:44,400 you add on extras such as disks and network interfaces and things like that. 186 00:14:44,400 --> 00:14:47,480 So it's very good, very competitive for the money. And if you think about building 187 00:14:47,480 --> 00:14:50,140 a cluster, then you could get a lot more throughput, you could network them 188 00:14:50,140 --> 00:14:56,980 together. So this is about as far as I got when I was a youngster, I was wasn't brave 189 00:14:56,980 --> 00:15:03,230 enough to really take the machine apart and poke around. Fortunately, now it's 30 190 00:15:03,230 --> 00:15:07,180 years old and I'm fine. I'm qualified and doing this. I'm going to take it apart. 191 00:15:07,180 --> 00:15:12,089 Here's the motherboard. Quite a nice clean design. This was built in Wales for anyone 192 00:15:12,089 --> 00:15:17,510 that's been to the UK. Very unusual these days. Anything to be built in the UK. It's 193 00:15:17,510 --> 00:15:23,420 got several main sections around these four chips. Remember the Steve photo 194 00:15:23,420 --> 00:15:29,470 earlier on? This is the chip set: the ARM BMC, PDC, IOC. So the IOC side of things 195 00:15:29,470 --> 00:15:34,090 happens over on the left video and sound in the top right. And the memory and the 196 00:15:34,090 --> 00:15:38,399 processor in the middle. It's got a megabyte onboard and you can plug in an 197 00:15:38,399 --> 00:15:43,640 expansion for 4 MB. So memory map from the software view. I mentioned this 198 00:15:43,640 --> 00:15:46,930 26-bit addressing and I think this is one of the key characteristics of one of these 199 00:15:46,930 --> 00:15:52,210 machines. So you have a 64MB address space, it's quite packed. That's quite a 200 00:15:52,210 --> 00:15:56,980 lot of stuff shoehorned into here. So there's the memory. The bottom half of the 201 00:15:56,980 --> 00:16:02,040 address space, 32MB of that is the processor. It's got user space and 202 00:16:02,040 --> 00:16:08,100 privilege mode. It's got a concept of privilege within the processor execution. 203 00:16:08,100 --> 00:16:11,851 So when you're in user mode, you only get to see the bottom half and that's the 204 00:16:11,851 --> 00:16:16,250 virtual maps. There's the MMU, that will map pages into that space and then when 205 00:16:16,250 --> 00:16:18,980 you're in supervisor mode, you get to see the whole of the rest of the memory, 206 00:16:18,980 --> 00:16:23,380 including the physical memory and various registers up the top. The thing to notice 207 00:16:23,380 --> 00:16:27,460 here is: there's stuff hidden behind the ROM, this address space is very packed 208 00:16:27,460 --> 00:16:31,390 together. So there's a requirement for control registers, for the memory 209 00:16:31,390 --> 00:16:34,770 controller, for the video controller and so on, and they write only registers in 210 00:16:34,770 --> 00:16:39,700 ROM basically. So you write to the ROM and you get to hit these registers. Kind of 211 00:16:39,700 --> 00:16:43,730 weird when you first see it, but it was quite a clever way to fit this stuff into 212 00:16:43,730 --> 00:16:50,810 the address space. So it will start with the ARM1. So Sophie Wilson designed the 213 00:16:50,810 --> 00:16:59,070 instruction set late 1983, Steve took the instruction set and designed the top 214 00:16:59,070 --> 00:17:02,880 level, the block, the micro architecture of this processor. So this is the data 215 00:17:02,880 --> 00:17:08,140 path and how the control logic works. And then the VLSI team, then implemented this, 216 00:17:08,140 --> 00:17:12,420 did their own custom cells. There's a custom data path and custom logic 217 00:17:12,420 --> 00:17:18,179 throughout this. It took them about a year, all in. Well, 1984, that sort of... 218 00:17:18,179 --> 00:17:23,832 This project A really kicked off early 1984. And this staked out first thing 219 00:17:23,832 --> 00:17:34,690 early 1985. The design process the guys gave me a little bit of... So Jamie 220 00:17:34,690 --> 00:17:40,800 Urquhart and John Biggs gave me a bit of an insight into how they worked on the 221 00:17:40,800 --> 00:17:46,870 VLSI side of things. So they had an Apollo workstation, just one Apollo workstation, 222 00:17:46,870 --> 00:17:51,760 the DN600. This is a 68K based washing machine, as Jamie described it. It's this 223 00:17:51,760 --> 00:17:56,180 huge thing. It cost about 50˙000 £. It's incredibly expensive. And they 224 00:17:56,180 --> 00:18:00,220 designed all of this with just one of these workstations. Jamie got in at 5:00 225 00:18:00,220 --> 00:18:04,060 a.m., worked until the afternoon and then let someone else on the machine. So they 226 00:18:04,060 --> 00:18:06,760 shared the workstation, they worked shifts so that they could design this 227 00:18:06,760 --> 00:18:10,020 whole thing on one workstation. So this comes back to that. It was designed on a 228 00:18:10,020 --> 00:18:13,660 bit of a shoestring budget. When they got a couple of other workstations later on in 229 00:18:13,660 --> 00:18:17,760 the projects, there was an allegation that the software might not have been licensed 230 00:18:17,760 --> 00:18:21,950 initially on the other workstations and the CAD software might have been. I can 231 00:18:21,950 --> 00:18:28,450 neither confirm nor deny whether that's true. So Steve wrote a BBC Basic 232 00:18:28,450 --> 00:18:33,300 simulator for this. When he's designing this block level micro architecture run on 233 00:18:33,300 --> 00:18:38,750 his BBC Micro. So this could then run real software. There could be a certain amount 234 00:18:38,750 --> 00:18:41,570 of software development, but then they could also validate that the design was 235 00:18:41,570 --> 00:18:46,820 correct. There's no cache on this. This is a quite a large chip. 50 square 236 00:18:46,820 --> 00:18:52,290 millimeters was the economic limit of those days for this part of the market. 237 00:18:52,290 --> 00:18:56,100 There's no cache. That also would have been far too complicated. So this was 238 00:18:56,100 --> 00:19:03,120 also, I think, quite a big risk, no pun intended. The aim of doing this 239 00:19:03,120 --> 00:19:07,620 with such a small team that they're all very clever people. But they hadn't all 240 00:19:07,620 --> 00:19:11,490 got experience in building chips before. And I think they knew what they were up 241 00:19:11,490 --> 00:19:15,100 against. And so not having a cache of complicated things like that was the right 242 00:19:15,100 --> 00:19:20,910 choice to make. I'll show you later that that didn't actually affect things. So 243 00:19:20,910 --> 00:19:24,810 this was a RISC machine. If anyone has not programmed ARM in this room then get out 244 00:19:24,810 --> 00:19:29,400 at once. But if you have programed ARM this is quite familiar with some 245 00:19:29,400 --> 00:19:36,210 differences. It's a classical three operand RISC, its got three shift on one of 246 00:19:36,210 --> 00:19:38,790 the operands for most of the instructions. So you can do things like static 247 00:19:38,790 --> 00:19:43,820 multiplies quite easily. It's not purist RISC though. It does have load or store 248 00:19:43,820 --> 00:19:47,980 multiple instructions. So these will, as the name implies, load or store multiple 249 00:19:47,980 --> 00:19:51,460 number of registers in one go. So one register per cycle, but it's all done 250 00:19:51,460 --> 00:19:54,970 through one instruction. This is not RISC. Again, there's a good reason for doing 251 00:19:54,970 --> 00:19:59,300 that. So when one comes back and it gets plugged into a board that looks a bit like 252 00:19:59,300 --> 00:20:07,400 this. This is called the A2P, the ARM second processor. It plugs into a BBC Micro. It's 253 00:20:07,400 --> 00:20:11,280 basically there's a thing called the Tube, which is sort of a FIFO like arrangement. 254 00:20:11,280 --> 00:20:15,230 The BBC Micro can send messages one way and this can send messages back. And the 255 00:20:15,230 --> 00:20:20,250 BBC Micro has the discs, it has the I/O, keyboard and so on. And that's used as the 256 00:20:20,250 --> 00:20:23,960 hosts to then download code into one megabytes of RAM up here and then you 257 00:20:23,960 --> 00:20:29,010 combine the code on the ARM. So this was the initial system, 6 MHz. The 258 00:20:29,010 --> 00:20:32,350 thing I found quite interesting about this, I mentioned that Steve had built 259 00:20:32,350 --> 00:20:37,200 this BBC Basic simulation, one of the early bits of software that could run on 260 00:20:37,200 --> 00:20:41,870 this. So he'd ported BBC Basic to ARM and written an ARM version of it. The Basic 261 00:20:41,870 --> 00:20:47,780 interpreter was very fast, very lean, and it was running on this board early on. 262 00:20:47,780 --> 00:20:51,750 They then built a simulator called ASIM, which was an event based simulator for 263 00:20:51,750 --> 00:20:55,240 doing logic design and all of the other chips in the chips on the chipset that 264 00:20:55,240 --> 00:20:59,020 were simulated using ASIM on ARM1 which is quite nice. So this was the fastest 265 00:20:59,020 --> 00:21:02,480 machine that they had around. They didn't have, you know, the thousands of machines 266 00:21:02,480 --> 00:21:07,730 in the cluster like you'd have in a modern company doing EDA. They had 267 00:21:07,730 --> 00:21:11,370 a very small number of machines and these were the fastest ones they had about. So 268 00:21:11,370 --> 00:21:17,450 ARM2 was simulated on ARM1 and all the other chipset. So then ARM2 comes along. 269 00:21:17,450 --> 00:21:21,590 So it's a year later, this is a shrink of the design. It's based on the same basic 270 00:21:21,590 --> 00:21:26,000 micro architecture but has a multiplier now. It's a booth multiplier , so it is at 271 00:21:26,000 --> 00:21:32,090 worst case, 16 cycle, multiply just two bits per clock. Again, no cache. But one 272 00:21:32,090 --> 00:21:36,950 thing they did add in on to is banked registers. Some of the processor modes I 273 00:21:36,950 --> 00:21:42,940 mentioned there's an interrupt mode. Next slide, some of the processor modes will 274 00:21:42,940 --> 00:21:47,960 basically give you different view on registers, which is very useful. These 275 00:21:47,960 --> 00:21:51,090 were all validated at 8 MHz. So the product was designed for 8 MHz. 276 00:21:51,090 --> 00:21:54,020 The company that built them said, okay, put the stamp on the outside 277 00:21:54,020 --> 00:21:57,681 saying 8 MHz. There's two versions of this chip and I think they're 278 00:21:57,681 --> 00:22:01,390 actually the same silicon. I've got a suspicion that they're the same. They just 279 00:22:01,390 --> 00:22:05,420 tested this batch saying that works at 10 or 12. So on my project list is 280 00:22:05,420 --> 00:22:12,270 overclocking my A3000 to see how fast it'll go and see if I can get it to 12 MHz. 281 00:22:12,270 --> 00:22:18,559 Okay. So the banking of the registers. ARM has got this even modern 32 bit 282 00:22:18,559 --> 00:22:25,060 type of interrupts and an IRQ pronounced "erk" in English and FIQ 283 00:22:25,060 --> 00:22:28,559 pronounced "fic" in English. I appreciate it doesn't mean quite the same thing in 284 00:22:28,559 --> 00:22:34,290 German. So I call if FIQ from here on in and FIQ mode has this property where 285 00:22:34,290 --> 00:22:37,830 the top half of the registers are effectively different registers when you get into 286 00:22:37,830 --> 00:22:42,670 this mode. So this lets you first of all you don't have to back up those registers. 287 00:22:42,670 --> 00:22:47,950 I mean your FIQ handler. And secondly if you can write an FIQ handler 288 00:22:47,950 --> 00:22:51,970 using just those registers and there's enough for doing most basic tasks, you 289 00:22:51,970 --> 00:22:55,940 don't have to save and restore anything when you get an interrupt. So this is 290 00:22:55,940 --> 00:23:02,510 designed specifically to be very, very low overhead interrupt mode. So I'm coming to 291 00:23:02,510 --> 00:23:07,890 why there's a 26 bit address space. And so I found this link very unintuitive. So 292 00:23:07,890 --> 00:23:13,520 unlike 32 bit ARM, the more modern 1990s onwards ARMs, the program counter 293 00:23:13,520 --> 00:23:17,020 register 15 doesn't just contain the program counter, but also contains the 294 00:23:17,020 --> 00:23:20,420 status flags and processor mode and effectively all of the machine state is 295 00:23:20,420 --> 00:23:24,200 packed in there as well. So I asked the question, well why, why 64 megabytes of 296 00:23:24,200 --> 00:23:27,700 address space? What's special about 64. And Mike told me, well, you're asking the 297 00:23:27,700 --> 00:23:31,980 wrong question. It's the other way round. What we wanted was this property that all 298 00:23:31,980 --> 00:23:35,990 of the machine state is in one register. So this means you just have to save one 299 00:23:35,990 --> 00:23:40,000 register. Well, you know, what's the harm in saving two registers? And he reminded 300 00:23:40,000 --> 00:23:43,490 me of this FIQ mode. Well, if you're already in a state where you've really 301 00:23:43,490 --> 00:23:47,890 optimized your interrupt handler so that you don't need any other registers to deal 302 00:23:47,890 --> 00:23:51,390 with, you're not saving restoring anything apart from your PC, then saving another 303 00:23:51,390 --> 00:23:56,000 register is 50 percent overhead on that operation. So that was the prime motivator 304 00:23:56,000 --> 00:24:00,500 was to keep all of the state in one word. And then once you take all of the flags 305 00:24:00,500 --> 00:24:04,600 away, you're left with 24 bits for a word aligned program counter, which leads to 306 00:24:04,600 --> 00:24:09,799 26 bit addressing. And that was then seen as well, 64 MB is enough. There were 307 00:24:09,799 --> 00:24:14,690 machines in 1985 that, you know, could conceivably have more memory than that. 308 00:24:14,690 --> 00:24:18,260 But for a desktop that was still seen as a very large, very expensive amount of 309 00:24:18,260 --> 00:24:24,450 memory. The other thing, you don't need to reinvent another instruction to do 310 00:24:24,450 --> 00:24:28,170 return from exception so you can return using one of your existing instructions. 311 00:24:28,170 --> 00:24:32,740 In this case, it's the subtract into PC which looks a bit strange, but trust me, 312 00:24:32,740 --> 00:24:39,030 that does the right thing. So the memory controller. This is - I mentioned the 313 00:24:39,030 --> 00:24:43,040 address translation, so this has an MMU in it. In fact, the thing directly on the 314 00:24:43,040 --> 00:24:46,080 left hand side. I was worried that these slides actually might 315 00:24:46,080 --> 00:24:49,520 not be the right resolution and they might be sort of too small for people to see 316 00:24:49,520 --> 00:24:53,570 this. And in fact, it's the size of a house is really useful here. So the left 317 00:24:53,570 --> 00:24:58,500 hand side of this chip is the MMU. This chip is the same size as ARM2. Yeah, 318 00:24:58,500 --> 00:25:02,380 pretty much. So that's part of the reason why the MMU is on another chip ARM2 was 319 00:25:02,380 --> 00:25:06,610 as big as they could make it to fit the price as you don't have anyone here done 320 00:25:06,610 --> 00:25:10,810 silicon design. But as the area goes up effectively your yield goes down and 321 00:25:10,810 --> 00:25:14,690 the price it's a non-linear effect on price. So the MMU had to be on a separate 322 00:25:14,690 --> 00:25:19,910 chip and it's half the size of that as well. MEMC does most mundane things 323 00:25:19,910 --> 00:25:23,920 like it drives DRAM, it does refresh for DRAM and it converts from linear addresses 324 00:25:23,920 --> 00:25:33,799 into row and column addresses which DRAM takes. So the key thing about this 325 00:25:33,799 --> 00:25:39,090 ARM and MEMC binding is the key factor of performance is making use of memory 326 00:25:39,090 --> 00:25:43,740 bandwidth. When the team had looked at all the other processors in Project A before 327 00:25:43,740 --> 00:25:49,380 designing their own, one of the things they looked at was how well they utilized 328 00:25:49,380 --> 00:25:56,320 DRAM and 68K and the semi chips made very, very poor use of DRAM bandwidth. 329 00:25:56,320 --> 00:25:59,940 Steve said, well, okay. The DRAM is the most expensive component of any of these 330 00:25:59,940 --> 00:26:04,280 machines and they're making poor use of it. And I think a key insight here is if 331 00:26:04,280 --> 00:26:07,740 you maximize that use of the DRAM, then you're going to be able to get much higher 332 00:26:07,740 --> 00:26:13,490 performance in those machines. And so it's 32 bits wide. The ARM is pipelined, so it can 333 00:26:13,490 --> 00:26:18,730 do a 32 bit word every cycle. And it also indicates whether it's sequential or non 334 00:26:18,730 --> 00:26:25,250 sequential addressing. This then lets your MEMC 335 00:26:25,250 --> 00:26:31,200 decide whether to do an N cycle or an S cycle. So there's a fast one and a slow 336 00:26:31,200 --> 00:26:35,220 one basically. So when you access a new random address and DRAM, you have to open 337 00:26:35,220 --> 00:26:40,710 that row and that takes twice the time. It's a 4 MHz cycle. But then once 338 00:26:40,710 --> 00:26:45,150 you've access that address and then once you're accessing linearly ahead of that 339 00:26:45,150 --> 00:26:49,599 address, you can do fast page mode accesses, which are 8 MHz cycles. 340 00:26:49,599 --> 00:26:54,030 So ultimately, that's the reason why these load store multiples exist. The 341 00:26:54,030 --> 00:26:57,820 non-RISC instructions, they're there so that you can stream out registers and back 342 00:26:57,820 --> 00:27:03,100 in and make use of this DRAM bandwidth. So store multiple. This is just a simple 343 00:27:03,100 --> 00:27:07,860 calculation for 14 registers, you're hitting about 25 megabytes a second out of 344 00:27:07,860 --> 00:27:13,083 30. So this is it's not 100%, but it's way more than a 10th or an 8th. 345 00:27:13,083 --> 00:27:16,880 Which a lot of the other processors were using. So this was really good. This 346 00:27:16,880 --> 00:27:21,170 is the prime factor of why this machine was so fast. It's effectively the load store 347 00:27:21,170 --> 00:27:28,069 multiple instructions and being able to access the stuff linearly. So the MMU is 348 00:27:28,069 --> 00:27:36,980 weird. It's not TLB in the traditional sense, so TLB's today, if you take your 349 00:27:36,980 --> 00:27:43,040 MIPS chip or something where the TLB is visible to software, it will map a virtual 350 00:27:43,040 --> 00:27:47,760 address into a chosen physical address and you'll have some number of entries and you 351 00:27:47,760 --> 00:27:53,880 more or less arbitrarily, you know, poke an entry and with the set mapping in it. 352 00:27:53,880 --> 00:27:57,789 The MEMC does it upside down. So it says it's got a fixed number of entries for every 353 00:27:57,789 --> 00:28:02,380 page in DRAM. And then for each of those entries, it checks an incoming address to 354 00:28:02,380 --> 00:28:08,600 see whether it matches. So it has all of those entries that we've showed on the 355 00:28:08,600 --> 00:28:13,500 chip diagram a couple of slides ago. That big left hand side had that big array. All 356 00:28:13,500 --> 00:28:16,831 of those effectively just storing a virtual address and then matching it and 357 00:28:16,831 --> 00:28:20,030 have a comparator. And then one of them lights up and says yes, it's mine. So 358 00:28:20,030 --> 00:28:24,551 effectively, the aphysical page says that virtual address is mine instead of the 359 00:28:24,551 --> 00:28:30,030 other way round. So this also limits your memory. If you're saying I have to have 360 00:28:30,030 --> 00:28:34,480 one of these entries on chip per page of physical memory and you don't want pages 361 00:28:34,480 --> 00:28:40,720 to be enormous. The 32 K if you do the maths is 4 MB over 128 pages, it's a 362 00:28:40,720 --> 00:28:44,460 32K page. If you don't want the page to get much bigger than that and trust me you 363 00:28:44,460 --> 00:28:47,890 don't, then you need to add more of these entries and it's already half the size of 364 00:28:47,890 --> 00:28:52,540 the chip. So effectively, this is one of the limits of why you can only have 4 MB 365 00:28:52,540 --> 00:28:58,360 on one of these memory controller chips. OK. So VIDC is the core 366 00:28:58,360 --> 00:29:05,230 of the video and sound system. It's a set of FIFOs and a set of shift digital analog 367 00:29:05,230 --> 00:29:09,970 converters for doing video and sound. You stream stuff into the FIFOs and it does 368 00:29:09,970 --> 00:29:14,850 the display timing and pallet lookup and so forth. It has an 8 bit mode I 369 00:29:14,850 --> 00:29:21,210 mentioned. It's slightly strange. It also has an output for transparency bit. So in 370 00:29:21,210 --> 00:29:23,830 your palette you can set 12 bits of color, but you can set a bit of 371 00:29:23,830 --> 00:29:31,580 transparency as well so you can do video gen- looking quite easily with this. So 372 00:29:31,580 --> 00:29:36,701 there was a revision later on Tudor explains that the very first one had a bit 373 00:29:36,701 --> 00:29:41,230 of crosstalk between the video and the sound, so you'd get sound with noise on 374 00:29:41,230 --> 00:29:45,480 it. That was basically video noise and it's quite hard to get rid of. And so they 375 00:29:45,480 --> 00:29:50,000 did this revision and the way he fixed it was quite cool. They shuffled the power 376 00:29:50,000 --> 00:29:53,690 supply around and did all the sensible engineering things. But he also filtered 377 00:29:53,690 --> 00:29:58,050 out a bit of the noise that is being output on the sound. He 378 00:29:58,050 --> 00:30:02,630 inverted it and then fed that back in as the reference current for the DACs. So that 379 00:30:02,630 --> 00:30:06,090 sort of self compensating and took the noise a bit like the noise canceling 380 00:30:06,090 --> 00:30:13,239 headphones. It was kind of a nice hack. And that was that was VIDC1. OK, the final 381 00:30:13,239 --> 00:30:17,700 one, I'm going to stop showing you chip plots after this, unfortunately, but just 382 00:30:17,700 --> 00:30:20,980 get your fill while we're here. And again, I'm really glad this is enormous for the 383 00:30:20,980 --> 00:30:25,590 people in the room and maybe those zooming in online. There's a cool little 384 00:30:25,590 --> 00:30:29,510 Illuminati eye logo in the bottom left corner. So I feared that you weren't gonna 385 00:30:29,510 --> 00:30:34,010 be able to see and I didn't have time to do zoomed in version, but. Okay. So IOC 386 00:30:34,010 --> 00:30:37,720 is the center of the IO system as much of the IO system as possible, all the random 387 00:30:37,720 --> 00:30:41,030 bits of glue logic to do things like timing. Some peripherals are slower than 388 00:30:41,030 --> 00:30:47,309 others lives in IOC. It contains a UART for the keyboard, so the keyboard is 389 00:30:47,309 --> 00:30:52,020 looked after by an 8051 microcontroller. Just nice and easy, you don't have to do scanning 390 00:30:52,020 --> 00:30:57,429 in software. This microcontroller just sends stuff up of serial port to this chip. So 391 00:30:57,429 --> 00:31:02,039 UART keyboard, asynchronous receiver and transmitter. It was at one point called 392 00:31:02,039 --> 00:31:06,080 the fast asynchronous receiver and transmitter. Mike got forced to change the 393 00:31:06,080 --> 00:31:11,900 name. Not everyone has a 12 year old sense of humor, but I admire his spirit. So the 394 00:31:11,900 --> 00:31:15,630 other thing it does is interrupts all the interrupts go into IOC and it's got masks 395 00:31:15,630 --> 00:31:20,341 and consolidates them effectively for sending an interrupt up to the on the ARM. 396 00:31:20,341 --> 00:31:24,690 The ARM can then check the status and do fast response to it. So the eye of providence 397 00:31:24,690 --> 00:31:27,540 there, the little logo I pointed out, Mike said he put that in for future 398 00:31:27,540 --> 00:31:35,799 archaeologists to wonder about. Okay. That was it. I was hoping there'd be 399 00:31:35,799 --> 00:31:39,440 this big back story about, you know, he was in the Illuminati or something. Maybe 400 00:31:39,440 --> 00:31:44,690 he is, but not allowed to say anyway. So just like the other dev board I showed you so 401 00:31:44,690 --> 00:31:49,930 this one's A 500 2P, it's still a second processor that plugs into a BBC Micro. 402 00:31:49,930 --> 00:31:54,460 It's still got this host having disk drives and so forth attached to it and 403 00:31:54,460 --> 00:32:00,289 pushing stuff down the tube into the memory here. But now, finally 404 00:32:00,289 --> 00:32:04,730 all of this, the chip set now assembled in one place. So this is 405 00:32:04,730 --> 00:32:08,100 starting to look like an Archimedes. It got video out. It's got keyboard 406 00:32:08,100 --> 00:32:11,620 interface. It's got some expansion stuff. So this is bring up an early software 407 00:32:11,620 --> 00:32:17,720 headstart. But very shortly afterwards, we got the a five A500 internal to Acorn. And 408 00:32:17,720 --> 00:32:21,460 this is really the first Archimedes. This is the prototype Archimedes. Actually got 409 00:32:21,460 --> 00:32:27,300 a gorgeous gray brick sort of look to it, kind of concrete. It weighs like concrete, 410 00:32:27,300 --> 00:32:31,480 too, but it has all the hallmarks. It's got the IO interfaces, it's got the 411 00:32:31,480 --> 00:32:36,810 expansion slots. You can see at the back. It's got all, it runs the same operating 412 00:32:36,810 --> 00:32:39,550 system. Now, this was used for the OS development. There's only a couple of 413 00:32:39,550 --> 00:32:44,540 hundred of these made. Well, this is a serial 222. So this is one of the last, 414 00:32:44,540 --> 00:32:50,730 I think. But yeah. Only an internal to ACORN. There are lots of nice tweaks to this 415 00:32:50,730 --> 00:32:55,700 machine. So the hardware team had designed this, Tudor designed this as well as the 416 00:32:55,700 --> 00:33:01,390 video system. And he said, well, his A500 was the special one that he had a video 417 00:33:01,390 --> 00:33:05,409 controller. He'd hand-picked one of the VCs so that instead of running 418 00:33:05,409 --> 00:33:10,855 at 24 MHz to run at 56, so some silicon variations in manufacturer. So he found a 419 00:33:10,855 --> 00:33:16,169 56 MHz part so he could do. I think it was 1024 x 768, which is way out 420 00:33:16,169 --> 00:33:22,400 of respect for the rest of the Archimedes. So he had the really, really cool machine. 421 00:33:22,400 --> 00:33:26,050 They also ran some of them at 12 MHz as well instead of 8. This is a massive 422 00:33:26,050 --> 00:33:30,500 performance improvement. I think it used expensive memory, which is kind of out of 423 00:33:30,500 --> 00:33:37,180 reach for the product. Right. So believe me, this is the simplified 424 00:33:37,180 --> 00:33:41,240 circuit diagram. The technical reference manuals are available online if anyone wants 425 00:33:41,240 --> 00:33:47,969 the complicated one. The main parts of the display are ARM, MEMC, VIDC and some RAM 426 00:33:47,969 --> 00:33:52,049 and we have a little walk through them. So the clocks are generated actually by the 427 00:33:52,049 --> 00:33:56,815 memory controller. Memory controller gives the clocks to the ARM. The main reason for 428 00:33:56,815 --> 00:34:00,327 this is that the memory controller has to do some slow things now and then. It has 429 00:34:00,327 --> 00:34:05,860 to open pages of DRAMs, refresh cycles and things. So it stops the CPU and generates 430 00:34:05,860 --> 00:34:11,559 the clock and it pauses the CPU by stopping that clock from time to time. 431 00:34:11,559 --> 00:34:15,929 When you do a DRAM access, your adress on bus along the top, the ARM outputs an 432 00:34:15,929 --> 00:34:19,720 address that goes into the MEMC. The MEMC then converts that, it does an address 433 00:34:19,720 --> 00:34:23,339 translation and then it converts that into a row and column addresses suitable for 434 00:34:23,339 --> 00:34:27,139 DRAM. And then if you're doing a read DRAM outputs the address, outputs the data 435 00:34:27,139 --> 00:34:33,419 onto the data bus, which ARM then sees. MEMC is the the critical path on 436 00:34:33,419 --> 00:34:37,109 this, but the address flows through MEMC effectively. Notice that MEMC is not on 437 00:34:37,109 --> 00:34:41,329 the data bus. It just gets addresses flowing through it, this is important later 438 00:34:41,329 --> 00:34:44,892 on. ROM is another slow thing. 439 00:34:44,892 --> 00:34:49,204 Another reason why MEMC might slow down the access from the CPU, it works in a 440 00:34:49,204 --> 00:34:54,099 similar sort of way. There is also a permission check done when you're doing 441 00:34:54,099 --> 00:35:00,259 the address translation per... user permission versus OS, a supervisor. 442 00:35:00,259 --> 00:35:05,356 And so this information is output as part of the cycle when the ARM does that access. 443 00:35:05,356 --> 00:35:09,730 If you miss in that translation, you get a page fault or permission fault, then an 444 00:35:09,730 --> 00:35:13,391 abort signal comes back and you take an exception. 445 00:35:13,391 --> 00:35:17,410 And the ARM deals with that in software. 446 00:35:17,410 --> 00:35:22,289 The data bus is a critical path, and so the IO stuff is buffered, it is kept away 447 00:35:22,289 --> 00:35:27,599 from that. So the IO bus is 16 bits and not a lot 32 bit peripherals were around 448 00:35:27,599 --> 00:35:32,599 in those days. All the peripherals 8 or 16 bits. So that's the right thing to do. 449 00:35:32,599 --> 00:35:36,150 The IOC decodes that and there's a handshake with MEMC. If it needs more 450 00:35:36,150 --> 00:35:39,809 time, if it's accessing one of the expansion cards and the expansion card 451 00:35:39,809 --> 00:35:47,691 has something slow on it then that's dealt with in the IOC. So I mentioned the 452 00:35:47,691 --> 00:35:53,680 interrupt status that gets funneled into IOC and then back out again. There's a 453 00:35:53,680 --> 00:35:57,599 VSync interrupt, but not an HSync interrupt. You have to use timers for that, 454 00:35:57,599 --> 00:36:01,500 really annoyingly. There's one timer and there's a 2 MHz timer available. I 455 00:36:01,500 --> 00:36:05,199 think I had that in a previous slide, forgot to mention it. So if you want to 456 00:36:05,199 --> 00:36:09,730 do funny palette switching stuff or copper bars or something - that's possible with the 457 00:36:09,730 --> 00:36:13,400 timers, it's also simple hardware mod to make a real HSync interrupt as well. 458 00:36:13,400 --> 00:36:18,529 There's some spare interrupt inputs on the IOC as an exercise for you . So the bit I 459 00:36:18,529 --> 00:36:23,440 really like about this system, I mentioned that MEMC is not on the data bus. The VIDC 460 00:36:23,440 --> 00:36:28,079 is only on the data bus and it doesn't have an address bus either. The VIDC is the 461 00:36:28,079 --> 00:36:31,200 thing responsible for turning the frame buffer into video, reading that frame 462 00:36:31,200 --> 00:36:35,509 buffer out of RAM, so on. So how does it actually do that RAM read without the 463 00:36:35,509 --> 00:36:40,780 address? Well, the MEMC contains all of the registers for doing this DMA: the 464 00:36:40,780 --> 00:36:44,970 start of the frame buffer, the current position and size, and so on. They all 465 00:36:44,970 --> 00:36:51,410 live in the MEMC. So there's a handshake where VIDC sends a request up to the MEMC. 466 00:36:51,410 --> 00:36:55,239 When it's FIFO gets low, the MEMC then actually generates the address into the 467 00:36:55,239 --> 00:37:01,102 DRAM, DRAM outputs that data and then the MEMC, gives an acknowledge 468 00:37:01,102 --> 00:37:05,509 to the ARM Excuse me - too many chips. The MEMC gives an acknowledged to 469 00:37:05,509 --> 00:37:11,210 VIDC, which then latches that data into the FIFO. So this partitioning is 470 00:37:11,210 --> 00:37:16,710 quite neat. A lot of the video, DMA. The video DMA stuff all lives in MEMC and 471 00:37:16,710 --> 00:37:20,799 there's this kind of split across the two chips. The sound one I've just 472 00:37:20,799 --> 00:37:24,839 highlighted one interrupt that comes from MEMC. Sound works exactly the same way, 473 00:37:24,839 --> 00:37:27,730 except there's a double buffering scheme that goes on. And when one half of it 474 00:37:27,730 --> 00:37:32,359 becomes empty, you get an interrupt so you can refill that so you don't glitch your 475 00:37:32,359 --> 00:37:39,700 sound. So this all works really very smoothly. So finally the high res- mono 476 00:37:39,700 --> 00:37:44,509 thing that I mentioned before is quite novel way they did that. Tudor had realized 477 00:37:44,509 --> 00:37:49,931 that with one external component to the shift register and running very fast, he 478 00:37:49,931 --> 00:37:53,400 could implement this very high resolution mode without really affecting the rest of 479 00:37:53,400 --> 00:37:59,276 the chip. So VIDC still runs at 24 MHz to sort of VGA resolution. It 480 00:37:59,276 --> 00:38:05,290 outputs on a digital bus that was a test board, originally. It outputs 4 bits. So 4 481 00:38:05,290 --> 00:38:09,420 pixels in one chunk at 24 MHz and this external component then shifts 482 00:38:09,420 --> 00:38:13,880 through that 4 times the speed. There's one component. I mean, this is a 483 00:38:13,880 --> 00:38:17,569 very cheap way of doing this. And as I said, this high res- mode is very 484 00:38:17,569 --> 00:38:23,009 unusual for machines of this era. I've got a feeling an A500 the top end 485 00:38:23,009 --> 00:38:26,979 machine, if anyone's got one of these and wants to try this trick and please get in 486 00:38:26,979 --> 00:38:31,080 touch, I've got a feeling an A500 will do 1280 x 1024 by 487 00:38:31,080 --> 00:38:35,750 overclocking this. I think all of the parts survive it. But for some reason, 488 00:38:35,750 --> 00:38:40,369 ACORN didn't support that on the board. And finally, clock selection VIDC on 489 00:38:40,369 --> 00:38:44,839 some of the machines, quite flexible set of clocks for different resolutions, 490 00:38:44,839 --> 00:38:51,170 basically. So MEMC is not on the data bus. How do we program it? It's got registers 491 00:38:51,170 --> 00:38:55,259 for DMA and it's got all this address translation. So the memory map I showed 492 00:38:55,259 --> 00:39:00,909 before has an 8 MB space reserved for the address translation registers. It 493 00:39:00,909 --> 00:39:04,690 doesn't have 8 MB of it. I mean, doesn't have two million... 32 bit registers 494 00:39:04,690 --> 00:39:09,819 behind there, which is a hint of what's going on here. So what you do is you write 495 00:39:09,819 --> 00:39:14,410 any value to this space and you encode the information that you want to put into one 496 00:39:14,410 --> 00:39:19,539 of these registers in the address. So this address, the top three bits are 1 - it's 497 00:39:19,539 --> 00:39:25,230 in the top 8 MB of the 64 MB address space and you format your 498 00:39:25,230 --> 00:39:28,999 logical physical page information in this address and then you write any byte 499 00:39:28,999 --> 00:39:35,479 effectively. This sort of feels really dirty, but also really a very nice 500 00:39:35,479 --> 00:39:39,779 way of doing it because there's no other space in the address map. And this reads 501 00:39:39,779 --> 00:39:45,069 to the the price balance. So it's not worth having an address bus going into 502 00:39:45,069 --> 00:39:49,809 MEMC costing 32 more pins just to write these registers as opposed to playing this 503 00:39:49,809 --> 00:39:55,849 sort of trick. If you have that address bus just for that data bus, just for 504 00:39:55,849 --> 00:39:59,990 that, then you have to get to a more expensive package. And this was 505 00:39:59,990 --> 00:40:05,140 really in their minds: a 68 pin chip versus an 84 pin chip. It was a big deal. 506 00:40:05,140 --> 00:40:08,719 So everything they really strived to make sure it was in the very smallest 507 00:40:08,719 --> 00:40:13,250 package possible. And this system partitioning effort led to these sorts of 508 00:40:13,250 --> 00:40:22,890 tricks to then program it. So on the A540, we get multiple MEMCs. Each one is 509 00:40:22,890 --> 00:40:27,329 assigned a colored stripe here of the physical address space. So you have a 510 00:40:27,329 --> 00:40:31,049 16 MB space, each one looks after 4 MB of it. But then when you do a 511 00:40:31,049 --> 00:40:36,039 virtual access in the bottom half of the user space, regular program access, all of 512 00:40:36,039 --> 00:40:39,362 them light up and all of them will translate that address in parallel. And 513 00:40:39,362 --> 00:40:43,663 one of them hopefully will translate and then energize the RAM to do the read, for 514 00:40:43,663 --> 00:40:49,930 example. When you put an ARM 3 in this system, the ARM 3 has its cache and then 515 00:40:49,930 --> 00:40:54,420 the address leads into the MEMC. So then that means that the address is being 516 00:40:54,420 --> 00:40:58,240 translated outside of the cache or after the cache. So your caching virtual 517 00:40:58,240 --> 00:41:02,900 addresses and as we all know, this is kind of bad for performance because whenever 518 00:41:02,900 --> 00:41:07,459 you change that virtual address space, you have to invalidate your cache. Or tag it, 519 00:41:07,459 --> 00:41:11,459 but they didn't do that. There's other ways of solving this problem. Basically on this 520 00:41:11,459 --> 00:41:14,950 machine, what you need to do is invalidate the whole cache. It's quite a quick 521 00:41:14,950 --> 00:41:23,540 operation, but it's still not good for performance to have an empty cache. The 522 00:41:23,540 --> 00:41:28,393 only DMA present in the system is for the video, for the video and sound. I/O 523 00:41:28,393 --> 00:41:32,569 doesn't have any DMA at all. And this is another area where as younger engineer 524 00:41:32,569 --> 00:41:35,969 "crap, why didn't they have DMA? That would be way better." DMA is the solution 525 00:41:35,969 --> 00:41:40,989 to everyone's problems, as we all know. And I think the quote on the right 526 00:41:40,989 --> 00:41:47,390 ties in with the ACORN team's discovery that all of these other processes needed 527 00:41:47,390 --> 00:41:51,969 quite complex chipsets, quite expensive support chips. So the quote on the right 528 00:41:51,969 --> 00:41:56,539 says that if you've got some chips, that vendors will be charging more for their 529 00:41:56,539 --> 00:42:03,259 DMA devices even than the CPU. So not having dedicated DMA engine on board is a 530 00:42:03,259 --> 00:42:08,930 massive cost saving. The comment I made on the previous 2 slides about the system 531 00:42:08,930 --> 00:42:14,440 partitioning, putting a lot of attention into how many pins were on one chip versus 532 00:42:14,440 --> 00:42:19,380 another, how many buses were going around the place. Not having IOC having to access 533 00:42:19,380 --> 00:42:25,019 memory was a massive saving in cost for the number of pins and the system as a 534 00:42:25,019 --> 00:42:33,539 whole. The other thing is the FIQ mode was effectively the means for doing IO. 535 00:42:33,539 --> 00:42:37,999 Therefore, FIQ Mode was designed to be an incredibly low overhead way of doing 536 00:42:37,999 --> 00:42:44,010 programed IO, having the CPU do the IO. So this was saying that the CPU is 537 00:42:44,010 --> 00:42:48,850 going to be doing all of the IO stuff, but lets just optimize it, let's make it make 538 00:42:48,850 --> 00:42:53,930 it as good as it could be and that's what led to the programmed IO. I also 539 00:42:53,930 --> 00:42:57,849 remember ARM 2 didn't have a cache. If you don't have a cache on your CPU then 540 00:42:57,849 --> 00:43:03,099 DMA is going to hold up the CPU anyway, so no cycles. DMA is not any 541 00:43:03,099 --> 00:43:06,960 performance gain. You may as well get the CPU to do it and then get the CPU to 542 00:43:06,960 --> 00:43:13,029 do it in the lowest overhead way as possible. I think this can be summarized as bringing 543 00:43:13,029 --> 00:43:17,410 the "RISC principles" to the system. So the RISC principle, say for your CPU, you 544 00:43:17,410 --> 00:43:21,420 don't put anything in the CPU that you can do in software and this is saying, okay, 545 00:43:21,420 --> 00:43:26,789 we'll actually software can do the IO just as well without a cache as the DMA 546 00:43:26,789 --> 00:43:29,799 system. So let's get software to do that. And I think this is a kind of a nice way 547 00:43:29,799 --> 00:43:34,339 of seeing it. This is part of the cost optimization for really very little 548 00:43:34,339 --> 00:43:39,910 degradation in performance compared to doing in hardware. So this is an IO card. 549 00:43:39,910 --> 00:43:43,380 The euro cards then nice and easy. The only thing I wanted to say here was this 550 00:43:43,380 --> 00:43:48,839 is my SCSI card and it has a ROM on the left hand side. And so. This is the 551 00:43:48,839 --> 00:43:53,731 expansion ROM basically many, many years before PCI made this popular. Your drivers 552 00:43:53,731 --> 00:43:58,950 are on this ROM. This is a SCSI disc plugging into this and you can plug this 553 00:43:58,950 --> 00:44:02,990 card in and then boot off the disk. You don't need any other software to make it 554 00:44:02,990 --> 00:44:07,670 work. So this is just a very nice user experience. There is no messing around 555 00:44:07,670 --> 00:44:11,690 with configuring IO windows or interrupts or any of the iSCSI sort of stuff that was 556 00:44:11,690 --> 00:44:17,869 going on at the time. So to summarize some of the the hardware stuff that we've seen, 557 00:44:17,869 --> 00:44:21,950 the ARM is pipelined and it has the load- store-multiple -instructions which make 558 00:44:21,950 --> 00:44:27,950 for a very high bandwidth utilization. That's what gives it its high performance. 559 00:44:27,950 --> 00:44:32,670 The machine was really simple. So attention to detail about separating, 560 00:44:32,670 --> 00:44:37,239 partitioning the work between the chips and reducing the chip cost as much as 561 00:44:37,239 --> 00:44:44,569 possible. Keeping that balanced was really a good idea. The machine was designed when 562 00:44:44,569 --> 00:44:49,400 memory and CPUs were about the same speed. So this is before that kind of flipped 563 00:44:49,400 --> 00:44:52,910 over. An 8 MHz ARM 2 was designed to use 8 MHz memory. 564 00:44:52,910 --> 00:44:56,509 There's no need to have a cache at all on there these days it sounds really crazy 565 00:44:56,509 --> 00:45:01,410 not to have a cache on the CPU, but if your memory is not that much slower than this 566 00:45:01,410 --> 00:45:07,809 is a huge cost saving, but it is also risk saving. This was the first real proper CPU. 567 00:45:07,809 --> 00:45:11,670 If we don't count ARM 1 to say ARM 1 was a test, but ARM 2 is that, you know, the 568 00:45:11,670 --> 00:45:16,490 first product CPU. And having a cache on that would have been a huge risk for a 569 00:45:16,490 --> 00:45:20,640 design team that hadn't dealt with the structures that complicated at that 570 00:45:20,640 --> 00:45:22,599 point. So that was the right thing to do, I think 571 00:45:22,599 --> 00:45:25,569 and I talked about DMA. I'm actually 572 00:45:25,569 --> 00:45:28,636 converse on this. I thought this was crap. And actually, I think this was a really 573 00:45:28,636 --> 00:45:33,319 good example of balanced design. What's the right tool for the job? Software is 574 00:45:33,319 --> 00:45:37,757 going to do the IO, so let's make sure that FIQ mode, it makes sure that 575 00:45:37,757 --> 00:45:44,640 there's low overhead as possible. We talked about system partitioning. The MMU. 576 00:45:44,640 --> 00:45:49,299 I still think it's weird and backward. I think there is a 577 00:45:49,299 --> 00:45:56,029 strong argument though that a more familiar TLB is a massively complicated 578 00:45:56,029 --> 00:45:59,339 compared to what they did here. And I think the main drive here was not just 579 00:45:59,339 --> 00:46:06,120 area on the chip, but also to make it much simpler to implement. So it worked. And I 580 00:46:06,120 --> 00:46:09,450 think this was they really didn't have that many shots of doing this. This wasn't 581 00:46:09,450 --> 00:46:14,779 a company or a team that could afford to have many goes at this product. And I 582 00:46:14,779 --> 00:46:20,660 think that says it all. I think they did a great job. Okay. So the OS story is a 583 00:46:20,660 --> 00:46:24,599 little bit more complicated. Remember, it's gonna be this office automation 584 00:46:24,599 --> 00:46:28,920 machine a bit like a Xerox star. Was going to have this wonderful high res mono mode 585 00:46:28,920 --> 00:46:33,729 and people gonna be laser printing from it. So just like Xerox PARC, Acorn started 586 00:46:33,729 --> 00:46:37,911 Palo Alto based research center. Californians and beanbags writing an 587 00:46:37,911 --> 00:46:43,319 operating system using a micro kernel in Modula-2 all of the trendy boxes ticked 588 00:46:43,319 --> 00:46:49,400 here for the mid 80s. It was by the sounds a very advanced operating system and it 589 00:46:49,400 --> 00:46:54,029 did virtual memory and so on, is very resource hungry, though. And it was never 590 00:46:54,029 --> 00:47:00,130 really very performant. Ultimately, the hardware got done quicker than the 591 00:47:00,130 --> 00:47:05,403 software. And after a year or two. Management got the jitters. Hardware was 592 00:47:05,403 --> 00:47:09,320 looming and said, well, next year we're going to have the computer ready. Where's 593 00:47:09,320 --> 00:47:13,170 the operating system? And the project got canned. And this is a real shame. I'd love 594 00:47:13,170 --> 00:47:16,599 to know more about this operating system. Virtually nothing is documented outside of 595 00:47:16,599 --> 00:47:21,569 Acorn. Even the people, I spoke to, didn't work on this. A bunch of people in 596 00:47:21,569 --> 00:47:25,250 California that kind of disappeared with it. So if anyone has this software 597 00:47:25,250 --> 00:47:29,259 archived anywhere, then get in touch. Computer Museum around the corner from me 598 00:47:29,259 --> 00:47:35,369 is raring to go on that. That'll be really cool thing to archive. So anyway, they 599 00:47:35,369 --> 00:47:39,979 had now a desperate situation. They had to go to Plan B, which was in under a year write 600 00:47:39,979 --> 00:47:43,239 an operating system for the machine that was on its way to being delivered. 601 00:47:43,239 --> 00:47:48,260 And it kind of shows Arthur was I mean, I think the team did a really good job in 602 00:47:48,260 --> 00:47:53,160 getting something out of the door in half a year, but it was a little bit flaky. 603 00:47:53,160 --> 00:47:57,160 RISC OS then a year later, developed from Arthur. I don't know if anyone's 604 00:47:57,160 --> 00:48:01,609 heard of RISC OS, but Arthur is very, very niche and basically got 605 00:48:01,609 --> 00:48:07,170 completely replaced by RISC OS because it was a bit less usable than RISC OS. 606 00:48:07,170 --> 00:48:12,059 Another really strong point that this had it's quite a big ROM. So 2 MB going 607 00:48:12,059 --> 00:48:17,400 up...sorry, 0,5 MB in the 80s going up to 2 MB in the early 90s. 608 00:48:17,400 --> 00:48:21,739 There's a lot of stuff in ROM. One of those things is BBC Basic 5. I know 609 00:48:21,739 --> 00:48:29,289 it's 2019, and I know Basic is basic, but BBC Basic is actually quite good. It has 610 00:48:29,289 --> 00:48:32,859 procedures and it's got support for all the graphics and sound. You could write GUI 611 00:48:32,859 --> 00:48:36,660 applications in Basic and a lot of people did. It's also very fast. So Sophie Wilson 612 00:48:36,660 --> 00:48:42,920 wrote this very, very optimized Basic interpreter. I talked about the modules 613 00:48:42,920 --> 00:48:45,589 and podules. This is the expansion ROM things. And a really great user 614 00:48:45,589 --> 00:48:50,589 experience there. But speaking of user experience, this was ARTHUR . I never used 615 00:48:50,589 --> 00:48:57,969 ARTHUR. I just dug out a ROM and had a play with it. It's bloody horrible. So that 616 00:48:57,969 --> 00:49:03,819 went away quickly. At the time also. So part of this emergency plan B was to take 617 00:49:03,819 --> 00:49:08,210 the Acorn soft team who were supposed to be writing applications for this and get 618 00:49:08,210 --> 00:49:12,079 them to quickly knock out an operating system. So at launch, basically, this is 619 00:49:12,079 --> 00:49:15,750 one of the only things that you could do with the machine. Had a great demo called 620 00:49:15,750 --> 00:49:20,569 Lander, of a great game called Zarch, which is 3D space. You could fly around, 621 00:49:20,569 --> 00:49:27,029 it didn't have serious business applications. And, you know, it was very 622 00:49:27,029 --> 00:49:31,079 there was not much you could do with this really expensive machine at launch and 623 00:49:31,079 --> 00:49:35,450 that really hurt it, I think. So let me get RISC OS 2 in 1988 and this is now 624 00:49:35,450 --> 00:49:42,219 looking less like a vomit sort of thing, much nicer machine. And then eventually 625 00:49:42,219 --> 00:49:46,749 RISC OS 3. It was drag and drop between applications. It's all multitasking, 626 00:49:46,749 --> 00:49:52,849 does outline font anti aliasing and so on. So just lastly, I want to 627 00:49:52,849 --> 00:49:55,769 quickly touch on the really interesting operating systems that ACORN had a Unix 628 00:49:55,769 --> 00:49:59,079 operating system. So as well as being a geek, I'm also UNIX geek and I've always 629 00:49:59,079 --> 00:50:04,609 been fascinated by RISCiX. These machines are astonishingly expensive. They were 630 00:50:04,609 --> 00:50:08,191 the existing Archimedes machines with a different sticker on. So that's A540 with 631 00:50:08,191 --> 00:50:14,850 a sticker on the front. And this OS was developed after the Archimedes was 632 00:50:14,850 --> 00:50:18,359 already designed at that point when this OS was being developed. So 633 00:50:18,359 --> 00:50:20,950 there's a lot of stuff about the hardware that wasn't quite right for a Unix 634 00:50:20,950 --> 00:50:26,230 operating system. 32K page size on a 4 megabyte machine really, really killed you 635 00:50:26,230 --> 00:50:29,900 in terms of your page cache and and that kind of thing. They turned this into a bit 636 00:50:29,900 --> 00:50:35,089 of an opportunity. At least they made good on some of this. There was a quite a novel 637 00:50:35,089 --> 00:50:42,380 online decompression scheme for you to demand a page- text from a binary 638 00:50:42,380 --> 00:50:46,170 and it would decompress into your 32K page, but it was stored in a 639 00:50:46,170 --> 00:50:53,659 sparse way on disk. So actually on disk use was a lot less than you'd expect. The 640 00:50:53,659 --> 00:50:56,638 only way it fit on some of the smaller machines. 641 00:50:56,638 --> 00:51:02,160 Also Acorn TechL the department that designed the cyber truck it turns out. 642 00:51:02,160 --> 00:51:06,228 This was their view of the A680, which is an unreleased workstation. 643 00:51:06,228 --> 00:51:08,940 I love this picture. I like that piece of cheese or 644 00:51:08,940 --> 00:51:13,379 cake as the mouse. That's my favorite part. But this is the real machine. So 645 00:51:13,379 --> 00:51:18,730 this is an unreleased prototype I found at the computer museum. It's notable. And 646 00:51:18,730 --> 00:51:22,130 it's got 2 MEMCs. It's got a 8MB of RAM. It's only designed to run RISC iX, 647 00:51:22,130 --> 00:51:26,099 the Unix operating system and has highres monitor only doesn't have color, who's 648 00:51:26,099 --> 00:51:30,279 designed to run frame maker and driver laser printers and be a kind of desktop 649 00:51:30,279 --> 00:51:35,249 publishing workstation. I've always been fascinated by RISC iX, as I said a while 650 00:51:35,249 --> 00:51:41,450 ago I hacked around on ArcEm for a while. I got it booting in ArcEm. I'd never seen 651 00:51:41,450 --> 00:51:46,640 this before. I never used a RISC iX machine. So there we go, it boots, it is 652 00:51:46,640 --> 00:51:51,130 multi-user. But wait, there's more. It has a really cool X-Server, a very fast one. I 653 00:51:51,130 --> 00:51:54,730 think Sophie Wilson again worked on the X server here. So it's very well 654 00:51:54,730 --> 00:51:58,019 optimized and very fast for a machine of its era. And it makes quite a nice little 655 00:51:58,019 --> 00:52:02,900 Unix workstation. It's quite a cool little system, by the way Tudor, the guy that 656 00:52:02,900 --> 00:52:07,099 designed the VIDC and the IO system called me a sado forgetting this working in 657 00:52:07,099 --> 00:52:14,150 there. That's my claim to fame. Finally, and I want to leave some time for 658 00:52:14,150 --> 00:52:19,510 questions. There's a lot of useful stuff in ROM. One of them is BBC Basic. Basic 659 00:52:19,510 --> 00:52:23,009 has an assembler so you can walk up to this machine with a floppy disk and write 660 00:52:23,009 --> 00:52:29,529 assembler has a special bit of syntax there and then you can just call it. And 661 00:52:29,529 --> 00:52:32,460 so this is really powerful. So at school or something with the floppy disk, you can 662 00:52:32,460 --> 00:52:37,199 do something that's a bit more than basic programing. Bizarrely, I mostly write that 663 00:52:37,199 --> 00:52:41,420 with only two or three tiny syntax errors after about 20 years away from this. It's 664 00:52:41,420 --> 00:52:46,059 in there somewhere. Legacy wise, the machine didn't sell very many under a 665 00:52:46,059 --> 00:52:50,930 hundred thousand easily. I don't think it really made a massive impact. PCs had 666 00:52:50,930 --> 00:52:54,640 already taken off by then. The ARM processor, not going to go on about the 667 00:52:54,640 --> 00:52:58,920 company. That's clear that that obviously has changed the world in many 668 00:52:58,920 --> 00:53:04,140 ways. The thing I really took away from this exercise was that a handful of smart 669 00:53:04,140 --> 00:53:10,089 people. Not that many. No, order of a dozen designed multiple chips, designed a custom 670 00:53:10,089 --> 00:53:14,599 computer from scratch, got it working. And it was quite good. And I think that this 671 00:53:14,599 --> 00:53:17,380 really turned people's heads. It made people think differently that the people 672 00:53:17,380 --> 00:53:21,160 that were not Motorola and IBM really, really big companies with enormous 673 00:53:21,160 --> 00:53:27,479 resources could do this and could make it work. I think actually that led to the 674 00:53:27,479 --> 00:53:30,809 thinking that people could design their systems on the chip in the 90s and that 675 00:53:30,809 --> 00:53:35,309 market taking off. So I think this is really key in getting people thinking that 676 00:53:35,309 --> 00:53:40,420 way. It was possible to design your own silicon. And finally, I just want to thank 677 00:53:40,420 --> 00:53:45,279 the people I spoke to and Adrian and Jason. Their center of computing history in 678 00:53:45,279 --> 00:53:49,049 Cambridge. If you're in Cambridge, then please visit there. It's a really cool 679 00:53:49,049 --> 00:53:56,270 museum. And with that, I'll wrap up. If there's any time for questions, then I'm 680 00:53:56,270 --> 00:53:58,356 getting a blank look. No time for questions? 681 00:53:58,356 --> 00:54:01,890 Herald: There's about 5 minutes left for questions. 682 00:54:01,890 --> 00:54:07,880 Matt: Fantastic! Or come up to me afterwards. I'm happy to chat more about this. 683 00:54:07,880 --> 00:54:18,940 *applause* Herald:The first question is for the 684 00:54:18,940 --> 00:54:29,799 Internet. Signal angel, will you? Well, grab your microphones and get the 685 00:54:29,799 --> 00:54:36,700 first of the audio in the room here. There that microphone, please ask a question. 686 00:54:36,700 --> 00:54:44,130 Mic1: You mentioned that the system is making good use of the memory, but how is 687 00:54:44,130 --> 00:54:50,459 that actually not completely being stalled on memory? Having no cache and 688 00:54:50,459 --> 00:54:55,450 same cycle time for the cache- for the memory as for the CPU. 689 00:54:55,450 --> 00:55:01,000 M: Good question. So how is it not always stalled on memory ? I mean. Well, it's 690 00:55:01,000 --> 00:55:04,390 sometimes stalled on memory when you do something that's non sequential. You have 691 00:55:04,390 --> 00:55:08,869 to take one of the slow cycles. This was the N cycle. The key is you try and 692 00:55:08,869 --> 00:55:11,469 maximize the amount of time that you're doing sequential stuff. 693 00:55:11,469 --> 00:55:16,220 So on the ARM 2 you wanted to unroll loops as much as possible. So you're fetching 694 00:55:16,220 --> 00:55:19,799 your instructions sequentially, right? You wanted to make as much use of load-store 695 00:55:19,799 --> 00:55:24,290 multiples. You could load single registers with an individual register load, but it 696 00:55:24,290 --> 00:55:28,710 was much more efficient to pay that cost. Just once the start of the instruction and 697 00:55:28,710 --> 00:55:33,619 then stream stuff sequentially. So you're right that it is still stalled sometimes, 698 00:55:33,619 --> 00:55:37,141 but that was still a good tradeoff, I think, for a system that 699 00:55:37,141 --> 00:55:40,549 didn't have a cache for other reasons. M1: Thanks. 700 00:55:40,549 --> 00:55:45,140 Herald: Next question is for the Internet. Signal Angel: Are there any Acorns on 701 00:55:45,140 --> 00:55:49,839 sale right now or if you want to get into this kind of hardware where do you get it? 702 00:55:49,839 --> 00:55:52,810 Herald: Can you repeat the first sentence, please? Sorry, the first part. 703 00:55:52,810 --> 00:55:56,259 S: If you want to get into this kind of hardware right now, if you want to buy it 704 00:55:56,259 --> 00:55:58,839 right now. M: Yeah, good question. How do you 705 00:55:58,839 --> 00:56:06,359 get hold of one drive prices up on eBay? I guess I hate to say it. Might be fun to 706 00:56:06,359 --> 00:56:09,170 play around in emulators. Always perfer that to hack around on the 707 00:56:09,170 --> 00:56:12,309 real thing. Emulators always feel a bit strange. There are a bunch of really good 708 00:56:12,309 --> 00:56:19,180 emulators out there. Quite complete. Yeah, I think it just I would just go on 709 00:56:19,180 --> 00:56:23,260 auction sites and try and find one. Unfortunately, they're not completely 710 00:56:23,260 --> 00:56:27,829 rare. I mean that's the thing, they did sell. Not quite sure. Exact figure, 711 00:56:27,829 --> 00:56:31,500 but you know, there were tens and tens of thousands of these things made. So I would 712 00:56:31,500 --> 00:56:35,130 look also in Britain more than elsewhere. Although I do understand that Germany had 713 00:56:35,130 --> 00:56:40,170 quite a few. If you can get a hold of one, though, I do suggest doing so. I think 714 00:56:40,170 --> 00:56:46,259 they're really fun to play with. Herald: OK, next question. 715 00:56:46,259 --> 00:56:51,860 M2: So I found myself looking at the documentation for the LVM/STM instructions 716 00:56:51,860 --> 00:56:58,049 while devaluing something on ARM just last week. And just maybe wonder what's your 717 00:56:58,049 --> 00:57:04,029 thought? Are there any quirks of the Archimedes that have crept into the modern 718 00:57:04,029 --> 00:57:06,900 ARM design and instruction set that you are aware of? 719 00:57:06,900 --> 00:57:13,449 M: Most of them got purged. So there are the 26 bits adressing. There was a 720 00:57:13,449 --> 00:57:19,409 couple of strange uses of, there is an XOR instruction into PC for changing flags. So 721 00:57:19,409 --> 00:57:25,160 there was a great purge when the ARM 6 was designed and the ARM 6. I should know 722 00:57:25,160 --> 00:57:31,559 this ARM v3. That's got 32 bit addressing and lost this. These weirdnesses 723 00:57:31,559 --> 00:57:35,690 got moved out. I can't think of aside from just the 724 00:57:35,690 --> 00:57:40,619 resulting ARM 32 instructions that being quite quirky and having a lot of good 725 00:57:40,619 --> 00:57:46,789 quirks. This shifted register as sort of a free thing you can do. For example, you 726 00:57:46,789 --> 00:57:52,059 can add one register to a shifted register in one cycle. I think that's a good quirk. 727 00:57:52,059 --> 00:57:55,119 So in terms of the inheriting that instruction set and not changing those 728 00:57:55,119 --> 00:58:05,959 things. Maybe that counts? Herald: Any further questions? Internet, 729 00:58:05,959 --> 00:58:11,439 any new questions? No? Okay, so in that case one round of applause for Matt Evans. 730 00:58:11,439 --> 00:58:13,579 M: Thank you. 731 00:58:13,579 --> 00:58:21,142 *applause* 732 00:58:21,142 --> 00:58:27,679 *postroll music* 733 00:58:27,679 --> 00:58:43,658 Subtitles created by c3subtitles.de in the year 2021. Join, and help us!