Brain Inspired
BI 151 Steve Byrnes: Brain-like AGI Safety

Support the show to get full episodes and join the Discord community.

Steve Byrnes is a physicist turned AGI safety researcher. He’s concerned that when we create AGI, whenever and however that might happen, we run the risk of creating it in a less than perfectly safe way. AGI safety (AGI not doing something bad) is a wide net that encompasses AGI alignment (AGI doing what we want it to do). We discuss a host of ideas Steve writes about in his Intro to Brain-Like-AGI Safety blog series, which uses what he has learned about brains to address how we might safely make AGI.


Steve    00:00:03    So what we really want is that, uh, we understand what the AGI is thinking in all this full, glorious detail. I am hopeful we will get that. I guess I shouldn’t say that. I would love it if we got that. I am pessimistic that we’ll get that. There’s, uh, a school of thought in neuroscience, um, that says that the human brain is just so enormously complicated, that there’s just no way, just no way at all, that we’ll possibly have agi. Yeah. In a hundred years or 200 years, or 300 years, people will just throw out these numbers. Um, and I really want to push back against that. The question I’m interested in is, uh, how long would it take to understand the brain well enough to make brain like agi? Uh, whereas the question that a lot of other people are asking is, How long would it take to understand the brain completely. Mm-hmm. <affirmative>, Uh, and these are different questions.  

Speaker 0    00:01:08    This is brains inspired.  

Paul    00:01:21    Hey, everyone, it’s Paul. Are you worried about, uh, humans building agi, artificial general intelligence, and that if we do any number of things could go horribly wrong? If so, Steve Burns, uh, shares your worry. Steve is a physicist turned AGI safety expert, I guess I would say. And like a handful of others concerned about how AGI might affect our world. He thinks we should be thinking now about steps to take to ensure that our future AGI laid in world is a safe one however far away we are from actually building agi. In Steve’s series of blog posts called an Introduction to Brain like AGI Safety, he lays out that concern and argues one way to ensure a safe future is to understand how our brains work, including the algorithms that underlie our motivations and ethics to be able to program, uh, agis the right way to ensure that we don’t build AGI sociopaths, for example. So in this episode, we discuss some of the ideas that he elaborates in his writing, and Steve takes a few questions from the gallery of Patreon supporters who joined this live discussion. Links to Steve in his work are at brain 151. So I encourage you to go there to learn more. Thanks for listening as always, and thank you to my Patreon supporters who help make this show possible. All right. Here’s Steve.  

Paul    00:02:52    Steve Burns, uh, you’re one of them physicist turned AGI safety experts. How many of those, those are there,  

Steve    00:03:00    Uh, it does seem to be, uh, a, a common path for, for whatever reason.  

Paul    00:03:05    Is that right? Is that right? I was kind of making a joke, but I would assume that AGI safety and alignment people would be from all walks.  

Steve    00:03:12    Uh, they are, um, especially AI these days. But, um, let’s see. Uh, I used to be able to list five or 10 former physicists agi people off the top of my head. I’m not sure I can do that right now. <laugh>.  

Paul    00:03:27    Okay. Well, I I’m not gonna, um, talk much here, and we’re gonna get right to your introductory remarks. So you’ve prepared a few slides for people on YouTube and, uh, but, but you’ve, um, designed it such that people listening at home should be able to follow along. Uh, just, just fine. So what I’m gonna do is let you just go ahead and take the floor and talk about what you’re doing, what you’re concerned about, and how you’re going about, uh, doing it, and whatever else you wanna say. And then we’ll come back and have a discussion about it.  

Steve    00:03:56    All right? Yep. I made sure that the, that the slides are completely useless for the benefit of the audio listeners. Yeah. Thanks for inviting me. Um, so I was gonna start with a few minutes on, uh, uh, what I’m working on and why. Um, so, uh, the big question that I’m working on is what happens when people figure out how to put, uh, brain-like algorithms on computer chips? So I claim that this is something that’s likely to happen sooner or later, so it’s well worth, uh, thinking about. And when I, uh, bring this up to people, uh, they tend to split into two camps, um, based on, uh, how they think about that either. Um, they think about these brain-like algorithms on computer chips as a tool for people to use, uh, or they think of it as like a new species. So, um, let’s start with the tool perspective.  

Steve    00:04:44    Uh, so this is the perspective that’s gonna be most familiar to AI people, because if you put brain like algorithms on computer chips, that is a form of artificial intelligence. And the way that everybody thinks of artificial intelligence these days is it’s a tool for humans to use. Um, so if that’s your perspective, then the, uh, sub problem that I’m working on is accident prevention. So, uh, we’re concerned about the situation where the algorithms are doing something that nobody wanted them to do, not its programmers, not anybody, um, for example, being deliberately deceptive. Um, and you can say, Why are people gonna write code that does something that they don’t want it to do? Uh, and the answer is, this happens all the time. Uh, when it’s a small problem, we call it a bug when it’s a big problem. We call it a fundamentally flawed software design, but it’s certainly a thing.  

Steve    00:05:39    So, uh, the technical problem to solve here is, um, if people figure out how to run brain-like algorithms on computer chips, and they want those algorithms to be trying to do x where X is solely sell research or, uh, being honest or, or whatever the programmers have in mind, then what source code should they write? What training environments should they use? Uh, and so on. So this turns out to be an unsolved problem, uh, and a surprisingly tricky one. Uh, just to give you a hint of why it’s not straightforward, you could consider, uh, for example, um, humans have a innate sex drive, which you could think of as a mechanism that evolution put in us to make us wanna have sex. Uh, but it doesn’t work reliably. A lot of people have a perfectly functional, innate sex drive, but choose to be celibate. Uh, or as another example, you know, parents are always trying to get their children to, uh, you know, be indoctrinated into, um, religion A, but then the children grow up and join religion B, things like that.  

Steve    00:06:46    Uh, if you want an AI example instead of a human example, uh, there’s long lists of funny ones where the AI algorithms did, uh, really ingenious things that the programmer, uh, didn’t intend and didn’t think of, and certainly didn’t want. One of my favorites is, uh, somebody made a reinforcement learning algorithm that would make a robot arm grasp an object, Uh, and what ended up happening was the reinforcement learning algorithm learned to place the robotic arm between the object and the camera, such that it looked like it was grasping the object when it was not. Um, so you can find lots of examples like that. Um, and again, those, those all just go to show that, uh, the, that solving this problem is not straightforward, uh, and it turns out to be, uh, really tricky for, for pretty deep reasons, and that’s a problem that I and others in the field are working on.  

Steve    00:07:38    So, uh, meanwhile, there’s probably some people who are really up in arms at this point, uh, maybe the, like neuroscientists and biologists, um, saying that if we put brain-like algorithms on computer chips, we should not think of them as a tool for humans to use. We should think of them as a species, as a new intelligence species on the planet. Um, and incidentally, a species which will probably eventually, uh, vastly outnumber humans and think much faster than humans and be more insightful and creative and competent, and they’ll be building on each other’s knowledge, and they’ll be inventing tools and technologies to augment their capabilities and making pro plans and getting things done. All the things that, that humans and, and groups and societies of humans can do. Uh, these algorithms, uh, alone or in groups ought to be able to do them too, Right? So the big question is, uh, uh, if we’re inviting this new intelligence species onto the planet, how do we make sure that it’s a species that we actually want to share the planet with?  

Steve    00:08:41    Uh, and how do we make sure that they wanna share the planet with us? This raises lots of interesting, uh, philosophical questions that we can talk about over drinks, but, uh, in addition to all of those, there is a, uh, technical question, question, namely, uh, whatever properties we want this species to have, uh, we need to write source code, um, or come up with training environments to make sure it actually happens. Uh, and I think that, uh, human sociopaths are an illustrative example here. Uh, high functioning sociopaths do in fact exist. Um, therefore it is at least possible to put brain-like algorithms on computer chips, uh, that feel no, uh, intrinsic inherent motivation, uh, related to compassion and friendship, uh, and anything like that. And I would go further and say, not only is it possible to make, uh, algorithms that, uh, have human capacity, human-like capacity to make plans and, and get things done, um, but that don’t feel compassion and friendship.  

Steve    00:09:45    I would say it’s strictly easier to do it that way. I think that compassion and friendship are extra things that we need to put into the source code, and I think that in our current state of knowledge, uh, we wouldn’t know how to write the source code that does that. So I say, let’s try to figure that out in advance. Uh, so again, that’s a question that I and others in the field are working on. Yeah. So in summary, uh, I claim that sooner or later, uh, it’s likely that people will figure out how to run brain-like algorithms on computer chips. Uh, I claim this is a very big deal, uh, probably the best or worst, certainly the weirdest thing that will have ever happened to humanity. Uh, and there’s technical work today that we can do to increase the odds that things go well, and that’s what I’m working on.  

Paul    00:10:32    Okay. So let’s, um, thanks for the little thanks for the presentation, the little introductory presentation there. Um, I want to go back, I guess, and ask about your background and where you are now. So, you’re currently working at Aster, Did I pronounce that correctly?  

Steve    00:10:47    Uh, yes.  

Paul    00:10:48    So how many people like you are at Aster, and then in the, uh, AI alignment forum in which you posted the, this series of blog posts? Like how, how big is that community and how active?  

Steve    00:11:02    Um, let’s see. Uh, I think, uh, well, I’m the only full-time safety person at ter. Uh, by the way, I just wanna make clear that, um, I’m speaking for my, myself at this, uh, podcast, not, not ter, um, in the broader community. Um, uh, there are various estimates typically, like on the order of a hundred people, maybe 200 people, uh, who are working full time on, uh, sort of artificial general intelligence, uh, safety and alignment type problems in the world, uh, which I think is, is not enough. Um, but yeah. And of them, uh, I’m certainly on the, on the very neurosciencey side. Um,  

Paul    00:11:49    Well, yeah, I was gonna ask you how many people, because, you know, you are, the series of blog posts is about brain, like ai, right? And so I’m sort of wondering where you fit within that community, if you’re on the outside in, that you’re interested in applying brain knowledge and concepts, Um, to, first of all, as a way to build artificial general intelligence and will come to whatever help that means in a moment. Um, but, but where, you know, are you accepted by your peers? Are you ostracized? Where do you fit in in amongst them?  

Steve    00:12:20    Um, I think my, my work is, is being well received. I certainly have very frequent, uh, discussions with other people and, uh, uh, I get a lot of ideas from from others in the community, and, and I’d like to think that they’re getting ideas from me too.  

Paul    00:12:34    Oh, good. Okay. So should we, I mean, does everyone agree on what AGI is in that community, or if there are 200 or so folks, does that mean that there are 200 definitions or  

Steve    00:12:45    Perspectives? Um, right. Uh, let’s see. So, um, so the thing that, uh, pretty much everybody, or at least most people are like thinking very hard about. Um, so there’s, there’s this notion of transformative AI that would change the world as much as, uh, the industrial revolution or agriculture. Uh, and that’s a lot of, so everybody is interested in that. Um, and then there are differences of opinions about what transformative AI would look like. Yeah. And one salient possibility is that transformative AI would look like, uh, uh, goal seeking agents, or I guess I should say more broadly, that it would look like, um, yeah, agents that, uh, can do the sorts of things that humans can do, like, uh, you know, have ideas and, and make plans and execute the plans. And if the plans don’t work, come up with better plans. And if they don’t know how to do something, they can figure it out. They can invent technology, so on and so forth. Um, there are, uh, other visions of transformative AI that don’t look like that. Um, and I don’t really buy into those visions, so I’m not gonna do a good job of, uh, explaining that perspective. Uh, but yeah, um, I guess you could think of it as that’s the scenario that, that I’m working on, and I think that’s the likeliest scenario.  

Paul    00:14:23    Okay. So before we move on, also, I’m just curious how you got into this field. Like, what, where your concerns started? Did you read, were you reading books like Super Intelligence by Nick Bostrom, or did you just see the rise of artificial intelligence and start to become concerned?  

Steve    00:14:43    Um, well, I did read Super Intelligence, uh, by Nick Bostrom. That’s a lovely book. And there’s, there’s all these books too. Yeah. Uh, by Du Russell, and there’s a book by Brian Christian, um, mm-hmm. <affirmative>, I think I had heard about it on by reading blogs, you know, as one does. Um, and, uh, yeah, I had been vaguely interested in it for a very long time, um, without really doing anything about it. Uh, but then over time, um, yeah, my background is in physics, and I had a, a job where I was learning machine learning and getting involved in brain computer interfaced projects. Uh, so then at one point, maybe four years ago, uh, I had finished a hobby and, and I was ready for a new hobby, and I figured my next hobby is gonna be, uh, AGI safety. Um, and, uh, my aspiration in my free time was to learn enough about it that I could, uh, leave intelligent comments on other people’s blog posts. Uh, but I actually did way better than that. And a few years, maybe a couple years after that, I got a full-time job doing it, and now I’m on my second full-time job doing  

Paul    00:15:55    It. Nice. Okay. So, so the series, um, is called, and I’ll point to it in the show notes. It’s called Intro to Brain, like AI Safety. Do I have that right? I  

Steve    00:16:05    Don’t have Safety,  

Paul    00:16:06    A AGI Safety. Um, yeah. Uh, and I, I kind of wanna go through some of the, we’re not gonna be able to hit everything, obviously, but in the series, you kind of go through and start from a neuroscience perspective, um, and talk about what you’ve learned about brains and how you’ve applied those to develop what you think would be a system capable, at least, the beginnings of a system capable in the future of some sort of artificial general intelligence. And then you go on, based on that system, to talk about some of the, the problems with, uh, where it could go wrong, um, how it works, how, how it could go wrong, what needs to be programmed, what needs to be learned. So maybe in a few minutes, we can step through a little bit, um, of those overarching principles, because they’re interesting. Um, but I’m so naive about this stuff that I didn’t know that AGI safety was a thing, and then I didn’t know that AGI safety was different than the alignment problem or AGI alignment. Could you just talk a little bit about how those fit together and, and differ?  

Steve    00:17:08    Um, yeah. I mean, part of this is just branding, um, basically depending on who I’m talking to, I will describe myself as doing safety or alignment. Mm-hmm. <affirmative>. Um, the way I think about it is, um, the, uh, the term alignment refers to if you have, um, uh, an AI system that is trying to do something, um, then, uh, is it trying to do the thing that you wanted it to be tracking to do? Um, and if so, you call that an aligned system. And if not, then it is misaligned. Uh, safety means that the, um, we’re, we’re sort of thinking more specifically about, uh, catastrophic, about accidents in general, and in the context of AGI safety, catastrophic accidents in particular. So you think of the AI’s, you know, um, escaping human control and self replicating around the internet and, and things like that. You know, permanently disempowering humanity, um, all that kind of stuff.  

Steve    00:18:10    Uh, and we can talk about why that’s less sensible than it, than it sounds. Um, and, but anyway, um, safety and alignment are very closely related, because if you want your AI to be, you know, not, uh, causing harm, the best way is to make an AI that doesn’t want to cause harm and, uh, vice versa. So, uh, the way to get safety by and large is through alignment. There’s, people talk about other possibilities, like, our AI is gonna be safe because we don’t connect it to the internet. Our AI is gonna be safe because, uh, we didn’t give it access to the nuclear codes. Um,  

Paul    00:18:50    In the blog series, you mentioned a, uh, an actual website that describes building the perfect box to prevent the AI from doing it, I guess, to make the AI as safe as possible. And there are even design plans for this, which is interesting.  

Steve    00:19:06    Yeah. This is a archive paper by, uh, Marcus Hu and, and colleagues. They have an appendix with a, with a box that includes a laser interlock and airtight seals and a far day cage and so on.  

Paul    00:19:18    I dunno why I find that. I don’t know why that’s amusing. So, I mean, I, I should be up front here and say like, you know, these are things that, um, I’m glad that you worry about, but someone like me, like when I read Nick Bostrom’s Super Intelligence, and I know that there are plenty of other sources, um, one of the things that I noticed is that almost every sentence began with the word if, and then it’s if this happens, and then if this happens, and then if this happens, and my reading, I struggled to get through the book because it was so chalk full of these low probability events, which I don’t know how you feel, you, you just mentioned that they might be less fanciful, um, than I might consider them. Right? But the, but the probability quickly goes to zero if the, um, probability of each of those, if statements is even, you know, half or something like that.  

Paul    00:20:03    Um, that’s my, that was sort of my reading on it. So I have not been that concern. The, the other reason why I, I don’t, um, have, maybe I’m not as concerned as I should be, is because, you know, as you know, we don’t, we humans don’t have a great track record in predicting the future. And it, uh, I imagine that the future is not, so thinking about developing AGI in general, and then, you know, the safety issue if you issues with it, Um, I just have this feeling. I mean, it’s not based on anything. I guess it’s based on our human track record of, of being terrible at predicting future events and technologies and so on, that the future’s gonna look vastly different than we can predict right now. So maybe, you know, I’m curious about, so my temperature of worry of concern is very low. Um, where, where would you put your temperature, uh, of concern?  

Steve    00:20:56    Um, yeah, um, let’s see, uh, <laugh>, so there’s a bunch of questions here. Um, sorry.  

Paul    00:21:03    Yeah.  

Steve    00:21:04    Uh, yeah. So the first thing is, um, that there’s sort of a lot of paths to reaching the conclusion that, uh, AGI is going to be very, very important in the future. Mm-hmm. <affirmative>, uh, for example, if you don’t think that, you know, uh, or Nick, Nick Bostrom sends a disproportionate amount of his book talking about, uh, one, you know, galaxy brain, super intelligent agent, and what would it do? Um, but you could also say, Well, what if there is a trillion somewhat human or human level, AI with a trillion robot bodies, you know, controlling the world? And, you know, that’s a different scenario, but it’s still a very big deal. Um, and, uh, I claim, uh, not a low probability event. Uh, you know, either that or something equally weird. Um, uh, what, let’s see, what were your other, what else were  

Paul    00:22:01    You, Oh, I think I, I was just ranting about my disdain for that book and, uh, and just my, um, doubts about how accurate we can be predicting what the future will look like.  

Steve    00:22:12    Yeah. Um, right. And then, yeah, so you can say, um, AGI is going to be important in the future, but there’s nothing that we can do to prepare for it. Um, so why think about it? So that’s kind of different from not being concerned. That’s sort of being despondent. I guess  

Paul    00:22:35    I’m gonna look like a real jerk in, uh, what 30 years or however many years it takes, Right.  

Steve    00:22:40    Uh, I don’t know how many years, and neither it does anybody else. Um, we can talk about that more too. I think that the best argument for, uh, why we should not feel, uh, despondent and like that there is nothing that we can do that’s gonna be reliably good, is, first of all, um, there are very concrete and specific things that I claim that we could be doing right now that would that seem likely to help. I list a bunch of them, uh, in the, the last blog post at this areas, uh, open questions that I’m very eager for more people to work on.  

Paul    00:23:16    Yeah. I wanna come back to those actually. Uh, maybe we’ll connect after we, Yeah. Ok.  

Steve    00:23:20    Yeah. Like, we can fight to particular problems like, uh, learning algorithms tend to make models that are large and inscrutable and have lots of, um, unlabeled parameters where lots means millions or billions or trillions. And we can’t just look at the activations of those parameters and know what the model is doing and why, and what it’s thinking. That’s a problem. I mean, it’s obviously a problem already, and it seems like it’s likely to be a bigger problem in the future. So we can start working on that problem and have, uh, you know, a reasonable theory that that’s gonna be helpful in the future.  

Paul    00:23:58    So this might be, we don’t have to per perseverate on, you know, the likelihood and all that because, um, that that could take us down a long road. But, um, let me, this might be just a good time to ask you a question that a, uh, listener sent in who couldn’t be here because it’s 3:00 AM in Australia and they’re sleeping. Um, but this is sort of, um, about that and yeah, I don’t know if this is the right time to ask it, but I’m going to, So this is from Oliver, who, who says and asks, um, the core assumption behind AGI safety concerns is the quote Orthogonality thesis, which states, he says that any goal is compatible with any level of intelligence, thus the superintelligent, uh, paperclip maximizer, that example, that famous example of, uh, the super intelligence that you say, I want to, your goal is to manufacture paper clips and destroys the world, um, extracting resources in order to do so and destroys us all eventually, Okay. But then all of our continues. But I find it very difficult to imagine a being that is smart enough to learn everything it needs to about physics, psychology, et cetera, to manipulate and destroy us, yet is not convinced by our best ethical theories that imply that they ought not to do so. So, in other words, for this kind of scenario to make sense, do you need to assume that our ethical theories are merely parochial prejudices lacking any objective truth, value, thanks from Oliver? Did you get all that?  

Steve    00:25:27    Uh, yeah. Um, sometimes the way that I think about this is in the context of, uh, reinforcement learning. Um, so if you take, uh, alpha go and you give it a reward function of trying to win it go, then it becomes extraordinarily good at winning it go. If you give it a reward function of trying to lose at go, then it becomes extraordinarily good at losing a go. Um, I think that the human brain likewise has, uh, model-based reinforcement learning algorithms right at its core. And I think that the sort of analog of the reward function, uh, is what I call innate drives. So there’s nothing in the world that says that, um, you know, um, eating food when you’re hungry is good. It’s a thing in, in your brain. Um, there’s nothing in the world that says, or, you know, for, for any innate drive. Uh, it’s a thing in your brain and it makes you think that it’s great. And, um, well, oversimplifying it that, um, I think that, uh, Oliver should spend some time as I have, uh, employed by high functioning sociopaths, <laugh>, um, not my current boss, but I’m  

Paul    00:26:41    Sorry, sorry to hear about that, by the way. <laugh> <laugh>, you talk about that in the, in the blog posts as well.  

Steve    00:26:46    Um, yeah, I mean, just try to imagine talking a, a high functioning sociopath into caring about other people. Uh, and if that sounds easier, easy to you, then I suggest that you should actually try. Uh, I think that people have a lot of intuitions that are related to things like, you know, justice being good, and compassion being good, and, um, friendship being good and torture being bad. And I think all of those intuitions are sort of rooted in our human innate drives. And, um, when we make future agis, uh, we get to put whatever innate drives into them that we want. Uh, so it could have an innate drive to maximize stock prices. It could have an innate drive to, you know, whatever crazy thing you can think of. Uh, if you can program it, you can put it in. And I hope that, uh, future programmers put in the innate drives for, you know, compassion and friendship and not for maximizing the amount of money in its bank account. Um, but we don’t actually know how to write the code for, uh, a compassion and friendship innate drive right now. And that’s one of the things that I would like to figure out.  

Paul    00:28:00    I don’t re, I don’t even know what compassion, really, I, you know, some of these, some of these concepts are also currently ill defined, or not clearly defined. But again, that’s sort of besides the point. But these, these innate drives that you’re talking about, uh, are gonna be, you claim, at least in your version, in your brain inspired, uh, version of AGI safety, uh, are, are gonna be hard coded in what, what you call the, the steering system or the steering subsystem of what an AGI might look like. So maybe we should sort of jump into what your, what your model, current version of your model entails and the, the concepts. And I’m gonna, I’m about to throw a lot of out at you, so I apologize for this because I also want to know why you turn, cuz I know that you’ve been learning a lot of neuroscience. Um, so why is the question, uh, and then maybe we can kind of step through the components, um, you know, really high level, uh, overarching ideas and concepts of the components. And then on top of that, a third thing, and I will repeat these, is how, how did you decide where to stop? Like, what was important? What are the important principles to abstract from neuroscience, um, and, and brains, um, to include, Right. There’s always that question of how much detail to include.  

Steve    00:29:23    Yeah, sorry. So you, so you said why, why study neuroscience?  

Paul    00:29:27    Yeah. What, what, maybe  

Steve    00:29:28    I’m a glad to punishment  

Paul    00:29:30    <laugh>, Welcome to the club, sir. Yes,  

Steve    00:29:33    <laugh>. Um, I think that, uh, there’s the big problem in the field of AGI safety that, um, we’re trying to prepare for a thing that doesn’t exist yet. Uh, people sometimes, uh, uh, say, what’s the point of even trying, Uh, we can’t do anything until we have agis in front of us that we can experiment on. Um, but we can do, uh, contingency planning. And one contingency is that, um, the, uh, well, we know that human brains are able to do all these cool things like go to the moon and, and invent nuclear weapons. Um, and there’s some reason that human brains are able to do that. Um, some principles of, of learning and search and planning and perception and so on. Um, and whatever those principles are, uh, presumably future, um, programmers could either, either for future neuroscientists, could reverse engineer them, uh, or future AI programmers could independently come up with the same ideas, just like, um, you know, uh, TV learning was invented independently by AI researchers before they realized that it was in the brain. Uh, from my perspective, I really don’t care which of those it is. Uh, I care about the, the destination as opposed to, uh, which, uh, department, uh, is the people who gets the credit at the university <laugh>. Um,  

Paul    00:31:01    But your bet is that it’s gonna, AGI will develop via, uh, reverse engineering. What, what we know about the only example that we have of what, whatever AGI is, what we call agi the brain, uh,  

Steve    00:31:15    Um, I think that, uh, it’s probable that will wind up in the same location, but, uh, I don’t have any strong opinion about whether we’re gonna, uh, reverse engineer it or, um, reinvent it. Uh, I do think that, uh, it’s in, vaguely in the vicinity of lots of different lines of artificial intelligence research, uh, most notably model-based reinforcement learning. Uh, but I don’t think that the way that human brains are able to do the cool things that humans can do is exactly the same as any paper that you could find on archive right now. So the AI people are hard at work. They’re publishing more papers, they’re coming up with new architectures and learning algorithms, and maybe they will reinvent brain ideas. Maybe some of the people who have been on this podcast and elsewhere will, uh, study the, uh, principles of learning and whatnot in the brain and come up with, uh, the ideas that way.  

Steve    00:32:16    Uh, but I do think that whenever I see the brain does thus and such, it seems to be like, I often have the reaction, Oh, wow, well, that is a really good way to do it. Um, so, and I have trouble thinking of very different ways to get to the same definition, uh, sorry, get to the same destination, um, that are wildly different from how the brain doesn’t. Uh, but a lot of people disagree with me on that. Um, and I sometimes give the cop out response of maybe, uh, this is just one possible route to agi, but we should be doing contingency planning or every possibility.  

Paul    00:32:58    Hmm. All right. Well, so you have spent, um, plenty of time looking into neuroscience and used some of the concepts to develop, um, what you see as a feasible model for how an AGI might be built, and then go on to say, this is how we, uh, could, um, solve or potentially solve or address solving at least the alignment and safety problems. So what are the, um, what are the main components, um, in the model and what do they correspond to in brains?  

Steve    00:33:30    Um, well, let’s see. So I should start with, um, the concept of, uh, learning from scratch, which is this jargon term i, I made up because I couldn’t find an appropriate one in the literature. So, um, I’ll start with two examples of what learning from scratch is, uh, and then I’ll say what they happen in common with each other. Uh, so the first example is, uh, any machine learning algorithm that’s initialized from random weights. Uh, and the second example is a blank flash drive that you just bought from the store. So maybe all of the bits in it are zero, or maybe it’s random bits. Um, but what those two things have in common is that you can’t do anything useful with them at first. So your machine learning algorithm, your convolutional neural net or whatever, is gonna output random garbage. Um, but then over time, uh, the weights are updated by gradient descent and it gradually learns to update, uh, to, to have very useful outputs.  

Steve    00:34:31    Um, and by the same token, uh, you can’t get anything useful out of your blank flash drive, uh, until you’ve already written information onto the flash drive. Um, so you could think of these as memory systems in a sort of very broad and abstract sense. Uh, I just call them learning from scratch. Um, and in the context of the brain, uh, I’m very interested in, um, modules, uh, which are learning from scratch in that sense, um, that they start from, uh, that they emit random garbage, uh, when the organism is very young, uh, and by the time the organism is an adult, they are sending out very useful, uh, ecologically useful outputs that help the organism thrive and reproduce.  

Paul    00:35:21    Yep. Do you have, do you have children? I was just thinking about thinking of my children as random garbage at the beginning, which,  

Steve    00:35:28    Um, <laugh> I do have children. Um, they are emitting lots of very useful outputs  

Paul    00:35:34    By now.  

Steve    00:35:35    Yeah. By now. Yeah. Um, even, yeah. Uh, even at birth, it’s possible that, um, that learning from scratch has already been happening in the womb, For example, uh, you don’t really need sensory inputs to learn to control your own body, So you can have a reinforcement learning algorithm related to motor control, and that would be learning from scratch, but it can already start in the womb, and it doesn’t really need to be learning from the outside world.  

Paul    00:36:02    Right.  

Steve    00:36:04    Um, so my hypothesis is that, um, 96% of the brain is learning from scratch in that sense, including the, the whole cortical mantle, the neocortex and, and hippocampus. The, the whole tum amygdala aum, um,  

Paul    00:36:21    That’s by volume, right? 96% by volume. Yeah.  

Steve    00:36:25    Yeah. Uh, probably similar by mass or neuron count. Um, and the, uh, big exceptions are the hypothalamus and brain stem, which I think are not learning from scratch in that sense. Um, so I’m, I’m very interested in whether that’s true or false. Uh, it seems to be, uh, an under discussed question in my opinion. I think some people agree and other people disagree. Hmm. But it’s really central to how I think about the brain.  

Paul    00:36:55    So that learning from scratch system is, or that learning from scratch, um, algorithm, I suppose is contained, you know, lots and lots of different, uh, objective functions within, uh, what you call the learning subsystem, which can, which, uh, consists of in the brain, consists of the cortex, cerebellum, thalamus, basal ganglia, uh, and so on. Um, and then, so that’s kind of, that’s the learning subsystem. And then you have what you think is understudied in neuroscience, and you, and you put a call out for more neuroscientists to study the, the, um, what you call the steering subsystem. So maybe you can des describe that as well.  

Steve    00:37:35    Yeah. So that would be the rest of the brain, um, especially the hypothalamus and brain stem and, and a few other odds and ends. Um, I think that, yeah, if we go back to the reinforcement learning, um, context, um, a big thing that, so the reason that I’m calling it the steering subsystem in this series is that, uh, one of the very important things that it does is steer the learning algorithms to, uh, emit, uh, ecologically useful outputs. Um, and a big way that it does that is, uh, through, uh, the reward function and reinforcement learning, the thing that says that touching a hot stove is bad, and eating when you’re hungry is good, as opposed to the other way around. Um, so I think that, um, all the innate drives, well, I think that it’s probably somewhat more complicated than just a reward function, uh, in the brain.  

Steve    00:38:32    Um, but if we just simplify it to a reward function for the sake of argument, all the innate drives that we have, including things like compassion and friendship and envy, uh, all the social instincts are implemented through, uh, this reward function as far as I can tell. Um, and I’m very interested to know exactly how that works. Um, what, um, there isn’t an obvious ground truth, um, for a lot of these things. Um, I think that, that, that’s a big open question that I wish more people were working on. And I think that that’s the kind of work that involves, uh, sort of thinking about what the hypothalamus is doing, uh, and different parts of the brain stem are doing, uh, in the context of reinforcement learning and learning algorithms more generally.  

Paul    00:39:23    But you see these, so in, in the com terms of the computer metaphor for the brain, you see these as the hard coded parts of the brain, essentially the, um, the steering subsystem, right? So in neuroscience, um, you would call these innate, um, uh, <laugh> innate structures of the brain that, uh, we’re born with and have been honed through evolution. Right?  

Steve    00:39:47    Um, I mean, obviously there are lots of genes doing lots of things in the neocortex and the strum too. Um, I mean, you can just look at them, you see all the different cyto architecture and so on. There’s, but I do think that there’s a big difference between the two. So in the context of the neocortex, uh, if, if, uh, you know, this part is a granular and that part is granular, I think of that as sort of an innate, uh, neural architecture and a hyper parameters, which might be different in different parts of the architecture and different in different stages of, of life and so on. Whereas, um, so you could think of it as sort of, uh, disposition to learn certain types of patterns rather than other types of patterns, but it still has to learn the patterns. Um, and that’s a big difference from what I think is happening in the hypothalamus and brain stem, where, uh, it’s just saying directly what’s good and what’s bad, and what should be done. Uh, in some cases, um, the hypothalamus and brain stem will just do things directly, uh, through the motor system without, uh, involving the learning algorithms at all. Um, so, you know, if the mouse runs away from, uh, incoming bird, uh, it doesn’t need to learn to do that because there’s sort of specific circuitry, uh, in the brain stem that looks for birds, and that, uh, makes the mouse scamper away when it sees it, um, as far as I understand.  

Paul    00:41:16    And we will override any of the learning, uh, subsystems, um, inter interfering in that case plans, right?  

Steve    00:41:23    Yeah. Uh, well, often, and, and not only that, but um, that’s sort of the ground truth, um, that will help the learning systems to, uh, do similar things in the future. So, for example, if, uh, I imagine that, you know, maybe the amygdala notices that, so it would be maybe the first time that, um, the mouse sees a bird from overhead. Um, the superior curus has its vision processing systems, and it can notice that there’s a bird and say, This is a good time to be scared. This is a good time to scamper away. Meanwhile, the amygdala gets a kind of ground truth signal from that and says, This is a good time to be scared. Um, and it can look for sort of arbitrary patterns in what it’s thinking and what it’s doing, where it’s at, uh, not just the amygdala, probably other parts of the brain too. Um, and they can learn more sophisticated patterns that can sort of preempt preemptively do things that the brain stem, uh, would be thinking is a good idea.  

Paul    00:42:23    Hmm. So, okay, so we have these, uh, the steering subsystem and the learning subsystem. So how do they, uh, communicate? And then I, I wanna eventually get to the concept of what you call aats interpretability. Uh, how do they communicate and, um, how does that drive the behavior and the learning of the agi? I’ll say,  

Steve    00:42:44    Yeah. Uh, this is, this is a question that I wish I had a better answer to myself, and I’m, and I’m still, uh, trying to learn about it myself. For example, um, uh, just the other day I was reading about, uh, exactly what do, uh, neuropeptide receptors do in the strum, uh, as an example of, of something that, uh, I’m still a little hazy on the details. Um, uh, by and large, if we start from the reinforcement learning perspective, then we would say, um, the steering subsystem, one of the things it does is send a reward function up to the learning subsystem, and then the learning subsystem can learn a value function and then take good actions that are, uh, seem likely to lead to rewards. Um, I think that it’s actually more complicated than that, though. Um, if you dig a little deeper into the reinforcement learning literature, you can find, um, examples with multidimensional value functions, uh, where instead of one reward, maybe there’s 10 rewards, and you learn a value function for each of the rewards that kind of predicts the different rewards.  

Steve    00:43:55    Uh, I think that’s might be a little bit of a better analogy because, um, if you think about it, uh, I could have some thought or memory and it, uh, do it, it has a valance, maybe I think it’s a good thought or a bad thought, but it can also have, uh, other evocations, it can make me feel goosebumps. It can make me, uh, get stressed out, you know, cortisol, um, it can make me, uh, you know, feel envious or whatever. Um, so I think that there’s, uh, uh, just like a reward can sort of lead to a value function that anticipates the reward. There can also be ground truth goosebumps that leads to, uh, anticipation of future goosebumps and, uh, ground truth cortisol that leads to anticipation of future cortisol and so on and so forth.  

Paul    00:44:48    Let me, so, uh, abi you have your hand raised, so unmute yourself if you wanna jump in and ask question.  

Speaker 3    00:44:56    Hi, Fran. Yeah, I was wondering, on the AGI safety front, I think there’s also a group that considers not just, you know, safety for humanity from the agi, but also safety for the AGI itself, and so kind of a reproductive responsibility for these agents who bringing into existence. And I was wondering what your thoughts were, since you know, you’re coming from it from a, you know, how the brain works perspective and, you know, there’s might be a perspective where that could increase the chance that there’s artificial phenomenology happening here. And, you know, the structures in the AI are mirroring the structures in our brain in ways that we can recognize as, you know, pain suffering in the sort. What are your thoughts on, as someone who’s looked into it, uh, on yeah, on this approach to, uh, AGI through, uh, emulating brain systems versus others?  

Steve    00:45:57    Um, thanks for the question. That’s a great question. Um, yeah. Uh, so, uh, what, what do we owe these future AGI algorithms? Um, are they gonna be conscious or not? Uh, in a sort of phenomenal, logical and morally relevant sense? Uh, is a question on which, uh, I don’t claim to be an expert. Um, I think that, uh, I have high confidence that it is the least possible to make that, that future AGI systems will be conscious in a morally relevant sense. Uh, I would go further and think and say that it’s likely, but I don’t have, uh, great confidence on that. Uh, if, uh, then sort of a separate question is what do we do with that information? Um, uh, the immediate place that most people’s minds jump to is they get worried that humans will mistreat the agis. And I am worried about that too.  

Steve    00:46:54    I think that’s certainly a legitimate worry. Um, I’m also worried that the agis will wind up in charge of everything, and then they’ll be mistreating, uh, the humans. Uh, and I’m worried about the AGIS mistreating each other too. Um, so, uh, from that perspective, it, it seems to me that, uh, at least one useful thing to do is what I’m trying to do, which is to try to better understand and control the motivations of the agi. Um, but yeah, um, the fact that agis are probably likely to be conscious is just makes everything sort of more complicated. Um, and I’m not sure exactly what to do about that. I’m not sure that’s a great answer, but that’s my answer.  

Speaker 3    00:47:42    Oh, you’re chilling. No, that’s perfect. I think that I was thinking more specifically along the lines of, you know, the approach of emulating biological systems versus, you know, more disconnected ways of organizing a system towards agi, where it’s like, okay, uh, and your perspective on, Hey, should we be going down this path of, uh, emulating biological systems, or should we be, you know, exploring other paths that don’t, uh, mimic or emulate or represent natural systems that we, uh, give moral consideration to already? Uh, and if, you know, you think that, hey, you know, emulating biological systems isn’t that bad, reasonable, the sort which I am very open to, uh, or, uh, yeah, if we should, uh, there might be merit in, you know, going to a more, uh, disconnected unlike biological systems approach to, uh, AGI development.  

Steve    00:48:49    Uh, yeah. So let’s see. So you touched on a couple issues there. One is, um, if we have a choice of making conscious agis or, you know, not phenomenally conscious agis, which one would, would we pick? Um, and, uh, I guess if I had the choice, uh, it seems like the non phenomenally conscious ones would make things a little less complicated to think about from a moral standpoint. Uh, I’m not sure that we will have the choice. Um, and, uh, yeah, it might just be something that we have to deal with. Uh, I should also clarify that, um, I’m not like, uh, my, my official position on whether brain like AGI is better or worse than other past to agi all things considered is, uh, that I don’t have an opinion. And, uh, I consider that my research is trying to figure out what the answer to that important question is. Um, yeah,  

Paul    00:49:50    But your, your models don’t, um, or your, your, your proposed model doesn explicitly build in something to attempt to create consciousness. Um, and, you know, it’s, it’s lacking a lot of like psychological concepts like working memory and, uh, and attention and things like that that, you know, I presume that you think about and may want to build in, uh, eventually. But that’s why I’m, you know, part of my question was how did you decide what to leave in, what to leave out? What were, what were the core principles, right, that, that were important for when you were building this model?  

Steve    00:50:25    Uh, so for example, the idea, the hypothesis of learning from scratch, uh, can be true or false. Um, I think it’s a, a meaningful question, whether that’s correct or not, uh, independently of all the complexities within the learning algorithms itself. So for example, um, I think of working memory as sort of a type of output or action, um, among other things. And, uh, we don’t have to take a stand on what exactly the space of outputs and actions are. Uh, or I should say that more carefully. Uh, I think that, um, well, it’s fine. Anyway, <laugh>, uh, I think that you can make a decision to hold something, a working memory, uh, in a, at least loosely analogous sense to you can make a decision to move your arm. Um, I’ve tried to, um, focus on the things that seem directly relevant for safety, um, which tends to be things like, uh, what does it mean for, uh, an AGI to be trying to do something, uh, and how can we control what it is that the AGI is trying to do?  

Steve    00:51:42    Uh, and if we want the AGI to have sort of pro-social motivations, then how would we go about doing that? Um, so not everything is relevant to that. Uh, protein cascades inside neurons are not relevant to that, right? Um, the, uh, uh, gory details of, of how the neocortex learns and works are not relevant to that. Um, and, uh, yeah, so I tend to be sort of having questions in my mind, and then I try to answer those questions, and sometimes I get the right answer, and sometimes I remain confused, and sometimes I think I have the right answer, and then I turns out that I don’t, and then I have to go back and edit my old blog posts.  

Paul    00:52:26    Okay. So, I’m sorry we’re jumping around here, but there are, uh, more questions from the chat now. Um, this was related, Both of them are really related, I guess to what a Abe was a asking. Uh, Donald asks, um, does the concept of death or a kill switch play into the development? So this is, goes back to, you know, runaway agi. Um, well, yeah, runaway potentially dangerous agi, can’t we just, uh, kill it?  

Steve    00:52:58    Uh, yeah. So, um, uh, I mentioned earlier the, um, the difference between, um, safety and alignment. So the, we don’t have a great solution to the alignment problem right now. We don’t have any solution to the alignment problem right now. I’m working on it and other people are working on it, but at the moment, we don’t have a great solution. And when people hear that they, let’s try to get safety without alignment. So that means that we don’t know for sure how to make sure that the, um, AGI is trying to do the things that we wanted to do. Uh, we can’t really control the agis motivations. Uh, maybe the AGI is motivated to do something vaguely related to what we were hoping, or maybe not even that, but, uh, the AGI is not gonna do anything dangerous because we’re locking it in a box and we’re not giving it internet access, and we’re gonna turn it off if, uh, if it tries to do anything funny.  

Steve    00:53:56    Um, so pretty much the consensus of everybody in this field is that that’s not a great approach. Maybe it could be used as an extra layer of protection, but, um, it’s not a central part of the solution. Um, and there’s a lot of things that go wrong. The first thing that goes wrong is that, um, even if you don’t let your AGI access the internet, what if the next lab down the street or across the world let’s their AGI access the internet? Um, so boxing doesn’t work unless everybody does it. The other problem is, um, you don’t, uh, what are you going to do with this AGI that might have bad motivations? Um, and is in a box. So every output that it sends you might be some useful thing that you wanted it to do, but it could also be part of some dastardly scheme to escape from its box.  

Steve    00:54:53    Um, and how are you gonna know, again, the activations inside this neural net model might be trillions of, you know, parameters and, and, um, it’s just some complicated model. And, uh, unless neural network interpretability makes great strides from where it is today, uh, I think our default expectation is that mind reading this AGI is gonna be, uh, an uphill battle. Um, so yeah, you just have this thing that is in a box and it’s kind of useless because you don’t trust anything that it’s doing. Um, and so you kind of sit around not doing anything, and then the next lab, uh, releases there AGI from the box. Um, and yeah, uh, even, even above and beyond that, um, computer security practices are terrible these days. Um, and we should assume that, uh, the box is not gonna be leak proof even if we wanted it to be. So there’s a lot of problems with that. Um, I think the, the better solution, the solution everybody wants is to actually just figure out how to make the AGI have good motivations so that it’s trying to do things that we want it to do, and then we don’t have to worry about keeping it locked in a box.  

Paul    00:56:04    But with the, would the AGI have control over its own power? I suppose if it was a robot, it would be able to, uh, always go plug itself in when it’s battery got low, et cetera. But, you know, because we have, um, well, we won’t talk about free will, but we have, I’ll just say, say we have control over our own power, right? Um, and the, these are what lead to our autonomy and our, um, you know, drives our motivations and base level inherent drives. But, you know, uh, an agi, which would presumably require electricity or some source of power, uh, would, you know, wouldn’t have those, uh, base inherent motivations unless you hard coded them into what you’re calling the steering system, I suppose. Right?  

Steve    00:56:51    Well, there’s a big problem, which is that, um, uh, yeah, you, you might be wondering why I keep talking about the AGI trying to escape the box. Why is the AGI trying to escape the box if we don’t want it to escape the box? Uh, the problem is presumably we want the AGI to be doing something, designing better solar cells, um, you know, curing cancer, whatever it is. And the problem is almost no matter what an agent is trying to do, it can do it better with more resources, and it can do it better, uh, if it isn’t getting shut down. So let’s say we make this AGI that just really wants there to be better solar cells. Uh, we have this, you know, mediocre ability to sculpt the motivations of the agi, and we’re able to sort of put some solar cell related motivations into it, but, uh, we’re not able to fine tune the motivations well enough that it’s following human norms and so on and so forth.  

Steve    00:57:54    Um, and yeah, so if the AGI wants to invent better solar cells, the problem is, um, the best way to invent better solar cells is to, uh, convince the humans to let you out of the box and then self-replicate around the internet and earn lots of money or steal lots of money and, um, you know, build lots of labs and, and do it that way. Uh, you can sort of imagine if a bunch of eight year olds have locked you in prison and you just really, really wanna invent better solar cells, what’s the first thing you’re gonna do? Escape prison. And then you can invent the better solar cells. So it’s not that the agis have this natural yearning to be free, you know, um, uh, see the blue sky under their heads or whatever. Uh, it’s that escaping is a good way to accomplish lots of goals. This is called, uh, instrumental convergence in, in the bis, by the way.  

Paul    00:58:50    The bis. Yeah,  

Steve    00:58:52    <laugh>.  

Paul    00:58:53    Okay. We, we have the age old question, so I’m assuming you’re just gonna solve this and answer it right now. Steve. Uh, Hannah asks, How would we know if a brain like AGI is conscious? And I’m wondering how, how we would know if Hannah is conscious, because that’s the, you know, it’s a phil philosophical conundrum, right? So do you wanna address that  

Steve    00:59:12    <laugh>? Um, yeah. Uh, I wanna reiterate that I’m not an expert on, you know, digital sentience or consciousness, uh, as far as I can tell. Um, uh, from my humble perspective, uh, if the AGI is, uh, able to write philosophy papers about consciousness despite us never telling it anything about consciousness, um, and they kind of sound like human papers. Uh, not only that, but doing it for similar reasons based on similar underlying algorithms going on in its, you know, digital brain as the algorithms that go on in, in our human brain, uh, lead to the same output discussions of consciousness, uh, then I would feel, um, awfully, uh, like that’s a conscious algorithm. I think that that’s a necessary, uh, sorry. I think that that would be sufficient. Uh, I don’t think that that would be necessary. It’s probably, as far as I know, possible to get an AGI that’s conscious, but doesn’t write essays about consciousness. And in that case, I don’t really know. Uh, I remain somewhat confused on the topic myself.  

Paul    01:00:24    <laugh>, I think we all are, whether we admit it or not. Okay. So, uh, Donald is commented about his earlier kill switch, um, question, just saying that, um, he was thinking more that part of our motivations for making good choices was the, our self-preservation, right? This inherent drive to stay alive, uh, so that the AI might need similar fear of death to develop good choices. I don’t know, do are good choices as a, uh, consequence of fear of death? I make maybe, maybe good and bad choices are, I’m not sure that they’re connected.  

Steve    01:00:55    Um, yeah, I mean, uh, in my mind it would be good if, uh, supposing that we wanted to turn the AI off, uh, that the AI didn’t, um, try to trick us into not turning it off. Um, so in that sense, fear of death is exactly the opposite of what we would want in AI to have.  

Paul    01:01:16    All right. What, what’s the climber’s name? Alex Har, who does the free, free solo, like climbs without ropes, and he has no, no fear of death. <laugh>. He seems to be, he seems to have his morals intact, but I’m not, I don’t know the guy. Okay. So, um, these are good questions guys. So, um, keep, keep, uh, jumping in with the questions. I don’t mind jumping around. I hope, Steve, that, that this is okay with you.  

Steve    01:01:38    Fine with me.  

Paul    01:01:39    See if we’ll lose the audience either way, perhaps. Um, so I don’t know where we were, you know, we were talking, so we were talking about the model kind of, and how you base it on, on loosely on the brain and how you conceive of two systems in the brain communicating with each other. Um, and I don’t know if, uh, maybe, Oh, someone, Chris is saying that, uh, Alex Arnold’s amygdala is smaller than average. I’m not sure if that’s true. Sorry for the interruption. One of the, um, I, I, I mentioned this before because you, you put out a call to neuroscientists to start working on, um, the basal parts in the brain that, um, roughly correspond to what you call the steering subsystem. And that, that there’s a dearth of research on this because all of neuroscience is focused on the cool part of the brain, the cortex, the seat, the seat of consciousness, right? And where all the, the really neat cool stuff happens. But, um, but you suggest that designing the steering subsystem, which can corresponds to those more innate, um, hard coded parts of our brains, um, is probably the most powerful way for us to address the safety problem. So, uh, do you, do you wanna just comment on that?  

Steve    01:02:51    Yeah. Um, I should be more clear that the, um, uh, uh, the, the thing that I think we need more work on is, um, specifically people who have, uh, really good understanding of, uh, artificial intelligence in general and learning algorithms in particular. I want those people to be, uh, you know, looking carefully at the hypothalamus, uh, and the brain stem, not just BTA and sncc, but, um, but the rest of it too. Um, sort of answering the question of not just, um, how does the reinforcement learning reward prediction error cause the learned models to update, but also what is the reward in the first place? The thing that says that touching the hot stove is bad and, and eating is good. Um, let’s try to get more details on that. Um, you know, what, what is the part of the reward function that’s able to, um, solve these tricky problems?  

Steve    01:03:51    Like, um, for example, if I’m motivated to, uh, be kind to my friends and mean to my enemies, uh, what exactly is the reward function that leads to me feeling those feelings? Um, there’s sort of a tricky symbol grounding problem. There isn’t an obvious ground truth that you can program in to, you know, the hypothalamus or the brain stem to sort of figure out that this is an appropriate moment to feel envy. Um, if I feel envy about, um, you know, I, if I think a thought like, Oh, I see my friend Rita won a trophy, and I didn’t, then that thought makes me feel envious, but, um, you know, the genome doesn’t know what a trophy is, and it doesn’t know who Rita is. Uh, somehow it has to send this negative reward for envy, uh, on the basis of what seems like very little ground truth. Um, I think that that’s a solvable problem, and I’m just very interested in, uh, hammering out exactly what the solution is as implemented in the human brain.  

Paul    01:04:51    But the difficulty in, in that also is that the, those ground truths can change depending on context, right? So if I see Rita get a trophy in one context, I’ll be really envious if I was, uh, if I lost to bowling to her and she won the bowling competition, right, or tournament, but then in another context, I might be happy for her because, you know, I’ve beat her the last three times and she deserves to, uh, to win every once in a while, right? So it’s the exact same scenario. I see Rita win a trophy, and my, um, my response is different, right? So like the, it depends on how we are moving throughout the world and the context of our recent history and our future possibility, and how our, whether we’re hungry and all of these different signals, right, that are a function of our living continuous time condition in the world. And I’m, I’m even losing myself as I’m saying this, but, um, I agree with you in, in essence that it is a hell of a tricky, uh, problem. But so do you, you think these are neuroscientists that need to work on this, who need to appreciate AGI and these issues, or are these, um, AGI people who need to go into neuroscience, perhaps?  

Steve    01:06:07    Uh, I’m happy for, for both to happen. Um, uh, let’s see. Yeah. Um, everything you mentioned about, uh, NV being context dependent and so on is absolutely true. Uh, I should mention that, you know, envy, as we think of it as an adult human is this complicated mix of our innate drives, plus a lifetime of experience and culture. Um, so it’s not necessarily, uh, so the things that our innate drives aren’t gonna necessarily have one to one correspondence with, um, you know, English language, words for emotions. It might be, uh, different kind of ontology. Um, but, you know, we sh I, I would, I look forward to the day when we have a theory that, uh, sort of explains, um, all those different aspects of, of all of our different social instincts, Um, and, and where those come from. Uh, yeah, I, I would love for, um, neuroscientists to, uh, be engaged in that project, especially, uh, if I can, um, you know, uh, yeah, the, the people who are sort of working on the, the gory details of, of the neocortex and so on, um, I would love for them to be spending a little more time thinking about, um, you know, what is the reward function?  

Steve    01:07:26    And not just how does the reward function cause updates, Um, and, uh, I am also trying to get the people in AGI safety interested in neuroscience, um, I think with, with some success. Um, but there’s very few of those people, and they’re already doing useful things in the other parts of AGI safety. So I’m concerned about, uh, parasitizing them for my own pet projects.  

Paul    01:07:51    Sure. So, so just not to belabor the point, and apologies because I am naive about a lot of the, you know, the concerns, but there’s also the question of what is the perfect age to model an AGI after, Right? You think children are, what’d you call them? Garbage, but no, <laugh>, I’m just kidding. But, but, you know, do we want our AGI to be a, a 13 year old, a 24 year old? What’s that perfect age? And I know, know, in your blog posts, you think that that perfect age is your age right now, right? Which is our, our own biases we each have.  

Steve    01:08:22    Um,  

Paul    01:08:23    But, but my point is that those reward functions, you know, change through development also, which is just another dimension of that context specificity.  

Steve    01:08:31    Yeah. Um, I hope that we can make agis that don’t experience teenage angst. Uh, I would be, I’m very optimistic that, uh, nobody’s, I think that teenage angst is a, is a thing in the genome that’s rather specific. And I think that nobody’s gonna be going out of their way to figure out how that thing works and then put it into agis, at least Scott, I hope not. Um, I think of within lifetime learning, uh, in humans, as you know, a learning algorithm, and the longer it runs, the more capable the algorithm lines up. Uh, I don’t know. So the reason I brought that up in my blog post was, uh, I wanted to address the question of, uh, timelines, for example. Um, if it turns out that we need to run an AGI algorithm for 35 years to make it, as you know, generally intelligent as a 35 year old human, then that means that we’re not gonna have, uh, AGI at least for 35 years plus however long it takes us to press go on that training. Um, I think that it’s unlikely, very unlikely to actually take that long for, uh, quite a variety of reasons. I think that the, uh, brain speed, uh, is very different from AGI speed. Um, we can get into all the differences,  

Paul    01:10:00    So we’re gonna have to, we’re gonna take a step back, and we were talking about consciousness a few moments ago. Uh, and so Hannah is talking now about pain, right? So if we’re building brain like agi, uh, Hannah’s concerned that, uh, we’re going to, um, begin by building subhuman, um, brain like Agis, uh, with subhuman intelligence, who cannot tell us when they’re in pain. So presumably if things like consciousness and pain come along for the ride and building these things, um, again, this is, um, uh, concern directed toward theis themselves, I suppose.  

Steve    01:10:39    Um, yeah, I I’m not sure I have much to add to that, that that seems true. Um, I tend to spend most of my time thinking about the, uh, sort of later stages when we have, uh, agis that, um, where, where there’s, you know, trillions of agis possibly controlling the world who are, uh, using language and, and so on and so forth. Uh, and maybe humans wouldn’t be in a position to cause them pain, even if they wanted to, uh, <laugh>. Um, but, uh, yeah. Um, it’s funny that, um, and a little bit depressing that, uh, nobody seems to, uh, be thinking too hard about whether reinforcement learning algorithms of today are experiencing pain. Uh, my best guess is that they’re not. But it’s very hard to be highly confident about that kind of thing.  

Paul    01:11:32    Should we talk a little bit about, um, explainability and interpretability and what you call air SATs interpretability? So, you know, presumably it’s important for us to be able to, uh, look into the, um, agis computations and algorithms and make some sense of them, especially when the steering system and the learning system are, you know, interacting. And we’re trying to interpret, uh, what’s happening with, with every, with everything that’s inside, uh, so that we can, in a, from the safety perspective, so that we can alter things if we need to. Right. So, um, what is air SAT’s interpretability? Maybe you should define aats too, cuz I actually knew what it was, but you define it in the blog post, and I know it from Ssat coffee from I think World War II or something.  

Steve    01:12:18    Yeah, that’s, that. I think that’s where, where the term comes from, Uh, ssat means, uh, cheap invitation. Um, so what we really want for interpretability, uh, for quite a number of reasons, um, is that, uh, we understand what the AGI is thinking in all is full glorious detail. Um, I am hopeful that, uh, we will get that. I guess I shouldn’t say that. I would love it if we got that I am pessimistic that we will get that, um, uh, if we could get that, it would solve all kinds of problems. You wouldn’t have to worry about the AGI deceiving you because you could just check what it’s thinking and why it’s emitting the words that it’s emitting. Is it emitting them for a good reason or for a bad reason? If, um, if you had full glorious interpretability and you wanted an AGI that is motivated to be honest, then you know, you could catch it lying a few times, um, with perfect reliability and, um, you know, give it negative rewards and hope that it generalizes from that.  

Steve    01:13:30    Uh, if we’re just going off of ADI behavior, then, uh, there’s no way to, um, send a reward signal that is clearly for the AGI lying as opposed to the AGI getting caught lying. We want the AGI to be motivated not to lie, but it’s not so good if the AGI is merely motivated to not get caught lying. Uh, my little, uh, term s that’s interpretability is to say that if we don’t have full glorious interpretability of this, you know, billions or trillions of parameter model, um, you know, corresponding to the and strum and so on, it’s not like we will know nothing whatsoever about it. Uh, we’ll get little glimpses of what’s going on. So in the context of reinforcement learning, uh, the, the value function is one example of that. If you look at the value function, you get an idea that whether the, uh, AGI thinks that the thought that it’s thinking is a good thought or a bad thought, you don’t know why, but you know that it’s really happy about its current thought or plan.  

Steve    01:14:40    Um, and, uh, we can do a little better than that. Um, and sort of the same way that, um, uh, the, the amygdala can learn that something is gonna learn to goo lead to goosebumps. I think the amygdala, maybe subgenual singular cortex or whatever, some, some part of the brain, uh, is learning that something’s gonna lead to goosebumps or lead to nausea or lead to crying. Um, you can sort of train up models like that too, um, that are sort of parallel to a multidimensional value function, and then you get a little bit more information. Um, I’m not sure that anything like that is sufficient, but it certainly seems like a step in the right direction.  

Paul    01:15:29    All right, Steve, So, um, I’m aware of the time, and I know that we haven’t, uh, gone through and, and we’ve been jumping around and we haven’t gone through in any sufficient detail, uh, your, your, uh, model and how it relates to brains and, um, but, uh, I, you know, recommend that people go check out the blog series post. And I, I think maybe in the last few minutes, maybe we can just go over some of what you see as the, you know, the most important open problems in AGI safety. Um, and I should say, you know, in, in the blog post, you spend some time, um, clarifying that you, and no one has a solution to the alignment problem itself and or safety more in more generally. Um, but you sort of hint at some directions and some of the most important problems, um, these days. So maybe you could talk about one or two of the, uh, problems that you think are most ripe or most important to be working on.  

Steve    01:16:24    Um, yeah. Uh, the, the one for neuroscientists, um, uh, that I’m, uh, particularly excited about is, um, trying to get a better handle on how the genome puts social instincts into us. Um, uh, and, uh, I think this involves, uh, trying to understand circuits and probably the hypothalamus and, um, how the circuits interact with learning algorithms in the brain. Uh, I think that there are learning algorithms involved, but that it’s far from obvious exactly what their ground truth is. I have some sort of speculation in post number 13 of the series, um, but I don’t have an answer. I’m sort of working on it, uh, among other things, but, um, certainly more, more eyes on that problem would be great. So yeah, I’m, I’m happy for, uh, for people to be, you know, reading y punks up in effective neuroscience textbooks and stuff like that.  

Steve    01:17:22    But, um, really just sort of trying to connect those ideas to, uh, the idea that the neocortex is a learning algorithm. Strum is a learning algorithm. Um, and how do, uh, how do those learning algorithms, uh, connect to, um, to social instincts? What’s the loss function? Uh, those kinds of questions I think would be really good because, um, if we know how the genome makes humans sometimes nice to each other, then, um, maybe we can, uh, build off of those ideas when we’re trying to brainstorm how to make agis be nice to humans and to each other. So that would be, uh, that, that would be my number one for, for neuroscientists in particular.  

Paul    01:18:08    That’s good. You wanna add one more? And then I, I have just a few closing questions for you as well.  

Steve    01:18:15    Uh, let’s see. I guess for, um, one thing I’m particularly interested in for ai, people in the audience that I think doesn’t get enough play, that’s why I brought it up, is, um, uh, making, um, really big and really good and really human legible world models that are open source. So, for example, uh, Psych Cyc is an example of, uh, one that was made laboriously over the course of many decades. Um, maybe there’s  

Paul    01:18:47    Doug, Is this Doug Ette? Doug, Yeah. Okay.  

Steve    01:18:51    Um, yeah, that one. Um, so that’s this. So that one’s not open source. Regrettably, um, maybe there are ways to use machine learning to make open source ones, but the important thing is that they’d be human legible. And the reason that we care, well, there’s a number of reasons for that, but maybe the simplest one is that, um, if we have this big opaque world model that was learned from scratch, um, uh, by, you know, fitting sensory data and so on, um, and we’re having trouble making heads or tails of it, it would be helpful if we can try to sort of match up latent variables, uh, in this big opaque world model with, um, concepts in a human legible world model. Um, so I will wanna see, you know, really big and awesome and accurate human legible world models, and that’s definitely seems like at tractable problem for machine learning people to be working on.  

Paul    01:19:50    All right, Steve, there are just so many, uh, issues that are beyond the scope of our current knowledge or our immediate future knowledge, at least that we can tell this. You know, you talked about having had a, you know, you were working on a hobby, um, and, and it was time for you to get a new hobby. This, have you found your calling? Is this, do you consider this a hobby or, and how long do you think this hobby will last? Because it seems like it could last forever, at least if well, or until AGI happens.  

Steve    01:20:22    Um, yeah. Uh, I have not started a new hobby. Uh, this is, uh, I, I spend, uh, I’m, I’m going all out on AGI safety. I think it’s, uh, really important and I expect to stop working on it, uh, when we get to, uh, our post AGI utopia or a certain doom either way,  

Paul    01:20:40    <laugh>. So is there anything that would, uh, convince you that AGI safety is not an issue, or that AGI is not going to be a thing? Do you see it as inevitable or, Uh, yeah. Go ahead. And then, and then we’ll, we’ll close with, Now you have to do the, the prerequisite, um, the required, uh, estimate of when AGI will happen.  

Steve    01:21:04    Ooh, fun. Um, yeah. Uh, let’s see. I mean, I can imagine things, There are certainly empirical facts that are very relevant to, um, my assessment that AGI safety is important. Um, for example, uh, I have a strong belief that it is feasible to, uh, run AGI on, you know, an affordable number of computer chips, let’s say, you know, millions or billions of dollars, but not quadrillions of dollars worth of computer chips. My best guess is, is low millions, uh, based on, uh, estimates of how much, uh, computation, you know, how many operations per second does the human brain do, how many operations over the course of a human lifetime. Uh, it seems very likely to me that, um, once we know, uh, more information about how the human brain works, um, there’s not really gonna be anything stopping people from putting those algorithms on computer chips. Um, in terms of like cost or feasibility, uh, I think it’s that, that we’re currently limited by the algorithms. If somehow I were convinced that it’s just fundamentally impossible to put these algorithms on, you know, enough silicon that, you know, fits in the world supply of sand, then I would no longer be interested in AGI safety.  

Paul    01:22:28    What if, um, what if I could convince you that, um, brain processing was not algorithmic fundamentally?  

Steve    01:22:36    So I’m a physicist. Uh, I think that the universe, uh, I believe in a, you know, clockwork universe that follows, uh, oh, okay. Uh, legible laws and that are in principle, simulated in physics. Uh, so there’s a question of, you know, computational tractability or, or intractability, but, um, I don’t think it’s fundamentally impossible to simulate a brain on a computer job.  

Paul    01:23:02    Okay. Uh, so what’s the  

Steve    01:23:04    Number? How many years from now? Um, yeah, so I don’t know, and neither does anybody else. There’s a, a school of thought in neuroscience, um, that says that, um, the human brain is just so enormously complicated that there’s just no way, just no way at all that will possibly have AGI in a hundred years or 200 years, or 300 years. People will just throw out these numbers. Um, and I really wanna push back against that because I think it’s, uh, uh, delusional, overconfident. Um, I think that technological forecasting is much harder than that. Um, there are a lot of examples in history where, uh, you know, there was a, one of the Wright brothers said that they were 50 years away from flight, and then they did it two years later. Mm-hmm. <affirmative>, I think there was a New York Times op-ed that said that they were a million years away from flight.  

Steve    01:23:58    Um, there’s examples in the opposite direction too. Yeah. Um, yeah, technological forecasting is hard. We’re not gonna have, uh, conclusive evidence one way or the other. Um, but we still need to make decisions under uncertainty, um, you know, weigh the, uh, pros and cons of, uh, under preparing versus over preparing. Um, and I find there’s this funny thing where I say, I don’t have definitive proof that we’re gonna have AGI in the next 25 years. Uh, and what people hear is there is absolutely no way that we’re gonna have AGI in the next 25 years, um, as if the worst possible thing imaginable is preparing for something that doesn’t happen. Um, and yeah, that’s just not how you make good decisions under uncertainty. Um, sometimes I like to bring up the analogy that the, the aliens are coming and the invasion might happen in one year, or it might happen in 50 years.  

Steve    01:24:54    We don’t really know. Depends on the jet stream or whatever <laugh>. Um, uh, but we’re not prepared for them. Uh, and we don’t know how long it’ll take to prepare for them, and we don’t know how much notice we’ll get. Uh, so it’s really obvious to everybody in that scenario that we should start preparing straight away and not, you know, to twiddle our thumbs and do nothing. Um, and I think that’s, that’s the right idea here too. But <laugh>, having said all that, um, I do actually think that, um, if I had to guess that, that, um, brain like AGI is something that could well happen, uh, in 10 or 20 or 30 years and not in the hundreds of years, like the eminent neuroscientists think, uh, and I have a, a couple of reasons for that, uh, which I’m happy to spell out, please.  

Steve    01:25:41    Yeah, please. Uh, so the first one is that, um, well, I guess first and foremost, uh, the question I’m interested in is, uh, how long would it take to understand, um, the brain well enough to make brain like agi? Uh, whereas the question that a lot of other people are asking is, how long would it take to understand the brain completely? Mm-hmm. <affirmative>, uh, and these are different questions. Um, the first reason that they’re different is that understanding a learning algorithm is a lot simpler than understanding a trained model. Um, so for example, in, you know, one machine learning course, uh, you can learn how to make a convolutional neural net. Um, but if you ask the question of how, what, what are all these 25 million parameters doing and why in order to distinguish a tree from a car, that’s just a really, really complicated question.  

Steve    01:26:34    Um, so if you look at the cognitive scientists, they will be doing experiments on adult humans, and they say, How does the adult human do such and such intelligent task? And that is a, um, trained model question, or at least it’s partly a trained model question. It’s not a learning algorithm question. So I claim that we’ll be able to make brain like AGI long before we have the answers to questions like that. So that’s number one. And then the other one is that, uh, models are much simpler than their physical instantiations. So again, you know, the person who takes a machine learning course knows how to program a convolutional neural net, knows basically how it works. But if you ask that person, how is it physically instantiated with everything from, you know, semiconductor fabrication and quantum tunnelling and transistors and cuda compilers and, and all that stuff, uh, that’s just like decades of, of work to understand all of that. So by the same token, uh, people studying the brain, you know, you zoom in on any cell and you just find this fractal-like, unfolding of crazy complexity. Um, and I think that people are going to know how to build brain like AGI long before they unravel all of that complexity.  

Paul    01:27:50    Hmm. That is an optimistic AI take. And I guess I’ll, I’ll, um, close just by, uh, and I should have mentioned this earlier, One of the things I appreciated reading your, uh, blog series and thinking about, you know, people who are working on AGI safety is, uh, that someone who’s concern, So someone like you who’s concerned about AGI safety, uh, sort of implicit in what you do is also build toward agi because you’re, um, trying to build systems that will be inherently safe or that will mitigate, um, you know, dangers from the ai, the ai. So in essence, you know, you’re contributing to the project of building agi and also at the same time thinking about its safety. So it’s just an interesting implicit, um, two things at once wrapped up. So anyway, I just wanted to say I appreciate that aspect of it.  

Steve    01:28:45    Uh, so I, I think I would kind of push back on that. Um, Okay. Uh, I think that the right order to do things is to, um, first gain better understanding of, uh, whether brain like AGI is what we wanna build, and if so, how to build it safely. Uh, and then, uh, assuming the answer is, uh, yes, and we have a good answer, then, uh, you know, proceed all out in building it. Um, I think that we’re still at step one. Um, I, I try to be conscientious about the things that I publish and don’t publish because, uh, I don’t feel like we’re ready for brain like AGI right now. Um, um, I mean, it, you know, my, my individual decisions, I don’t have any secret sauce of human intelligence, uh, you know, sitting in my notebooks. Um, it’s just a drop in the bucket. But, um, I do like to think of myself as, um, you know, not trying to build brain like AGI until, um, uh, until we’re ready for it.  

Paul    01:29:50    All right, Steve, thank you for the thoughtful discussion. Thanks everyone who joined and asked questions. Um, I appreciate it. So I wish you luck and in keeping our world safe and, you know, selfishly, so thank you.  

Steve    01:30:03    Uh, thanks for inviting me, Paul. Thanks to everybody in on Zoom for coming.  

Paul    01:30:23    I alone produce brain inspired. If you value this podcast, consider supporting it through Patreon to access full versions of all the episodes and to join our Discord community. Or if you wanna learn more about the intersection of neuroscience and ai, consider signing up for my online course, Neuro ai, the quest to explain intelligence. Go to brain To learn more, to get in touch with me, You’re hearing music by the new year. Find Thank you. Thank you for your support. See you next time.