BI 173 Justin Wood: Origins of Visual Intelligence

Brain Inspired

00:00 / 01:35:45

Support the show to get full episodes and join the Discord community.

In the intro, I mention the Bernstein conference workshop I’ll participate in, called How can machine learning be used to generate insights and theories in neuroscience?. Follow that link to learn more, and register for the conference here. Hope to see you there in late September in Berlin!

Justin Wood runs the Wood Lab at Indiana University, and his lab’s tagline is “building newborn minds in virtual worlds.” In this episode, we discuss his work comparing the visual cognition of newborn chicks and AI models. He uses a controlled-rearing technique with natural chicks, whereby the chicks are raised from birth in completely controlled visual environments. That way, Justin can present designed visual stimuli to test what kinds of visual abilities chicks have or can immediately learn. Then he can building models and AI agents that are trained on the same data as the newborn chicks. The goal is to use the models to better understand natural visual intelligence, and use what we know about natural visual intelligence to help build systems that better emulate biological organisms. We discuss some of the visual abilities of the chicks and what he’s found using convolutional neural networks. Beyond vision, we discuss his work studying the development of collective behavior, which compares chicks to a model that uses CNNs, reinforcement learning, and an intrinsic curiosity reward function. All of this informs the age-old nature (nativist) vs. nurture (empiricist) debates, which Justin believes should give way to embrace both nature and nurture.

Wood lab.
Related papers:
Justin mentions these papers:
- Untangling invariant object recognition (Dicarlo & Cox 2007)

0:00 – Intro
5:39 – Origins of Justin’s current research
11:17 – Controlled rearing approach
21:52 – Comparing newborns and AI models
24:11 – Nativism vs. empiricism
28:15 – CNNs and early visual cognition
29:35 – Smoothness and slowness
50:05 – Early biological development
53:27 – Naturalistic vs. highly controlled
56:30 – Collective behavior in animals and machines
1:02:34 – Curiosity and critical periods
1:09:05 – Controlled rearing vs. other developmental studies
1:13:25 – Breaking natural rules
1:16:33 – Deep RL collective behavior
1:23:16 – Bottom-up and top-down

Transcript

Justin 00:00:03 But we realized that nobody had actually tested it directly. Whether, you know, a a a machine learning system given the same input as a newborn animal would actually develop the same kinds of capacity. So that was really our opening goal going in. And I think we’re gonna have to start mo playing around with different artificial bodies the same way we’ve been playing around with different artificial brains in order to have a chance of closing this gap between animals and machines. And my hope is this tells maybe AI researchers something important about critical periods, right? So this is almost non-existent in the AI or ML world, but maybe there’s something really important about learning something and then having a critical or sensitive period and shutting it down and having that serve as the foundation for the next stage of learning.

Paul 00:00:52 This is Brandon inspired. I’m Paul. My guest today is Justin Wood. Justin runs the wood lab at Indiana University, and his lab’s tagline is building newborn minds in virtual worlds. So what does that mean? Basically what it means is Justin is comparing the very, very early development of natural organisms to artificial agents. So he does this by using controlled rearing experiments in chicks where he controls all of their experiences in, in this case, visual experiences. Because a lot of what Justin is interested in is understanding the origins of our visual intelligence and cognitive abilities from birth and, and through development. He controls their visual experiences by placing them as soon as they’re born into these rearing areas, uh, in which all of the visual stimuli that’s presented to the newborn chicks are controlled by the researchers. So everything they’re seeing is controlled <laugh> by the lab.

Paul 00:01:56 And in tandem, uh, Justin’s lab builds models and AI agents that are trained on the same visual data as the newborn chicks. Um, as you might imagine, this enables Justin to probe the nature versus nurture divide or, uh, otherwise known as nativism versus empiricism. So we discussed that debate, uh, which Justin thinks of more as a, a both slash and, uh, kind of issue rather than an either or, uh, issue. And using this setup, Justin has studied, uh, a number of the visual capabilities of newborn chicks. Things like view and variant recognition, which is more or less the ability to recognize an object regardless of what orientation, what, what perspective you’re taking on that object. And for each of these visual capabilities, Justin can basically test, uh, build an AI model and test, uh, whether given that same training data, the model is able to perform the tasks, uh, in the same way or in different ways, or as well as, or not as well as the newborn chicks.

Paul 00:03:05 And a lot of the early experiments that, um, we talk about more in the episode, he was using convolutional neural networks, which is the classical object recognition network. And we talk about what he found using those we go on to talk about. Uh, so he, he’s taken this approach and expanded it and is beginning to study other cognitive abilities. And in the future, he’ll go on to Stu study even more cognitive abilities. But one of the lines of research that we talk about in this episode, uh, he studied collective behavior in artificial agents and in newborn chicks, um, using convolutional neural networks and deep reinforcement learning, and an intrinsic curiosity reward function. Okay? So the general theme here is what newborn chicks can and can’t do, whether and how artificial systems can and can’t do those things. And what that tells us both about the natural chicks and how to build systems that better emulate biological organisms.

Paul 00:04:00 So it’s a fun conversation. We hash all of this out and more during the discussion, uh, and Justin’s enthusiasm, uh, you’ll see really comes through. Of course, I link to the papers that we discuss in the show notes and to Justin’s lab. Uh, the show notes are at brain inspired.co/podcast/ 1 7 3, where you can also learn how to support this show through Patreon. If you value, uh, what I do here, uh, you can send me enough to buy a cup of coffee every month, or you can send me a little bit more and get the full episodes and or even join the Discord community, uh, which is filled with like-minded listeners like you. One last announcement here, I’ll be in Berlin, Germany in late September, um, as part of a workshop that’s associated with the Bernstein or Bernstein, um, conference in Berlin, Germany. That’s actually September 26th and 27th.

Paul 00:04:50 The title of the workshop is How Can Machine Learning be used to generate Insights and Theories in Neuroscience? And actually Justin would be a great person to participate in that workshop. Uh, he’s not gonna be there, but there’ll be a lot of, uh, great researchers that are there who are gonna give talks. And then at the end of this two day workshop, I’ll be moderating a panel with a handful of, uh, those researchers and discussing how can machine learning be used to generate insights and theories and neuroscience. So if you wanna learn more about that, um, I’ll link to that in these show notes. Again, that’s brand inspired.co/podcast/ 1 7 3. You should register for that workshop. Um, it’s gonna be a lot of fun. It’s gonna be very interactive, and you should come say hi to me if you do. That’d be fun. Okay, thanks for listening and or watching. Here’s Justin.

Paul 00:05:39 Justin, you, uh, are studying bird brains these day, bird and artificial bird brains these days. Um, and we’re gonna talk all about your controlled rearing studies and what you’ve found out, um, with regard to natural and artificial, uh, intelligences. Um, but I, I kind of want to go back to the beginning and ask how you came to, so I will have introduced a little bit the ideas of, you know, what you study in the introduction, but what led you to this point of, um, raising these chicks in highly restricted virtual environments to compare them with artificial networks?

Justin 00:06:15 Yeah, absolutely. So it’s, it’s, it’s, it’s been a winding path. So I actually started in graduate school working with young human babies. So I, I, I always loved the origins questions. I always wanted to know what are the core drivers of cognition and perception and action, right? What are the learning algorithms in newborn brains that make it possible to learn object perception and navigation and make decisions and do all of the amazing, wonderful things that humans and animals do? And so I’ve kind of moved from population to population in an attempt to try to get at those questions. So I started with five and six month old babies, uh, when I was in graduate school. And I, you know, I I, that of course got me introduced to this idea of looking at what might be present in a young human brain. Um, I started to realize that, and this will come up later when we talk about chicks, I believe.

Justin 00:07:05 Um, but I started to realize that even though we were using young human infants, that’s not the way to get at the nativism and empiricism question because of course, anything you find in a five month old or a one month old or even a one week old could in principle be learned, right? Right. So it becomes incredibly difficult to distinguish between nativism and empiricism perspectives, right? Do we, do we start out with something like a domain, general transformer, for example, when we’re trying to model the brain? Or do we really need to build in a collection of domain specific core knowledge systems, for example? And that, that’s how I was trained in graduate school. Um, and so that’s basically the, I I see that as, you know, one of the major debates in the field. It of course goes back thousands of years to Aristotle and Plato, but we haven’t really been able to get at it directly, right?

Justin 00:07:53 And so I’ll, I’ll go into why chicks is what I believe the only model system we can make a run at those questions with. Um, um, so, so anyway, yeah. So I, I, I, I went to babies. I then went to wild Monkeys for several years. So this was partly to escape cold Boston winters, but it was an absolutely wonderful experience to go and work with wild monkeys. And I basically just compared what are the perceptual and action perception abilities of monkeys, and how do they compare with those of young babies. Um, and so I spent several years doing that, kind of trying to find commonalities between the capacities of non-human animals and animals. And given the monkeys lack culture and language and all these uniquely human trappings, um, it was an interesting way to look for dissociations between particular cognitive capacities and culture language and so on. So I did that for several years. And then when I started up my own lab, i, I, I dove straight into newborn chicks. Um, so during grad, oh yeah, I’ll, I’ll I’ll pause for a second. Yeah.

Paul 00:08:49 Well, yeah. But, so did you just have an epiphany that you needed to study the chicks, or was this always eating at you, this, this nativ nativism and empiricism debate, and how may maybe the wild monkeys that you were studying and the human infants just, you couldn’t really get to what you wanted to study. Is that, did you have an epiphany over a coffee one day? Or had this been the idea for a long time? The chicks? Yeah, I

Justin 00:09:12 Was, yeah, great question. It actually started, I remember exactly when it started. It was in my, with my first meeting with my, one of my graduate advisors, Elizabeth Pelkey. And I remember sitting across from her and mentioning that these were the questions I absolutely cared about. And we just started talking about chick research. And she then inundated researchers in the lab and grad students in the lab with this is, you know, perhaps the one model system we can use. ’cause we can control everything they do from the very moment of hatching, um, control their experience, and then see what are the causal roles of experience in terms of shaping perception, uh, cognition and action. So I had wanted to do it from the very beginning of graduate school. Um, and then when I finally had the freedom with my own lab to do so, I basically just started figuring out how do we study newborn chicks?

Justin 00:09:56 Precisely. That was really the kind of the big gap in the field is that there had been studies with newborn chicks for, you know, really for, for decades, but the data was, had a low signal to noise ratio. Um, and you know, this is kind of one thing coming up in the reverse engineering enterprise is you really need high quality data to ask, you know, does my machine that I just grew in a virtual environment, does that actually map onto one of the target systems that I care about, namely an individual chick? And if your individual chicks, or across the board in terms of the measurement space, then you might have a model that performs at chance levels and it’ll still match up to one of those chicks, right? So you can’t have a huge amount of variation in your target population, um, if you really wanna try to do this direct mapping between how does an animal learn and how does a machine learn?

Justin 00:10:43 So the first five years of my career were really spent just trying to figure out how to automate the whole process. So <laugh>, you know, built making it so we could record twenty four seven and we could give newborn animals thousands of trials that lasted, you know, 20 minutes at a time or 40 minutes at a time, and really collect these detailed measurements of how newborn animals are learning as a function of particular experiences. And then once we did that part, we then had the data that we needed in order to start reverse engineering. Um, and so that started to happen around 2015, 2016, when we really jumped into the AI world.

Paul 00:11:16 Okay. That was gonna be my next question also, but maybe even before that. Um, so I, I’d love for you to just de describe the, the process on the natural side, you know, from when you, from the chicken, I don’t know when it’s, if it’s from hatching and like, just what happens from that point, you know, to them living in these, you know, incubators, in these systems of these virtual worlds. But before that, is it, do we like chicks because, um, as they’re developing in the egg, they’re, so if you, you know, a human in the womb can be said to be learning in the womb because it hears mom’s voice, uh, singing songs and, uh, you know, know yelling at dad and stuff. Yeah. And but chicks, is it just that, you know, they have less, less opportunity to quote unquote learn in the, in the egg relative to mammals in the womb, et cetera?

Justin 00:12:06 That’s, that’s, I would say that’s part of it, right? We have more access to their prenatal development. So if it does turn out, and, and this is one of our working hypotheses in the lab, if it does turn out that prenatal training data really matters for building a brain so that it then reveals, so to speak, innate knowledge at the birth state or the hatching state, then we really need to be able to study that prenatal developmental process. Um, and ideally, of course, put AI through the same kind of training data, right? So this is one thing we’re doing in the lab, is we’re training artificial neural networks with retinal waves and other kinds of prenatal experiences that we think occur prenatally. Oh, and then we can ask, you know, if you have been trained on retinal waves and muscle twitches and all the things happening to an animal prenatally, do you end up with something that seems a little bit closer to how biological intelligence works in the newborn state than what we currently have with machines?

Justin 00:12:57 And, and, and I really think we’re gonna need to dig into that prenatal state in order, in order to be able to tackle it. But that’s not actually the reason we switched to chicks. The main reason we switched to chicks is they’re the only animal in the world, at least to my knowledge, that you can raise in virtual reality. Um, oh, and, and you might ask, of course, you know what, fully 24 7 in virtual reality, right? Mm-hmm. So you can, you can, we, we hatch chicks in darkness. We then, uh, put on night vision goggles when they’re hatching, and then we move the animals in complete darkness from their incubator over to their controlled rearing chambers, these virtual reality chambers. And then we flip on the chambers, and they’re thereby giving us full control over everything they see, right? So every object they see. So,

Paul 00:13:39 Sup super. Sorry, super naive question though. Does, I don’t even know with chicks. Do, do moms lay on the eggs for a while? Do the hints lay on the, like how, you know, when do you start handling it? And I’m, this doesn’t matter. Probably. Yeah,

Justin 00:13:51 Yeah. No, it, it might matter. We don’t know what matters. <laugh>, you know, that’s one big, like, that’s a, that’s kind of one of the main motto of the lab is we don’t know what matters, right? In terms of the models, that’s in terms of the environment, in terms of the experience. So it’s all open. Um, we just need to be empirical about it. Um, but yeah, in the, in the wild, uh, a mother hen will roll the egg once in a while, so they’ll turn the egg so that it ends up developing correctly. Uh, we use an automated incubator to do that. So it basically just rotates the eggs every 45 minutes. And so it basically replicates the kind of rearing conditions that roughly that animals are getting in the wild. Um, and then, uh, of course, once the virtual reality part starts, then it’s a very different kind of experience than what they get in the wild. Yeah.

Paul 00:14:28 Yeah. Okay. Alright. So you bring them over in darkness, uh, with the night goggles on, you place them in the virtual environment. And then, and then what happens then, then what’s the setup? Yeah.

Justin 00:14:38 And then, so we just, we basically just let them do what they will do. So we program the virtual world, so we rely on imprinting, at least in our early experiments. So imprinting, I imagine many of your, you know, much of your audience might be aware of this, but imprinting is basically just an early social preference that you develop, uh, in life. So it’s feel imprinting is the most well known of this, just because it was studied. You know, notice back in the 19, you know, or in the 15 hundreds, I think was the first observation of imprinting. People would notice that baby birds would very quickly develop a preference for, uh, for a caregiver. And of course, Conrad Lorenz really popularized this as we see those wonderful photos of, of, of little geese following after Conrad as he checks down a river <laugh>. Um, and so, you know, this has been a, a well-known phenomenon for a while.

Justin 00:15:22 And so, uh, it, it ends up being quite convenient for us. ’cause we can essentially just put a virtual object in the chamber. So our chambers are essentially just white walls, except for two of the walls right now. Were l c d monitors. So we can import whatever kind of virtual stimuli we want into those virtual chambers. You know, maybe you wanna, uh, raise an animal with a single object, for example. And, and maybe that single object is only seen from, you know, a single viewpoint range. Um, so we can essentially vary the experiences that the animal gets, and they’ll imprint to that, whatever that might happen to be. And then we can change around the characteristics of that object and see if they think that it’s still mob, right? So I, I, you know, I, I, I know that this is mostly audits auditor, but, you know, so I, I always like to look for props, but, you know, I imagine like holding a coffee cup and you only get to see that coffee cup from one perspective, you know?

Justin 00:16:13 And, and the chick imprints to that coffee cup from one perspective. What if you then show ’em the coffee cup from a different perspective that produces a different image on their eyeball? Will those, will they still recognize that object? So this is known as view and variant object recognition. Um, you can also do this with things like background and variant recognition, right? You can put an object on a background, and you can see if the chicks can recognize that object across a, a, a different background. So we essentially went through a, you know, dozens, hundreds of experiments asking how rapidly and quickly does high level vision emerge in newborn chick. So what do they need? What kinds of experiences do they need to develop view and variant recognition and background and variant recognition, and the ability to bind colors and shapes into integrated representations and memory, and, you know, things like object permanence and action recognition.

Justin 00:17:00 Those are kind of the main capacities we’ve been looking at so far. Um, and, and, and, and, and I think, you know, what’s really surprising me at least, is how fast, how little chicks need to learn these high level abilities. Mm-hmm. So even a single snapshot of an object, right? Just a single object just being flashed over and over again from one viewpoint is, is sufficient for a newborn animal to be able to recognize that object across a wide variety of other, of other viewpoints. Um, which, which, which at the beginning almost seemed like magic to me, right? Like, how on earth would an animal be able to recognize a three-dimensional object having seen no other objects in their lifetime, uh, from other viewpoints? How might that capacity emerge? Um, and so that of course, is where the AI comes in, is ’cause it can actually give you an answer to that, that question. If it turns out that the AI system’s doing the same thing that, that the chick is doing it, it takes us beyond nativism magic, so to speak, right? This has always been a problem, is you can say something’s innate, but the explanation was always a little bit like magic, right? Like, well, it exists, right? We have no idea how or no idea why, but that ability exists in the newborn state. So we’re really trying to figure out mechanistically why that’s the case.

Paul 00:18:11 So you said you were surprised, but what would, what, what did you think would’ve happened? Yeah. I mean, you, ’cause you had already studied the, the innate or, you know, uh, the cognitive skills of newborns for a long time. Yeah. What, um, did it really surprise you? Or was that your hypothesis going in?

Justin 00:18:26 I, it actually really surprised me and I, I, I think I had been influenced by machine learning experiments, right? So these were these experiments where you, you know, dump in tons and tons of data, of course, into CNNs or Transformers, and you find that typically they, they needed, or this, this was at least the wisdom back then is you needed a huge amount of data, right? A huge visual diet in order to start breaking into this problem of recognizing objects from different viewpoints. Given that that was the one learning system that existed that was actually replicating this, this capacity of un variant and background and variant recognition. That was my starting point as a hypothesis, is maybe chicks are something like these, these machine learning systems, maybe in the first couple days of life, you know, you still could see in principle hundreds of objects wandering around the natural world as you see mom, you know, mother hint, and you see mother hint moving across backgrounds and you see her from a whole bunch of different viewpoints. And then you see your grazing field in the bar, and Right. You within one day, you could acquire a massive amount of experience about the world. Yeah. And so I thought that maybe that massive amount of experience might, might be necessary or, or, or, or, or might be required. And as it turns out, it’s, it’s, it’s not, right. It’s, it just requires this little bit of data. Um, and from that little bit of data, you end up getting these high level abilities emerging.

Paul 00:19:40 See, I, I kind of had it in my mind because you had done a, a bunch of these studies in the early 2000 and tens, right? Mm-hmm. <affirmative>, um, and, and I, I suppose a little bit before that as well, maybe, um, I had it in mind that you had the idea of how the newborns would acquire these skills, and then, and then kind of the, with the rise of convolutional neural networks and these artificial intelligent networks, then you thought, oh, well, now I have something to compare it to, but maybe I have it backwards. Did, so did you have the models in mind and when you were developing the experiments? Yeah,

Justin 00:20:13 They really, they really kind of happened side by side with one another. Okay. I mean, so, you know, back in when we started this work with chicks, it was 2010, which is of course before Alex, Annette. Um, and so, yeah, yeah, the, the computer vision systems on sort of on the market weren’t quite, you know, weren’t nearly as sophisticated as they were. But I still remember some early, uh, some early James DiCarlo papers, especially one that he wrote with, with David Cox. Um, it was a, a trends in cognitive science paper called Untangling Object Recognition. And I remember reading that paper like 10 times, just trying to transform my brain from more of a nativist way of thinking, <laugh>, that just something is there, but no understanding of the mechanisms. And I just remember reading that paper over and over and just trying to understand what it means to think about the brain mechanistically as a series of transformations across a series of layers.

Justin 00:20:59 And that paper just completely transformed the direction or the arc of my research. Um, and then I got very interested in ai. I was of course quite excited when AlexNet came, came out, and then just kind of waiting with bated breath until an unsupervised model would come out, because of course, chicks don’t learn through supervised learning. So we needed to wait till things like sim clear and then these other unsupervised models came out till we could really make that detailed comparison. Um, but yeah, a lot of my career has just been waiting around for AI to, to come up with an algorithm that would somehow learn, like, you know, in an unsupervised manner. And, you know, that happened about 2020 with Sinclair, and now Transformers I think are the next stage of that, right? There are these kind of full blown embodied models that can handle high dimensional action spaces and, and have very few inductive biases, and nevertheless can learn these amazing capacities. So, um, so yeah, it’s quite fun right now, especially with AI providing the models that we then plug into the artificial animals and see, you know, does the transformer, does the C N N actually end up learning, like the actual animal itself?

Paul 00:21:55 Yeah. Oh, okay. Well, so the, the first AI models comparisons that you made as, as far as I know, we’re with convolutional neural networks, and you tested different types out, but then you’ve gone on to look at, and we’ll talk about all this, to look at collective behavior. And, and in that you implemented, um, the same kind of convolutional neural network, but also to, as a model, but also a reinforcement learning agent with, with curiosity mm-hmm. <affirmative> intrinsic motivation. Um, and so are are, are you tr transitioning to transformers now? We, yeah. It’s like the latest and greatest. You just plug it in and

Justin 00:22:26 <laugh>. Absolutely. Yeah. The transformers I think are, are fascinating, um, especially from, you know, well, for many reasons, but especially from a developmental psychology point of view, because they have such minimal inductive biases. And, and this has always been the debate, right? Is, is do you need strong inductive biases to make learning possible? And nativists have kind of come into the game with this, I think, strong assumption that training data is right, like sparse, noisy, and impoverished. You’ll see this all over the place, right? All going all the way back to William James is this basic assumption that the training data, the proximal images we receive, just aren’t good enough for learning. Hmm. Um, but I think AI is giving us a completely different story of this, right? So for example, you know, transformers, if you do things like masking procedures, you can mask out 90% of the, of the content.

Justin 00:23:13 And yet that’s when these models learn the best. And so what that tells us is that these models act that, that the training date is highly redundant, it’s highly structured. And so if you have a powerful enough domain, general learning mechanism, at least in principle, it’s possible to be able to leverage that structure and redundancy in the data to be able to learn a good visual model of the world. You know, maybe you like what objects are and how to, you know, what the world is like, what space is like, so you can navigate in all these, of course, critical capacities. We need to learn. So, you know, I, transformers, I, you know, kind of said differently, or I think a good scientific model because they’re so relaxed in terms of their inductive biases, and they let us test this very strong extreme hypothesis, which is that maybe most or all of intelligence emerges from a domain general system, right? This would’ve been an empiricist. This is what appears has been arguing for a while. Um, and it lets us test that very, very strong hypothesis. Um, oh, you know, but to be fair to nativist, you know, I, I wanna say I’m agnostic about nativism versus empiricism. Oh,

Paul 00:24:11 I was about to ask, be my next question, but how, but how did you used to be more nativist, and now you’re in the where, like how did that evolve?

Justin 00:24:18 I’ve switched back and forth like five times in my career. So, and, and, and I’ve realized the reason why is because nativism, uh, nativists and empiricists often look at different evidence in order to support their theories. So, um, I kind of got to be trained by Nativists at one point during graduate school, and now I’m at Indiana Spell, yeah. By Pelky and Carrie. And now I’m at Indiana University by, with Linda Smith and a whole bunch of wonderful people here who are kind of more on the empiricist side of things. And it’s just seems really clear. They’re just picking at, you know, they just pick different kinds of phenomenon to support their claims. And this is fine, of course, but I actually think both, both, both ideas are gonna be right. And, and so let me just kind of specify that, just so I don’t, nobody comes away thinking I’m an empiricist or a nativist.

Justin 00:24:59 I really think both ideas are right. Right. So you could have a single domain, general mechanism that’s, and that might be something like the starting point at prenatal development, but you might imagine that prenatal training data is so rich with these retinal waves that are very object like, and have these important second order correlations, just like natural data. You might imagine that at the birth state, you really do have something like core knowledge, right? I mean, and, and, and then that would mean the nativists on some level, were right to suggest that at the birth state or the hatching state, you would see something like object knowledge, like object permanence, or if you, in variant recognition, the capacity to navigate and take novel paths between places. So I, I really think the idea of core knowledge phenomenon might be, right. Uh, if we take training data seriously during prenatal development, so right.

Justin 00:25:48 In a way, we’ve got the nativist being right. ’cause maybe core knowledge does exist. It is present, it is a phenomenon in the first maybe days or weeks of life. Um, but that might all emerge from a domain, general mechanism which would make the, the, the empiricist, right? Right. So this is, you know, this is kind of my hope of getting past this debate is we wanna build a test bed where people can just test it one way or the other. So this is what we’re doing with all of our chick work, is essentially trying to build a public test bed where researchers around the world can just plug their brains into our artificial chicks, whether they’re a nativist, whether it’s a nativist theory or an empiricist theory, and just run the model across the range of experiments that we’ve done and see if it’s a good match.

Justin 00:26:26 Right? And then we can just test a bunch of nativist theories. We can test a bunch of empiricist theories and we can see, just see which one ends up being the best match for what chicks are actually doing. Um, so I, I love this idea in computer science of making public benchmarks so everybody can plug in their models. So, you know, a a grad student has exactly the same chance of coming up with the right model as a full professor. Right? That’s how it should be, I think, in a <laugh>, in a, in a, in a good science. So,

Paul 00:26:50 Right, right. Yeah. But so, so the, um, we haven’t really talked about how your experimental setup is like fully automated. Mm-hmm. And why that’s an important thing. And I know that you worked really hard on the technical side to get all this right. And now you have how many rigs? 30,000? How many you have like what, 20 something? Like 20? Or,

Justin 00:27:11 We actually had, we had, so we had 60 rigs back at U S C six. Yeah. So, so this was, we could run about eight experiments at a time, you know, each with multiple subjects in it. Um, and, and we were able to collect a massive amount of data very quickly. Um, and, and critically, it wasn’t just a lot of data, it was also very precise data. So we can make precise predictions for each individual subject rather than relying on inferential statistics. Um, and, and we can kind maybe come back to this. I think this is, there’s, right, right now, I think there’s a lot of interest in bringing psychology into the AI world. But, you know, psychologists have largely relied on inferential uncertainty, whereas, you know, researchers in AI care about productivity, how well you can predict things and, and those are completely different from one another.

Justin 00:27:52 And you can show that you might have the smallest p value you could ever want to get something published, but it doesn’t produce the kind of benchmarks you need for machine learning, right? That the data might be too variable to actually serve as a good benchmark. So that was really like, again, the first step of automation is how can we get a good benchmark so that, you know, the cloud of chick, you know, the cloud of dots representing the individual, individual chicks is actually separate across conditions. So it’s really easy to see if your machine is, is more like one or more like the other.

Paul 00:28:20 Okay. Should we talk a little bit about the early convolutional neural network experiments? And this is kind of serves as background for the, for the rest of your experiments. And it sounds also like you’ll have experiments for the next a hundred or so years lined up. That’s because there’s so much that you can do <laugh>.

Justin 00:28:35 That’s the hope. That’s the hope. And, and the hope is, you know, we’re building these new VR chambers too. So this might come at the end, but we’re building these new VR chambers that are full on vr. So we have a a a a a a camera that basically monitors the chick as they’re moving, and that they, they’re surrounded by all four screens, almost like the, the cave, um, that the human cave VR system Yeah. Where you move around and the walls update. Um, so we built this for chicks. Wonderful graduate student in my lab named Joshua McGraw is working on this. And it essentially allows us to raise chicks in any kind of world you can imagine, right? So a world where object permanence is true or false, or a world where objects don’t obey their boundedness or their shape as they move through time and space. So you can really give animals these kinds of strange, unnatural experiences. And really, k kind of make a run at these questions. You know, is it built-in knowledge of object permanence, for example? Or is it learned based on seeing objects disappear and, and reappear from view?

Paul 00:29:26 Hmm. Yeah. Okay. Well, um, yeah, let’s start with those convolutional neural network experiments. And maybe, I don’t know if you want to end up, I’ll, I’ll, uh, I won’t. Well, before that actually, uh, you had shown that there are two properties that are really important for chicks to learn things like, um, object, uh, in variance. And that is, um, that the objects that you’re showing them need to vary smoothly and slowly, which is quite interesting. And, and I think we’ll keep coming back to this nativist versus empiricist, um, um, sheen because you know all these, because all, you know, I have a bunch of questions like, well, what does that mean, uh, early on? And, uh, what, you know, so anyway, but if, if the objects are moving fast, um, and if they’re like, kind of jutting around so that you don’t get a, a slow, a smooth transition, then the chicks don’t learn the objects, um, as you’re showing them.

Justin 00:30:21 That’s exactly right. Yeah. So, you know, you know, 10 minutes ago I mentioned all of the amazing capacities of chicks, right? So from a very little bit of training data, they can do a whole bunch, right? They can recognize objects across novel views and backgrounds and so on. But as it turns out, the way that they do that is through heavy constraints on learning. And these two heavy constraints appear to be slowness and smoothness. So if an object is not moving, you know, slowly over time, then essentially what the chick builds is this kind of, it’s, it’s like an abnormal object representation in which action information is part of the, part of the shape representation. So they really care about how that object moves. They don’t particularly care about what that object is. And, and what I really loved about these experiments is you can basically causally manipulate the first representation an animal makes.

Justin 00:31:10 So if you make the object move slowly, they build these beautiful view and variant representations. If you speed up the object a little bit, all of a sudden you start pushing it into an action representation. And so, you know, this, this, to my knowledge, was kind of the first way where we had like strict causal control of manipulate this experience, and it will directly impact the nature of the visual perception that emerges in the animal. So this is really exciting because it, you know, these are the kinds of constraints I think we need to really understand what’s under the hood, right? So we should expect whatever the learning algorithm is to show these same constraints. Um, a a machine should be subject to the slowness and smooth this constraints the same way that a newborn chick should if we’ve really figured it out, right? And if we haven’t figured it out, which we haven’t yet, then we know that we have the wrong models, right? We know that there’s some other model out there that leverages time to a greater extent than, uh, than, than than current ML systems do. And, and perhaps those should be the next systems we test. Yeah. So

Paul 00:32:06 You want those constraints in a system that you’re building in order to learn about how we do it or how brains do it. But if you’re building a, so thinking about those constraints, um, it is potentially limiting, and maybe that’s the price we pay for whatever, quote unquote general intelligence that we have. It’s like we have these very narrow, uh, constraints, right? Like we only see a very narrow band of UV or of, um, ultra, um, electromagnetic radiation and, uh, of light, in other words. So how does that make you think about our own intelligence? I know this is kind of a big question before we, uh, get into some of the details, but do you think that we’re, does it make you think we’re narrow, more narrowly intelligent, or is that that’s something necessary for our wonderful general intelligence, you know, this constraint of smoothness and slowness and we can’t keep track of things that are moving really fast? That sort of thing?

Justin 00:33:00 I I, it’s a, it’s a, it’s a, it’s a great question. I I, I mean the, the, the natural world is slow and smooth, so right. For every animal not born in our lab, they’re good to go, right? They’ll have all the right kind of training data they need, they’ll get, you know, you know, if you look around the room, most things aren’t moving at all, right? So that’s what I mean by slowness. And when something does move, it tends to attract your attention. You fixate on it, and then it’s slow in terms of the proximal images hitting your eyeball. ’cause you’re actually following that object as it moves. So, you know, I think the natural world essentially provides all, you know, mat meets those constraints we need for high level learning. Um, so, you know, I I I don’t really think about the brain as being as cons highly constrained because of those unnatural constraints, so to speak.

Justin 00:33:48 Mm-hmm. Um, why we like them, those constraints is because they give us ways of distinguishing between candidate models. Um, right. So the fact that it’s learned, the fact that experience seems to play a causal role in developing view and variant shape representations, in other words, suggests that that’s not hardcoded into the brain, right? Chicks probably don’t come into the world with some sort of notion about what three-dimensional objects look like. Rather our speculation is, you know, chicks come into the world as something like a domain general learner, and the world is filled with objects. And as long as those objects move in the right way, which is slow and smooth, perhaps it needs to be that way because of, you know, temporal windows and spike timing dependent plasticity. We don’t know at this point. Um, but that’s our kind of working hypothesis. Um, as long as you have those basic requirements, um, then you can see learning happening, uh, pretty, pretty rapidly. Um,

Paul 00:34:39 I, I don’t know anything about chick secs rapid, rapid eye movements like, uh, ballistic eye movements because the slowest movements is not, uh, doesn’t jive well with the way that humans and other, you know, primates move their eyes. And like, every time you move your eyes, you have a kind of a completely different scene, which is not smooth, but the object itself might be moving smoothly. But your, your perception from your eyes, from your vision, um, is not, and, and so we have to like compensate for that. I don’t know know if you’ve, I just, uh, thought Oh yeah, I’m, I have a background in ods, so this <laugh>, I have to think about this.

Justin 00:35:14 Oh yeah, no, it’s a great, it’s a great point. And, um, uh, actually, uh, James DiCarlo has done some wonderful work on this. So he, he did work with, uh, I think this is 2008, uh, the first author was Lee. Um, but basically he, they put monkeys in, in a natural world where when the monkeys Ted, you would change something from like a tiger to a teacup. And what they found mm-hmm. Is that in it, you start seeing those tiger and teacup representations coming closer and closer together. Oh. Um, as long as they’re part of the same kind of temporal, they have that temporal continuity. Oh, okay. So I think this, you know, still works even with animals that make extensive eye movements. Uh, chickens, they have a little bit of eye movements, but mostly they just move their head instead of their eyes, uh, in order to refresh their rettino image.

Justin 00:35:52 Um, but I think the same general principles, right? That’s ultimately where we’re chasing is these general principles of learning. I think those seem to be true from what we can tell across, uh, across, uh, chicks and across monkeys. Um, there’s also, um, some really nice work by Linda Smith where she looked at young children. You put a, you know, a camera on a child’s head, and you ask, you know, how does that, what kind of data does that child acquire? And one of my, one of my favorite, uh, one of my favorite, uh, findings that she has is that babies or toddlers make object data really slow and smooth. So they’ll pick up an object and they’ll hold it completely still, and they’ll steal their head too, essentially making it so the retinal image going into their brain or are very slow and smooth. And when, when they rotate it, they’ll rotate it again very consistently and very slowly, especially focusing on plane reviews. So even if we just look at the natural behavior of young children as they’re learning, they’re actually spontaneously making data slow and smooth, suggesting that maybe this really is a deep principle about visual learning and maybe learning more generally is that you need slowness and smoothness, you know, maybe allowing the temporal machinery of our brain to link up these, these time slices in the world into coherence representations.

Paul 00:37:00 Hmm. Okay. So you, um, so you raise chicks in this virtual world and you’re showing them one object. You’re imprinting them with one object. Uh, and, and then you test whether they’re imprinted by showing that same object on one side and a different object on the other side. And how much time they spend among these different objects is an indicator of whether the imprinting worked and when, whether they have this type of whatever you’re testing, uh, uh, object, uh, view and variance, et cetera. Um, and then you train a, a convolutional neural network, but you take different types of convolutional neural networks that have either been super massively trained on ImageNet, but thousands and thousands of images to learn how to rec, uh, categorize those objects and a few different kinds of, uh, convolutional neural networks. Could you just kind of talk us through that and then, uh, tell, tell us what you found. And these are, these are kind of the early AI experiments, right? Yeah,

Justin 00:37:54 Absolutely. And we actually have a paper that I think the paper I sent you is a archive 2021 paper. So we have a new paper working on that’s basically fleshing all of this out, looking at different architecture sizes. Um, but basically, kind of long story short is that the goal of it was to ask can we build an image computable model of a newborn visual system? Um, and if we can, we can then test, is it the case that machine learning systems really are more data hungry than newborn brains, right? You, you see this claim everywhere. So it used to be everywhere with CNNs is that, well, CNNs might be good models of the brain, but we know they’re trained on 14 million images from image nets. Yeah. That’s radically different than the visual diet of a young child. Therefore, these aren’t the same learning systems.

Justin 00:38:35 But we realized that nobody had actually tested it directly, whether, you know, a a a machine learning system given the same input as a newborn animal would actually develop the same kinds of capacity. So that was really our opening goal going in. So what we did is we essentially, you, you can’t put a camera on a chick’s head. They’re too heavy. We actually tried this and it didn’t work. Um, but what you can do is you can simulate the visual experiences of a chick in a video game engine. So you can essentially just yolk the camera in a video game engine to the head of the chick. Um, and then you can just move the agent, uh, you can, you can just basically move the camera with the chick’s movement and then collect all the same visual images that the chick gets. And so, you know, we’ve done kind of versions of this with kind of more simple base agents.

Justin 00:39:17 And, and what we find is that that’s enough. Like the, the kinds of massive data augmentation that you get when you get to look around the world and move around and select your own viewpoints, that ends up being kind of the, the key missing piece necessary to build view and variant recognition. So kind of said differently, you know, even if you have a single object, I can move farther away from that object. I can get really close to that object. I can look at it from the sides, right? I can, I can, I can, I can augment, I can do natural data augmentation the same way that engineers do, you know, artificial data augmentation, like color jitters and swapping, but you, you know, and an animal can actually augment their, their data however they want. Um, and so if you give CNNs that same kind of augmented data from the environments of the chicks, they end up solving these view and variant problems.

Justin 00:40:04 Um, so, so that was really the goal is, is, is to try to match the training data as closely as possible between the animals and the machines. Because if you don’t match the training data, you have no idea what the problem is if there’s a gap, right? So it might be like, let’s say a a a, you know, a machine and a and a human are doing something different from one another. Well, one possibility is that maybe it’s the learning algorithm that’s different. But another possibility is maybe it’s the training data, right? You have no way of knowing if it’s the learning algorithm or the training data that’s leading to differences. So if you clamp down on training data, you can now make an inference that if two things are different, it’s likely, you know, either the body of the system or the learning algorithm of the system.

Justin 00:40:43 And of course, those should probably be treated as the same thing, right? A body’s in a, a brain’s in a body. Um, and that’s a single learning system. So, so that was really our goal there. And then we mo we, we recently did, uh, grad student, my lab elite Pandy recently did this with transformers where we thought, you know, maybe CNNs with their kind of beautiful hierarchical structure would be able to learn from sparse data, but no way with transformers, right? They just don’t have the inductive biases to be able to do this. And as it turns out, you know, they perform a little worse than chicks, maybe five to 6% worse or than CNN’s maybe five to 6% worse than CNN’s, but they still solve the task. So, so you can start with the most minimal inductive bias we have in ML in machine learning.

Justin 00:41:23 And you can just give it limited training data of a single object from a, you know, from a limited viewpoint range. And transformers will learn view and variant representations. So I think this is really important ’cause even transformers don’t appear to be more data hungry than new than newborn visual systems, right? This, this argument that you see in every popular media story and almost every scientific paper about the data hungriness of these systems. I mean, you know, we’ve just run one study so far, but it looks like, from what we can tell, when you really equate their training data, it’s not actually clear that that one’s more data hunger than the other.

Paul 00:41:55 Well, the, one of the interesting things about the C N N studies, and you can tell me if this is true of the transformers as well. Uh, I had mentioned that you tested it against also A C N N that was fully trained on ImageNet mm-hmm. <affirmative>. And it turned out that that C N N performed better than chicks and better, higher, was more accurate for object recognition, uh, of view and variance than the chicks, and also more accurate than the CNNs that were trained just on the virtual world chick data. But, uh, the issue is that the, the, uh, CNNs that were trained just on the chick data, actually matched the chick’s behavior better. It was a better indicator of their actual cognitive ability.

Justin 00:42:38 Exactly. Exactly. Exactly. And so I think kind of what you’re suggesting is maybe the, you know, we, we see this really as a closed loop system, right? So you’re never done just when you show that the C N N matches the chick, the next experiment, of course, is then to, you know, put chicks in a more natural environment, something a little bit, well, you can’t do anything like ImageNet, but you know, you a much more natural environment. Maybe you have dozens of objects virtually around the animal, maybe their three D. So they change perspective as the animal moves. And then you can ask, how does that more fully trained chick compare to A C N N that’s been trained on that data, right? So you can continuously cycle, or you can continuously cycle around asking, you know, given that the animal required this amount of training data, the same should be true of the machine.

Justin 00:43:20 Um, and then you can, for example, generate new predictions with the machine and ask does the animal actually match those predictions that you just generated from your model, from your machine? And you continuously go around the cycle until you close the gap between the animals and the machines. So, you know, I, I really don’t think this is sort of the end point. I really see it as the starting point of, of now we have a closed loop system where animals and, and, and machines are connected with one another. There’s no human in the loop deciding what counts and what doesn’t, what phenomenon we should care about and what we shouldn’t. It’s just a completely automated system. Um, and so now we can just basically go around this closed loop system over the next, you know, hopefully several years, um, and gradually close the gap in vision and navigation and other areas that we look at. Hmm.

Paul 00:44:01 So, so do you have versions of transformers that were fully trained that also performed, outperformed to the transformers that you trained in a more realistic way? Or?

Justin 00:44:11 So we, we’ve, we’ve mostly stepped away from ImageNet train systems in our newer papers, just ’cause it’s not mm-hmm. <affirmative>, I mean, it’s, it’s, it might, it’s, it’s an interesting for an AI audience. Yeah, right? I mean, it’s yeah, you’ve, yeah, right. I mean, it’s like, you know, I mean, image that is such a weird visual diet anyway, right? So like sta snapshots from the <laugh>, I, you know, many of your, many of your, of your prior guests have made this claim. So, but right. It’s, it’s such a weird visual diet to get mm-hmm. <affirmative>. ’cause you don’t actually get to do what a child does, which is, you know, pick at that object and look at it closely and move it around or interact with that thing. And so, um, so we typically don’t use ImageNet train systems, we just train ’em from scratch. So we’ll just take an untrained system, give it the same training data as a chick that’s been simulated, you know, where new project is. Actually first we’ll train it on retinal waves, and then we’ll train it on, uh, on the visual diet of the chicks. So to ask if retinal waves give you an extra bump in performance, so to speak. Um, but yeah, we’re really just focusing on matching the training date at this point.

Paul 00:45:05 So if, if you can get the same performance from A C N N and from a transformer, is it just that, uh, cognition is just so multiply realizable that we’re gonna be able to, uh, simulate visual development in 12 different ways? What does it tell us?

Justin 00:45:22 Yeah. Yeah. So I think we’re discover, I think the field is discovering the answer to that question right now. I, I, I certainly don’t have an answer. I can, I can speculate, which is that of course, like anyone, but I, I think that there’s something about this direct fit idea. This, this, so this is something that came up from, you know, me Daan at Princeton that I think you interviewed a couple years ago, this idea that, you know, maybe what brains are doing, and we know that ML systems are doing this, but maybe what brains are doing is they’re just fitting to the space time features of embodied data streams, right? So maybe they’re just very, very flexible and they fit to those data distributions. Um, this might happen in an, you know, purely unsupervised way, like predict what’s gonna happen next, like N L P systems or, you know, masked auto encoders.

Justin 00:46:09 Um, but maybe, maybe just that fitting process is what we’re, what we see during development. And if that’s the case, if it’s just kind of a large scale fitting process, then that, you know, shows us that it’s not just that it’s kind of multi, I mean, it suggests that as long as a system is fitting, whether it’s brain or whether it’s silicon, whether, but as long as it’s fitting with a lot of parameters, then we should perhaps see the same kinds of capacities emerging, right? So another way of saying this is like, we wanna push the animals in the machines to the same part of this parameter space and mm-hmm. <affirmative>, maybe that part of the parameter space that, that, that, that, that you, you push towards, is that is fitting to the data, fitting to the underlying data distribution. In which case, the prediction I think we would see is that as, that’s a general principle of learning, and as long as you make a system fit to the underlying data distribution through something as simple as prediction or masking or, or auto, you know, masking, uh, you know, we should, we should see these capacities emerging.

Justin 00:47:09 So that, that would be my speculation, is that there’s something really deeply right about fitting to underlying data distributions. And given that we have 86 billion neurons connected to another thousand to 10,000 neurons, that’s a lot of dials you can turn and tweak in order to fit to a high dimensional data distribution. So that’s kind of my working hypothesis about what’s likely underlying vision. Um, but of course, you know, it’s, it’s just a guess, but it’s empirically testable, right? That’s what transformers do. Yeah. Especially these ones that fit as we can see if that’s actually enough. Right?

Paul 00:47:41 H how many neurons does a chick have?

Justin 00:47:44 How many neurons does a chick have s several million. So it’s a smaller system than humans. Nicely. I think humans are quite complex Yeah. And take a while to develop. Um, but they have all the same kinds of cortical circuits and large scale brain architectures as mammals. Um, yeah. So this, there’s this old kind of myth that bird brains are radically different than mammal brains. This, this comes from early observations in the 1950s, you know, where researchers with, you know, very low precision techniques, you know, went into the brains of mammals and birds, and noticed Wow. Nuclear versus layered organization. They look totally different. <laugh>, therefore, birds and mammals must be thinking in really different ways. But the last 70, 80 years has shown us that that’s wrong. Right? I mean, the, the, the cortical circuits are almost identical in terms of their input output, uh, uh, connections.

Justin 00:48:30 They’re produced by homologous genes. Um, our common ancestor was only 300 million years ago. And you can see that canonical cortical circuit in both birds and mammals today. Um, and you even see the same like large scale hub organization with, you know, prefrontal and hippocampal and visual areas. And Hmm. So, you know, I think, I think that, you know, especially as we’re comparing animals to machines, I think the distance between birds and mammals is actually quite small. Um, and given that you can’t raise mammals particularly well in VR 24 7, if you can do it at all, I think birds might be the one chance we’ve got as scientists, at least to make a run at these questions of, of, of, of nativism versus empiricism.

Paul 00:49:09 You mean birds and mammals is really small relative to birds and AI models?

Justin 00:49:14 Exactly. Is that what exactly? Yeah. Okay.

Paul 00:49:16 Yeah.

Justin 00:49:17 Yeah. That’s what I’m, so

Paul 00:49:18 You had mentioned, you know, yeah, okay, go ahead. So, sorry, go ahead.

Justin 00:49:23 Oh, no, I just was clarifying. That’s exactly right. Yeah. Is that, you know, we often make these comparisons between primates and CNNs, or primates and transformers, and yet a lot of researchers are a little bit reluctant to go to birds because primates have often been the model system for object recognition. And so, you know, this is, this is often the field kind of getting into an attractor basin that might not necessarily be entirely healthy or might be getting, pushing us to a, become too specific in terms of what we’re studying. My suggestion would be we need to relax that attractor basin a little bit so that we can study other animals that might give us other advantages that primates don’t give us. So you can’t raise a monkey in VR from birth, um, and yet you can do this with a chick and re you know, really push around and see what experience the world experience plays in, in, in developing knowledge.

Paul 00:50:06 Well, so I was gonna ask about, and we don’t have to perseverate on this, but, uh, I mean, you had mentioned that mammals develop their, their visual system and visual intelligence develops more slowly, takes more time to mature. And I, I don’t know if, do birds have a critical visual period like mammals do early on and, or there’s this whole pruning effect. There’s, you know, the brain, the, the brain is developing, making thousands, you know, millions of connections per day or whatever in the first few weeks, and then there’s this pruning back, and that’s part of the, uh, process as well. And I, I don’t know if that has, do you think about that or is that an, an issue with chicks? Or do they not have such a pruning and do they come out a little bit more hardwired?

Justin 00:50:51 They they come out a little bit more hardwired? Um, I, you know, I i it could still be coming from a domain, general mechanism that hardwired, right? Because you could imagine domain general through prenatal training data. Yeah. So, so I just wanna clarify, you know, there’s a difference between kind of learning something and then having it being there at the birth state versus genes actually hardwiring the thing into the system, per se. And, and those are kind of two different Yep. Versions of nativism. So I, I just thought it’d be worth distinguishing between those. Um, yeah. So chicks are basically ready to go right away. So about 48 hours after birth is when they reach their peak, uh, visual acuity. Um, it’s, it’s, it’s much lower than humans. Um, so human visual acuity is about 35 cycles per second, or 35 cycles. Um, and, and, and chicks is about 1.5, which is better than rats and better than mice.

Justin 00:51:38 Um, but certainly not what you get at with, uh, what you get to with humans. So, um, so, you know, there’s a lot of, you know, there’s a lot of differences or there are some differences. It just, you know, we don’t know how much these matter, right? This is, this is the problem with reverse engineering, or the fund of reverse engineering, depending on your perspective, is it’s, it’s, it’s kind of more of an art than a science, right? So you start simple <laugh>, and you, you take a guess of what matters, right? Maybe CNNs matter, maybe transformers, maybe rl, and you push that system as far as you can get, and then at some point it’s gonna break, and you’re gonna notice differences between the animals and the machines. And then you should just start plugging other stuff in, you know, but critically you start simple, but then you plug stuff in when you need it.

Justin 00:52:17 Um, and then you can ask, what is that extra thing that I just plugged in? What does it gain me in terms of function? And, you know, I think the ventral visual stream is a wonderful example of how this research program has, has, has, has emerged, right? I mean, you know, you know, Dan Yeomans and DiCarlo started with just feed forward systems, even though they perfectly knew well, that there’s huge recurrent connections, but there was a lot of value to knowing that a feedforward system could actually solve a large, you know, many of these hard problems of vision. Um, I, I remember back, uh, you know, 10 years ago, 15 years ago, when I would teach vision classes, I had this slide that said, we have no idea how vision works, right? It was like, it was a little bit mysterious. We didn’t know, like, generally what kind of system you needed to solve it.

Justin 00:52:59 And now of course, we, you know, I have a whole lecture on how, how vision works and how we can replicate it with feed forward and also recurrent neural networks and what you gain by adding in each one of these extra little pieces. And I think that’s what’s, what’s really beautiful about reverse engineering is it’s not that you just throw the most complicated system. You can add it at the beginning. You really just start small. See what you can get with, you know, for, for example, reinforcement learning and CNNs, in the case of collective behavior, see how far you can push that and then, uh, start adding in more complex models as you need them.

Paul 00:53:28 Well, speaking of starting simple, what do you say to people who, you know, so the original experiments that you did, um, comparing neural nets and brains involved, you know, just a blank screen, a single object for I don’t, days and days, maybe, I’m not sure how long it, it, it lasted the training, but that’s, that’s a far cry from a naturalistic, ecologically valid, uh, environment. And like now, right now, there’s this huge push that we need to get beyond these sort of task defined and minimalistic, um, toy models, et cetera, toy experimental setups. And because it, it doesn’t necessarily apply to the naturalistic, ecologically valid world. Um, but, and I know that you’re moving more and more toward that with the virtual reality systems and what we’re about to talk about with the reinforcement learning, et cetera. Uh, but but how do you, what’s your response to that kind of, uh, feedback?

Justin 00:54:19 Yeah, I would say, I would say, I would say that’s absolutely right. And, and, and, and, but it’s important to distinguish between two types of naturalism, right? So on one hand, we can talk about the naturalism of the images, right? So in this case, something like ImageNet, right? You’re showing natural images. These are images that were taken from the natural world. So on one hand, you could talk about the naturalism of the stimuli per se, right? But on the other hand, you could talk about the naturalism of the data that can be acquired by the model, right? And so in, in, in the vast majority of computational neuroscience, we’re using these disembodied models, right? These are models that just sit there and, and, and the researcher picks the training data and kind of spoon feeds this data to these models. But the models have no decisions about, you know, given their current knowledge state, I’m gonna pick this information versus that information to learn from, which of course, animals are able to do.

Justin 00:55:07 And, and, and so I think we need to distinguish between naturalism on the stimuli side of things, and naturalism on the embodied side of things. So even though we have really simple environments, right, just to kind of a, a a rectangular room with the single virtual object, the animals can move around that entire space, um, and produce, you know, thousands and thousands of unique images of just that single virtual object. And, and so I think that kind of naturalism in terms of embodiment is actually quite important. And might, you know, we don’t know, but might be more important than the naturalism of, of, of actual stimuli that you’re seeing. So this of course, is an empirical question, like they all are, but you know, this, and this is what we hope to address when we go to the kind of the, the more advanced forest screen VR chambers where we can, you know, really raise animals in a farm or in a forest and, and give them all the same kinds of visual stimuli you’d actually get in the real world, and, you know, see how much that matters versus, you know, you could put a chick in a world and maybe the world doesn’t update as it should, right?

Justin 00:56:03 They move, and the world updates a second later, or they move, and the viewpoints that you get are radically different than what you should actually get if you’re moving in that way, right? So these are ways where we can actually see whether the naturalism of how you move through the world is also causally related to this capacity of learning. Um, so, so again, I, I think these, there’s, there’s really two, two types of naturalism that come into play, um, especially when you start to embrace embodiment, uh, uh, in, in the way that we are.

Paul 00:56:31 Okay. Let’s, okay. So, um, so we’ve talked a little bit about the convolutional neural network experiments, and then you’ve, it, it seems like you’re just gradually building up and up and up and making it more complex and asking bigger questions. So pixels to action and, uh, pixels to actions or action and collective behavior and imprinting, uh, with the reinforcement learning agents. That was a mouthful that I just said, but, um, so, so these are experiments that you’ve done, um, with, uh, I guess embodied, um, AI agents trying to, um, figure out what leads, like, what, what are the criteria, uh, that lead to collective chick behavior, um, and, and mm-hmm. <affirmative> mixed in with that as the, as imprinting story as well. So can you describe a little bit about, uh, why you’re, what led you to do those experiments and, and what you found there?

Justin 00:57:22 Yes, absolutely. So, um, a lot of that work was an attempt to take kind of the, the beautiful closed loop systems that had been worked out with a ventral visual system, right? I just fell in love with that idea that you could build an image computable model that could be directly compared with a primate or with a human, to try to take that idea of building a closed loop system between biological and artificial systems, but extended to embodiment. So, and, and collective behavior was one of the first areas we really wanted to target. I, I remember in graduate school, just falling in love with collective behavior. So just, you know, it’s quite amazing to see an, you know, whether it’s schooling fish or flocking birds, how they can fly in these amazingly intricate patterns and never crash into one another, and yet end up building these just kind of beautiful dynamic displays.

Justin 00:58:08 And, you know, I certainly wasn’t the first one to be amazed by this, right? So, you know, you can, you can find quotes from biologists a hundred years ago where their leading theories were like, maybe animals have some sort of telepathy that we don’t have, and that’s what allows them awesome to, to engage in this collective area, right? That was our leading theory was telepathy. And so, you know, I, I, I, you know, and then, you know, in the past around the, the 1980s, um, uh, work by, uh, Craig Reynolds, he actually created these systems called Boyds, um, which he used it for computer animation. He was an animator, and he didn’t wanna animate every single bird in a flock. Oh. So he essentially, uh, created these hard-coded systems that, with just the three simple interaction rules, like, right, like, stay close to your neighbor, don’t crash into them and align your body with them.

Justin 00:58:52 What he found, um, and this is also worked by Ian Zen, who’s a biologist, um, basically they found that these boyds were actually really good models of predicting how, you know, they could actually reproduce the kinds of behaviors that, uh, that you end up seeing animal groups doing. Um, and so this was really neat, right? ’cause now we actually had a model that didn’t have telepathy in it, right? It actually had understandable <laugh> hard-coded rules, and that would actually produce the same kind of behavior as animals. Um, but it, but it, but does leave open this question, right? Of where do those rules come from, right? So maybe evolution, hard codes in rules, however, it might do so into an animal, because of course, being in a group is safer than not being in a group. There’d be a lot of survival benefits to being in a group.

Justin 00:59:33 And so one might imagine a more nativist story, which is that collective behavior emerges from these hardcoded interaction rules. Um, we, we, we, we, we, we, we thought it might be something different, which is maybe these hard-coded interaction rules come from something more basic, right? Something more core. Um, and so we just tried CNN’s with reinforcement learning just on kind of like on a, on a whim to see if, uh, that would be enough. If, if that’s all you need, so to speak, to use computer vision, uh, compu, you know, speak is, is, is, are those the only ingredients you need to make collective behavior? Tick. And, and what we found is that it is, so you, you essentially, um, for, for people who haven’t read this paper, essentially, you can just take a, a group of artificial animals and you can give them, uh, reinforcement learning.

Justin 01:00:18 And, uh, A C N N is a, is an encoder. It doesn’t matter how big the encoder is. It can be two layers or three layers, or 15 layers. You basically get the same performance, right? That’s right. But you get a sense of curiosity. So it’s basically just given two different observations, predict what action is gonna happen next, predict what the next action is, and then you use the error of the reward in order to motivate the policy network. And as it turns out, that’s all you need. You end up seeing groups of animals spontaneously grouping together and imprinting to one another, so to speak, just from those simple ingredients of reinforcement learning and, and curiosity and critically being raised together. So, I, I, this is one thing I really love about that paper is that you can do controlled rearing experiments on machines the same way you can do it on animals.

Justin 01:01:01 And then you can causally explore the role of experience in the development of machine intelligence. And so in one experiment, we raise the machines together. So, you know, just like newborn animals, you know, they’re, they, they typically are, there’s almost like a litter of animals. And so the first thing an animal sees when they open their eyes are their brothers and sisters sitting around them, right? So, and their brothers and sisters are moving around in unpredictable ways. So you would expect a curious agent to start paying attention to those unpredictable things happening in their environment. You don’t need hard-coded knowledge of agents or social groups or anything like that. You just need a curious learning agent to be born with other agents. And, and, and they’ll learn from one another. If you take exactly those same agents and you raise them separately, no collective behavior whatsoever.

Justin 01:01:46 So it’s not just that RL and, and curiosity automatically leads to collective behavior. It’s really an emergent property of those learning algorithms interacting with these social experiences that rapidly produces collective behavior. And, and, and this might be one of your next questions, like, what does this tell us? And I think what this tells us is it gives us an image computable model of collective behavior, right? It it tells us the sufficient conditions, the sufficient algorithms and experiences you need to explain why animals group, and there might be lots of models that do it. It, so this is, this is, you know, but this is at least, you know, one set of models that actually ends up just producing this without any hard-coded knowledge about, again, agents or social groups or all these other things that researchers have proposed might be innate to explain social grouping.

Paul 01:02:34 Can we just talk about the, the curiosity for a moment algorithmically? So I think that you mentioned this, and I don’t know, I know you tested like a, a few different versions of a curiosity algorithm, I think. But, um, basically from what I understand, uh, at least from one line of curiosity driven reinforcement learning, it’s, and I think you just said this, is that you, you make a prediction and the, the, the bigger the error, the, in your prediction, the bigger the reward, and then you update the policy based on how big that error was, and then, um, and then you take actions to explore that space more and make more predictions. Is that right? Exactly. And is that, oh, okay. Um, okay. This is, it’s, it’s an odd, uh, I mean, I think that if you only did that, that would, that would lead you astray in real life. Uh, but there, there must be some sort of reward function. I mean, the, the idea is that it’s like an intrinsic reward without an extra extrinsic reward.

Justin 01:03:31 That’s correct. That’s correct. And I should, I should mention, this algorithm came from pathic back in 2017. So I just, you know, full, full credit to, to him and the Berkeley group that came up with that. Um, we were just excited to have an unsupervised system that could actually learn in an environment.

Paul 01:03:45 Yeah, you’re just waiting, you’re just waiting for all these tools to hit your lab, right, <laugh>. Exactly.

Justin 01:03:49 Exactly. That was one of the ones we ta ones we were waiting on. So, so yeah, that’s, that’s, that’s absolutely right. Um, one critical thing that might kind of help explain or make sense of how pursuing novelty could lead to something like familiarity, right? There seems to be like a tension. There is, you need a critical period. Yeah. So this is, you know, you, you, you have to subject the machine to a critical period so that you ramp down learning until eventually learning is at zero. The same way that in an actual animal, we see critical periods all over the place, especially in imprinting, is initially chicks have their imprinting period open for about three days, and then it shuts down. And so if you don’t put the machine through the critical period, it’ll just continue to explore novelty. It’s really about shutting down that learning with a critical period that allows us to match, uh, animals and machines.

Justin 01:04:36 And, and my hope is this tells maybe AI researchers something important about critical periods, right? So this is almost non-existent in the AI or ML world. Um, it’s basically you just take a huge amount of data and you train these large scale systems. But maybe there’s something really important about learning something and then having a critical or sensitive period and shutting it down and having that serve as the foundation for the next stage of learning. We know that critical and sensitive periods are widespread in animal brains. And so, you know, my hope or what our lab is doing is to essentially start using critical periods in machine learning studies and see what this might buy us in terms of being able to match the behavior of animals and machines. And at least in critical, at least in collective behavior, uh, it seems to be essential.

Paul 01:05:18 So, thinking about humans, uh, uh, at least these days, a lot of people spend a good deal of effort trying to practice beginner’s mindset. And for that, that critical period of curiosity and novel seeking to never end. And we do it in various, you know, artificial ways, you know, in meta, you know, for a healthy mind. But, uh, I suppose this is how, you know, our own critical window is how we become, uh, prejudice and <laugh>, how we stop learning new things. And I mean, should we fight against that? Or should we embrace it as humans? You know, <laugh>,

Justin 01:05:50 Right? <laugh>, I think that’s really great. Yeah. You know, there’s this, this, this phenomenon called the baby duck syndrome in human computer interaction, which is that people tend to like the first computer systems that they ever used, and then they have a hard time adjusting to other computer systems, right? So in a way that’s like we’re imprinting to our computer system ’cause we’re learning habits and policies that allow us to use that computer system really well. Then when you switch to another computer system, now you need to kinda relearn a bunch of stuff. So I think we see this kind of widespread across human behavior if you pay attention, right? Children like watching the same movies over and over and over again. And we really like familiarity. And, you know, there’s even a, I I can’t remember the reference, but I remember this, this paper came out, uh, it was in science suggesting what is the top sided papers?

Justin 01:06:37 And the top sided papers are often papers that are 90% familiar and 10% novel. So, you know, we really don’t like pure novelty. We want a lot of familiarity to kind of ground our foundations of knowledge. And just a little bit of novelty added on top ends up producing kind of the most impactful scientific paper. So, you know, I i I, I think you can often think that humans are just curiosity machines, right? We’re constantly going out and seeking new experiences, no way acquiring data, but that’s not actually how it works, right? I mean, we do that sometimes when all our other needs are met. Um, but I think it’s actually, if you look at 24 7 behavior of a, of a, of a, of an animal or of a child or of a human, you see something really different.

Paul 01:07:15 Yeah. Yeah. I walked in on my mom this morning, I saw, I’m visiting family and she’s just on her iPad playing solitaire like she does every morning. It’s like, ah, come on, mom, do something new. Explore some, some new space <laugh>. Um, I wanna ask you, maybe I’ll come back to imprinting later. ’cause I wanna ask you about like what the current science of, of that is. And, and I know this is another aside, but, um, so thinking about these reinforcement learning agents and, and just raising the chicks in these, uh, environments and maybe especially, uh, the first studies that we talked about where, you know, there’s this funny looking object that you’re testing them on in this controlled rearing experiments and, and thinking about development and critical periods. Do, do we have any idea what, what this does to their visual cognition as adults? You know, uh, you know, if you’re training them in these environments and how you might guess, you know, with, with artificial agents, if you could study that as well. Yeah.

Justin 01:08:08 We, we, we, we don’t know yet. It’s a, it’s a really good question. Um, right now we don’t, th this might be another waiting period because we don’t have the same kind of continual learning systems that, that we know that biological systems can do. So right now, we’ve just been focusing on the very first, or just the very first few objects that newborn chicks create. But our hope is that as continual learning kinda ramps up in the machine learning world, right? Then you could imagine raising chicks into adulthood, and we actually have these chambers ready to go where you could actually, you know, give an animal one set of experiences for the first couple weeks of life, and then a totally different set of experiences the second few weeks of life and ask, how does that, how does that animal then end up navigating those different learning landscapes in order to adjust?

Justin 01:08:53 And then of course, you could put the machine in through exactly the same ropes, right? You could ask, does that machine end up relearning in the same way that the actual animal ends up relearning? And, and, and I think this will be helpful because there might be a lot of ways to build a continual learning system, but we’ll have no way of knowing which is right or which is wrong as it compares to animals, unless we actually put ’em in a closed loop system with one another. Yeah. Right? We got, we have to, we have to raise ’em in the same worlds. We have to test ’em with the same tasks. If we’re not doing both of those things, we’ll never be able to say whether they’re the same or different in terms of how they developed. So, um, so, you know, i i, the kind of, again, coming back to the general theme is I think we need to test bed that is publicly accessible where we have rigorous data of animals and machines being raised in the same worlds and tested with the same tasks. Once we have that, we can come up with ev whatever tasks we want or whatever we care about, whatever the next stage of AI evolution is, and then we just test the animals in it, and then we just plug those same sort of algorithms into the machine and, you know, see what works and see what doesn’t, what closes the gap, and then we move on to the next problem.

Paul 01:09:50 So this is really, um, uh, exposing my naivete, but what, uh, there must be other people who are using AI models to study development that are not doing it in a controlled rearing from birth setting. Um, what, what I’m, I’m not gonna, I don’t wanna ask you like what you think of that, you know, if that’s all for n but I mean, <laugh> maybe what do the, what does that world think of, you know, the controlled rearing aspects or what, you know, is there value still in, in doing something that is not as perfect as your research is <laugh>

Justin 01:10:22 Oh, oh, oh, ab absolutely, absolutely. So, you know, the, the first thing I would say is that I could really care less about chickens <laugh>, right? So the, the reason I use them is because they’re a model system that allows us to discover general principles of how experience and what core learning algorithms, whatever those are, interact with one another. But nobody, I don’t think anybody in the field will be convinced if we just discover something with chickens. Oh, yeah. And it doesn’t actually generalize to any other species. So, you know, ideally, you know, chicks are a way we can actually make a deep, deep run at these questions. You know, I, I, I think I mentioned the study where you can, you know, raise an animal where object permanence is false. Yeah. This is something that we recently did in case you wanna chat about it.

Justin 01:10:59 Yeah. Um, but, you know, that’s something you could never do with a baby, you know, ethically or practically. Um, and so the hope would be that that, that some of these general principles we discover from chicks could then be applied to, uh, to, to, to, to humans, for example. And, you know, Brendan Lake is doing some really wonderful stuff, um, in this regard. So he takes, for example, se cam data that, uh, Mike Frank put together, um, along with a bunch of his collaborators, um, and essentially, uh, um, uh, trains, transformers and CNN’s through the eyes of this baby cam data. And, you know, that’s great because it’s the same approach. Um, you can’t vary the, the, the, the visual diet of a baby the same way that you can with a chick. But if what we discover with chicks is right, we should find that that same approach should work with babies too, right?

Justin 01:11:46 You just, it’s gonna be just much more complex experience. You won’t be able to figure out how particular experiences really impact the development to vision. But it’s a critical sanity check to make sure that whatever we’re discovering with chicks actually does apply to, to animals more generally. So, yeah, I, I think we need to use lots of model systems. Um, I think that just sticking with chicks would be a mistake. So we actually even do stuff with fish. Um, so for collective behavior, we have a fish collective behavior paper where we find that if you insert, uh, you can basically look at the emergence of collective behavior in fish, and it, it, it takes about 21 days for them to start to, to group together. Wow. Um, so we built a bunch of virtual fish and put them in virtual fish tanks. And what we found is that the development of collective behavior is identical across those virtual fish in the real fish. So they start not caring about one another, and then they gradually start grouping together more and more during development. And you can see this kind of parallel developmental trajectory across the biological and artificial systems. Um, and so, you know, again, again, I think it’s really important to use different model systems and really try to discover general principles of learning rather than just specific things that might happen in an individual animal.

Paul 01:12:53 Is, I’m, I know that we’re all over the place here, but I just curious is, so there’s a difference between collective behavior and swarm behavior. Is swarm behavior a type of collective behavior or, yeah. Subtype. Okay.

Justin 01:13:05 That, that’s right. Okay. Exactly. So, collective behavior just generally refers to animals interacting with one another mm-hmm. <affirmative>, and they can form a lot of different kinds of dynamic systems, so to speak. Um, swarming is a particular kind of, uh, uh, of, of, of, of collective behavior that emerges when certain characteristics are met. So

Paul 01:13:21 You’re not studying chick <laugh>, which would be something else. Um, yeah. But, but I also, like you, you’d mentioned the, um, object in variant, uh, breaking that, breaking those rules, um, raising them in conditions where object in variance isn’t a thing. Tell, tell me about that. And then I, we, we’ll come back to the reinforcement learning again. I know it’s terrible that we’re jumping around. Sorry.

Justin 01:13:42 Oh, no, that’s okay. That’s okay. No, this is one of my favorite experiments, so I’m, I’m excited to, to tell you about it. So this, this has, this has no AI or ML component yet. Okay. We, we really hope it will. But the study is basically imagine being raised in a world in which object permanence is never true. Right? So here’s how the experiment works. As you again, uh, take a chick, you hatch it in darkness, you move it over to a video game, basically a video game world that’s, uh, where the chick controls the actions happening in the video game world. And then, uh, for the first several days of life, the chick sees like two screens. So there’s an object, two screens move up, and then the object goes behind one of those screens, and then the screens move down. Uhhuh <affirmative>. Now, in a natural world, the objects always where it was when it disappeared or when it moved out of you.

Justin 01:14:27 Whereas in the unnatural world, the object’s always behind the other screen, right? As if the object teleported from behind one screen to the other screen. So we raised chicks either in the natural world for several days, or in the unnatural world for several days. You know, I’ll, I’ll just point out of course, that only the chicks in the natural world had experience that object permanence was true. Right? The chicks in the other world had the opposite experience. If, if, if it’s pure learning, they should have learned something radically different than the chicks raised in the natural world. As it turns out, they look exactly identical to one another. It’s their behavior is almost identical with both the natural raised chicks and the unnatural raised chicks both showing object permanence. Hmm.

Justin 01:15:10 So, so, so this, I, I, I love this study because it shows why we need vr, right? So one could have guessed, and this is many, you know, many empiricist think this is that object permanence is learned, right? You have ample experience seeing objects, you know, balls rolling under couches and then reemerging our dogs going away and then coming back, right? We see this all the time. And so maybe object permanence is learned. The only way we could ever figure that out is if we actually put animals in a world where object permanence wasn’t true, where you didn’t have the experiences you needed to learn that. Um, and then you can ask, does the animals still show that capacity? Um, and for the case of object permanence, we still do see that. So our speculation is that prenatal training data is so rich that maybe it builds a visual system that has object permanence in a sense, baked into it mm-hmm.

Justin 01:15:55 <affirmative>, so that even when you get contrary experience, that initial foundation is so strong that you just can’t unlearn it. And given that a retinal wave is a lot like an object that’s obey object permanence, as it moves across the eyeball, basically from the moment the brain is growing, or from the moment the retina develops, at least, um, you’re getting experience that object permanence is true during prenatal development. And, and, and, and so I, you know, I, again, I I I love this wave of basically being able to look at what’s learned and what’s not learned through vr, because it lets us make a clear distinction. We know in that case that it’s, that it’s gonna be some sort of nativist story, um, in terms of where object permanence comes from. Mm-hmm.

Paul 01:16:33 Okay. Okay. I’ll finally bring us back to the, the, uh, deep reinforcement learning, uh, collective behavior stuff. And, uh, one of the interesting things is, so with the convolutional neural networks, this, the, the experiments we opened up with, uh, it, it seems to be more the inductive biases are, are, are inbuilt, essentially. And you talked about this domain general learning knowledge, and it takes very little for them to, um, to learn the object, permanence object, uh, in variance. But, uh, you make the point that the, in terms of the collective behavior, you actually really need that curiosity driven learning algorithm. So it’s, so it’s really a, a learning more of a learning story in terms of collective behavior, uh, than it is in terms of visual cognition, uh, basic visual cognition, I guess.

Justin 01:17:22 Hmm. Yeah. Yeah. That, that’s right. So if I don’t answer your question directly, you know, feel free kind of to ask it again. So I think, I think Gwen, of what you’re suggesting is maybe there’s a little bit of a tension between the C n n results and like the collective behavior results.

Paul 01:17:36 Exactly. Well, it was like kind of an open-ended question, because I don’t that, that’s just, uh, from my, from my uh, perspective, it seems like that there might be a tension, but, but there’s a, could be an easy story to be made that, um, I mean, they’re, they’re both at play because we’re nativists and empiricists and they’re both right, I suppose <laugh>.

Justin 01:17:54 Yeah. Yeah. At least that’s my, that’s, that’s my view. So, yeah. No, I’m, I’m glad you asked. So, so, so, right. So with the CNN results, what we find is that if you use a supervised linear classifier, you can read out from a frozen c n n view and variant object features, and you can discover those features in the embedding space using even just a single viewpoint. So even though it’s supervised learning, which the chicks don’t get, you can still use a very small amount of data in order to find the right set of features in the embedding space that allows review and variant recognition. Now, that works. So that suggests that the C n N is at least developing the features it needs, or learning the features that it needs in order to be able to succeed in the task. But that’s quite different from an embodied agent discovering those features on their own.

Justin 01:18:44 Right? Right. So in, in, in, in one of the papers I sent you, we call it a newborn and embodied turing test. So this is a, this is an experiment that’s really similar to collective behavior, but essentially we take exactly the same agent, except it’s just one little artificial chicken in a chamber instead of a whole group of them. This matches directly to the virtual, uh, to that experiments with the chicks. And what we find is that even though our c n n experiments show that view and variant recognition can be learned from this sparse data, we find that the embodied agents don’t discover those features mm-hmm. <affirmative>. So in fact, the machines end up looking radically different from, from the actual chicks. So the machines develop view dependent representations rather than view and varying representations. Right. They’re really kind of, you know, it’s almost like they’re relying too much on statistical learning, uh, or they’re fitting to the data a little bit too much.

Justin 01:19:33 Um, we also found this is work by, uh, Manju Garel in our lab. She finds that machines are very background dependent. So this was an experiment where we raised, you know, one object on one background. So that was their entirety of their visual experience for the first week of life. So if you’re relying purely on statistical learning, you should yolk the features of the object with the features of the background, which is what you don’t want. ’cause then you can’t recognize that object across novel views. Well, chicks are wonderful at this. They don’t, they’re not at all thrown off when you change the background, but machines are terrible at this. So you change the background on a machine, and now they go more by the background Yeah. Than they do by the object itself. So this suggests that even though we can get these systems to imprint, uh, in the same kinds of chambers where we get real chicks to imprint, what we’re not finding is the same kind of recognition capacities developing.

Justin 01:20:22 So this leads us to think that there’s something missing. We have no idea what that is. We have some guesses of course, but there’s must be something missing in the machines that the animals have that allows the animals to develop these view and variant background and variant representations, um, that, that, that really support high level, high level vision. So that really, that’s one of, that’s what we currently see as kind of one of the big things in the lab is how do we get an embodied agent to discover that particular set of features, the high level features that are needed in order to, to, to solve the task. You, you said, and, and you know, that could be, oh, sorry.

Paul 01:20:55 I was just gonna say, you had, you said you had some guesses, I think you were about to say some of your guesses, perhaps.

Justin 01:21:00 I, yeah. So, you know, one of our guesses is, um, the, the, uh, the, the, the, the retina is highly dynamic. So right, in, in machine learning world, we often feed in R G B images, right? But if you look at what data looks like visual data looks like after it goes past the eyeball, right after it goes through L G N, it just looks completely different. You only essentially maintain moving features. Um, and so there’s these wonderful demos called the d dynamic visual system, uh, uh, uh, demos that people can pull up. I’d highly encourage you to do this, where you can see, you can convert a real video into a D V S video, and you can see that basically, you know, you get rid of 99% of the data. Mm-hmm. But the only thing left is what changed, right? Okay. Right. Because you’re only keeping those changing signals, which is what neurons do.

Justin 01:21:49 And so that might solve the problem, right? So, um, so, so it might be the case that if you’re only paying attention to moving things, and we have some data suggesting that this is what chicks actually do, if you’re only paying attention to moving things, you’ve made your computational problem a lot easier, rather than having to reason over the whole 2 24 by 2 24 image space, now you just get to worry about those moving features. And those moving features will then be automatically segmented from the background. Because if you still your head, the background is not moving, but the object might still be. Yeah. And so this is our, this is one way that a biological system just based on having a different eyeball, right? Having a different body, not having a different brain, but having a different body, um, could lead to radically different kinds of, of, of changes.

Justin 01:22:31 So, you know, kind of more generally, I hope what this, what I hope what happens is that rather than just focusing on the learning algorithms per se, as I hope the field embraces a morphology as well, because it might very well turn out that the body of an animal, right, like the, the, the nature of their eyeball, the nature of their ears, the bait of how they move through space is gonna influence their training data. It’s gonna influence what they perceive. And I think we’re gonna have to start mo playing around with different artificial bodies the same way we’ve been playing around with different artificial brains in order to have a chance of closing, closing this gap between animals and machines. Hmm. Um, so that, that, you know, that’s my speculation. I think it’s really about the eyeball per se, and the nature of the input you actually have going into the visual system, rather than just the R g B frames that we typically freed in. Now that’s

Paul 01:23:17 Sort of a, a bottom up attention story, right? That if, if you’re, you know, attending to, uh, relations or actions or differences in, in movements, um, do, is there a top down story in chicks <laugh> also?

Justin 01:23:32 I, I, I got, you know, I, i, I just, it depends on what you mean by top down. So I, I tend to, yeah. My, my world was rocked by Henry Yen a few years ago, and he came on your podcast and he started talking about negative feedback control systems, which really kind of questioned what it even means to be bottom up versus top down. So, uh, so, you know, I ultimately, I think what you need to do to test that question is just kind of build in whatever you mean by top down instantiated into the, into the brain of the artificial animal itself, and see if that ends up, ends up making a difference, right? So you might imagine that if an animal’s trying to control some internal homeostatic variables in a negative feedback control system, then you, you know, that might influence the kinds of features that it’s gonna end up paying attention to depending on its current needs, right?

Justin 01:24:17 So if you’re yeah, hungry, you might pay attention to this, this set of features, and if you’re thirsty, you might pay attention to this set of features. And if you’re scared and being chased by a predator, you’ll of course look for a third set of features. So, you know, I really think that the, that embracing the control of a system rather than the kind of input output approach that we’ve been taking in ML and AI is might turn out to be quite important, especially as we scale from these disembodied systems to the embodied systems that actually act in the world and, and, and, and, and, and, you know, have goals of their own in some, in some, so in some ways.

Paul 01:24:47 Yeah. Yeah. Those internal set points. Um, you mentioned, you know, Henry y, like what are those internal set points? Is it all homeostasis? Like where does it all derive from? It is really hard and fascinating to think about. For me, there’s this paper, um, by Michael Anderson, Mike Michael Anderson, whom I’ve had on the podcast. Um, and it’s all about like behavior as what they call, and Cete Raja is the co-author what they call, uh, behavior as an enabling constraint. And, you know, as you were thinking right after, I said, well, that’s kind of a bottom up attention mechanism. The other thing with morphology and the way that our eyes work and the way that we interact with the world is you could kind of think that of that as a top down, uh, constraint as well, because it is like cons top down, did I say constraint? Yeah. Mechanism constraint, whatever. Because it is constraining the way the signals that get in and how they get in. So that’s almost a top down thing. And you were saying, well, it depends on what you consider bottom up and top down. And yeah, it all gets mixed up, I suppose. But, but do you think of, you know, the environment and, and our behaviors, do you think that, of that as, you know, shaping our neural activity in a sort of top down constraining kind of way also?

Justin 01:25:59 Oh, oh, absolutely. Absolutely. I think, I think that the way of looking at the way of building scientific models by allowing those models to do tasks in their own right, and then putting constraints on those models to see if we can push them to the same part of the parameter space as biological systems, I think that’s absolutely the right idea, right? So this of course comes from, you know, Yemens and DeCarlo and, you know, PICO score and other wonderful people. Um, but I, you know, I, I think we’ll need to do the same thing with embodiment, right? So your body places constraints on, on, on what you can see and what you can do and what kind of training data you can acquire. Uh, your, your your, your physiological needs are gonna place additional constraints on the kinds of things that you look for in the world.

Justin 01:26:38 Even the nature of, you know, as we were talking about, the nature of your eyeball is gonna place these radical constraints on the nature of the data actually going into your brain. Um, and so I think all of these con, you know, we don’t know which constraints matter and which don’t, right? That’s why we’re doing the science in the first place, is to figure out what matters and what doesn’t. Right? We have, you know, thousands and thousands of details of phenomenon and lists of facts and neuroscience, and we don’t know what matters and what doesn’t. And so I, I really love this reverse engineering approach. ’cause it gives us a way of figuring out at that engineering level of abstraction, what do we need to keep and what can we throw away in order to be able to build functioning intelligence? And so I think that, you know, we should just add constraints and see if they matter.

Justin 01:27:18 Some might matter a huge amount, some might not matter at all, but we can do this. We could take an empirical approach on this and just, you know, plug them into the artificial animals and see which of those constraints really matter and which don’t. Mm-hmm. But yeah, I think that’s how we get a domain, general system being pushed to the same part of the parameter spaces an animal is through heavy, heavy constraints on their morphology, uh, on, on, on the learning rules, on the architecture, on all the various, uh, things that have been pointed out by, by some of your prior guests.

Paul 01:27:46 How far can you push your system? What your pixels to actions, like what, what’s your vision for how far you can take this and where, where the limit is? Do you see a limit?

Justin 01:27:56 Wow, I, that’s a great question. I, I, I, I, I <laugh> I hope this just come, doesn’t come off here, but I, I don’t see a limit. So the reason I tried to do this 10 to 15 years ago was because I wanted to fully embrace an entire whole animal, right? So, yeah, like a chick does it, it does every, you know, it, it, it, it, it, it, it recognizes objects. It navigates the world. It develops social connections. It solves all of the problems that that animals solve. And so, you know, it it, I don’t wanna say anything negative about the kind of piecemeal attempt that we’ve been attempting in the field need. Well, we’ll try to reverse

Paul 01:28:34 Need all You need them, need them all.

Justin 01:28:35 Right? Right. You need ’em all. Yeah. Right. Yeah. I mean, and, and the beautiful work of, you know, the visual system and the auditory system and the olfactory system, and that’s absolutely wonderful work. But at some point, we’re gonna have to put those together into a single unified system and then see if that single unified system develops a visual system and an auditory system, and all the various kinds of systems we actually see in a real developing organism. So I think we have to take development seriously because the only, the only evidence we have of an intelligent system is through development. Um, typically we’ll take an adult state of, and, and that will be our target, and then we’ll try to build an a and n through massive training. We won’t care if it’s evolution or development. We’ll just say, oh, it doesn’t really matter, but we’ll just take that a and n model and compare it to an adult animal.

Justin 01:29:17 Right? Right. And, and, and I think that’s fine depending on what your goals are, but if you wanna build a unified model of intelligence, you have to take development seriously. You have to take embodiment seriously, and you have to take agency seriously, right? You have to let the, like, just like real animals who are embodied and can decide what they’re gonna do next, you need to give machines that same kind of capacity. Because ultimately, I think the name of the game is to ask how much of the phenomenon can you explain through your model? And if we get it right, like I’m, you know, I, I imagine it’ll be more complicated than this, but let’s just say that it’s a transformer, right? So if we stick a transformer into a developing chicken embryo, artificial chicken embryo, we give it prenatal training data and all that, and then we give that artificial chicken the same kinds of experience as a real chicken, and it starts to develop object recognition.

Justin 01:30:05 And it navigates path integration and snapshot based navigation and reorientation and all the core capacities that nativists have pointed out over the years. If we, if, if we find that one model doing all of those things, we’ve now then got a unified model, a unified closed loop system model that can be used to then do additional tests. And, and so that to me is, is kind of at least what our lab is chasing, right? We like the same way that Darwin gave us d n a as a way to connect tigers and worms and insects to one another through a common medium. We need something like that for psychology, right? We need some sort of medium, maybe this will be like a transformer or something else that allows us to explain why brains develop object recognition and, and, and smell and navigation and decision making and face selection and all the various things that we develop, right? But, but I think that we need to, we need to embrace or chase a unified model. And this is the only way I can see us doing it is, is taking development seriously and controlling data from the very beginning of life.

Paul 01:31:06 Surely the transformer is not it. Surely there’s, it’s the transformer is just, you know, another five year thing that maybe, maybe be replaced. But you mentioned you, you keep mentioning transformer ’cause that’s like the latest and greatest, right? And

Justin 01:31:18 So it’s the latest and greatest and, and it, and it’s, and it’s, you know, yeah, definitely push back on me if you don’t agree. But it’s the, you know, in many ways I consider it to be one of the best models we have in cognitive science. I know it’s not treated that way, of course, but, but it’s the one model we have that can do a bunch of different things. Hmm. So there’s this beautiful, you know, gatto model from DeepMind, for example, that can do, you know, can play video games and it can caption images and it can write essays for you, and it can n you know, control a robot arm, right? This is one of the first examples we’ve seen of a single unified system that can actually do a wide range of different things. And, and, and, and, and, and, and given that, that’s the only system that we have, and given that that’s one of the hallmarks of human and animal intelligence, I really think that we should treat transformers seriously.

Justin 01:32:03 We don’t understand how they work yet, but that’s okay. We didn’t understand how steam engines worked until we built it and then reverse engineered it. So I think we wanna just first find a system that can do all of these different things, and then we can make sure it actually matches the biological system, which is what we’re trying to do. And then of course, all the, you know, brilliant mathematicians of the world can try to understand how these things are working, um, in a way that gives us a, a fielding of understanding whatever understanding means, you know, I think it’s gonna mean something different to, to, to, to everybody else.

Paul 01:32:31 I mean, it seems like your lab’s going gangbusters and, and like, I think I mentioned this earlier, just you probably have 15 different experiments, ideas lined things to test lined up. What, what, what’s, uh, gimme just a quick flavor of what to expect over the next year or something from your lab. What are you, what’s in the very beginning stages now that you’re, and, and then we’ll wrap up, I promise.

Justin 01:32:50 Oh no, this is, yeah, this is really fun. Um, so yeah, next stage will be, hopefully we’ll have every chick experiment we’ve ever done publicly available for people to try. So this will be hundreds of experiments. So instead of just the one or two experiments we talked about here, it’ll be hundreds of experiments with detailed data from chicks. People can basically plug in their model and test it across the whole test bed. And then that way we have this integrative benchmarking approach, um, focusing on this, I think, fascinating question of the origins of, of, of knowledge. Hmm. So that’s what I hope we hope to have that up and running in the next year. Um, and then the other thing will be, we’re gonna be start exploring other capacities. So I’ve, I’ve always loved navigation, um, especially because a lot of core navigation abilities emerge quite early on, right?

Justin 01:33:31 So you see newborn animals doing things like path integration and snapshot based memory and reorientation. This is, you know, part of kind of the core knowledge, uh, tradition. And so I’d love to see what does it take, what kind of learning algorithms do you need in order to reproduce core navigation in, in a machine? Um, and, and, and we can play the same game of raise the machine in the same world as the animal, make sure that it’s all, they get the same training data. And then we can start asking, you know, again, does that same core learning algorithm that gives us object perception, does that very same algorithm also give us navigation? And then in the future, does it also give us social cognition? Right? So can we slowly expand out the range of tasks until we have a single unified model explaining all the various things we care about <laugh>,

Paul 01:34:14 That is the dream. That’s okay, folks. That’s, I have nativist, Justin Wood. We, I’ve been, no, I’m just kidding. I’m just kidding. <laugh>, I have non nativist Nora Empiricist, but, uh, accepting of both Justin Wood. Hey, Justin, I, I’m glad that we finally got to connect. It’s, we had to keep kicking it down the road, but I appreciate you coming on and, uh, of course, best of luck even though you don’t need it. And, and thanks for being on.

Justin 01:34:33 Oh, thanks so much for having me. It’s been a pleasure.