Brain Inspired
Brain Inspired
BI 171 Mike Frank: Early Language and Cognition
Loading
/

Support the show to get full episodes and join the Discord community.

Check out my free video series about what’s missing in AI and Neuroscience

My guest is Michael C. Frank, better known as Mike Frank, who runs the Language and Cognition lab at Stanford. Mike’s main interests center on how children learn language – in particular he focuses a lot on early word learning, and what that tells us about our other cognitive functions, like concept formation and social cognition.

We discuss that, his love for developing open data sets that anyone can use,

The dance he dances between bottom-up data-driven approaches in this big data era, traditional experimental approaches, and top-down theory-driven approaches

How early language learning in children differs from LLM learning

Mike’s rational speech act model of language use, which considers the intentions or pragmatics of speakers and listeners in dialogue.

Transcript

Mike    00:00:03    I’m fascinated by language in general, and I, I always have been, and I’ve focused in on early word learning because it’s, you might call it the smallest big thing the listeners thinking, what was the speaker’s goal, given that they said this utterance. And then the speaker’s thinking, how would I get the listener to figure out my goal? Which utterance should I choose? Mm-hmm. <affirmative>. And of course, that’s a recursive formulation. Listener speaking, thinking about speaker speakers, think about listener. Yeah. This kind of question is really controversial, in part because there’s a lot of ambiguity when somebody asks the question, what is the function of language? So philosophers have a lot to say about distinguishing different kinds of functions, but minimally, we can say  

Paul    00:00:51    This is brand inspired. I’m Paul. Welcome everyone. My guest today is Michael C. Frank, better known as Mike Frank, uh, who runs the Language and Cognition lab at Stanford. Mike’s main interests center on how children learn language. And in particular, he focuses a lot on early word learning. Um, and what that tells us about our other cognitive functions like concept formation and social cognition. So, uh, we discussed that. Uh, we discuss his love for developing open data sets that anyone can use. Um, Mike dances, uh, the dance between bottom up, uh, data-driven approaches in this big data era and traditional experimental approaches and top-down theory driven approaches. So we talk about those different approaches and the advantages and disadvantages. Uh, we talk about how early language learning in children differs, uh, from large language models and, uh, large language models in general. We also discuss one of Mike’s normative language models called the Rational Speech Act Model, or R s A Model of Language use, uh, which considers the intentions or pragmatics of speakers and listeners in dialogue.  

Paul    00:02:05    And of course, we, uh, talk about how his model compares to large language models and, uh, a lot of other topics as well. You can support this podcast on Patreon if you value it. Uh, that or signing up for my neuro ai course, uh, online are the very best ways to support what I do. So you can go to brain inspired.co to learn more about those. I’d like to give a special thanks to Yolanda and Kendra, whom I got to meet at the recent Gordon Research seminar and conference for eye movements. Thank you both for being patrons for over two years now. So, awesome, Yolanda. Uh, nice to meet you finally. And thank you for the invitation. And Kendra, uh, I enjoyed being squeezed in your car for a few minutes with those other clowns, uh, and great to see lots of old friends and meet new ones as well at the conference. And now here’s Mike. Oh, wait, I forgot to mention the show notes for this episode are at brand inspired.co/podcast/ 1 71, where you can learn more about, uh, Mike and the excellent science he does. Okay, here’s Mike. So bluegrass music, uh, do you play bluegrass?  

Mike    00:03:11    I do, yeah. I play the mandolin.  

Paul    00:03:14    So I’ve said this and thought this about jazz, and I think, and have said the same about bluegrass, that it’s more about the, uh, for the player than the audience. Is that, uh, too scathing or is, would, would you disagree?  

Mike    00:03:26    No, I think that’s totally true. Oh, I actually didn’t  

Paul    00:03:29    Thank  

Mike    00:03:29    You. I didn’t like bluegrass before I started playing it, but I loved the ability to play with other people and to have a language to engage with them in, ah, so in bluegrass there are all these rules that mean that you can roll up to a jam and somebody can call a tune you’ve never heard before. And by the end of three minutes, you know, you all are playing together, you’ve singing harmony on the chorus. It’s a very, very structured way of interacting really, very much like jazz. And the amazing thing about bluegrass is, you know, you can master those skills with a couple of years of hard practice, whereas jazz really takes a lifetime.  

Paul    00:04:04    Oh, and rock and roll takes two minutes.  

Mike    00:04:06    Well, I was in rock band all through high school and college, and I mean, that’s definitely my first love in terms of listening to music. But, um, yeah, I’ve grown to love bluegrass because I can just go anywhere in the world and immediately connect with people.  

Paul    00:04:19    I’ll see. I appreciate that, and I appreciate like the skills aspect, but from a aesthetics in listening, have you come to enjoy bluegrass? Just can you just sit back, my old neighbor who is a painter, an artist, he would always have bluegrass and he played bluegrass, but he would always have it in the background while he was painting. And so can you, does it must bring you joy now to, to just consume it as well?  

Mike    00:04:42    Not always. So I, I remember coming back in a car, very kind of vivid memory of driving through the beautiful Marine Hills, coming back from my first bluegrass camp. This actually before I played the mandolin. And I remember I was just in love. Like I, I wanted to immerse myself and understand the genre so that I could get this incredible high of playing with people and producing music. And so I, you know, I put on some really banjo, heavy bluegrass, uh, flatten Scruggs or something, and I thought, Ooh, I don’t know that I wanna listen to this. But I look, I, I love, uh, new acoustic music, you know, Tony Rice, David Grisman, this kinda stuff that I’ll listen to all day. But there is a certain kinda real banjo forward thing that is not my bag, per se.  

Paul    00:05:21    Okay. Well, um, it was a great segue that you mentioned, uh, language and enjoying that, you know, that there is a language to communicate with your other, uh, bluegrass, um, players. Because we’re gonna talk a lot about language today. Uh, and that is, you know, the, the main focus of your research, and you can correct me, is in, uh, uh, how we acquire language and language learning, especially from a very young, uh, I guess early word learning would be your specialization. Is that right?  

Mike    00:05:48    Yeah, I, I’m fascinated by language in general, and I, I always have been, and I’ve focused in on early word learning because it’s, we might call it the smallest big thing, right? It’s a, uh, a very manageable problem in some sense. Like we have all these wonderful word learning experiments where we can teach a kid a word in the lab, and we know they learned that word, but it’s also connected to social cognition, it’s connected to language structure, it’s connected to sound structure and so forth. So it’s both a very tiny, well operationalized construct that you can study in a controlled way and also just this incredible universe of possibilities of connections to different aspects of cognition and language.  

Paul    00:06:26    Okay. So I was gonna ask how your, you know, your interests and your research focus has changed over the course of your academic career, but you just said that you’ve always been interested in language  

Mike    00:06:37    Very,  

Paul    00:06:38    Yeah. But you come from a besian, you come from a besian background as, as well. And we’ll get into like your, your modeling aspect of it as well. But that’s why I was curious, like, if, if, if it started off with like, oh, did he, is he more of besian or is he more language like what, how you began and, and that trajectory?  

Mike    00:06:55    You know, it’s funny, you know, you have these intuitions about the topic you’re studying sometimes, especially when the topic is psychology or cognitive science. And for me, that intuition was something like, uh, that language was really critical to thinking and structuring the world that I remember having these conversations, you know, back in high school, you know, late at night about thinking and language and just thinking about, you know, your thoughts. Like one of those kinda late night introspective ideas. And then in college when somebody assigned me a little bit of vichtenstein, I read that I thought, yes, this is how it works. We’re just deciding on the ways that we communicate and coordinate with one another. And so I became a, uh, you know, roughly a philosophy major. I mean, I was, I was a symbolic systems undergraduate, which is actually our cognitive science major, but I was a philosophical foundations concentrator, uh, which meant I, I was reading victim stein and doing a lot of philosophy of language.  

Mike    00:07:46    And, you know, then I tried to get involved in research and they were like, well, okay, you’re doing the research right now, you know, you’re thinking about it. And that didn’t seem so satisfying. So I ended up going in kind of in a more empirical direction. Uh, I had some great mentors in psychology who encouraged me to study language with empirical tools from cognitive science, uh, and especially developmental tools. But, so I, I was always interested in language and thought and this connection. It was just that the approaches have changed pretty radically over the years.  

Paul    00:08:15    Okay, but you’re not like a Chomsky, that language is forethought.  

Mike    00:08:20    No, no. I, I think that language is functionally for communication and much of my work has been about cashing out what that means and how it might affect interpretation in the moment, as well as a little bit how it might affect the structure of language downstream.  

Paul    00:08:37    Okay. Yeah. We’ll come back to function of language in a minute, but I was gonna open with a sort of morose, uh, question, which would be, you know, what would your scientific tombstone say if you just, uh, vanished from the science community right now? Uh, because, and, and the reason why I was, I brought this up now is because you said, well, you know, your mentors and your advisors said, well, you’re doing the, uh, the empirical work right now by using words. And if I understand it correctly, you kind of had a, uh, a backlash because you wanted to get into like, numbers and, uh, computational, um, theoretically driven, uh, aspects of, of understanding how we acquire, uh, language. So would your tombstone say something like that? Um, maybe a little less long-winded?  

Mike    00:09:24    Well, there’s what I’d want it to say and what it would say, I, I think I would want it to say something about bringing bigger data sets to bear on fundamental questions about the origins of language and the variation between kids and between languages.  

Paul    00:09:36    But that’s your more recent interest, correct?  

Mike    00:09:42    Yeah. So there’s real continuity in, in my interest in the sense that I’ve always wanted to create unified computational theories. It’s just that when I started to try to do that, the rubber met the road, I realized there just wasn’t enough data about the questions that I was interested in theorizing on. And so, in some sense, the, the questions, the theories, all of those are, you know, relatively older. I’m trying to answer questions that people have been asking for a long time. And the project has been to get the resources, mostly data, but then increasingly the formal models to run over those data and then, you know, actually be able to evaluate hypotheses which many people have had.  

Paul    00:10:21    Well, maybe I’ll just, maybe we’ll go ahead and talk about this, cuz I was gonna ask about, you know, building these repositories. Um, I know that you ran out of, you know, good data for your models and so you needed more data. So, you know, now you’re like building essentially like tools and open source tools, publicly available data sets and for people to, um, come use them when they want. How’s that going? Like, um, this is a huge focus of your work right now. I know, uh, and I think I saw on one of your talks, you put up the, um, field of Dreams poster. The, if you build it, they will come. Uh, and so is that true? If you build great tools, will they come and use them?  

Mike    00:10:55    Well, the, the contrast that I sometimes make in the talks is between the field of dreams model, where you just build something and people show up and use it versus something a bit more, uh, product and user focused, which is to drive adoption of a tool you need to really think about what people want, and then you need to promote it and build a user base and gradually kinda gain steam in the community. So I, I don’t think we’re really at the field of dreams level right now. You know, the, the other alternative, which is maybe another, another bad way to go is the Joshua Bell and the subway model where you’re fiddling away, you’re doing something amazing, you know, it’s world class and nobody stops to look and listen, right? So we trying, trying to navigate these things in, uh, repository building, tool building, uh, data sharing is really, uh, it’s a critical challenge right now,  

Paul    00:11:41    But is so you have to promote it. Um, was that, is that high on your list of desires, promotional activities?  

Mike    00:11:49    Well, you know, it’s not like we’re advertising in the subway. Instead, what, what we’re doing and, and what I think about when I’m constructing these repositories is not, oh, there should be such a repository. Cause I think that’s the way some people approach. It’s, there should be a thing that does X and we’re gonna build the tool. I actually go into it with a much more, I could call it selfish, but it’s a scientific motivation, which I wanna do this kind of analysis. I think this kind of analysis will answer my question. And so I wanna get together the kind of data set that will allow that analysis. And I’ve learned from experience that if you structure the data in an appropriate way, that analysis will be easier and a hundred other analyses will be easier too. So I’m going in it thinking, um, hey, maybe I can get enough data together to get a really clear picture of something that I couldn’t see before.  

Paul    00:12:34    So we don’t need to spend too much time, you know, persevering on this, but, you know, have you like had people come and say, well, actually I need it structured a slightly different way. Have you had to modify the structures of the data over time?  

Mike    00:12:46    Uh, I’ve had a couple of really, uh, wonderful collaborations across the years with folks who are very technically sophisticated and talented. Um, Mika Brisky is one, Stephan Mailin is another. Uh, these are folks that actually came into my lab quite early on when I was getting started, and they’ve taught me about, you know, the open source community and tool design and the set of practices that make it relatively easy to create a kinda an API that is centered around what you might wanna do. And so I don’t, I don’t think I’m particularly good at this, but I’ve at least absorbed some of those lessons and they’ve helped with a lot of these projects. And so we have tools that are, you know, relatively flexible and straightforward for doing a bunch of stuff. And, you know, again, the benefit is that you then can do some fun things that make it easy to scientifically promote these, these repositories.  

Mike    00:13:35    So you can do a bunch of, uh, visualizations that are fun to play with these interactives that we have on our websites. And then you can do kind of big mega analyses, glomming all the data together and looking at what comes out across countries or across languages or across, uh, sociodemographic groups. And those are very intuitive and appealing. Like the analysis that I show people to demonstrate one of our repositories is just about the, uh, female advantage in early world learning. And yeah, I don’t know that how theoretically significant it is from the perspective of like understanding the universals of language acquisition, but it totally resonates with anybody who’s had a kid or been in a preschool classroom and they see the girls talking up a storm. I mean, it turns out the world around there’s a substantial female advantage for toddlers learning language, and I think that’s pretty cool and fun.  

Paul    00:14:20    Well, that’s kind of the other direction. Uh, I was gonna go instead of the tombstone was, uh, to ask about how your approach has changed, because I mean, you, you’ve developed the RSA model, which I’m sure we’ll talk about, uh, which is a, a theory, but now we have this big data world and you can do things like visualize just based on different slices of, uh, demographics, et cetera. And are you, am I right that you’re like kind of playing with that data now and, and taking a more bottom up or data driven approach to, um, develop theories? Or are you, you know, using a theoretical approach and, and playing with the data and meeting in the middle? What, what’s your overview there, your world view?  

Mike    00:14:59    Yeah, there’s, there’s a fundamental tension here, which I’m well aware of. And, you know, sometimes you look at an individual piece of work and it’s like, Hey, you’re just looking at the data and then there’s another time you’re looking at a different individual piece of work, and people are like, this is a terribly simple, simplified, boiled down tiny little model of like, people talking about squares and circles. Why don’t you look at the data? I’m like, well, you can’t do both at once. So, or maybe you can, but it, it takes a lot of work and it a lot of development. So there’s this beautiful paper by Brown and Hanlin from 1970. This is, uh, Roger Brown, the kinda one of the founders of the field of language development who’s at Harvard for many years. He had lots of wonderful students and he was always known as just this amazing writer.  

Mike    00:15:43    And it’s a paper about whether parents correct kids. It’s a kinda a classic paper in a small niche of child language development. But at the end, he just starts, you know, opining, he starts just talking about stuff and he says, I think it’s the last paragraph. He’s like, at this point, I wanna express the distaste that experimentalists will have from this, for this fundamentally observational work that I just did here. They’re gonna think it’s totally uncontrolled and doesn’t show anything about causality. But I challenge them to model also the distaste that the observationalist is gonna have at their little tightly controlled experiment. And really only by putting these two together is the truth gonna emerge. And I feel that very strongly. I mean, this tension is, is just completely evident in my research life because my main teaching contribution at Stanford is to teach an experimental methods course, right?  

Mike    00:16:34    Where I hammer home these issues about causal inference and tight experimental control and construct validity. And my current book project is this textbook on experimental methods, and yet I’ve continually find myself developing these large data projects, which are completely observational Yeah. Have no control over causality and so forth. So I really think you have to play both sides of this methodological, uh, continuum and try to connect between them. And, and the, of course, the connective tissue here is theory is computational theory. And so if you have big data sets and good experiments, you can try to bridge between those using a well articulated computational theory. And that’s of course, the end goal for me, but, you know, takes time to get there.  

Paul    00:17:20    Well, a a lot of the data that you work with is, um, like survey data from parents making judgements or, or maybe judgments is too harsh, um, observing their own children and estimating when they, how many words they learn in a week, you know, things, things like that. But then you also have some like re recorded data, right, right. Which is like transcribed, uh, which is people in their homes and how much children parents are talking to their children. So it’s a lot of like speech data. So the, the quality of that data, I guess is always in question somewhat as well. Perhaps that’s not my real question, but I, it was just a quick question,  

Mike    00:17:57    <laugh>. Yeah. The quality of any data, well, yeah, about a, an internal construct should be in question, right? These are all measures of internal constructs. And so we always have to ask about the reliability and validity of those. And, you know, uh, my whole stick for these repositories is that they’re domain specific data repositories. They’re about individual constructs that people care about. And the kind of basic theory for language development, you know, the one before kind of pre theoretic theory is like, Hey, kids have to hear some stuff in their house, they have to process it, right? And then they have to, uh, have some outcomes. They have to show learning in some way. And so what we do is try to create repositories and data sets around those constructs, and, and each of them is gonna have their weaknesses. So for example, if we’re interested in kids’ outcomes, we can and do archive, like their eye tracking data from in lab experiments.  

Mike    00:18:49    Turns out those eye tracking data, which are direct assessment, are extremely noisy because kids are very noisy. They’re, you know, well, they’re literally noisy at their data are noisy as well. They come into the lab and they do a couple trials and they fuss out and they look at the thing that’s more interesting rather than the language matching thing and so forth. So you don’t get that much data from direct assessment. So, okay, let’s go to parent report. Now you get a lot more data by asking parents about the kid, but you’ve got a bunch of biases creeping in. You’ve got a bunch of worries about that parents not remembering right? Or being biased, uh, because of certain things they think about their kid or about particular kinds of words. So, okay, you go to transcript data and you’ve got yet another set of issues. Um, so you really have to triangulate across these measures because no individual measure is really revealing the construct.  

Paul    00:19:35    Okay. I mean, the, the reason why I asked is because it’s so studying an animal model of something, whether you’re doing it observationally or experimentally in tightly controlled experiments or something, um, you, well, lemme back up. So it’s, so my thought was like, this is a very hard thing to study language acquisition because A, you’re dealing with humans and not, um, non-human animals, you know, uh, b because of all those factors that you just listed, however, it seems like you get, you could get and do, I guess, get massive, massive data from it. Um, or, or have the potential to, I mean, you know, you could, um, I guess you’d have to, well, you, you of you of course know this much better than I like how you would do this, but you know, you could surveil on any given family, uh, as long as their privacy was protected. Right? And if, uh, 10% of all families did that for a year, you’d have way more data than you wanted probably.  

Paul    00:20:34    Okay. <laugh> Okay. Just, just tacit agreement there. Okay. So, okay. All this leads me, uh, up to the question of, it’s an unfair question having, you know, read a bunch of your work and just how many open questions there are. Uh, but the, the big broad question was, is like, roughly where we are in understanding how humans acquire language. And if that’s too big of a question, we could, uh, could reframe it and say like, what maybe has been surprising in the past handful of years that has come out of, you know, recent research based on these new data?  

Mike    00:21:11    Well, let’s, let me see if I can take a shot at the bigger question. Yeah, do  

Paul    00:21:15    It. Yeah.  

Mike    00:21:15    All right. How far are we? I mean, I’m like, let’s start with something that I don’t know as much about, which is the ventral visual stream. Mm-hmm. <affirmative>, I’m an outsider to this research, but I just love seeing it. It seems like we’ve just learned so much in the past 25 years since, you know, I first started taking classes as an undergrad. Uh, our understanding of the pathways that get you from, you know, basic visual input to a variety of different categories and their recognition, uh, that understanding has just been enhanced so much by the science of the past 25 years, right? So we’ve got the human neuro imaging literature, we’ve got the single unit literature, we’ve even got some controlled rearing experiments. Then we’ve got this whole host of new deep learning models that provide pretty good models of, um, you know, general responses across the cortex.  

Mike    00:22:11    Um, there’s just, there’s just this wealth of exciting stuff happening where it feels like from a computational theory perspective, we kind of understand a bit about why it might be organized this way. And then from a actual neuroscience perspective, we, we see the, the organization being measured. And, and so that’s just, I think that’s just tremendously exciting. And then the question is, what do you do when you’ve got a case where you can’t use the monkey models, right? Um, your neuroscience tools don’t work very well for the population. That’s critical. I mean, it’s, new methods are starting to come online and people are starting to figure out how to get awake behaving, uh, fmri recordings with kids, but it’s, it’s really rough. So how do you get the level of precision? How do you get the level of convergence that we would want? So it seems to me that you can’t just do the same experiments and like come up with another clever experimental design.  

Mike    00:23:02    Mm-hmm. I mean, no question that people make progress and they come up with really clever designs, and they do get incremental answers there. But it seems to me that what we need is a bit more in the way of this multi method synthesis where we come up with framework theories, you know, uh, and people have been doing that since really the very earliest days of some of the neural network models. And even before trying to think about what is a feed forward model of the ventral stream look like? How can we, you know, uh, reason about the relation between that and a bunch of different modalities worth of data. So you need model, and then you need to be able to connect between different data sets that have different strengths and weaknesses, including, you know, uh, some more naturalistic freeform data, um, right. That’s kind of inspirational work on the statistics of natural images and so forth.  

Mike    00:23:56    Um, and including, uh, controlled experiments that test individual bits of the, um, the proposed theory. So, so anyway, uh, that, that’s, you know, in terms of scientific strategy, so how far along are we with language development? Well, you know, we’ve got some kind of basic ideas about key principles, <laugh>, but, um, you know, putting those things together into models that actually learn from data and tell us what things are mattering for the acquisition of, of, um, kind of a fragment of language. I, I think we’re, we’re really a little bit low on synthetic theory there. Um, you know, there, there’s some, I think, quite promising proposals. So I, I like a lot of the learning accounts that are, you know, based in kind of, uh, learning individual words and then generalizing constructions from those tho those feel sort of fundamentally, right? And maybe more so as we look at the progress of language models, and we can, we can get to that later mm-hmm. <affirmative>, but getting unifying models that connect between the kind of acquisition and, uh, range of empirical phenomena that people have documented has, has been very hard. And we’re not there yet.  

Paul    00:25:04    So are there, um,  

Paul    00:25:07    Have there been, you know, thinking back to the old days of just theorizing without actual quantify actually quantifying anything, having empirical work, I mean, I guess there are observational studies, but it, it used to be philosophers would just think about it and make a claim, right? Um, and then I, I I, I’m not discounting any of the, you know, computational work that’s gone on since then, but, but have there been surprising, um, new ideas and or results about, um, how we acquire language? I just, because there are so many different facets, right? So like the, there’s the developmental facet of when we start acquiring language and how we start acquiring it. How much of it is social? How much are we just absorbing? How much of it is, is active? So has that story changed? Has the broad, I don’t know. I don’t even know what the story was, and so I don’t, I don’t know how it would’ve changed over time.  

Mike    00:26:00    Well, I, you know, I think I should differentiate here between the acquisition of vocabulary, which people have, you know, said certain things about. And then, then the, and you know, when you say vocabulary, sometimes you’re saying in the kind of narrow sense of like, Hey, did you learn the word forum dog? And sometimes people talk about it in the broad sense, like, when you learn the word justice, are you actually learning something about the concepts in the right way? The world is organized. So, so there’s, you know, there’s both narrow and broad there. And then the place where a lot of the theoretical action has been is actually in discussing theories of grammar and the mechanisms of word combination and, uh, syntactic structure and so forth. So in both of those places, I think there’s been real progress.  

Mike    00:26:44    If you go back to kind of first principles, we’re talking kind of nativism and empiricism, and, you know, those are the kinds of philosophical positions that people articulated without looking at kids or looking at data or doing experiments. So from the empiricist perspective, I think there are some real challenges to the kind of pure associative, um, story about, about early word learning and, uh, about early grammar learning. And, and the most effective of those challenges, from my perspective, come from the social world, right? So as we’ve understood more how deeply grounded kids are in social interaction with others and how much that motivates them, it’s become clear that kids aren’t just picking up word world associations. They’re learning how to use language to talk to other people and how to make their way through the world. So, um, from that perspective that, you know, that, that, I think that’s a challenge to empiricism, uh, on the nativist perspective, you know, there, there’s, there’ve been nativist theories where, okay, you know, much of the kind of conceptual and syntactic structure of language is innate, and really what we just need to do is, um, plug in a bunch of kind individual word forms or something and maybe, you know, learn a couple things from the environment, and then all of that structure is gonna come online and tell us what obviously caricature.  

Mike    00:28:11    But those kinds of views, I think, have been really challenged both by, um, learnability demonstrations that kids can and do, learn from the statistical structure of the environment, and also by proof of concept from a variety of computational models, clearly the most recent language models, but also many before that, showing that actually it is quite possible and sometimes even easy and sometimes even more effective to learn particular task from the environment than to rely on a fragile set of rules. Mm-hmm. So, so anyway, yeah. I, I, I think that there are challenges, this is a long, very long answer to your question, but I think’s great. There are challenges to both positions, uh, that you might have had as a, a priority theorist, and you know, that, that occurred in the philosophical literature. Um, so really, you know, beyond, okay, nativist versus empiricist, I think we need to talk about like, what are the tools the child brings to language learning? How are those, uh, grounded in different aspects of cognition, memory, generalization, but also social cognition, also conceptual and perceptual structures. How do those come together? Uh, and, and that’s a much richer and for more fun story,  

Paul    00:29:19    But all in this messy developmental stage as well, because everything is developing at the same time. It’s like how it just seems impossible to parse all of these things out.  

Mike    00:29:31    Well, yeah, that, that, that’s why, uh, word learning is such a fun case study is because, or  

Paul    00:29:37    Fun, you know,  

Mike    00:29:37    Yeah. You can have a, well, because it’s so central, right? Because you can have cases where what’s critical is the child’s own active exploration. What’s, or what’s critical is the perceptual structure of the environment, or what’s critical is the pragmatics of the social interaction or the syntactic structure. It’s like you’ve got all these different information sources or clues to meaning, and the child can put them all together, really, I think, very flexibly to, um, to, to reason about language use in the moment, and then word meaning going forward. So th this kind of set of interactions is really what makes it fun, because word learning isn’t modular. It’s the most fundamentally central part of what a kid is doing.  

Paul    00:30:18    Well, you use the term flexible, is that the most amazing thing is how flexible our learning is, like in a stage, in a very young stage where we’re cognitively, you know, developing, um, also our language is developing, our social context is developing, we’re actually physically growing, you know, we’re being embodied in the world in different ways, and that’s constantly changing. And, uh, is that, is nothing critical, or should we just be impressed with the flexibility? Or are there critical things?  

Mike    00:30:51    Well, language learning is very robust against developmental variation. Now, there are certainly cases of, uh, developmental disorders that lead to impairments in language. That’s, there’s no question of that, right? But across many, many different caregiving environments across many different, um, for example, uh, sensory disorders, uh, and across many different aspects of kinda cognitive diversity, you still see language emerge and it’s emergence, you know, can, can lead to some, you know, there’s some variation timeline, there’s potentially variation in content and focus, but you see the ability to communicate using language and, and just a dizzying number of circumstances. And that’s, I think, fundamentally amazing. So there has to be some flexibility. You know, if somebody says, oh, social learning is literally the only way you can learn language. Say, okay, well, let’s talk about cases where, you know, the social input is impaired with social motivation is impaired.  

Mike    00:31:52    And you still see language emerge. Somebody says, oh, it’s about, you know, perceptual mappings where you talk about kinda cases where the sensory modality in question isn’t present, and the kids are still picking up something about the structure of language. So, so there’s a lot of robustness and flexibility in the system. Uh, and I, you know, if you think about, um, I, I’m gonna use neural network terminology, but like a loss function that’s about communication in some sense, uh, so ex expressing something to another person, um, then there’s a lot of different ways that you can kind of optimize that loss. There’s a lot of different information sources that you can learn about and use. Uh, and there’s a lot, you know, a lot of different sensory, uh, um, well, yes, there, there’s just a, let me say that a different way. Um, it’s likely that if this loss function is really important, that the organism will use almost anything at its disposal, information wise, cognitive resource wise to optimize it.  

Paul    00:32:54    Okay. Yeah. Okay. So, so does that mean, what does that say about the relative importance of la Not importance, but, um, should we, how impressed should we be with language? You know, we’re always impressed with language because it’s uniquely human, and we’re the best thing in the universe, uh, is language, all that, uh, all that in more,  

Mike    00:33:14    I have to say, I’m pretty impressed. I think that language is pretty incredible for its ability to allow us to coordinate complex plans with other people for our ability to, its ability to, to allow us to externalize our thoughts and augment our cognition, uh, and its ability, you know, to allow us to structure our cognition around complex concepts that otherwise might not be comprehensible in a, you know, in our thinking, we can tag things with, uh, new words and use those as abstractions, uh, and kind of reuse them in, in our mental simulation. And in our, our, our reasoning, uh, we can structure complex propositions and reason about them internally or in external forms. So, so I really do think it, it’s amazing. And one of the fundamental exciting things is that I, I think, right, if you’re optimizing this communicative loss on both an evolutionary and a developmental timescale, you actually get something that’s not just good for coordinating in the moment, but also actually has all these cognitive benefits in terms of structure recognition, allowing externalization, um, allowing, you know, persistent plans over time and space and so forth.  

Paul    00:34:26    Okay. So I was gonna ask you, you know, what the function of language is. I, I know that you have said it’s, it’s definitely for communication, and I don’t know if you’re saying that with the background of the debate between whether it’s for communication or forethought, but I recently had Nick Enfield on, and he’s written a book called, um, um, nature versus Reality. And his argument is that language is mostly for, or the primary function of language is for socially coordinating around, you know, social problems, which you just mentioned, uh, uh, a few minutes ago. Um, but I, I guess the, I don’t have to ask you what you think the function of language is, but it just occurs to me, does there need to be a function of language or thinking, you know, thinking about my children, right? Uh, they just chitter chatter all day and, um, I, I don’t know what the hell their language is for <laugh>.  

Paul    00:35:14    I mean, it’s, it’s like, it’s for them to practice language and, you know, before that it was like, to get things from me. And I think, you know, a large part of language is getting things, getting what you want, um, which would, would fit into that social coordination aspect. But, but now they’re at a later, uh, stage where they’re just yammering on all day, you know? And, um, so c can language be four different things at different stages of your development? I mean, is there, how many functions of there of language are there? Well, what, what can we say about the function or why not have lots of functions?  

Mike    00:35:47    Yeah, this kind of question is really controversial, in part because there’s a lot of ambiguity when somebody asks the question, what is the function of language? So philosophers have a lot to say about distinguishing different kinds of functions, but minimally we can say, is there a evolutionary advantage that’s granted by a particular behavior set of behaviors that are enabled by language that would lead to language ability being selected for, you know, over the past million years, say, uh, speculating about these kinds of things is difficult. It’s hard to find really great evidence, but, um, I think that’s probably the root of enfield’s claim that social coordination might confer selective advantage. And, you know, on the flip side of that, somebody like Chomsky has asserted that the role in, uh, representational thought might be the key grant or selective advantage personally, when you work out the evolutionary story around that, especially, you know, Chomsky at times has endorsed a saltatory evolutionary story where there’s sort of a small number of key changes, evolutionarily, that happened relatively recently, you know, 50 to a hundred thousand years.  

Mike    00:37:07    That that story for me doesn’t seem like it is as likely or as plausible based on, you know, some of the evidence out there. But I’m not a specialist in this. So anyway, that’s, that’s kind of evolutionary for that, is, uh, what is what selective advantage does language confer. But you could also think developmentally, uh, what are the, um, you know, uh, what are the reasons that a child might want to talk? And those are gonna vary from child to child, often social connectedness, uh, you know, getting what they want, as you mentioned and so forth, seem like motivations for some kids. Um, but for other kids, there seem like there are, you know, other, uh, expressive motivations. They like to talk to themselves and so forth. So, you know, th these might be fairly, uh, diverse and variable.  

Paul    00:37:56    I know they don’t wanna just understand what I have to say. Yeah, that’s, I’ll, I’ll leave it at that. I don’t know how your parenting experience has been <laugh>.  

Mike    00:38:04    Well, there, there’s this insight from Michael Thomas, um, which is that the fundamental thing that emerges in very early childhood and even infancy, is this desire to share. And you see that in many, many children. For my kids, I like, I have these moments where the four language they started to wanna share using that declarative pointing with your index finger. And I, I just have these pictures of both kids being like this, like, share an experience. They wanna share their experiences. Yeah,  

Paul    00:38:31    Yeah, yeah. Okay.  

Mike    00:38:33    They wanna know that somebody else is seeing the same thing as them, that tdic attention that Thomas El has written about. And I just think that’s totally right for a lot of kids that they’re doing that kind of sharing. And it appears to be a huge motivator of early language, because, you know, if you look at kids early vocabulary, we’ve done this across many languages in the Word bank book, what you see is that the earliest words are not typically these kind of basic needs words like, change my diaper, um, you know, uh, I need, uh, food of this type and so forth. They feature a ton of people and people’s names. They fe like mom and dad and grandma and grandpa and sister, brother and so forth. They feature interesting stuff in their environment, and they feature social routines like, um, high bye peekaboo and so forth.  

Mike    00:39:23    So kids try to talk in this, you know, by and large, on average in this very affiliated way that allows them to relate to the people around them and to share interesting stuff with the people around them as opposed to just meeting their basic needs mm-hmm. <affirmative>. So that, that, that feels kind of largely fundamental. Although, as I said, there are cases of resilience where you don’t see as much social motivation, and you do see language emerge sometimes on a different timescale. So it’s not, you know, that’s not absolutely necessary, but it’s at least in most cases of typical development, a really strong functional driver.  

Paul    00:39:57    So, you know, I, I mentioned that acquiring language is couched in this crazy complicated development as well. Uh, what is the relationship of understanding development itself, development of the organism to development or acquiring language? I mean, is there, do we need to understand development writ large, uh, to understand language acquisition? Or is, can we just understand language acquisition from a purely psychological, um, uh, perspective?  

Mike    00:40:30    What, I don’t know that there’s one thing that I’d call development writ large. There are lots of different processes of growth and change, right? Right. And, uh, one way of defining this is that language, because it’s so central in cognition, um, because it has all these connections to, you know, other aspects of conception and perception to social interaction, um, to learning and memory. We do need to get a sense of, of that connectedness and how it’s changing over time. Uh, you know, especially when you’re looking at very early word learning, there are these pretty dramatic processes of growth and change that are happening, you know, physical and psychological. And so language does appear to be affected by them. Just to take one example, this comes from work with head mounted cameras, and, and we’ve done a little bit of this work, but they’re, you know, many pioneers of it, Linda Smith and, um, uh, Chen Yu and Karen Adolph and, and then others.  

Mike    00:41:28    Um, and when you look at kids kind of around when they’re learning their first word, they’re also learning to walk. And so their perspective on the world is changing radically. And so, you know, what they have access to in terms of what they can see, and then also what they can do. Both of those things are changing. So they’re starting to pick stuff up and take it over to people and ask about it. Um, they’re seeing a lot more of the social world around them just because they’re a higher up and can, you know, uh, look up at their parents as opposed to looking down at the carpet when they’re crawling. So yeah, there, there’s a, there’s a lot of changes there, and that’s a place where you, I think you see pretty strong coupling between the physical changes and the, um, psychological changes later on.  

Mike    00:42:11    I think you see those things decouple a little bit. And we’ve, we’ve done some kind of large scale research with milestone data showing that early on language and motor milestones around babbling at around kind of gross motor stuff, and even first words seem more correlated, and then they kind of, by the end of the second year, certainly they’re kind of decoupled and a kid will make motor progress, but not language progress, you know, in a particular month or two of of development. So, um, so that’s all to say that that’s just one example of that developmental connectedness and the need to understand a particular piece of development, um, at the critical time. Uh, at other times, maybe the, the critical pieces are around, um, social cognition or around conceptual understanding, you know, so, so yeah, there’s absolutely these, these, um, other correlated developmental trends that you really have to think about as you’re thinking about what’s going on in the context of language learning.  

Paul    00:43:06    Refresh my memory on what, what the, um, end story was about regarding the dimensionality of, uh, development, um, developmental behavior with respect to language. That is the, the idea is that either language, uh, uh, processes in stages with your cognitive functions or everything’s going in parallel. Um, and I think that you concluded that, uh, everything’s going in parallel, that we have all these, um, ability cognitive functions, um, integrating with our language, and we are acquiring language using these other faculties all in parallel. Is that right?  

Mike    00:43:41    Yeah, this is, I think, a, actually a cool case where there has been a convergence across different ways of understanding language and language learning. So if you look at a standard linguistics department, you’ve got your phonology and phonetics, you’ve got your morphology, got lexicon, syntax, semantics, pragmatics. There’s this hierarchical ordering of abstractions, and those are the classes you take. And for a long time, people studied individual aspects of acquisition and individual aspects of language representation kind of around that same stratified hierarchy. And I think now that it’s pretty clear from a couple of different sources of evidence that that is not how human beings represent language. We don’t represent these individual modular components. So on the acquisition, and if you look at variation across individuals, all of these different constructs hang together. Variationally. So a kid who’s good at gesturing and gestures early tends to have a larger vocabulary.  

Mike    00:44:46    They tend to have a larger vocabulary. Uh, they tend to be combining words more. They tend to have more word meanings, so forth. So the language system in, in development is what we’d call tightly woven. Um, the correlations between these things are very, very high relative to almost any within participant correlation you might see with kids. Then if you look at the neuroscience evidence, and this is worked by folks like Ev Fedco, you see that there aren’t differentiations between say, syntax and semantics in the brain. There’s no place in the brain that does syntax but not semantics. And so that really breaks down some of the modularity that been supposed between those two different ways of looking at language meaning versus, uh, the, the, uh, rules of composition mm-hmm. <affirmative>. And then finally, from large language models, we see that the models that do best are the ones that learn more or less everything at once.  

Mike    00:45:42    So you feed in a lot of language and they get better at combining words grammatically. They also appear to get better at, uh, reasoning tasks. Whether that reflects true reasoning, they, they certainly are gaining some abstractions intermediate to some kinds of predictive reasoning tasks. So earlier efforts that aimed to do something like syntactic parsing without meaning really didn’t work that well. And the parses that came out were okay for downstream tasks, but certainly nowhere near the level that a holistically trained model now achieves. So, okay, that, again, long answer, but you’ve got data from acquisition, data from the cognitive neuroscience and data from the computational modeling, natural language processing. That all for me converge on saying, Hey, this stuff is happening together. It works better when it’s together. It’s represented in the brain together. Uh, and it varies together in acquisition. It does not look like these things come apart and are learned sequentially, uh, in series. They, they’re happening in parallel in a, uh, combined way  

Paul    00:46:41    Since you mentioned large language models. Let’s, let’s bring them into the conversation, I suppose. Um, yeah, they, they’re learning everything at once. And, and I should preface this by saying like everything that we’ve been talking about, it’s not like, um, you know, six half year olds, one and a half year olds, they’re not learning written language, they’re not learning getting vector inputs. They’re learning, uh, auditory speech. Um, and I don’t know how much of a difference that would make in a large language model, but, um, the children learn language in a vastly, vastly different way than a large language model. Correct, <laugh>?  

Mike    00:47:16    Sure, sure. Yeah, I’m, I’m on board with that.  

Paul    00:47:18    Does it matter?  

Mike    00:47:20    Of course. Yeah, it absolutely matters. So what we’re seeing with the most massive language models is really something like a proof of concept around certain aspects of learnability and that that can really inform the conversation, but they’re not learning from the same data or even the same sensory information sources. And they’re, uh, you know, the, the kinds of data that they’re being exposed to are different. Obviously the architectures are vastly different, and the scale of the data is vastly different. Yeah. So it’s worth noting that there’s a really big language acquisition literature that made very strong learnability claims under a range of different learning paradigms. They didn’t say, Hey, it would take a couple billion words to learn syntax. They said, syntax is not learnable. And people were saying this in all sorts of, um, major and legitimate high profile ways within the last 20 years.  

Mike    00:48:15    Uh, there’s a, you know, kind of influential series of papers from, um, Partha Yogi and Martin Nak and so forth that, that, that, uh, promoted a bunch of these, uh, learnability results, some of them dating back to the sixties that said, uh, syntactic structures are not learnable under certain conditions. Um, and there was a lot of debate about what those conditions were, but now I think everybody agrees that the syntax coming out of large language models is pretty much perfect. And so from the perspective of like a proof of concept that bear string acquisition of language, uh, from, you know, syntax is possible. Yeah, yeah. We’re, we’re there. Okay. Now everybody can backtrack and be like, well, we meant, we meant within a couple, you know, a couple hundred million words or something under these learning. Yeah, certainly on Twitter. Well, you know, nobody was, no, that’s not fair.  

Mike    00:49:05    Some people were very precise about the conditions of acquisition. Um, so in, you know, the original gold theorem that is cited very widely here, uh, there’s a particular adversarial learning context for learning, uh, context free grammar. And the adversarial learning context is, I can guarantee that you cannot learn a context pre grammar. If I hold back key exemplars for an infinite amount of time, I can always hold back key rules and then surprise you with them adversarial, and then that’s totally reasonable. So those are precise results that have very limited scope in some sense, because that adversarial learning context is obviously wrong. Yeah. And then there was a lot of general kind of opining on the subject of learnability, which was not as precise and didn’t give clear conditions or amounts of data or, uh, constraints on the learning regime. So, great. Now we’re getting a little more precise. Okay. Does anybody wanna make a learning claim on a hundred million words? Okay, well, that’s starting to be about more of the right scope of data. I’m not sure that anybody’s about to draw that line in the sand given how fast things are moving right now, but yeah, uh, what we can conclude from, from learning from a hundred million words is probably very different than, um, you know, uh, 3.6 trillion in the latest or 3.6 trillion tokens. So more than 2 trillion words in the latest Google model. Yeah. That is a proof of concept.  

Paul    00:50:31    Yeah. Well, have you learned anything, um, from the way that large language models function and acquire language? So I’m not gonna say so efficiently because that’s a huge difference, uh, between, you know, humans and large and models is efficiency. But have you, have you, has it changed your mind any, uh, in any way about, uh, how children learn and acquire language?  

Mike    00:50:53    Oh yeah. I, I I think it’s an incredibly exciting time. I, you know, are we at the full understanding we need of what is happening in transformer models that are learning from Yeah, right. Input prediction. No, we don’t, we don’t know. But that’s why it’s exciting. So in my kind of formative cognitive science years, I was reading the kind of rules versus statistics debate. I was deeply inspired by stuff like Jen Saffron’s statistical learning papers with Aslin and Alyssa Newport, where they were showing, again, proof of concept that kids could learn from data and then was really affected by this debate between them and Gary Marcus and others on whether particular phenomena could only be captured with symbolic rules and so forth. So the emergence of reasoning related behaviors, just using the kind of most vanilla way of describing it in these large language models, I think is very exciting and suggests that networks without inbuilt symbolic structures can learn to do very highly abstract tasks.  

Mike    00:52:03    And that that’s really cool. And some of the work that I’ve done, uh, in collaboration with folks like Atticus Geiger and Chris Potts has probed that more deeply and convinced me that many people were wrong about the necessity of inbuilt, symbolic representations. And so I, I learned something fundamentally, my mind was changed. I thought there was a good, you know, so Mike of let’s say 2010, when I published a paper on rule learning, I thought that there was probably an in-built operator for equality or sameness. At least I couldn’t figure out a way to make the model work without one. And, you know, Mike, of 2022, I was at least convinced by the collaborators on the project that you could learn such an operator, and in fact, it would even be causally identical to some tolerance to the symbolic operator, and that that operator could be learned from data in both supervised and unsupervised regimes in a kind of flexible way, um, with the potential for connection to human data. So that really changed my mind and, and I, I think folks who are still kind of saying, oh, but the symbols, you know, haven’t engaged deeply with the new literature on these models and how they can learn abstractions from data. And that’s just so exciting. What a great moment to be in. Hmm.  

Paul    00:53:23    Would you describe the, like, learning of symbolic utility as like an emergent property of the large language of the large models?  

Mike    00:53:34    Yeah. Yeah. It’s clear that there’s some abstractions that are being learned and we don’t know. I certainly don’t know, but I think nobody knows exactly what the granularity of the abstractions are. Yeah. And the largest models, because for one thing, they’re commercial artifacts that we can’t probe, right, right. And mess with in the ways that we wanna, but even in smaller artifacts, we’re just gradually getting the tool set necessary to understand what kinds of things they’re representing, but it’s clear they’re representing something  

Paul    00:54:03    We’ll get to your child with a hat and glasses task in a moment. Well, I’ll, no, let’s just, I’ll just ask you now. So, uh, one of your favorite, uh, tasks, um, and this has to do with a pragmatic inference. So what I was gonna ask you about is do large language models pragmatically infer, and maybe if you could describe what I’ll, I’ll just ask you that first. Um, if you could describe what pragmatic inference is beforehand, and then I’ll ask you on your thoughts, like, so, you know, the larger question is like, what are large language models missing and are they missing pragmatic inference?  

Mike    00:54:34    Yeah, it’s a great question. So pragmatics is, broadly speaking all the stuff about language that’s not the literal meaning. It’s about, the contextual inference is about meaning. So anytime you’re going beyond the literal meaning of an utterance to infer some kind of, uh, extra communicated meaning you’re, you’re doing a pragmatic inference. Classic example is something like a classic, and I should say also controversial and, you know, highly debated example is something like a scale or ture So if I say some of the students passed the test, you might infer that some of them didn’t. But you know, it’s lo logically consistent. I could say some of the students passed the test. In fact, all of them did. Yeah. Or the, you know, the other classic but is not as highly debated. It’s kind of clearly pragmatic as when I write the letter of recommendation and I say, you know, dear sir, please accept my student. He has fantastic penmanship and often shows up on time. You for, there’s nothing else good that I could have said about the poor guy. Is it with this, these are from gr Yeah.  

Paul    00:55:30    Yeah. Oh, gr yeah. You, you cite gr a lot with this, uh, with the Mitch Headberg Hedberg joke work. Uh, I used to do drugs. I still do, but I used to too. So that, that’s why it’s funny is cuz we pragmatically infer that it no, he no longer does. Is that, would that count  

Mike    00:55:47    <laugh>? Oh, absolutely. I I love, you know, Mitch Hedberg is like the perfect comedian for this kinda stuff, right? Cause he’s got these one-liners that are all about assuming you have an alternative in mind that you might not have had in mind. So, you know, I’m gonna mangle it, but my favorite Mitch Hedburg was like, um, they should have a sign that says, escalator now stares. Thanks for the convenience,  

Paul    00:56:12    Escalators cannot break. Yeah,  

Mike    00:56:14    Yeah. Right. So I, I wasn’t thinking about that alternative, but now you pointed it out to me. I’m reasoning about it. It’s like, oh yeah. Ok. Anyway, so, uh, yeah. So, so pragmatics is all of that going beyond the, the literal meaning to the communicated meaning. And there’s, you know, there there’s been a lot of, uh, discussion about the extent to which various different model classes start to do that. Now, I, I remember, you know, just like a funny story. I was at a conference at Google, I don’t know, in like 2011 or 2012, something like this. Um, and I was really deep in this pragmatic stuff at the time, and I gave this talk about pragmatic inference and, and I was talking to Peter Norvig af afterwards and I was saying, oh, doesn’t, you know, doesn’t Google need to do pragmatics? I think you’re just considering other alternatives. And I think we do that. I can’t really tell you about the details, but I’m pretty sure we do that <laugh>. Ok. And I was like, oh yeah, that, that makes sense. You are, you know, a search company considering like which searches go with which, uh, targets. Like, so the kind of key principle of pragmatics is like, what could somebody have said if they had wanted a particular thing? What are other things they could have said and what goals would’ve been consistent with that? Mm-hmm.  

Paul    00:57:23    <affirmative>.  

Mike    00:57:24    And so in some sense, a lot of different technologies, something as simple as, uh, TF IDF information retrieval and, you know, many different search algorithms and so forth will follow some of those basic principles. And then the question is like, are they effectively doing that in communicative context in the way you would want? And, you know, large language models do get some pragmatic tasks, especially the, the bigger ones. So there, there’re folks, uh, who have been doing the probing game and trying to figure out to what extent they can, you know, pass some of these tasks. And they, they, they do actually do a few of them fairly well. I I don’t think they’re, um, you know, great perfect conversational partners. But, but some of the dialogue training and stuff that, that folks have done on top of the chat agents actually does encourage a lot of what seems like inference about the motivations or goals or intentions of the dialogue partner, human dialogue partner. And that’s actually fundamental to pragmatics. So anyway, I don’t think, I used to think, again, this is a place where I was wrong. I used to think you had to build in the pragmatics thing to these models. And then when, you know, context conditional agents like GPT three came along, I was like, oh, actually they’re kind of doing pragmatics in some sense. Maybe it’s not the perfect sense, but it’s kind of baked in and it’s kind of emerging. And that’s pretty exciting.  

Paul    00:58:40    I mean, does that tell us more about the impressiveness of the models or that pragmatics is a less difficult problem to solve?  

Mike    00:58:53    I think it maybe tells us that pragmatics might emerge actually as gr hypothesized from more general processes of, um, well, he, he would, he would’ve said kind of collaborative, um, action maybe, you know, uh, so, so he, he was thinking of talking as a form of rational behavior where somebody has a goal and is trying to accomplish it with, with that, uh, with language. And so when models are trained to complete that goal or to, you know, help you achieve that goal in dialogue, maybe they’re, that’s a fundamentally gr and kind of training to train on dialogue and, and, uh, kind of inferring the goals of rational action. And so then when pragmatics falls out, maybe what that tells you is actually that you didn’t need, it’s not that pragmatics is not impressive, but it’s that you didn’t need specific pragmatic machinery. What you needed is general abstractions related to goals and action and communication or maybe just a huge pile of data about those things.  

Paul    00:59:55    Yeah. Well, let’s take a, a side street to Rational Speech Act. Speech Act. Because since you’ve mentioned GR a few times, um, is, is that, where, is that what inspired the Rational Speech Act, the, the gian approach? Because, so it’s all about in it, it sort, you know, it can specifically address pragmatic inference, right? Because it’s all about the probability of someone’s, um, intended meaning based on, uh, your own understanding and your understanding of their understanding and what, and then you can like recursively what they are not meaning to say what, you know, what is not the, the prior over possible meanings of what they’re saying. So that is that recursive process, is that k kind of out of grace and Bay Bayesian approaches or  

Mike    01:00:35    Yeah, exactly. I I, you know, we were trying, and this was with my collaborator Noah Goodman, we were trying to figure out what it might mean to infer somebody’s intention in context. Like what are they trying to talk about in context? This actually came out of earlier work, uh, where we were collaborating on a model of word learning and we stuck in this node called the speaker’s intended reference. Okay. Like, what is the thing they’re talking about? And that actually happened to play a big role in the success of our model because, you know, kind of classic associated models of word learning thought that kind of everything was being talked about a little bit all the time. That’s how the associations are formed. Everything’s kind of connected a little bit to everything, but that’s not true in a social perspective. Actually, people are only talking about one thing at a time, or maybe two or maybe none.  

Mike    01:01:18    Yeah. But there is a particular thing they’re typically talking about. So we put in this node called intention, and we instantiated it in a pretty <laugh> stripped down, not very social way in that model. And then we started talking about it like, could we flesh this thing out? Could we make it a little bit more interesting in the next version of the model? And of course, that then spawned this whole research program on Rational Speech Act models Oh. Where we lost track of the connection to word learning for maybe about six or eight or 10 years, and just we’re trying to figure out what it might mean to infer the intention and context based on the language that was used. And so what we fundamentally did was just try to write down Grice’s idea about intentional inference in a besian framework. So, uh, the listeners thinking, what was the speaker’s goal given that they said this utterance, and then the speaker’s thinking, how would I get the listener to figure out my goal, which utterance should I choose? Mm-hmm. <affirmative>. And of course that’s a recursive formulation, the listener speaking, thinking about speaker speakers, think about listener. And for us, we, we then tried to ground that out in what we call the literal listener, which is a listener that is a kind of a notional agent that just reasons about the meanings of words that doesn’t think pragmatically. So listener thinks about speaker, who’s thinking about a kind of dumber listener. And so the regress is an infinite, it kind of grounds out after a couple levels,  

Paul    01:02:42    Meaning that, that the dumber listener, um, knows that there, that there are only two meanings of the word bat and considers them with some probability each or something.  

Mike    01:02:51    Exactly. Yeah. So, so, um, if you take this hat and glasses example, this is one of our fun ones that we used in a lot of different studies with kids. There’s a face with a hat and a face with hat and glasses and you say, my friend has glasses that means has glasses and not a hat. And so the list,  

Paul    01:03:11    That’s the, that’s the, that’s the most, uh, chosen, um, answer right? That, that people choose is, um, it’s not the person with glasses and a hat, it’s the person just with glasses. If you say, my friend has glasses, even though they both have glasses. Sorry, I just wanted exactly to spell it out.  

Mike    01:03:25    Yeah, yeah. Thanks, thanks, thanks for, uh, yeah, uh, making the, the example explicit and, and what we find is that adults will choose the one with just glasses and not a hat maybe, I dunno, 75% or 80% of the time. And actually, so will three-year-olds, and in some studies, even two-year-olds will, older two-year-olds will be able to do this kinda very simple pragmatics suggesting that this is really something that is, uh, not too hard. Developmentally, even though this is kind of an unfamiliar new inference, at least, you know, in a new context, kids are still able to get this. Um, so in that kind of example, what RSA says is, the listener hears glasses and thinks about glasses with respect to the two potential speaker goals. Which of the two faces are we talking about? And thinks, oh, well, if the speaker had meant hat or the guy with the hat and glasses, they would’ve, they could have said hat and that would’ve been perfect, but they didn’t have access to that if it was just the guy with only glasses. And so they could say glasses. Now in the real world, they could also say only glasses, or the guy on the left, or so forth and so on. There’s lots of different possible utterances. And so yeah, in those early experiments, what we did is really constrain the world and constrain the set of things that a speaker could say, um, and constrain the listener’s interpretations and so forth to try to get, um, a very simple reference game that we could, uh, make quantitative predictions about.  

Paul    01:04:52    Do LLMs answer that correctly? Have you tested it? You have to input an image, right? Uh, assuming I, I don’t, I I have, I can’t keep up with like all the different, um, models and what they, what you can input and stuff. So apologies that I don’t already know <laugh>.  

Mike    01:05:08    No, no worries. Yeah, I don’t know that anybody’s done a multimodal version of this. Ok. I, um, there are some evaluations on kind of some simple ture tasks, and I do think that the, uh, more sophisticated models do okay at them. I don’t know if anybody’s done that exact one. I don’t have to look. Um, but yeah, in general, they, they, uh, um, I’m not totally up on this, but, but there have been some, some notable successes lately.  

Paul    01:05:33    What, what, what’s your guess on, you know, so would it, would it easily pass?  

Mike    01:05:38    Yeah, I, I mean, I think this is not a hard task and that Okay. You know, um, yeah, I, I I I, I would be surprised if GPT four were not able to do this task.  

Paul    01:05:50    What would you have said four years ago, three years ago? Was it, was that a different mic?  

Mike    01:05:56    GPT three already made me think that, uh, there were a bunch of possibilities for pragmatic inference baked in. Ok. So as soon as we started to see like, kind of really task conditional behaviors in prompting that I, I remember, you know, walking the dish at Stanford with Noah and him saying, you know, it’s just all baked in there. Like, try these things. And we, we played with them. There was a, you know, um, early demo copy, and we’re like, oh, yeah, okay. These, these models really are sophisticated. Yeah. Yeah. Because task inference, like the thing that you’re doing when you’re trying to complete a prompt that’s really very pragmatic. You’re like, what does this person want? What are they trying to communicate to me about the task, the underlying task and the kinds of completions they want? So, um, the same abstractions that a model has to learn to do that task well about goals, uh, you know, so from the prompt that we see to a kind of abstract set of goals that the user has, that’s very much a pragmatic inference that, that reverse reverse inference about goals. And so I, I think, you know, that I would’ve said no as like, like, um, yeah, 20 18, 20 19 probably. And then after that I’ve uncertainty started to go up until it’s pretty much a yes.  

Paul    01:07:12    Have you, you know, I think a lot of these, um, uh, a lot of, you know, people, what’s the term? Like, everything is AI until it becomes solved, and then it’s not AI anymore. There’s some term for it. Like, it’s, it’s a moving goalpost anyway, it’s a paradox. But do you think that there, there’s a lot of like people, you know, saying, well, it can’t do this, so, uh, you know, it’s no good. And then they build a bigger model and it can do it, and then you have to like search for the next thing it can’t do. Is that a wise game to play?  

Mike    01:07:43    Well, we have to have some kind of benchmarks and understanding of, uh, what capacities we care about. And I’m not so much into the line drawing. Mm-hmm. <affirmative>, uh, you know, there, there’s this old idea of the Rubicon, right? Like, uh, um, there’s like 19th century philologists saying like, you know, language is the Rubicon between man and beast or something, <laugh>. Uh, and I think that that’s, that’s not a great game to play. Like we’ve seen that play out in the comparative literature where, you know, somebody says, okay, like, you know, language is that, that boundary. And then you get a dog that knows a hundred words and like, well, okay, but a hundred words doesn’t count. So like, somebody trains their dog to do a thousand words <laugh>. And then does that count like, well, okay, there are a lot of self-respecting three-year-olds that maybe are only around that level. So I, I think it’s, it’s very tricky to draw the lines, but it’s very useful to have some kinds of benchmarks, tasks and tests that you think really do reliably provide evidence of a particular capacity. And so triangulating the, you know, um, what sorts of representations a model appears to have access to is, is really very interesting.  

Paul    01:08:57    That’s why I always draw my lines in sand at low tide near the water, you know, and it’s just really easy to start over. But I I, I brought us away from, um, the rational Speak Speech Act, uh, model. And I, I, I want to come back to it because I want to try to think about how to think about that model with respect to large language models, because RSA is like a normative model of how we might implement these things. Um, and it’s a theory driven model, whereas a large language model is this kind of associate associative connectionist, somewhat theory free. Uh, although you could talk, you could say that there’s theory in the architecture, et cetera, but almost brute force driven. How do you think about something like the R s A model that you continue to, uh, work on versus these large language models?  

Mike    01:09:47    Well, so maybe one way to start is to think about, you know, what the goal of a model is in a, in a, um, science, and there’s lots of really interesting philosophy of science about this. But you might think about like a cognitive model as an artifact that you put in some relation to a target system, and then your reason about the relationship between the artifact and the target system, in part by kind of encoding assumptions about the target system and the design of the artifact. And in part, by then having access to a set of strategies that you could probe the behavior of the artifact and look at, you know, how that might reflect on certain, you know, aspects of, uh, it’s performance. So then you could think, okay, is a large large language model a useful scientific model? Well, um, GPT four isn’t because we can’t poke it, um, in a lot of the ways we, uh, we want to, we don’t know what went into it.  

Mike    01:10:46    We don’t know its architecture. So there’s a lot of basic guarantees that are not being met by some commercially available large language models. So, okay, how about alpaca or, you know, LAMA or whatever this is getting closer. We could start to think about using that as a cognitive model for some tasks. And that’s, I think, pretty cool and interesting. And we gain access to certain kinds of evaluations that we wouldn’t have otherwise by the, just the sheer capacities of the models. Mm-hmm. <affirmative>, which is cool, right? So, so they’re useful for some things, but we also really don’t understand fundamentally a lot of things about the training data, about the, you know, uh, representations and so forth. So a model like RSA is probably the extreme end of the simple cognitive model side, right? This is written down in a couple of equations. Um, early versions we played with really mostly on PA paper and pencil.  

Mike    01:11:36    We coded them up in different ways. Eventually it became important to code them up in, in, you know, larger and more kinda data connectable ways. But, um, that was a lot of the fun of that development process for me is that you could kind of math out what should happen in particular situations. And so there’s a huge amount of transparency about the assumptions and, uh, lots of ability to extend and modify and compose with other kinds of, um, reasoning models or other sources of information, um, or other reasoning tasks. So there’s a lot of flexibility in the kinds of relations you can learn between that teeny tiny artifact and the cognitive system of interest. So lots of benefits there, in other words, but there also real major costs because the models aren’t that capable. You have to constrict the language and the world and encode them in very specific ways in order to get anything out.  

Paul    01:12:34    Yeah.  

Mike    01:12:35    So I, I, I guess I see them both as different cognitive modeling strategies with the scale being a kinda a key variable, uh, that, that controls what, what you can do. Last thought on this is that one reason that RSA has been useful is in part because of this small scale, which has allowed it to be adopted by many groups that are working on these phenomena. And some of those groups really do the thing that I was talking about, paper and pencil wise and others connect it to larger reasoning issues or even to NLP systems and incorporate those insights. So, you know, that, uh, that kind of simple modularity has been, you know, to the extent that RSA has been successful, that’s been part of the success. Um, we’ll see, you know, in LLMs it’s more the, um, the power of the general framework that makes it intuitively appealing to, to try to use them as, as cognitive models. But as we’re seeing, there’s a lot of, a lot of, uh, real scientific challenges that come with it around reproducibility, replicability around, um, just the scale of resources necessary for, uh, training from scratch, which is critical. Yeah. Yeah. Um, and, and even around like what are the right probes to better understand what’s going on with the models, because that’s not totally clear, uh, yet.  

Paul    01:13:49    Well, so you were on, um, a recent review. You had mentioned the ventral visual streaming and how far we’ve come understanding, um, the, and, you know, convolutional neural networks have been a large part of that. And you’re on a recent review, um, from the Dan Yeaman’s group, I believe, and p and a s you know about that. Um, I believe you’re on it. Am I, am I right? Yeah. Um, and, and, and so you mentioned ET ko, she’s also been on the podcast and we talked about, uh, her results and others that have correlated the predictive aspect of large language models to the predictive, uh, next word prediction of our brains. And that is a, that’s a predictive, um, aspect. So first of all, just before, cuz I want to ask you about prediction writ large prediction as a, uh, explanatory, uh, as, um, goal of science, but what, you know, what do you think of that work then? Um, thinking, you know, compare as we were comparing RSA to these large language models and we don’t have access and you know, everything that you said about the large language models, what do you think of that avenue of research that kind of is parallel to the ventral visual stream and convolutional neural networks, uh, versus, um, our, our language area language, uh, brain regions and EEG data and stuff is to large language models.  

Mike    01:15:05    Yeah. So, so first I should just say about this, um, piece of work that you mentioned. This is Chang and uh, he was really interested in unsupervised models of Right, right. Visual learning. Yeah. I, I was involved there because I was helping them work with and think about the developmental data they use in that paper. Cool. Say Corpus. This is a corpus of head, uh, head mounted videos that we, um, uh, collected from, from young children. So the upshot of that paper is that you can get some pretty good neural productivity from unsupervised visual learning models. And, uh, for me, a real strength of the paper, you know, irrespective of anything that I might have contributed there, I certainly didn’t contribute to this. Uh, it was the systematicity with which chu uh, had gone through this literature. So it’s not just, oh, my model wins at neural prediction, but rather, um, unsupervised models that learn in a variety of different related but distinct ways through prediction and, you know, recoloring and infilling and so forth, um, all seem to relate to brain data in really interesting ways.  

Mike    01:16:14    So there’s something about this general task. So as a scientific strategy then it was to understand commonalities and differences between models and use that as a, a basis for inference. And I think that that’s very exciting. So prediction is one, you know, brain prediction is one way to evaluate models. And if you can then look at differences or similarities and prediction accuracy to brain data or for that matter to behavior data, which he did a little bit of in that paper mm-hmm. <affirmative>, then you have a lever with which to understand how differences between models, differences between artifacts and their assumptions and their architectures might relate to, uh, you know, human cognition and behavior. So, so I guess I see, you know, the, the most stripped down version of the prediction task is just, hey, here’s a really high dimensional data structure that I can kind of compare to and use as a benchmark or outcome that goes beyond performance benchmarks, like achieving an NLP task and goes beyond the low dimensional behavioral outcome predictions, which often from cognitive science are, don’t have enough data to really differentiate models. There’s some counter examples to that, but so, so that’s the kinda stripped down version is just like, brain score is amazing as a framework. Brain score being this comparison of models to brain data and scoring them on their, um, fit to how that brain data. So it’s amazing in part, just cuz you’re bringing really high dimensional data to bear on, on understanding, uh, which representational ma uh, assumptions of the models matter.  

Paul    01:17:44    One of the reasons you’ve been on my list for, for a long time, I have, I have a really long list of people that I wanna invite on the podcast, and you’ve been there a long, a long time. But I happen to see, um, I, I had Jeff Bowers on recently, Jeffrey Bowers and you know, he argued that, uh, we need to start testing large language models like, uh, psychology tests as, uh, as we do in psychology, as you do in psychology. And you had a similar sort of, I guess, tweet thread, uh, along the same lines. And, um, this was related to the usefulness of, uh, using prediction as a proxy for comparing things and, and for explanation. Uh, um, and, uh, one of the, well one of the things that you warned against, I, I suppose is that there’s a problem with saying that, uh, large, this particular large language model has this particular cognitive function. It can, uh, relationally reason or something like that. Um, so what’s the problem with saying that a large language model has a particular cognitive function?  

Mike    01:18:53    Well, so from a measurement perspective, I think all of that tweeting that I was doing was really sub-tweeting people who are screenshotting language models and saying the blood model does X or doesn’t do X, ok. Which is, you know, the equivalent of the cognitive scientist taking a video of their kid at the dinner table and they’re like, oh yeah, my kid does relational reasons, right? Look at the, and an anecdote is not, uh, a dataset. So I was suggesting that the minimal thing that you might wanna do in evaluation is follow some psychology best practices, like having a control group that is matched a control condition that is matched in all, but the key way to, uh, the experimental condition where you actually evaluate a particular test. You also, of course wanna look for data contamination to make sure that your test hasn’t been used before in a very similar or even identical, uh, form in the literature so that the language model hasn’t been trained on it.  

Mike    01:19:48    Uh, you might want to use, um, multiple different items. So the surface form of the item that you’re using doesn’t contaminate the task cuz you might see success in a, a reasoning task that’s instantiated in a particular kinda common item and then a failure in a more general item or a, a decontextualize version of the reading task reasoning task. So I just, I was really kind of ranting on the topic of experimental methods because that’s what I teach and am passionate about. And so when you get these screenshots of like model can’t do causal reasoning or can do theory of mind or whatever, the truth is often more complicated because these models may vary widely in their behavior based on how you’re asking the question, what the materials are and so forth. Just, just like kids as it turns out.  

Paul    01:20:33    Meanwhile, you’re, you’re, uh, drawing cartoons of all of the psychological, uh, uh, approaches along the, the tweet thread. So I’ll, I’ll link to that, uh, tweet thread in the, in the show notes so people can read the full thing and visualize what you were just saying. Um, I know that we only have a few more minutes. What’s more fun, uh, building data repositories, um, and doing open science, ensuring that your data is structured well so that it can, uh, be shared and, uh, so that we can all be, um, good open science practitioners or, uh, not worrying about that and asking the questions that you want to ask. And doing non reproducible, uh, perhaps non reproducible, uh, science.  

Mike    01:21:15    What’s more fun? Well, there’s a <laugh>. What’s more fun? I, I mean there’s a version of open science that I, I worry is like the, the open science that everybody is pushing for, which is fundamentally boring. It’s compliance based, open science. It’s, yeah, okay. At the end of your research project, you gotta fill out all these forums, document your thing, you know, put the link there, upload the thing here. And that is kind of boring. I mean, it’s really important. It’s critical, don’t get me wrong, but it’s not fun. Compliance is not like the way I wanna spend my time and I’m either monitoring compliance or doing compliance work on my own science. Yeah. But the alternative version is really like, I take my research program to be an argument for the idea that if we follow open science principles, not in this compliance based sense, but in the broader spirit of them, which is to collaborate, to make open, to make accessible our research, we actually enable new discoveries that are more interesting.  

Mike    01:22:13    And that’s what’s fun is when we get data from labs around the world, all working together, sharing their materials, their code, their data openly in service of a particular scientific goal like the Many Babies Consortium, which is, uh, this collaborative network. When we do that, we have a pre-registered hypothesis. We’ve thought through what we wanna do, you know, hundreds of people around the world that have looked at our stimuli and often found them wanting in different ways, right? Uh, then we see something fundamentally new and exciting. Mm-hmm. So it’s that power of openness to fuel collaboration to fuel the pooling of effort and to allow us to go beyond what we were doing before, not just to make what we were doing before a little more solid and a little bit more compliant. So that I think is fundamentally fun. The fun I have is it’s like the kid in the candy store or the, you know, Christmas morning, it’s opening up the data set that wouldn’t have been possible without those principles of openness and sharing, uh, and the analysis that wouldn’t be possible if we weren’t reproducing what others had done and building on it code wise and data wise.  

Paul    01:23:16    Okay. Okay. I promised I would let you go right at the hour, and I’ve taken us to the very last minute in a few seconds. So Mike, I, we could have talked a lot more about a lot more things, but I really appreciate your time and I know you’re a busy fellow, so thanks for your time.  

Mike    01:23:29    Oh, thank you so much. This is a pleasure.  

Paul    01:23:46    I alone produce brain inspired. If you value this podcast, consider supporting it through Patreon to access full versions of all the episodes and to join our Discord community. Or if you wanna learn more about the intersection of neuroscience and ai, consider signing up for my online course, neuro ai, the quest to explain intelligence. Go to brand inspired.co To learn more, to get in touch with me, email Paul brand inspired.co. You’re hearing music by the new year. Find them@thenewyear.net. Thank you. Thank you for your support. See you next time.