Brain Inspired
Brain Inspired
BI 139 Marc Howard: Compressed Time and Memory
/

Check out my short video series about what’s missing in AI and Neuroscience.

Support the show to get full episodes and join the Discord community.

Marc Howard runs his Theoretical Cognitive Neuroscience Lab at Boston University, where he develops mathematical models of cognition, constrained by psychological and neural data. In this episode, we discuss the idea that a Laplace transform and its inverse may serve as a unified framework for memory. In short, our memories are compressed on a continuous log-scale: as memories get older, their representations “spread out” in time. It turns out this kind of representation seems ubiquitous in the brain and across cognitive functions, suggesting it is likely a canonical computation our brains use to represent a wide variety of cognitive functions. We also discuss some of the ways Marc is incorporating this mathematical operation in deep learning nets to improve their ability to handle information at different time scales.

0:00 – Intro
4:57 – Main idea: Laplace transforms
12:00 – Time cells
20:08 – Laplace, compression, and time cells
25:34 – Everywhere in the brain
29:28 – Episodic memory
35:11 – Randy Gallistel’s memory idea
40:37 – Adding Laplace to deep nets
48:04 – Reinforcement learning
1:00:52 – Brad Wyble Q: What gets filtered out?
1:05:38 – Replay and complementary learning systems
1:11:52 – Howard Goldowsky Q: Gyorgy Buzsaki
1:15:10 – Obstacles

Transcript

Marc    00:00:03    If something interesting happens, uh, like if I clap, um, it doesn’t leave our experience immediately. It lingers for a little bit. If things were able to change more quickly, I mean, there’s, there’s look, if this is really a theory of the brain, right? There’s thousands of experiments that need to be done. Listeners might reach their own conclusions as to why the brain and the mind choose to do this. But there’s no question that it does in many, many, many circumstances. So my answer would be,  

Speaker 0    00:00:43    This is brain inspired.  

Paul    00:00:56    I’m Paul. Hello? Can you sense that word? Hello, echoing into your past. If you could listen into a certain set of your neurons called time cells, it would sound something like, hello? Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. As the word recesses into your memory. That’s my poor attempt to mimic an exponential function of hellos. And that kind of representation is the focus of mark Howard’s research. Mark runs his theoretical cognitive neuroscience lab at Boston university and over the past decade or so he and his colleagues have produced a little explosion of work based on this idea. The idea is that we have lots of overlapping populations of neurons that represent time and events and memory on a compressed logarithmic scale, which is a handy way to represent things over a wide range of time. Scales. Mark has been mapping psychological and neural data to a mathematical function called a Lalo transform, which mark argues as a fantastic way to represent not just memory, but all sorts of cognitive functions, like our S spatial cognitive maps and more in part, because the Lalo transform is pretty straightforward to implement in populations of neurons and because it’s flexible and general purpose.  

Paul    00:02:16    And as you might expect, populations of neurons in our hippocampus, an area classically important for memory and spatial navigation, cognitive maps, those neurons map onto this mathematical function quite well. But its signature has been found now in many parts of the brain, which would make sense if it’s a fundamental computation our brains use. And if it’s a fundamental computation, our brains use, why not implement it in deep learning networks, which is what mark is also doing. So we discuss many topics surrounding these ideas. One quick note, as well throughout the episode, you’ll hear me use the term Lalas, which is a mistake due to my naivete. Uh, mark later told me that the correct term is always Lala when talking about the transform and Lalas refers to heat equations and mark was too polite to correct me all those times that I used the term.  

Paul    00:03:05    So don’t make the same mistake I did over and over. And mark takes a couple guest questions. One from Brad Weibel, who’s been on the podcast and one from a Patreon supporter, Howard. Oh yeah. Patreon. If you value what you’re hearing, that’s a great way to support brain inspired and get your questions asked on the show among other things, my deep gratitude to you, Patreon supporters, or you might consider taking my online neuro AI course, which dives deeper into the conceptual foundations of many of the topics that we discuss here on the podcast. You can learn more@braininspiredotcowhereyoucanalsofindtheshownotesatbraininspire.co slash podcast slash 139. All right, enjoy Mark Howard. Mark. I wanna thank you. Um, I’m gonna read, uh, the first sentence from one of your papers because it gave me a delightful, uh, moment and it has nothing to do with what we’re going to talk about today. <laugh> but this is it’s actually, <laugh> the, the intro sentence to one of your chapters starts human babies while adorable are remarkably incompetent. And then you go on to do, do sciencey things in the paper, but uh <laugh> yeah, I just, it made me laugh at the time. It’s making me laugh now. So I appreciate that. I don’t, I don’t know. Was that in an intentional, um, put a smile on their face sort of intro?  

Marc    00:04:24    Yeah, I, I think, um, yeah, no, I, I try to put in little subtle jokes. That one was a little less subtle than my other jokes, which usually, uh, acquire no experience, but yeah, it’s um, and it really struck me having kids actually how they know nothing. Right. Um, everything, everything they have to learn, they have to learn how to stick their hand into their mouth. Right. They can’t do that initially. So, uh, memory and learning are really important, I guess one might say,  

Paul    00:04:52    Well anyway, thank, thank you for that. It really made me smile a week ago or so, so, uh, there’s a lot to talk about here. I guess I can start with, it’s not the same. It doesn’t have the same ring as turtles, but I suppose it’s Lalo transforms and LA plus inverse transforms all the way down in cognition. Is that right?  

Marc    00:05:13    Yeah. I, I, here’s how I would, uh, yeah, that’s the, that’s the hypothesis I’m trying to argue for? So I, I think, um, if I were to summarize, uh, my large scale position on cognition right now, I would say that I think the, the functional unit of the brain is not the neuron, uh, nor the synapse, um, it’s, um, population of many, many neurons that cooperate to represent quantities and then operations data, independent operations on those quantities to enable us to think. And, um, and you know, we’re, we’re, we’ve now observed. Um, and you know, we should probably go into the weeds a little bit about this at some point, uh, we’ve now observed, uh, Lala transform and then inverse pairs. And we’ll talk about that more, uh, for time and space and, uh, I don’t see any reason why one couldn’t use reuse that same, uh, sort of computational language for many, many, uh, different, uh, quantities. So yeah, I, I, I, uh, I wouldn’t say it’s turtles all the way down, but there’s a lot of turtles. Uh <laugh>  

Paul    00:06:22    So there are a lot of turtles. That’s true. That’s true. Maybe we can back up then. And when I said Lalo, you know, Lalo and Lalo, uh, that immediately loses people of course, but, uh, may, maybe you can kind of introduce in words what the main idea is. And I also would love to hear how it originated, how, how, you know, the Lelos transform and inverse came to be the thing that, that you used the mathematical equations to then search for et cetera, in, in neural activity.  

Marc    00:06:50    Yeah, sure. So, um, so I, I was, um, uh, I’m I think one of the reasons that I have a relatively unique perspective on AI and neurosciences that my background was really in cognitive psychology, my PhD, we learned a list of words and remembered things. Um, and I had been working for a very long time on, um, uh, set of equations to describe behavior and, and recall tasks, uh, called the temporal context model. Um, and I got to a point where I just was convinced the equations were just wrong, right? That like it wasn’t working. We were trying to extend it to, it was working really well, what we designed it for, but I was trying to extend it as, you know, a theory of a whole bunch of stuff. And it was just failing in a bunch of different places. Um, and I came into work one day and, um, uh, my, my, uh, postdoc at the time cardiac Shanker, uh, was there and I was like, we gotta throw this whole thing out.  

Marc    00:07:40    Um, and I realized that all of the problems we were having had a common solution, um, and the solution was as follows if, um, if the brain, and this is turns out not to be a unique, uh, it was, it seemed really new to me, but, uh, uh, it was not actually a unique insight that if the brain had access to a representation of the past, that was ordered like a timeline, uh, like a musical score, uh, you know, with the present, uh, you know, at anchoring one end, uh, and then the past trailing behind. Uh, and if you could record what happened when, um, as a function over that, uh, recent past, and if you could arrange for that function to be, uh, compressed in a particular way, if it could be log compressed, like, uh, Weber, ner law. And perhaps we can talk about that later, um, that you could then make sense of a whole bunch of problems that we were stuck with.  

Paul    00:08:34    Right. But you knew, you knew like that, you just said that it, you know, it wasn’t new, it was new to you, but you knew about the Vibra ner law and you knew about logarithmic compression, right. Ideas in psychology. And I think even neuroscience at the time probably.  

Marc    00:08:48    Oh yeah. Yeah. I was just, I just, hadn’t been thinking about any of those in my day to day work, right? Yeah. So there’s a, there’s a thing like, you know, uh, you know, being a working scientist, you have a set of problems you’re working on and you know, that, that, like, I didn’t need any of those tools to solve the problems I had been working on up to that point. So I hadn’t been thinking about them. Um, and so anyway, so I, I was, uh, I came into work and I was like, um, uh, I, you know, I was like, this is, this is the thing we have is bad. We need this new thing. And I drew a bunch of pictures and I spent a while trying to explain to, um, card what I was trying to get at. Um, you know, and I drew pictures of, you know, what I understood the timeline to be.  

Marc    00:09:24    And I said something like, um, you know, we need to remember the past as a function. And he said, oh, I know how to do that. Um, Lalo transform. And so, and, uh, and so I went, what, and, you know, he explained it nice and slow to me over and over again for like, you know, about an hour. And I went, oh yeah, that’s a great idea. Um, and then, you know, we, we, um, worked things out pretty good, mostly cardic. And, um, and then we spent the next 10 years, uh, me and many other people spent the next 10 years sort of evaluating the hypothesis in the brain on we’ve extended from time to other variables. And it’s just been incredibly fruitful  

Paul    00:10:02    Well, but you started with time and, and we’ll go into time more, but, but was, what is it immediately clear that it, that it would extend to other cognitive functions and variables? And  

Marc    00:10:12    I was, I was pretty sure about that actually, the very first talk, one of the very first talks I went to, um, as a graduate student, was Matt Wilson talking about play cells. Um, and at the time I was working on, um, you know, temporal context model had, um, was basically the temporal context model basically is using, uh, inputs with a leaky integration to add up, to make some, uh, slowly changing temporal context to remember the past mm-hmm <affirmative>. Um, and, you know, I, I, wasn’t a great physics student, but I knew that the first integral of velocity is position and I knew I was working on an integrator. And so that was one of the first things that occurred to me is that, um, you know, these place, alls in the hippocampus, which are supposed to be important for episodic memory, uh, time and space ought to have some sort of similar basis. So, yeah, I I’ve been all in on that since, um, like first week of graduate school.  

Paul    00:11:02    Okay. But yeah, but at the time you were, so when, when Lala was said, uh, you guys working on memory,  

Marc    00:11:09    We had already actually tried to. So even in the context of the temporal context model, um, you know, which ends up having problems, we had taken really seriously that there ought to be some analogy between space and time. We did like a 2005 paper with my castle Mo um, where we described everything that was known kind of about, uh, firing of cells in the Enal cortex up to 2005 for this was also like one year before the Mosers found grid cells. So <laugh>, it became a lot less, uh, you know, comprehensive in like six months, but at the time it was, uh, it was pretty nice. Um, so yeah, we we’ve, I’ve been thinking about time and spaces really closely interrelated the whole time. Yeah.  

Paul    00:11:49    Um, you and  

Marc    00:11:50    Einstein, the, the data’s born us out. Uh, yeah. Yeah. Well, and Randy Gallow and many other people sure. And Howard, I can Baum and, uh, Tel toing and, uh, you know, I’m sure I’m leaving out a bunch of people.  

Paul    00:12:01    Yeah. Well, let, let’s go. I don’t know if we’re at the right place to do it, but do you want to go over the, the kind of big idea of what a Lala transform is and, you know, get into the weeds as much as you need, but, um, and, and why, you know, this is a good model for representing functions in the brain and specifically for memory at this point.  

Marc    00:12:20    Yeah. Well said for time here, let’s, let’s talk for a minute about what the LA plus transform is not. And let’s talk about time. Uh, let’s talk about time cells. I assume your viewer, your, your listeners don’t really know that story. Good.  

Paul    00:12:33    I don’t think we I’ve, I don’t think we’ve, I’ve ever talked about, uh, time cells on the podcast, so this would be a great place to start.  

Marc    00:12:38    Okay. Yeah. So, yeah, let’s start. So I, I suggest that we start with that and then, uh, we’ll get to, uh, Lolo bit. It turns out that Lalo and time cells are really, really closely mathematically related to one another and we’ll get to that. So if something interesting happens, uh, like if I clap, um, it doesn’t leave our experience immediately. It lingers for a little bit. Um, and in, uh, rodent experiments and monkey experiments in, uh, recently human experiments, uh, we find also that events that happen in time, uh, don’t immediately disappear. Uh, what they do is they create a sort of a sequence of neural states, uh, and particular they’re these cells that are now referred to as time cells, uh, that fire sequentially in the time after some triggering event. Right. So that if I, as in, oh, in different stimuli, if I chirp instead of clap, um, there’s different, there’s distinguishable sequences that are triggered, right?  

Marc    00:13:37    So if I’m five seconds after something happened, I have some set of cells that lets me decode that that thing was five seconds in the past and which thing it was and how far in the past it was, because there’s a sequence, there’ll be different cells before there’ll be different cells after, and because different, uh, stimuli, different events, trigger different sequences. I can tell what happened when in the past. And so those are time cells. Um, and we’ve now seen those in lots and lots and lots of, uh, brain regions. We hear meeting, uh, lots of people, um, uh, uh, in many different labs in many different species, uh, for sure. Hippocampus, for sure, uh, media prefrontal cortex, uh, for sure, uh, lateral prefrontal cortex, for sure. Many different regions in the stride item. It seems like this is a major thing the brain is doing as we go through our lives is remember, uh, the recent past.  

Marc    00:14:23    Okay. So the question is, how does that come about one, the first thing you might think of? And the first thing we thought of, uh, was that the time cells, because they fire in sequence, perhaps they should chain one to the other, right. In order to, uh, build out a sequence. So the time cell for one second should connect to the time cell for two seconds and three seconds and so on. Um, and if you take the equation seriously, it turns out to be really hard to get, um, uh, to get sequences of the right mathematical form, right. Uh, and there’s this, uh, there’s this requirement that, uh, in order to get the equations to work in order to get the behavioral models to work, uh, you need to obey the Webber ner law. You need to have a log scale. And for technical reasons, it turns out to be extremely difficult to get a reasonable neural network to, to do that right, just by connecting the neurons one to the other.  

Marc    00:15:14    Um, and so cardiac, uh, cardiac’s uh, argument was that, well, let’s not do that. Let’s just, uh, compute lap plus transform of the function of what happened when, and then we could invert it. And so the LA plus transform, uh, as a neural network is trivially easy. So, um, I’ve, I’ve heard you talk about R NS on this, uh, uh, podcast a couple times. So I, presumably I can talk about an R N yep. As an R N LA plus transform is just a diagonal, uh, matrix. Right. Um, it is the simplest possible R N and, uh, basically the, the numbers on the diagonal control, um, the rate at which things change, right. Um, you can think of like a, a little recurrent network. Um, and the idea is that because you have a bunch of different rates, uh, corresponding to a bunch of different times, you’re sort of tracing out a line that maps onto the wet cross when thing, uh, that you want, the, the timeline that you want, uh, to come out and that time cells appear to be representing.  

Marc    00:16:13    Um, and in that diagonal form, um, you have you just choosing the spacing of the time cells choosing the rate at which the sequence unfolds just amounts to choosing the numbers along the diagonal. Right. Okay. There turns out to be a right way to do that. And so it’s an extraordinarily simple R N uh, that is really, really easy to compute. It’s just a diagonal matrix, right. With, uh, some time con along the diagonal. And then after that, you can, um, attempt to, uh, invert the Laplas transform and you get out sequences of, uh, time cells or, or whatever you want. So to generalize from time to space, uh, you just need to have that diagonal matrix, uh, and have it modified by, uh, velocity of something. Right. And that’s like, uh, you know, um, uh, that’s, you know, the insight from, uh, you know, freshman, uh, freshman, uh, classical mechanics, right? <laugh>, uh, you know, you multiply by velocity on the right hand side and you integrate it, you get out position, right. Um, so you end up with, um, uh, LA plus transform as a function of position or time, or any other variable that you can use. Any, any other variable you can get access to the time derivative above  

Paul    00:17:18    All, all on this same logarithmic scale.  

Marc    00:17:21    Yeah. To the extent you can choose the time constants to be, uh, like, um, literally the, the, the time constants and the rate constants, uh, have to go on a, in a geometric series, they have to go like 1, 2, 4, 8, 16, uh, not, not usually with a factor of two, but, you know, it’s, it’s hard to do the arithmetic in your head. Um, but if you do that, then you’re guaranteed that whatever quantity you’re representing is first of all, a function, right. Lelo is a basis set that describes any function over, over the line. Right. Uh, and then, uh, uh, by choosing the rate constants in that way, you’re guaranteed that you have sort of a, a log rhythmic compression of that line. And so, yeah, I think this is extraordinarily general. The other thing I wanna say about the plus is, and, um, we need not go in further into the weeds, um, but I’d encourage people to, uh, like actually, this is how I learned a lot about Lalo.  

Marc    00:18:15    Go look at the like Wikipedia page for LA gloss, transform, you know, there’s a reason people have studied this for years and years. There’s just so many computations that are really trivial to, they’re much more straightforward to do in this integral transform, uh, domain. So like, you know, you can convolve functions, uh, you can, which is analogous to, you know, uh, adding probability distributions. Uh, you can subtract functions, you can translate functions, um, uh, uh, you can, uh, uh, uh, take derivatives of functions by multiplying, uh, by S uh, you can, uh, integrate functions by dividing by S and so you can build up, uh, velocity and acceleration and all kinds of cool stuff.  

Paul    00:18:54    So it’s a very like general purpose, uh, transform in which, which can be seemingly implemented fairly simply newly plausibly.  

Marc    00:19:03    And, and we’re pretty sure is in the noggin at this point, right?  

Paul    00:19:06    Yeah. You guys have, uh, we’ll, we’ll get to, um, I, I get, well, maybe we’ll talk about it now. I just wanted to put it into context, like, in terms of your clap to, uh, audioly visualize, visualize this, right. So like, uh, in terms of time cells, right. When you clap, um, the sequence of time cells would be something like, clap, clap, clap, clap, clap, clap, clap, clap, clap on into the past.  

Marc    00:19:34    Yeah, nicely done. Yeah. The sequence slows down. And, um, we can see that really clearly if you, if you make a plot of time cells sorted by the time at which they peak, uh, you don’t see a straight line, you see this J uh, that’s quite characteristic because the sequence slows down because it’s on like a log scale. Right. Mm-hmm <affirmative> so the difference between one second and two seconds in the past is a, not the same is the difference between 10 seconds and 11 seconds. It’s like the difference between 10 seconds and 20 seconds. Um, and that seems to be, uh, a pretty general thing in the Nain and, and seems to be true of, uh, time as well.  

Paul    00:20:09    So, so like, the way that this would map onto neurons is that you’d have a, a Laosian transform and each neuron would have a different, uh, time constant corresponding to those delays and spreads of the, uh, memory, I suppose, back in time.  

Marc    00:20:25    Exactly. Yeah. You’d have, you’d have a lot of cells, so like we observed, so, uh, we talked about what time cells look like, um, Lala transform cells, um, which, which you, we tried to call temporal contact cells. We’ll see if it catches on, um, have been observed thus far in the internal cortex, uh, and in our study, uh, in Brighton Maria Meister, uh, where the co first authors and, um, my experimental collaborator, Elizabeth Buffalo is co last author. Um, what we observed is, so the monkey is, uh, sitting there in a monkey chair, and there’s a, a image presented at times zero. Uh, and rather than firing sequentially, uh, whole bunches of cells are perturbed, uh, more or less immediately, like within 200 milliseconds, but there’s, there’s no variability in the time of their peak disturbance. Right. So they all, so if I clap, they all kind of go on and then they turn off.  

Marc    00:21:16    Right. Uh, and some of them turn off fast and some of them turn off slow, um, across cells. There’s different time constants describing the rate at which they’re turning off. Uh, they follow a roughly exponential function. Um, and that’s like exactly the gloss transformer’s like E minus S T. Right. Um, and so, uh, with a variety of values of S right, and the there’s an overrepresentation of fast turning off corresponding to the overrepresentation of fast, uh, cells firing, uh, time cells firing early in the sequence. And there’s fewer and fewer slow ones corresponding to the sequence slowing down. Like you said, clap, clap, clap, clap, clap. Oh, sorry. I did it backwards.  

Paul    00:21:57    So what’s the, um, what’s the advantage of having a log scale in terms of, I mean, so, you know, intuitively, you know, when, when you clap, I can kind of hear it in my echo, right. If I kind of pay attention to it, and then it kind of fades. Right. Um, but in terms of representing things, what’s the advantage of having a log scale,  

Marc    00:22:17    Uh, before I give you my speculation about that, let me note that the noggin pays that does this over and over and over again. Right. So the, so if I take, um, if you look at the density of receptors on your retina, right, as I move from the fo out towards the periphery, uh, uh, the, the receptors get more and more widely spaced. And, uh, it’s known, uh, for quite some time that there’s, uh, there’s like a logarithmic distribution of, uh, receptors to, um, you know, along the, along the retina, uh, photo receptors along retina. And that, that same, uh, same, uh, compression is respected throughout the early parts of the visual system. Same is true of V one and V2 and V3 it’s called, um, uh, retop coordinates, right? Mm-hmm <affirmative> they look like this bullseye, uh, the same is true of the nonverbal number system.  

Marc    00:23:05    Right. So, um, if you are not allowed to count, right. Uh, or you’re a monkey or something, right. Um, you are estimates of numbers, uh, appear to be on a log scale. Uh, you know, which is also probably why, you know, we, we might fight over $10, uh, you know, uh, at a lunch order, but $10 on your purchase of your car is like a, some of us completely unimportant. Yeah, yeah, yeah, yeah. Um, so, uh, anyway, so the, the brain seems to commit to this over and over and over again. So listeners might reach their own conclusions as to why the brain and the mind choose to do this, but there’s no question that it does in many, many, many circumstances. So my answer would be consider the possibility that you didn’t have a logarithmically compressed, uh, scale that you had. You had some particular finite resolution, right?  

Marc    00:23:57    You picked some number, and I’m gonna say, uh, you know, instead of having the time, uh, you know, on a log scale you had, I’m gonna represent time really, really well, uh, you know, out to, uh, you know, uh, some resolution, right? So if you, if the world agreed with you, right, if the world chose something important, evolutionarily important for you to remember, uh, that happened to be that scale. You’re great. You’re, you’re doing wonderfully. Um, but if the brain, uh, if the world chooses, uh, a scale that’s faster or slower than your choice, right. Um, you’re either going to be, if it’s, uh, you’re either going to be wasting a bunch of resources that are not providing any useful information, or you’re gonna be completely blind to that quantity. Right. Um, so choosing a log scale, uh, is equivalent to, uh, in some sense, uh, making an uninformed prior about what you’re gonna find out in the world, right?  

Marc    00:24:50    Mm-hmm <affirmative> so that if the world gives you something that’s important at 10 seconds, well, you can tell the difference between 10 and 11, right? If it gives you something, you know, that’s important at a hundred seconds, well, you can tell the difference between a hundred and 110. Uh, and if it gives you one, you can tell like difference between one and 1.1, right. Uh, if, um, if you chose some particular number in the world gave you, uh, something else you’d be in a, you’d have a big problem one way or the other. Um, so I think it’s, I think it’s adaptive, uh, in that sense,  

Paul    00:25:17    Given the world statistics and what to expect, we, um, you already mentioned hippocampus and, um, the Mosers and play cells and grid cells. I don’t know if we mentioned that time cells and, uh, play cells are both found in hippocampus. Right.  

Marc    00:25:32    Indeed. And they’re sometimes the, exactly the same neurons. Right.  

Paul    00:25:35    Right. And, and the population’s overlap. Right? Yeah. Um, yes. Which is interesting. And that goes back to time and space, but what I was gonna ask about, so, and, and I, I don’t know if your original work was in the hippocampus, but, um, you’ve talked about having, and, and of course the hippocampus is classically important for memory as the, you know, has been the age old story for it. Yeah. Uh, but you have found these kinds of Lalo, um, transforms and inverse transform transforms. Maybe we should just say Lalo functions, uh, in lots of different parts of the brains brain, like you said, um, what does that imply about? Well, does that mean that there’s memory everywhere in the brain or are these used in different cognitive functions? And we’ll go, you know, we’ll, we’ll go down a little bit more, uh, with different cognitive functions, but just to like come a big picture, what does that imply that they seem to be EV everywhere it’s, you know, it seems like, you know, like grid cells seem to be everywhere. Every new kind of algorithm seems to be everywhere <laugh> in the brain. <laugh>  

Marc    00:26:36    So I think, um, I think so we’ve seen time cells in many, many different brain regions thus far. Um, you know, the, the, the look for the search for Lalo transforms of time, uh, is, is at a much earlier level. I I’m of the belief that we’ll find it lots and lots of places  

Paul    00:26:51    I misspoke. I meant to say time.  

Marc    00:26:53    Sorry. Yeah, yeah, yeah, indeed. So, um, uh, so I think the reason why there’s time everywhere is cuz there’s nothing there there’s basically no experience we have where time isn’t important, right. Um, time is important in language. Uh, time is important in, you know, classical conditioning time is important, uh, and playing basketball, right. If you’re in the wrong place, you know, being, you need to intersect, you know, the, the ball, uh, or, uh, you know, a pass or something, uh, you have to not just be in the right spot. You have to be in the right spot at the right  

Paul    00:27:25    Time. Well, that’s anticipating time. That’s, that’s a future oriented. Oh  

Marc    00:27:29    Yeah, indeed. No. And we’re sure, um, we are sure that, uh, we have, um, here’s so watch this. So I, I did the little clap thought experiments, let’s say, so watch this. So I go a, B a B, um, as I say, a, again, a sort of recedes into the past, and you’re able to predict B and B sort of starts, it feels like it sort of starts out a certain distance from you and then gets closer and closer and closer and closer. And in order for, um, in order for us to behave ly, we need to be able to anticipate, uh, trajectories in time and space. And we’ve done some experiments, some psychology experiments, uh, that seem to show that, uh, you know, memory for the past, um, uh, uh, our ability to, uh, you know, retrieve information from the recent past has similar properties to our ability to judge the time of, uh, future events that are gonna come closer to us.  

Marc    00:28:25    Um, but yeah, no, I’m, I’m all in on, so, and oh, and by the way, both of them seem to be log compressed mm-hmm <affirmative>. So if you wanted to be sort of poetical about it, uh, you’d say that there’s something like, um, you know, logarithmic past, uh, pointing to the left the past is always to the left and there’s a laic future to the right. And we’re sort of like a time phobia, uh, in between, uh, the <laugh>, uh, the past and the future. And that’s where we sort of do our business in the present  

Paul    00:28:50    And the vortex of the present.  

Marc    00:28:52    Yeah. Yeah. If you wanted to be poetical <laugh> let me say it this way. Different types of memory, you know, in, in memory research, since I was, um, right. You know, since I was a graduate student, people talked about implicit memory, people talked about like episodic memory, uh, people talked about, you know, this semantic memory. Um, and so I’m saying all those different types of memory, um, are using the same form of temporal representation. They’re just doing different computations on them. And so, and motor memory as well, uh, you know, making spatial temporal trajectory. So I think, I think, yeah, time is built into the, uh, time is really, really important in all those different kind types of computations we might want to do  

Paul    00:29:29    Well. Yeah. So in, in some sense, this is a, a unifying principle of memory, which the, uh, you know, the recent past has been dividing memory into thinner and thinner slices of different types of memory, right. In some, and this Laosian approach unifies those, but yeah, I was gonna ask this later, I’ll, I’ll just go ahead and ask it now. It seems, uh, intuitively appealing on a short time scale in terms of minutes up to minutes, you know, maybe, maybe, you know, tens of minutes, but then episodic memory, right. When I’m remembering, um, childhood events and stuff, uh, this simple, um, just mechanism, it’s not like I have a neuron that’s just now firing, uh, for something that happened to me when I was six years old. Right. So it’s not along that S-curve, so it must be some different way to access those memories. So how does this relate to episodic memory?  

Marc    00:30:22    Yeah, so following, uh, you know, toing, uh, towing’s definition of, uh, episodic memory was that you relive, and re-experience a specific moment in time, and that would correspond to something like a recovery of spatio temporal context. Um, and that was actually, um, you know, that was the basic assumption of the temporal context model that there’s this slowly changing, uh, temporal context that follows us along. Uh, and when we remember something, we go, haha, oh, remember when you and I were, uh, you know, uh, messing around with a microphone earlier on, oh yes. We both remember that. And, uh, people at home, uh, you know, obviously, uh, also have episodic memories. Oh, I remember, uh, you were on a podcast with Randy Gallen you were talking about having, uh, some celery flavored chili and how you had a really robust episodic memory for that. <laugh>. Um, and so when you reli it’s as if you were there again, right. Mm-hmm <affirmative> you, you might remember, oh, that person was over there and the tasted icky or whatever, and this person was sad or whatever about their, their chill, losing the contest.  

Paul    00:31:23    Wow. You really remember it. Well <laugh>  

Marc    00:31:25    Yeah. It’s as if, well, it was recent, right? Yeah. So, um, if, so its as if you’re re-experiencing the world. Okay. So now as we just said, you know, as I clap or as we move around the world, we have this record of the recent past that tells us something about that gives us something about the feel of, oh, that clap was five seconds ago. And uh, you know, that, that, you know, the place code is saying something like, oh, I’m like, uh, you know, three meters from the wall over there. Um, and so in episodic memory would be something like let’s just pull up that whole collection of things that was available at that one moment in time while you were having the disgusting, uh, celery chili.  

Paul    00:32:01    Right. It wasn’t disgusting, but go ahead.  

Marc    00:32:04    Sorry. Okay. Sorry. Yeah. Yeah. Just  

Paul    00:32:05    Last place,  

Marc    00:32:06    The creative, uh,  

Paul    00:32:09    Okay.  

Marc    00:32:10    Very helpful.  

Paul    00:32:11    Poor joy, poor joy. Her, her chili was fine. He was very healthy. Yeah.  

Marc    00:32:16    <laugh> nice cover nice recovery. Um, so anyway, yeah, the, uh, so we, we, we re the idea is that we reinstate that experience that was unique to that set of circumstances and the, the, the time cells and play cells and, you know, whatever else is going on in the hippocampus and elsewhere, uh, gets, uh, reinstated. And then that spreads out and it’s, it’s like, ah, we just jumped back in time. Oh, we just remembered, uh, this thing. Um, so the, the question is how right? How is it that you manage to, uh, cause this state to, you know, come out of nothing from some partial, uh, imperfect queue. And that’s a really great question. Um, but anyway, that’s a basic idea of how to make episodic memory. Uh, and that seems to, oh, interesting thing we know. So in the laboratory we measured and actually this is like kind of the first experiments I did, uh, as a PhD student, you know, in the laboratory, we can measure something like this, jump back in time. If I give you a list of words. Right. And you remember a particular word from somewhere in the list, say you remember the, uh, let’s, let’s take the items in the list and map them onto the letters of the alphabet. Uh, if you’ve just recalled, uh, you know, letter H the next thing that comes to your mind, uh, is gonna tend to be a nearby letter, like IJ or K or, uh, G uh,  

Paul    00:33:36    I had to work for it too. Uh,  

Marc    00:33:38    E yeah, yeah, yeah. Um, and so you get this sort of characteristic curve peeking up around the center. Okay. So that curve, uh, we call the continuity effect. Mm-hmm, <affirmative> appears to, um, happen over a wide variety of time scales. Right. So if I take the words in, in the list and I spaced them out by 10 seconds, I still get the same curve if instead, um, this is worked by, um, from, uh, Carl Healy’s lab and, um, uh, Nash Unsworth, uh, did experiments like this. And, um, so did Jeff Ward, if I take experiences and space them out by larger and larger time scales, right. Including up to like hours or days, right. Uh, uh, Jeff Ward did this experiment. He like, uh, has a, um, you know, push notification on people’s phone to give them a word as they go through the day. Right. Um, uh, you, uh, still see the same type of, uh, continuity effect.  

Marc    00:34:29    So there’s not some characteristic scale of things being close in time. Right. Uh, cuz you can, you can get things that are close in time, uh, for people at home I’m doing air quotes about around the, thank you for that. Yeah. Um, yeah, yeah. Audio visual. Um, so, uh, and, and that you can, um, uh, and so that you see things are close on some relative scale as if it, there was some sort of laic basis. So the difference between 10 and 11 is kind of like the difference between a hundred and 110, uh, or kind of like the difference between 1,011 hundred. Um, and so that, that property seems to be respected in episodic memory as well. So I think we got it something pretty important.  

Paul    00:35:11    I’m just gonna go ahead and ask you this as well, since you brought up Randy gall and uh, celery chili. Um, so I, I know that you’ve been influenced by Randy’s, uh, work on learning, uh, and memory, but on that podcast episode, it was all about his ideas about intracellular memory that, you know, it’s, it’s essentially in principle, impossible to store these things over long periods of time within populations of, uh, neurons, among the activity of populations of neurons and what you’re saying with a population of neurons all with different time constants that it is indeed possible. So I’m curious what you think of that, uh, orthogonal aspect of, of Randy’s, uh, research that, that we actually have to store memories, uh, more stably in the cell in something like RNA or proteins or something like that. Yeah.  

Marc    00:36:01    So yeah, I, I, I need to say Randy’s like one of my heroes, right? Uh he’s uh, you know, intellectual, uh, giant, uh, he’s been immensely influential actually when I came into the lab that one day and I wrote a bunch of problems on the board that we couldn’t solve. One of the problems was basically Boston and Gallo, 2009 mm-hmm <affirmative>. So he’s, he’s been immensely influential, um, to me. Um, and I accept Randy’s critique about, uh, computation, uh, not being solvable at the level of, uh, an individual cell or a synapse or even a, a bunch of synapses, um, uh, rather than trying to look for a solution within a cell or within RNA we’ve zoomed out. Right. And tried to look for a solution at this sort of, uh, population level. Right. Um, and actually I, you know, about, uh, I guess it was about like six, seven years ago. Um, I got myself invited to give a talk. Like I called up my friend at Rutgers and I was like, look, I need to talk to Randy. Uh, you know, um, and so please, please invite me to give a talk. And she was very nice and, uh,  

Paul    00:37:07    Take note, let me do that aspiring scientists. This is how you get it done.  

Marc    00:37:11    Yeah. I invited myself and Randy was incredibly generous with his time and he kept saying, you know, what’s the number, what’s the number? What’s the number if you saw the, of Randy Gallo? Yes. If you listened to the Randy Gallo podcast, he said that like half a dozen times on the podcast, he said that like two dozen times to meet personally, uh, you know, in his office. Um, and I took a really seriously and the answer is a number. So here’s how I would say it, a number is a distributed pattern over the plus transform. So if I have a number, right, uh, you know, number is so I can take the real line, right. Number is some point on the real line. I take a point on the real line. Uh, let me map it onto a Delta function centered at that. Uh, for those of you with very little math, imagine a flat, uh, flat function over the real line, except it sticks way up, really super sharp.  

Marc    00:37:55    Uh, at some point let’s call it a, um, so now I can take LA plus transform of that function. It’s just E minus SA. Uh, and now I can compute and I have some other function for B and I can now sort of compute and add and subtract. And so I can build out data independent or operators, um, like the GAO and king, um, uh, uh, uh, book, which was also incredibly, incredibly influential, uh, on my thinking about the world. And I could write down neural circuits and they might be right and they might be wrong, but I can for sure write ’em down. There’s no, uh, there’s no, um, you know, fundamental limit, uh, to that. Yes. I need to be able to read and write numbers. Right. Um, I need to let, I need to let one population of neurons gate into another population of neurons.  

Marc    00:38:38    I need to be able to, uh, you know, uh, implement the expression for, uh, you know, convolution in the Lalo domain. Uh, so yes, there are, there are things, but they’re not like magic, uh, there there’s no, and then a miracle happens. It’s just, you know, so the brain could for sure do that stuff. Um, and whatever I have in lap past domain, I can then use the SA I can reuse the same mechanism for inverting it. Um, and I can use the same type of representation for time and space and number. Uh, oh, did I mention how influential, uh, you know, Randy and, uh, uh, Gilman’s work on numbers has been on, you know, in the library MC numbers has, has been on me. So, um, yeah, so we can reuse the same data, independent operators for different types of quantities. And, um, so yeah,  

Paul    00:39:23    That’s my answer. Have you communicated this to him?  

Marc    00:39:26    Yeah, actually I sent him, uh, I wrote this down as best I could in a paper with Mike Castel mode. That’s on archive right now. Mm-hmm <affirmative> um, and I sent it to him and I thought he’d be so excited. Um, and I don’t know, maybe he, maybe he was, but he also, he made, I mean, he made a again genius, uh, response. He was like, wow, that was really nice. But, uh, you know, what would else would be nice is if you, I can’t believe you didn’t, uh, you know, use the, um, and then again, the absolutely genius, uh, comment. Uh, why didn’t you take the bit about how you can solve, uh, differential equations really easily in the Lalo domain to build up like an intuitive physics. And I was like, oh God, that’s another genius idea. So basically I think like the last 10 years of my life has been trying to catch up with Randy’s thinking, uh, and he, you know, he’s, he’s, uh, you know, he is occasionally frustrated, but I think it’s mostly cuz he’s like 20 years ahead of the rest of us. Right. Um, so, uh, yeah, uh, one of my heroes and, uh, but no, I don’t, I don’t accept that. Uh, I, I don’t think it’s essential that we put memory into RNA. All  

Paul    00:40:29    Right. Uh, I’ll pass this along to, uh, maybe I’ll send that clip to Randy for fun, cuz uh, he would probably, uh, enjoy it. So you were talking about a recurrent neural network and how the Lala is, is very simple to implement in a recurrent network. And of course the brain is highly recurrent and we talked about turtles all the way down, Lalo, inverse, Lalo inverse as, as these, um, representations get cycled back onto themselves. Right. I mean, I’m trying to imagine what it means. If you have a set of neurons, right. That do a, a Lalo transform and then there’s like a overlapping population, inverse Lalo, and this is kind of a canonical computation. Right. But then mm-hmm, <affirmative> the, the signals coming in are then coming in recurrent as well. So what does it mean to continue transforming and inverse transforming? What would those representations be good for?  

Marc    00:41:22    Yeah. Interesting question. Um, so, okay. So the, the place where we’ve had, so first of all, there’s no reason to invert most of the time. Right? You can just keep computing in the Lalo domain. And then in, in principle it’s like if we’re, if we’re building a device, uh, from scratch, like an AI or something like that, mm-hmm <affirmative>, um, you know, my, my advice would be don’t invert unless you really need to. Right. Okay. Uh, in order to answer a specific question, um, the place where we’ve found that we need it, um, uh, we’ve started doing like deep networks, uh, work on this, the place where we’ve needed to, uh, uh, do transform inverse transform inverse, um, is in deep learning frameworks where we’re trying to, uh, uh, like decode speech. Right? So, um, we, uh, we are meaning mostly not me. Uh, again, uh Togan and, uh, Paris Cedarburg, uh, my, uh, friends and collaborators at Indiana university and, uh, university of Virginia and, uh, their students, um, we’ve been working together on a, a framework to try and build out, uh, like useful deep networks, uh, using these ideas. Um, and one of the things we did, uh, was to build this deep network, that encodes speech, um, that decodes speech. And so if we stay in the Lalo domain the whole way, it doesn’t work. However, if we go into and out of the Lalo domain from one layer to the next such that basically you’ve built a deep network of, you know, log time cells, um, going along, uh, it turns out to have this really nice property that the entire network, uh, can, um, deal with re rescaled speech.  

Paul    00:42:53    Well, does that map onto like, um, Subic, like segments and syllables and, uh, morph themes and stuff. Yeah.  

Marc    00:43:02    Different. So in the model, different layers of the network end up attaching to different types of meaning, right? Mm-hmm, <affirmative>, it’s a, it’s an open question. Uh, there’s people who’ve argued, uh, and I, I’m not an expert in like auditory cortex. Uh there’s people, um, as a first author, uh, Raman, uh, paper in P a S uh, arguing that there ought to be, uh, log distribution of time, Constance and auditory cortex, as far as I’m aware, they’re completely unaware of all this other stuff with time cells and, and whatnot. Okay. Um, and, uh, so they’ve argued that, so there’s at least, uh, you know, some, some reputable people who make that argument in the model, uh, having log scale at every layer of the network ends up letting the model, uh, do the following thing. If I’ve trained on speech, uh, such that, you know, we do like auditory ness and the, you know, the model’s been trained to decode one or four or eight. Um, we can give it, uh, arbitrarily. We can give it slowed speech or sped up speech. I can go seven then, uh, and the network goes, oh yeah, that’s a seven and it’s really slow. Right. Hmm. Um, and so, uh, it turns out, uh, you know, there, there’s only one way to do that and that’s to have log, uh, transform log, uh, log compressed time, uh, at every, uh, at every, uh, layer of the network.  

Paul    00:44:22    Well, yeah, I was thinking of, uh, David pople, who I had on the podcast and his dual stream hypothesis of speech perception and how he maps it on to, you know, oscillatory oscillations. Right. Because we, no matter how we, how hard we try, we speak at three to four syllables per second, I think is what the number is, but I can’t remember. So, uh, I thought he might be at least interested in this. And I wonder, I, I don’t know if he’s aware of this, uh, work either. I’ll pass it on to him. Yeah. Okay. So interesting. But go ahead.  

Marc    00:44:52    Oh, I was just gonna say that, um, the artificial networks, we can make them rescale as far as we want. Right. You’ve  

Paul    00:44:59    Done this with convolution networks, because you’ve already mentioned that you can do perform convolutions with Lalo. So maybe let’s, let’s go ahead and talk about the artificial work, like the deep Sy and where you’re I, what you were just talking about as an advancement from deep SI I think, right. But there’s a, a host of models that you guys have built building in these, uh, different time, constant lap plus, um, layers. Right. Which, which enables the networks to, uh, deal with time, that way compression.  

Marc    00:45:26    Yeah. So let’s, let’s, let’s back up a little so deep Sy is, uh, a network, um, that just has log time scales, uh, log time cells coding, what cross, when, uh, in series, uh, with learnable weights in between the networks. So basically what changes from one layer to the next. Right. Um, and so, uh, that network, uh, does really good. Uh, it, uh, you know, you can train it on a problem. It has the property that if you train it on a problem at some, uh, time scale, uh, and then you train it on a different, the same problem sped up or slowed down, it doesn’t care because, uh, it turns out that, um, uh, it turns out that on a log scale, rescaling time amounts to translation, I’ve said this a couple of times, the difference between 10 and 11 is the same, is a difference between a hundred and hundred and 10.  

Marc    00:46:14    Yeah. Um, and so that network is perfect, is exactly as able, uh, to learn as, as something, whether it’s fast or slow, um, Syon, uh, which is in press at ICML and it’s on archive, uh, if people are interested, um, actually we’re just working, we’re just working on the camera ready version. Uh, so nice doing that yesterday. Um, so, uh, it, uh, so yeah, it has a CNN and then a max pool over the CNN. The, because convolution, uh, is, uh, translation co variant, uh, the max pool and rescaling is equivalent to translation. Then if we do rescaling the input, it just moves the peak. Okay. Mm-hmm <affirmative>. And if you have a max pool on the CNN layer, uh, the network is scale in variant, uh, and it identifies the correct thing. Uh, the, the location of the peak moves around. I can tell whether it’s fast or slow that information ISN encoded in the network.  

Marc    00:47:11    Uh, but the, the thing it spits out for the categorization doesn’t care, uh, whether it’s faster, slow, and we can make that, uh, we can, without retraining weights, we can make the range of scales over which it generalizes as big or as big as we want, um, without any cost in weights. So this seems like, um, like a, a good idea. Um, and in any event, uh, you know, if you’re building robots or whatever, to go off to the moon or go off to Mars, or, you know, uh, crawl along the sea floor or whatever, um, and you want them to be able to deal with a bunch of different environments that you can’t anticipate mm-hmm, <affirmative>, you’re have a similar problem that, uh, you know, I said, uh, the brain is presumably solving by choosing log scale for the retina or the, uh, or, or, or, or, or, or, um, so I, I think this is pretty, uh, pretty beneficial for deep networks  

Paul    00:48:01    Dealing with different gravitational forces <laugh> I’ve had on, you know, just a host of people who are building in, you know, different kind of kinds of biological details to artificial networks and, you know, kind of smaller scale, but, uh, you know, like neuromodulation, I just had Matthew Larman who works on the DRI properties of feedback and feed forward connections and using those principles to build into artificial networks. Is this something that you see as just an obvious thing to build into state of the art AI and, you know, you wonder, well, why isn’t this being built in, or, or is it more of a spec, you know, less general purpose and more specific purpose? How do you, if, if you had your druthers, would this be incorporated into all the, uh, modern, deep learning networks, et cetera?  

Marc    00:48:51    Oh, yeah. If I had my druthers. Yes. Um, yes, that, that would absolutely. I, I think the reason I think, um, so I I’ve thought a lot, actually, the thing that’s been keeping me up lately is reinforcement <affirmative>. Um, and so reinforcement learning, um, you know, comes in, in my understanding, at least it comes from sort of Rescorla, Wagner’s sort of models of classical conditioning and the, you know, the classical conditioning results and the dopamine story, uh, was, you know, some of the most profound, uh, results and, you know, one, one of the really serious triumphs of, uh, computational cognitive neuroscience that we’ve had so far, but the, the theories of reinforcement learning they’re really a temporal, right. Um, we had no idea. The whole idea of the Feldman equation is to try and estimate the future without estimating the future. Right. It’s to try and estimate expected future reward with just something that’s time, local.  

Marc    00:49:41    And so that, you know, it steps along this. So how would, how would that algorithm look like if the goal in, you know, estimating expected, future reward included like an estimate of the future mm-hmm <affirmative>, if you could just directly compute that. So a, you know, there’s the future and here it comes, what if we just assumed we had that? So I think, um, I think the, uh, I think if I can, you know, speak really broadly, I think contemporary AI and contemporary deep networks and stuff they’re sort of built on, um, you know, neuroscience that we got mostly from the visual system and mostly through the eighties and nineties, uh, and also the dopamine system, uh, you know, became like a huge deal in like the mid nineties. Yeah. Um, and I, I think, uh, I think the, the, the insights that we’ve had since then have not yet gotten incorporated correctly into AI. And I think time is one of those things, and I would even go more, more broadly and say that representing, um, you know, the world as functions in continuous space, uh, continuous space, continuous time, continuous number, um, is also something that hasn’t really been properly incorporated into contemporary AI. So yeah, I think everybody should do this. Um, and, uh, if you’re listening to this and, you know, <laugh>, I got lots of ideas,  

Paul    00:50:55    <laugh>, it makes a lot of sense, you know, for robotics, obviously where the, where timing is very important. But yeah, I, I mean, aside from the, like, you know, a recurrent neural network implicitly has time, right? Because it has recurrence and sequences and stuff, but time is not explicitly a time. Representation is not explicitly built in. And the modern transformer has essentially no time because it does everything in parallel. What about, what about that? Do you think it would, what would, what effect might it have on, uh, transformer networks? Sorry, I’m just shooting from the hip.  

Marc    00:51:26    So, yeah, para I actually, I mean, we we’ve, this is not yet published. Um, we’re para and, uh, Zora and colleagues are working on transformers that work over scale and variantly compressed memories, right. Mm-hmm <affirmative>, and that’s work in progress for me. I think that transformers, um, I think that transformers, uh, you know, this might be a little mean, uh, they, they seem sort of like a hack to me, right. That there’s something more basic that the noggin ought to be doing. And transformers are sort of substituting for that function, uh, in some pretty acceptable way that gives them a tremendous advantage over just simple, you know, uh, feed forward networks. But I think there’s a deeper thing to what the noggins doing  

Paul    00:52:11    One way to one way to look at that. If I were an AI researcher working on transformers is to celebrate and say, yeah, it’s a hack that we don’t need to incorporate all your messy brain crap. We just, you know, we found this shortcut that does it just as well or something. I don’t know if you have thoughts on it.  

Marc    00:52:27    Yeah. Well, I mean, I think, I think that, I mean, I think that, um, looking at R and NS one might have said the same thing about R and NS, right? So, you know, we can, we can, we can do R and NS and we can eventually learn long term dependencies and we can eventually back propagate through time and we have all these hacks and tricks. Um, but you know, I just said, uh, you know, this is like an R and N but it’s a diagonal matrix. It’s simpler. Right. So in that case, uh, comparing the R and N to, uh, plus transform inverse, uh, this is immensely simpler and, uh, doesn’t have a proper problem with back propagation through time at all, because it’s diagonal matrix with a set of time concepts. Right? So in that case, at least, um, this, uh, proposed solution is much simpler than the situation we engineered ourselves into.  

Marc    00:53:08    And I, I, I have to say, I, I do think, um, you know, contemporary eye astounding, um, I feel like, you know, they took, you know, I said sort of dismissively, you know, they’re, they’re building on brain science from, you know, the mid nineties, but they’ve also engineered that into just amazing devices, right? Yeah. If they were starting. So I, I think the, the thing I like to think about, like is if we had sort of more elegant, uh, simpler brain inspired, uh, theory, and then we started engineering, uh, on it, like where would we end up with and how much more efficient, how much more energy efficient would those devices be? Uh, how much more flexible would they be? Uh, seven. Right? Um, like they they’d have capabilities that, you know, we can’t begin to understand. And right now it’s like three people it’s like four people engineering this. Right. Mm-hmm <affirmative> so the fact that we got anything that works at all, uh, sort of miraculous <laugh>  

Paul    00:54:01    So well, since we’re, since we’re, uh, on the subject of AI, I, I was gonna ask about that the scaling factors, right. So we talked about how, like, different neurons need to have different, uh, sorry, time constants, um, to, to, uh, scale out this, uh, Lalo mushed out, uh, memory trace, right? Mm-hmm <affirmative> uh, when you, in the networks, I haven’t done a deep dive on, on the networks and how, how they’re trained and stuff. Um, we’ve just been talking about a little bit, but, um, I’m wondering, so, so you have to like choose the scale, the factors, right? The time constant of, of the neurons.  

Marc    00:54:37    Well, you choose the, you, once you’ve committed to there being a log scale, you have two choices remaining. Uh, and that is what is the base of the log rhythm, right? Uh, so 1, 2, 4, you know, uh, 1, 2, 4, uh, eight 16, or, you know, 1, 3, 9, 27 mm-hmm <affirmative> right. Um, and then you have to choose the shortest scale. Uh, what is the, what is one, right? Mm-hmm, <affirmative>, what’s the fastest thing you care about. Oh, and sorry. There’s just a third thing. You then have to choose how many cells you wanna, how, what the extent is. Do you wanna go from one to a hundred or one to a thousand or one to a million,  

Paul    00:55:11    Right. So in the networks to, uh, do those numbers get, could you, uh, train, could you learn those numbers is the question, but I, I, I know that you, you think that those are sort of, um, hardwired in the brain, right? Those that they’re like intrinsic factors that,  

Marc    00:55:27    Oh, not necessarily.  

Paul    00:55:28    Oh, okay. I thought I,  

Marc    00:55:29    No, no, no, not necessarily. Yeah. We can rescale them. So the, the nice thing about this whole thing, so going from time to space basically means that you modulate, so you could, oh, here, let me try and say it this way. So in, um, in differential equation for time, there’s sort of like an intrinsic, um, you know, it, it’s sort of like that’s being modulated by some number one, like, uh, we can make time go faster or slower. Mm-hmm, <affirmative> just by taking all of the neurons and changing their gain. Mm-hmm <affirmative> right. And changing the gain of all of their time, constants together. Sorry, changing the gain of all the neurons changes their time, Constance all together. That’s basically how you go from time to space. So we could go from fast time, you know, to slow time, uh, in the equations, at least we can modulate all of them together, right. Uh, online. Uh, and if we wanted to do  

Paul    00:56:19    That, is that what happens when we get a shot of adrenaline or, you know, when, when our subjective sense of time <laugh> goes super slow because we’re experiencing cuz everything we’re experiencing it at a higher gain, our time constants are all shifted.  

Marc    00:56:31    It could be okay. <laugh> that’s that’s possible. Yeah. Yeah. Um, yeah, I I’ve, uh, yeah. Time, time goes, uh, time goes really slow when, uh, yeah. Yeah. It could be, yes.  

Paul    00:56:43    Oh my God. That means as we get older, our time constants are really going lower. Right. Cuz time goes by so much faster.  

Marc    00:56:50    Oh, totally. Yeah. Uh, yeah. Um, oh actually the other thing I would say is we can also build an Al code. Right. So if we make, uh, so we can build a, uh, we can build a code that, uh, computes, uh, say like, uh, numbered position within a list. Right. So if I let the velocity depend on is something happening or not. Right. So then if nothing happens, time stands still, but then something happens. Yeah. And I advance the clock a little bit, then something else happens. I get a log, uh, I get a log Al code. So one could also make sense of the time flies when you’re having fun by just saying, there’s lots of things you’re paying attention to. I see. Yeah. And that pushes stuff along extra fast. Uh, and if things are really boring, there’s nothing to, uh, move time along.  

Marc    00:57:34    You know, these are, these are all really, really, really important, uh, problems. And there’s not a solution given by the equations. I think that people, I would be ecstatic if lots of people took up these questions within the context of this sort of theoretical framework and asked questions about, you know, is this population Irans in this region or is, you know, is time perception in this task or retrospective timing and, you know, some behavioral experiment, these are all really important questions. The fixing the parameters in a particular region, in a particular task, in a particular, uh, setting, none of that is given by this, uh, theoretical framework is just sort of constrained the set of problems. And so I, I would love it if lots of people, uh, answered these questions.  

Paul    00:58:15    What I was gonna ask about with, um, your deep nets is, you know, if you, if you learned the, the time constants, right, or the numbers, the three numbers that you need for the time constant, if that was a learned thing, somehow you train the network to learn that number. If it would, then you could empirically look in a population of neurons in the brain and see if it’s the same distribution, if it learned a brain like distribution of, you know, scaling factors.  

Marc    00:58:42    Yeah. The interesting thing from the brain is we have no idea and hippocampus, at least we have basically no idea what the upper limit of that. Uh, we’re a pretty good idea what the, the shortest timescale is. Like, it seems like, uh, theta oscillation, something in the order of like a hundred to 200 milliseconds, the hippocampus doesn’t know anything faster than that basically, uh, in sequences of things that we can observe. And that makes sense, you know, given Theo oscillations and face procession and stuff like that, but the upper limit of, um, the upper limit of how long those sequences go on, we have no idea. So if we do an experiment that lasts a second and a half, we see a nice sequence that goes a second and a half. Someone does an experiment the last 10 seconds. Oh, we have a nice sequence that goes 10 seconds.  

Marc    00:59:24    Um, you know, I have a student, uh, you know, there’s a paper, uh, I think last year arguing for three minutes, right. Um, I have a student, uh, UA Lu, uh, that, uh, me and Mike Selmo, uh, co mentored, uh, you know, who went and observed something, he thinks, looks like sequences over 15 minutes, right. In calcium recordings. Um, so we have no idea what the upper limit is. Um, uh, and so we have no empirical constraints. Uh, and if, if this idea of like scale and variant memory, if it’s really a log scale, you should push it as far as you can. Right. Um, there’s been slice papers, uh, in intron cortex. Um, there’s a nature paper in like 2002 that had a big impact on me, um, where they, they set the neuron going in a dish and it fires at a steady rate.  

Marc    01:00:09    And then they stimulated again, uh, you know, in the, the experimenter on Lonzo who suddenly passed away a number of years ago, uh, he told me, uh, he’s like, yeah, you set the thing going and you go have, make a sandwich and you have lunch and you come back and it’s still firing at the same rate. Huh. So that, that cell has a, that cell has a time concept of infinity. Right. It’s integrating and has a time concept of infinity. There doesn’t appear to be a natural, upper limit, uh, at least in that experiment. So  

Paul    01:00:34    That’s in a dish  

Marc    01:00:35    In a dish indeed. Yeah, yeah. Yeah.  

Paul    01:00:37    Okay. So I have a couple guest questions and I, I just wanna make sure that we don’t, uh, continue because I, you know, have my own, all my own questions for you too. So, uh, I’m gonna go ahead and going back, this is going back to, I guess, uh, human or, uh, natural cognition here. Uh, you know, this guy named Brad Wek.  

Marc    01:00:56    Ah, yeah, I know Brad. Weibel super good. Brad, Weibel taught me about the hippocampus  

Paul    01:01:02    Actually. Oh, okay. Oh yeah. I was gonna ask how you, how you guys know each other. I knew, I knew that you were friends, but he taught you about the hippocampus.  

Marc    01:01:08    Yeah, so I was, I was in the, um, I, uh, I was a cognitive psychologist, mathematical psychologist in the Kahan lab. Um, Brad had been an undergraduate at Brandis, uh, and hung out and I got to know him, uh, you know, really well when he was a PhD student, uh, with Michael Hamo and they did a, they did a hippocampal inspired model of free recall mm-hmm <affirmative>, uh, which was the thing I was working on. And, and I would, um, I would bribe Brad cuz I couldn’t figure out like, uh, you know, all the anatomy stuff. Um, and so I would bribe Brad by like I had, I had things he liked, uh, and we would socialize those things. No, actually it was, uh, it was um, really high end, uh, rolling tobacco. Oh actually  

Paul    01:01:49    We don’t  

Marc    01:01:49    Have to go there. Yeah, yeah, that’s fine. It was legal. It was legal, legal, appropriate.  

Paul    01:01:54    Don’t bring our hands right now. It’s it’s  

Marc    01:01:56    Legal appropriate. It was okay. It was, it was, it was perhaps a poor health choice, but legal and relatively wholesome. Um, and you know, he’d ask me questions about memory research and behavioral stuff with people. And I’d ask him questions about like, you know, how’s the dentate gys, you know, what’s the Hiller region of the dentate gyrus and what do I need to know about that? Cool. We have these long conversations and we ended up being, um, colleagues at, uh, uh, Syracuse university for a number of years. And uh, yeah. Okay. So what does Brad, uh, wanna know?  

Paul    01:02:24    Well, first of all, Brad is in Costa Rica right now. And so he couldn’t send me, ah, an audio recording. So I have to read it, uh, in his voice. So anyway, here’s this question. Hi mark. As you know, I’ve always really admired the work, your work on implementing Lelos transforms as a, as a way to encode and compress information in models of human memory. I was wondering if you have any thoughts on what gets lost in the compression as a given memory gets further back in time. Do you envision it as a random selection process or does our memory system refine itself by progressively removing information? That seems irrelevant if the latter, it seems like a big unresolved question is to understand how such a purposeful editing process is driven.  

Marc    01:03:06    Yeah. Fantastic question. Uh, from my, uh, brilliant colleague, uh, Brad wel I think the, um, recent work we’ve done with deep Sy says something about this. So the, so in, in that network, um, each layer has log compression, right. Um, and basically the learned weights in between the layers preserve things that are gonna be important later on. Right. So the, the, the, you know, the phon name level let’s say goes away mm-hmm <affirmative> and it’s gone mm-hmm <affirmative>, but if there’s a pH name that’s important for decoding, the word that gets passed through. Okay. And then, and then the word, presumably that is important in categorizing the sentence, one would hope, uh, also gets passed along. And so, uh, I think, you know, it’s, it’s not unreasonable to use, uh, I mean, so in, in the context of deep networks, you know, pie torch, you know, you minimize some objective function to categorize stuff, um, and you know, and that works pretty good. Uh, you could imagine replacing that with something smarter, like, um, you know, informational bottleneck or predictive coding and, but in log compressed time. Uh, and that’s sort of, uh, one of the things, uh, we’ve been thinking about, but, uh, fantastic question. Good job, Brad. WVO  

Paul    01:04:16    This is kind of a somewhat related question and you may have kind of just answered it, you know, I’m trying to imagine. So, so in your vision of this, like when you clap your hand, right, that’s one event, but we live in this continuous ongoing dynamic interacting mm-hmm <affirmative> ever changing world. So is everything coming in getting, uh, has its own set of neurons? Like what, what’s the resolution a, what’s the resolution of our sampling, I guess of quote unquote events. And then a related question is just, you know, how, how we segment those events. Right. Which is what you were just, uh, talking, discussing really  

Marc    01:04:55    Mm-hmm <affirmative> yeah. Again, if, if different layers, uh, are using the same principle that have different definitions of what the answer, your, your question might be radically different from say auditory cortex to medial prefrontal cortex mm-hmm <affirmative> mm-hmm <affirmative>, um, and it, you know, it’s unclear and again, uh, you know, problems, we should work out, um, how we, to what extent the brain has control over what information gets gated in. And certainly in the case of animal experiments, there’s, you know, animals don’t particularly care about tones or lights, except in so far as they signal something interesting in the future. Um, so there must be some sense in which some of that can be learned and acquired.  

Paul    01:05:38    Okay. Um, <laugh>, I’m sorry. I have so many questions, but like such a fundamental and potentially canonical, uh, function leads me to wonder how it relates to so many other things. So for example, in the, uh, hippocampus, the phenomenon of replay, right, where a, uh, when we’re running through a maze or, you know, doing something classic thing, rats running through a maze, and we talked about play cells earlier. I don’t even know if time cells are replayed, but anyway, the, you know, when a rat has run through maze and learned it and stuff, then they stop to rest or sleep or eat or something. Then you can record these replay sequences during, uh, these ripples. Right. Um, and they can be compressed in time. They can be backwards in time. They can be forward in time. How does the LA um, approach, map onto the idea of replay?  

Marc    01:06:28    Um, yeah, I’m all in, I, I’m sorry. I’m 90% all in on the idea that share wave ripples are like a signature of that jump back in time thing we were talking about. Mm-hmm <affirmative> like, I remember when you were talking to Randy Gallo about, uh, chili, right. Mm-hmm <affirmative> um, so, and that, it, it, it sort of makes sense in the, the, um, the reconstructed positions are discontinuous with the current position. Um, and we’ve actually been working on, um, detailed neural network models where you write down, you assume there’s Lalo, uh, sorry, you assume there’s log compressed time cells. And then you build like in a heavy and, uh, matrix, uh, uh, amongst them. And you let there be a tractor dynamics and, uh, you build a line of tractor and we get things out that look kind of sorta at least a little bit like, uh, sharp wave, uh, ripple events.  

Marc    01:07:15    Um, so I think that’s my, that’s my sensible hypothesis, uh, for <laugh> sharp wave ripples. Um, and we don’t know there was a paper, uh, from, uh, Howard. Uh, I can, Baum was an author on it that came out, uh, after he tragically passed away, uh, reporting something like, uh, sharp wave ripples it’s on bio archive. Um, but I don’t think it’s been widely accepted, um, thus far, uh, I can say with certainty time cells do phase se, um, uh, there’s been recent work from my castle. No, and his students, uh, showing that quite beautifully, um, which is either out or should be out quite soon.  

Paul    01:07:51    So one of the functions of replay is, uh, memory consolidation, right? One of the proposed functions. Uh, and so the complimentary learning systems theory deposits to learning systems, right? This fast episodic event based learning in the hippocampus, which then gets slowly, um, consolidated and generalized perhaps in our neocortex, perhaps through partially through these, uh, replay sequences, but the Lelos <laugh> approach. I don’t know why I keep calling it the approach. It’s kind of a one function sort of thing. Right. So how, how do you reconcile a com complimentary learning systems theory with a Lelos, uh, mechanism?  

Marc    01:08:34    Well, so once you get to the part where there’s something like jump back in time, right then the, the, the slow system is things that I choose to jump back in time to a lot. Right. And after, so like, I, I play this, uh, stupid game. It’s like, um, math Doku right. Um, and there’s some configuration, you know, where one time I was thinking about this friend I had in high school while I was doing this one configuration. And, uh, so, you know, and I, I came back to that and thought of it a bunch of times now, I, I have recovered the same episodic memory, like 500 times, uh, playing this, uh, you know, stupid game. It’s, it’s certainly become consolidated into my, you know, experience of, of this silly, silly game. Right? Yeah. So the, the, the, the part I’m interested in is how we manage to jump back in Tom.  

Marc    01:09:22    I think there’s really, really important questions about, um, and actually Ken Norman’s recent work on this is really gorgeous. And, um, Marcella Mattar, uh, working with Nathaniel does done really nice stuff on this, about deciding when it’s adaptive to jump back in time and how to, uh, go about consolidating the part of the problem I’m really interested in and, and actually, uh, uh, complimentary learning systems, um, uh, has mostly sidestep. This issue is how you manage to recover these really highly temporally auto correlated patterns. Right? Mm-hmm <affirmative> so if you recall, like, you know, um, hot field networks, you know, uh, you, you can’t build point at tractors out of really, really similar patterns, right? They’re not agen vectors of the matrix, cuz if they were, you know, they would be orthogonal and then they would no longer be similar. Right? So this observation that things change really, really, really slowly over a wide range of time scales, which, which, which are just positive of in the hippocampus and prefrontal cortex and yeah.  

Marc    01:10:14    Uh, Denise, Kai has done work and like lots and lots of people have, have, have now shown this, um, you know, that precludes those patterns that you jump back in time to being EIN states of a hot field network. Right. And so you might do something else. So anyway, we we’ve thought a lot about this and, um, uh, we, we think there’s a, an answer that may or may not be unique, but anyway, that’s the part of that problem. Um, I’ve spent a lot of time thinking about, um, once there is that thing, then the question becomes strategically, how do I choose to, uh, and hopefully, you know, the, the, the me remembering my friend from high school playing this stupid game on my phone is not super adaptive. Hopefully you do something <laugh> or, you know, um, you know, better <laugh> to, you know, navigate the world. But, uh,  

Paul    01:10:59    All, all my, all of my behaviors, super adaptive and just, I I’m just like constantly improving all the time.  

Marc    01:11:07    Excellent.  

Paul    01:11:08    What can the Lala not do in the brain? Hmm.  

Marc    01:11:15    Oh, there, there, there’s no question. There’s a lot of stuff that has nothing, uh, to do with this Lolo LPL, the LPL transform is a way to represent numbers. Right? Mm-hmm <affirmative> anything that, um, and, and it’s a way to represent, uh, numbers and functions over a number field to the extent there’s things that are not like that, uh, like UL faction, for instance, doesn’t seem at all like that. There’s some incredibly complicated space. There’s no, as far as I’m aware, there’s there’s no, uh, so anyway, yeah, it it’s for representing numbers and for thinking, so there’s parts of the brain that don’t have anything to do with that. Um, yeah,  

Paul    01:11:52    Those, yeah, those are the uninteresting parts of the brain. All right. So you, we were talking about scale free and, and log scale, uh, cognition and neural activity, et cetera earlier. And you gave plenty of examples of that. Of course, I’ve had ujai on the podcast and he, that we didn’t talk about his rhythms of the brain book, but he presented a bunch of evidence in terms of oscillations, et cetera. Um, also scale free. And I, I, I almost played this next question, uh, at that point, but I I’ve waited and we’re kind of gonna zoom out. So here’s a question from an awesome Patreon supporter.  

Speaker 3    01:12:27    Hi mark. This is Howard Gutowski, a grad student at Tufts university, and a big fan of your work. One of my other favorite neuroscientists is Jorge Zaki the jacket blurb to his latest book, the brain from inside out states inside of a brain that represents the world consider that it is initially filled with nonsense patterns, all of which are gibberish until grounded by action based interactions by matching these nonsense words, to the outcomes of action they acquire. Meaning to what extent do you agree with this idea and how does the lap model of memory play a role in your thinking about B’s work? Thanks  

Paul    01:13:05    All  

Marc    01:13:06    Howard. Hey, Howard. Um, so let, let me greet, uh, Howard, uh Kowski so, um, yeah, how Yuri’s been, uh, you know, really influential, uh, on me. Um, I think the, um, I mean, I, I, I, I, I, to the extent that, uh, this Lala, uh, framework approach is really general and, and, and more overthinking of, um, thinking of cognition as being built out of, uh, functions over number fields, um, there’s certainly cases, uh, for instance, the retina where there doesn’t seem to be any acquired meaning to, you know, uh, the location on the, you know, physical receptor. Um, you know, if we’re taking, uh, the idea that this sort of, um, you know, log compression is important for audition, you know, there’s, there’s nothing in the cochlea that is learned and, but we know the cochlea, you know, takes log frequency. Um, so it’s an open question.  

Marc    01:14:04    Um, so I, I, I don’t feel, uh, I think there’s undoubtedly a lot of things that are learned, but the, the sort of ubiquity of, um, this form of compression, um, and the simplicity of the recurrent networks that give rise to it’s just a diagonal matrix, uh, with, you know, it, it seems conceivable to me that that might be something where many of those parameters are sort of built in. So, uh, I think that’s how I’d answer that question. Oh, I thought of a good answer. You were asking, where’s there a place where there isn’t Lalo. Yeah. And the answer is the visual system. There’s no need to, uh, compute, uh, you know, so we use Lalo to get inverse for time. We use Lalo get inverse for numerosity. The, the, the receptors in the retina they’re already there. There’s no, there’s no sense in which one would need to take Lalo transform to compute a function over the visual  

Paul    01:14:55    Field because the architecture is all the architecture is already log arrhythmic.  

Marc    01:14:58    Yeah, yeah, yeah. It’s built in there. So there’s a, there’s a nice substantive place where there’s no, uh, definitely no need for lap, uh, uh, early visual system.  

Paul    01:15:09    All right. Also the teeth probably in the teeth as well, but that’s not quite the brain.  

Marc    01:15:13    <laugh>  

Paul    01:15:14    I have, um, two more questions here. One I’m wondering what kind of obstacles you’re facing right now, is there, is, are there problems that you’re working on where there’s just something that’s, you just can’t get out of the damn way, or you know, that you can’t get over the hurdle?  

Marc    01:15:29    I think, um, no, I think the, the science is going gangbusters. I think there’s cultural problems. And I think the incentive structure of science is built in such a way that there’s, you know, uh, there’s a, you know, incentive structure of science is not, uh, built for really radical changes quickly. And there’s, you know, the, the, you know, the idea that psychology has something important to say about neuroscience is really not something a lot of neuroscientists think at all. And certainly AI people seem even less interested in, uh, psychology. So I think there’s, I think there’s really serious cultural obstacles, but  

Paul    01:16:06    Do you, for your science, do you need it to, to be able to change more quickly? Or is that what you were saying? I think if  

Marc    01:16:12    Things were, if things were able to change more quickly, I mean, there’s, there’s look, if this is really a theory of the brain, right. There’s thousands of experiments that need to  

Paul    01:16:21    Be done. Yeah. It seems like there’s a lot just waiting to be. Yeah.  

Marc    01:16:24    How many R and N papers and LSTM papers have there been? The answer is in the thousands. Um, you know, you’d, if, if seriously, the, the, you know, deep Sy and, you know, Sycon, if they’re replacements for, um, R and NS and LSTM, somebody should go check on, at least some of those thousands of papers written about R NS and LS TMS. And so there’s no way on earth, you know, my lab or Zen’s lab or PS lab, or all three of our labs put together can ever come close to that scale. So yeah, if we’re gonna, if we’re gonna build out a theory of the brain, there needs to be lots and lots and lots and lots of people working on this and that, that I don’t have control over most  

Paul    01:17:02    Of it, but that’s a matter of convincing people as well. Right. What, what I started to think about was, well, how is that gonna help you make people push the, like button more in social media or something, because that’s where all of like the real, uh, a lot of money and very fast progress, cuz there are tons and tons of people working on it. Right. But, but in theory of the brain wise, then you need to do some politicizing and convincing. Correct? To, to, I mean, that’s not my, I guess  

Marc    01:17:29    Yeah. That I don’t, apparently I’m not super great at that.  

Paul    01:17:32    <laugh> I sucked at it, which is one of the reasons why that was always a frictional point for me in academia.  

Marc    01:17:38    Yeah. I’m hoping, I’m hoping the actually that’s, that’s why I’ve been spending time on the deep network stuff. I think the, the, as a theory of the brain, the deep network stuff is sort of silly, uh, you know, gradient descent and back propagation are, are sort of silly. Um, but it’s a way by, by showing, you know, that these assumptions lead to capabilities that are categorically different than like generic RNs or generic, generic LSDM, you know, just LSTM papers side of, you know, just the L stand paper cited, like what 25,000 times, right. There’s literally thousands of people that have been working on that for decades. The fact that, you know, this rag tag group of, you know, misfits or whatever, can, you know, do something that is categorically different. Um, I’m hoping gives people an incentive to go dig into this and, and take this more seriously, um, than they might have otherwise from, you know, this rag tag, bunch of misfits giving talks and writing papers. No, when reads  

Paul    01:18:35    Right. And, and dare do Wells, you forgot the dare do Wells. Yeah. Well mark, thank you for, uh, your time here. And I have, so now I have all these traces going back all like tons of events. Right? I wonder how many I have. Maybe you could measure that. Yeah. I appreciate you being here. Thanks for the time.  

Marc    01:18:53    Thank you very much.