BI 183 Dan Goodman: Neural Reckoning

Brain Inspired

00:00 / 01:28:54

Support the show to get full episodes and join the Discord community.

You may know my guest as the co-founder of Neuromatch, the excellent online computational neuroscience academy, or as the creator of the Brian spiking neural network simulator, which is freely available. I know him as a spiking neural network practitioner extraordinaire. Dan Goodman runs the Neural Reckoning Group at Imperial College London, where they use spiking neural networks to figure out how biological and artificial brains reckon, or compute.

All of the current AI we use to do all the impressive things we do, essentially all of it, is built on artificial neural networks. Notice the word “neural” there. That word is meant to communicate that these artificial networks do stuff the way our brains do stuff. And indeed, if you take a few steps back, spin around 10 times, take a few shots of whiskey, and squint hard enough, there is a passing resemblance. One thing you’ll probably still notice, in your drunken stupor, is that, among the thousand ways ANNs differ from brains, is that they don’t use action potentials, or spikes. From the perspective of neuroscience, that can seem mighty curious. Because, for decades now, neuroscience has focused on spikes as the things that make our cognition tick.

We count them and compare them in different conditions, and generally put a lot of stock in their usefulness in brains.

So what does it mean that modern neural networks disregard spiking altogether?

Maybe spiking really isn’t important to process and transmit information as well as our brains do. Or maybe spiking is one among many ways for intelligent systems to function well. Dan shares some of what he’s learned and how he thinks about spiking and SNNs and a host of other topics.

0:00 – Intro
3:47 – Why spiking neural networks, and a mathematical background
13:16 – Efficiency
17:36 – Machine learning for neuroscience
19:38 – Why not jump ship from SNNs?
23:35 – Hard and easy tasks
29:20 – How brains and nets learn
32:50 – Exploratory vs. theory-driven science
37:32 – Static vs. dynamic
39:06 – Heterogeneity
46:01 – Unifying principles vs. a hodgepodge
50:37 – Sparsity
58:05 – Specialization and modularity
1:00:51 – Naturalistic experiments
1:03:41 – Projects for SNN research
1:05:09 – The right level of abstraction
1:07:58 – Obstacles to progress
1:12:30 – Levels of explanation
1:14:51 – What has AI taught neuroscience?
1:22:06 – How has neuroscience helped AI?

Transcript

[00:00:00] Dan: You.

But I also think that plasticity, brain plasticity will continue to surprise us. We’ll find things that we hadn’t even imagined that the brain is doing.

I think I understand how it would feel for, like, you to be working on this really important and exciting topic and be making so much progress, and then some neuroscientist come along and says, you should be learning from what I do like.

But I feel like there’s so many basic things that we just haven’t ever answered. We’ve had a hundred years of neuroscience, and we don’t know whether spike times matter or know.

[00:00:38] Paul: It’s sad. It’s sad.

Hello, brain inspired crew.

This is brain inspired. I’m Paul. You may know my guest today as the co founder of Neuromatch, the excellent online computational neuroscience Academy. Or you may know him as the creator of the Brian spiking Neural network simulator, which is freely available. I know him as a spiking neural network practitioner extraordinaire. Dan Goodman runs the neural reckoning group at Imperial College London, where they use spiking neural networks to figure out how biological and artificial brains reckon or compute. As you likely know, almost all of the current AI that we use to do all the impressive things that we do is built on artificial neural networks. Notice the word neural there. That word is meant to communicate that these artificial networks do stuff the way our brains do stuff. And indeed, if you take a few steps back and you maybe spin around ten times, take a few shots of whiskey, and then squint really hard, there is a passing resemblance between artificial networks and our brains. One thing you’ll probably still notice in your drunken stupor is that amongst the thousand ways that artificial networks differ from brains is that they don’t use action potentials or spikes. From the perspective of neuroscientists, that can seem mighty curious, because for decades now, neuroscience has focused on spikes as the things that make our cognition tick. We count them, we compare them between different conditions during cognitive tasks, and generally, we put a lot of stock in their usefulness in brains. So what does it mean that modern neural networks disregard spiking altogether? Maybe spiking really isn’t that important to process and transmit information as well as our brains do. Or maybe spiking is one among many ways for intelligent systems to function well, anyway, on this episode, Dan shares some of what he’s learned and how he thinks about spiking and spiking neural networks and a host of other topics. You can learn more about his work in the show notes at Braininspired co podcast 183. And of course, I link to a few of the papers that we chat about today. As always, a huge thank you to my Patreon supporters. You can learn how to support the show and get the full episodes and a few other bells and whistles at the website at Braininspired Co. Okay, I reckon you’ll enjoy this episode, and I hope you do. Here’s Dan.

In a world where we’ve already reached artificial general intelligence, and we’ve done it without spiking neural networks, but with rate based neural networks, one researcher, actually a lot of other researchers as well, is sticking to their guns and studying spiking neural networks. That researcher today is Daniel. Dan. Dan Goodman, welcome to the podcast. Thanks for being here.

[00:03:44] Dan: Thank you very much for inviting me. It’s very exciting to be on this. We listen to it a lot, so we often talk about it in the lab.

[00:03:53] Paul: Apologies and thank you. So I haven’t had someone on talking about spikes in a long time, and I thought, it’s overdue. And you have been studying spiking neural networks since you got into neuroscience. I believe you have a mathematical background, hyperbolic geometry, I believe, is what you got your phd in, and maybe we can come back to that. But when you started studying neuroscience, were you immediately interested in spiking neural networks? What’s the background there? Why spiking neural networks?

[00:04:25] Dan: Yes, I got into neuroscience after deciding that I wanted to do something other than maths, and I just started reading around. And basically, I think that the thing that just popped out at me is that why does the brain use this weird mechanism?

Spikes are kind of a bizarre way of communicating information. In a way, these are binary pulses, but that they come at very precise or potentially very precise times.

Yeah. What’s going on there, and how do we think about modeling that? It really struck me as a question that was sort of interesting, I guess, from a mathematical perspective. I mean, I was coming from maths, so it seemed mathematically interesting, but it also seemed like a good, just general problem. Yeah.

[00:05:11] Paul: What made you want to venture away from math?

[00:05:16] Dan: Yeah, I guess I wanted to do something that was, I wouldn’t say more applied, but more like rooted in the real world.

I really enjoyed doing maths as, like I said, an undergraduate and as a PhD student, but it kind of. Well, my little anecdote about this is that I wrote one paper during my PhD, and I presented at a conference, and about six people were interested. And I think those are the six people that have ever cared about that paper in all time.

It felt like it was a slightly monastic experience of experiencing the world. I wanted to do something a bit more in the world, and I’d actually always been interested in, I guess, intelligence, broadly speaking.

Both my parents are psychologists.

[00:06:07] Paul: Okay.

[00:06:08] Dan: My dad had me reading David Maher at, like, 14 years old, and, wow.

So I guess it was kind of natural for me to somehow sort of find myself doing this. And then it was really confirmed when a biologist friend of mine said, oh, you should do neuroscience, they need mathematicians. And I was like, okay, I don’t know if that was true or not, but it was enough to get me hooked.

[00:06:31] Paul: So, I mean, it’s interesting, at least to me anyway, that if you ask a physicist, right, a physicist will say, well, what’s the problem? I can solve it. There’s a sort of arrogance that’s maybe being unkind, but physicists think they know the right approach to everything, and I’m not sure about mathematicians. What do mathematicians do they carry with them that same sort of arrogance?

[00:06:58] Dan: Yeah, I guess so. Although they don’t venture outside of math so often as physicists, perhaps. So perhaps it’s not quite so flagrant.

[00:07:07] Paul: Yeah, well, I can’t tell you how many conversations I’ve had with fellow neuroscientists ruin the fact that we have not had more of a mathematical background and how it slows us down so frequently. So that’s something that you don’t face.

[00:07:25] Dan: No, although there’s very different types of maths in many ways, because a lot of the math that’s done in neuroscience is done by physicists. It’s a sort of mindset that is actually quite different to what I did as a pure mathematician.

[00:07:41] Paul: Is it closer to applied?

[00:07:42] Dan: Yeah, exactly. It’s a bit more closer to it, I guess, closer to applied. So, for me, as a pure mathematician, I rarely had to deal with solving a really hard differential equation, whereas that’s bread and butter for physicists, and they get really good at it.

I guess pure math is somehow more about how much of this information can we throw away and still say something generally interesting about it. It’s a different sort of. Sort of mindset.

So, yeah, I’m still quite flummoxed by some of the physicists. Stuff like, that’s that equation called you used to solve diffusion problems. That’s a nightmare.

[00:08:25] Paul: Yeah. Exotic.

Yeah. Well, okay, so you were reading David Maher at 14, and he’s the one that so many people point to in terms of how to approach studying, quote unquote, intelligence. Right. Taking that top down approach, where you figure out the behavioral computation first, and then search for algorithms, and then finally look at the implementation level in the brain. But like we just said, that you’ve been studying spiking neural networks forever now. What is it about spiking, in general, that turns you on scientifically? You already said that. It’s just a curious way you think to reckon. And by the way, your group neural reckoning, I actually looked up reckoning today. Do people tell you this?

[00:09:11] Dan: No.

[00:09:12] Paul: Okay. Because the way that I think of reckoning is, like, the third definition, which is a comeuppance, right, when you have to finally pay your dues, but it really just means calculate.

[00:09:22] Dan: Yeah, exactly. That’s more the sense that I had in mind for it.

[00:09:26] Paul: Okay.

[00:09:27] Dan: I just didn’t want it to be like the Goodman lab. I try and avoid stuff like that.

[00:09:31] Paul: Yeah, it’s a cool name, especially with the definition that I had in mind. Also because it’s like saying, hey, you’re going to have to reckon with these spikes eventually. Right?

[00:09:39] Dan: Yeah. Well, I like that element to it as well.

[00:09:46] Paul: When I say what’s important, or when I generally ask what’s important about spiking, I get sort of one or two general answers. One, computational efficiency. Right. It’s always up there, and I’m trying to be more and more convinced that that’s a super important for intelligence reason. But the other reason is just, well, that’s how brains do it, so there must be something special about it. So do you fall into either of those camps, or do you have anything to add why you think that spiking itself might be important?

[00:10:18] Dan: I agree with both of those, and I also kind of agree that there’s also something slightly unsatisfactory about that answer. But remember can come back to that. But, yeah, no, I agree with both of those. And I think it’s not just this is how the brain does it, so there must be something special about it, but this is how the brain does it. So if we want to understand the brain, we need to understand it. Okay. Obviously, that’s less important if you’re thinking about AI or if you’re thinking about it in some more higher level sort of cognitive frame of mind. But if you really want to understand ultimately what the brain is doing, I mean, it is doing it with spikes. So we do kind of need to understand them at some point.

[00:10:53] Paul: But there are levels of understanding. Right. And one could argue as many of the modern deep learning practitioners, modern neuroscientists who use deep learning to study the brain would likely argue that, well, maybe spikes are really not that important because look at all the functions that we can provide with these rate based models.

[00:11:13] Dan: Yeah, no, I think that’s right. I mean, one starting point that you always have to bear in mind in this is that both spiking neural networks and rate based artificial neural networks are both universal function approximators. So you’re never going to find something that snns can do that anns can’t do, or vice versa. You can always do it. The question is, and this actually gets to the other point that you mentioned, is there one that for particular sorts of problems, is much more resource efficient than the other?

Suppose that there’s refined some tasks that you can do with snns that you need 100,000 times more neurons to do with Ann. Now, in a way, that’s just a resource constraint problem, right, of the sort that you find uninspiring.

[00:11:57] Paul: Well, it’s also a thought experiment. I mean, are there examples that you can think of that, I’m sorry to interrupt.

[00:12:06] Dan: Yeah. So I have to be honest and say I don’t think that there are yet examples of things where snms are definitely better. I mean, there’s things, I think, where it’s clear that’s what the brain is using. Like in the sound localization circuit, we use the timing of individual spikes, and that’s important.

And it uses properties of spiking neurons. It uses coincidence detection, for example.

So in terms of understanding the brain, I think there’s definitely cases where spiking is important. And if you do sort of like information theoretic analyses, you see that spikes timings are carrying information that isn’t carried by the rate.

[00:12:44] Paul: Right.

[00:12:46] Dan: And even going one further than that, I’d say the question shouldn’t really be like, can you prove that the spikes are important? It’s more the other way around. It’s like, can you prove that the spikes are not important? They’re definitely there. So the burden of proof should really be other way around.

But, yeah, I don’t think that we have a really clear and obvious example of like, okay, in this computation, the spikes, we know that the spikes are doing something that we couldn’t easily do with an ends.

But I think a part of the reason for that is that we don’t know how to think about the type of mathematical systems that are involved in spiking. So, Anns, it’s linear algebra, it’s calculus. We’ve been thinking about that for hundreds of years. We have amazing mathematical tools we’re thinking about that.

Spikes are weird. They’re sort of discrete, and discrete systems tend to be more difficult than continuous systems in any case. And they’re sort of continuous as well. So that’s almost like the worst of both worlds in terms of our understanding. We can’t really apply our discrete systems thinking, and we can’t really apply our continuous systems thinking. So what do we do?

Part of what happens is that we end up looking at the problems that our tools can do, right. So we have an ends that are very like, for example, I think it’s not surprising that static image recognition was one of the earliest success stories of an ends, because time doesn’t matter.

And those tools are really good for dealing with things where time doesn’t matter. As we start to get more and more interested in things where time does matter, I think maybe this is a hypothesis, right? Like, I don’t have a proof for you here. We don’t yet have the example where spikes do better, but maybe we’ll start to see cases where, if not spikes, something a bit more like spikes, some sort of something with some element of both time and space built into it starts to become interesting.

I don’t know if this is true or not, but maybe we’re already starting to see that. For example, thinking about self driving cars, that’s somehow quite different from a lot of the earlier success in that you’re not presented with an image. And like, here is an answerable question, what is this a picture of?

One feed, forward, run, bam. You know what the picture is? It’s like, here’s a continuous stream of information. At any moment, some tiny feature in the corner might become present that becomes critically important. Right? Like a child’s football flying out between two cars or something like that.

And I think one of the reasons we’re not making such fast progress on things like self driving cars as was hoped for, is that that tool is not so good for that problem. Obviously, we don’t have a spike in neural network that can do that either.

But, yeah, again, that, I think is because we don’t yet have the same quality of mathematical tools, the same vocabulary or frameworks for thinking about that.

[00:15:41] Paul: So then going back to the efficiency question, I mean, timing and efficiency are sort of wrapped up with each other, I suppose. Is this where you think that efficiency becomes important is because of the timing element?

[00:15:54] Dan: I think there’s two different elements. One of the reasons why the spiking neurons are efficient is because they’re not continuously communicating, right? They only have these bursts of information.

And obviously, that’s something that happens in time, but I think that there’s also just an element of processing continuously varying signals, I guess. So we can use anns to recognize sounds, for example. But very often those frameworks are not taking, as it were, like we are like a stream of samples over time, something like that. They’re doing a furry analysis, and they’re treating it as an image, essentially. So they’re turning this time varying problem into a static image problem, because we’ve got a tool for doing static image problems, and amazingly, it does really well. But it’s not, I think, what our brains are doing, and it’s not necessarily the right tool for that task. I don’t know if that answers your question.

[00:16:51] Paul: Yeah, I want to think of efficiency as an important for intelligence, computational principle, but sure feels like with computing speed and power these days, we can just brute force our way there. And I’m not sure that efficiency is all that important, but I want to be convinced of it.

[00:17:13] Dan: Yeah, I actually kind of think that we are maybe getting close to the point where efficiency is starting to be a limiting factor.

[00:17:21] Paul: Even people have been saying that. I’m sorry to interrupt you.

[00:17:23] Dan: No, I mean, it’s true. It’s true. People have been saying that for a long time. But I saw an analysis. I can’t remember who did this, but basically saying that if you look at the amount of energy that needs to be put in to get an improvement on various test scores in ML benchmarks, you’re getting a doubling of energy for a halving of the improvement over time. Right. So at some point, that does have to cap out. We can’t keep doubling energy to get a quarter of a percent better on the benchmark, and then an 8th of a percent better and then whatever. Right, right.

And also, I think that there’s an increasing interest in being able to do stuff that doesn’t require sending your data to a central server, which then runs it on some massive computational thing, and then sends the answer back. So, for example, that’s not ideal for self driving cars because you have latency issues. Right. If you need to make a quick decision, you don’t want to have to send the data to a server and get response back. Obviously, that’s also critical for sort of animals in the wild, but that’s a separate issue.

[00:18:22] Paul: Right.

[00:18:23] Dan: That’s perhaps also one of the things that gets people interested in neuromorphic computing. Right. Like the promise of that. So I think that there might be a limit. I think we haven’t reached it yet, but I think it was last week there was an announcement of.

I think it was, Facebook wants to buy 350,000 of these mega H 100 gpus and various analyses of how much carbon those would be dumping into the environment. And I think OpenAI had a similar sort of announcement that they want to sort of solve the energy problem in order to make further progress. I think it is becoming an issue. We may not have quite tapped out the approach yet. Yeah, I think it’s important, but maybe more than that is that there might be, like, a huge leap if instead of just like, a small improvement of energy efficiency, you could get a massive leap in energy efficiency. Maybe you can then bypass that poor scaling, where energy is doubling and performance is halving, and get some sort of, I don’t know, more linear scaling or who knows, right?

There could be all sorts of interesting things going on there.

[00:19:39] Paul: So your bet. What I wanted to ask you is whether you’ve ever been tempted to abandon the study of spiking neural networks in favor of these rate based models that have been performing so well. And I know offline you mentioned to me that you maybe weren’t initially as excited about using machine learning approaches to study brains and cognition, but that you’ve come around to them. So maybe you could elaborate on that as well.

[00:20:06] Dan: Yeah, I might have been thinking. So I went to Europe’s in 2009, something like that.

I didn’t know anything about machine learning at that time, and it was already like it was gearing up. It wasn’t quite as. It wasn’t as big as it was now, but it was gearing up. And I remember coming back from that conference and my supervisor said, what do you think of all this machine learning stuff? And I was like, just seems to be something for selling adverts, as far as I can tell, which is true, and I still kind of hold on to that. But I do feel like I really missed a big opportunity there to realize that something important was happening. And definitely something important is happening. Right. Like, the effect of machine learning on neuroscience, I think, is a good thing. I’m not one of the sort of, like, anti machine learning Luddites. I’m also skeptical about certain elements of it.

Sorry, what was the question? I feel like I’m getting.

[00:21:01] Paul: The question originally, and then I immediately switched it because I’m a terrible host, was whether you’ve ever been tempted to jump ship from spiking neural networks.

[00:21:07] Dan: Yeah. So, actually, I think I was kind of on the way to that until I read about this new development in training spiking neural networks, surrogate gradient descent. So this is this idea from Friedman, Zenker and FC, which is basically an idea from taking the algorithms from machine learning, but with a clever trick allowing you to train spiking neural networks, which were really hard. I mean, there was a technical reason why training spiking neural networks using those algorithms is hard, which is that they’re not differentiable, and all of those methods require differentiability.

Anyway, so I read about that, and I tried it out, and it just works amazingly. It’s like magic.

[00:21:48] Paul: Well, there have been multiple approaches to training spiking neural networks in the past. So why this one in particular? It just was a step beyond.

[00:21:56] Dan: Yeah, it just works, right. In the old days, we used to train things with STDP, and we could kind of get spike timing dependent plasticity. We could kind of get some interesting results from that, but it never got us. Right from the beginning, I thought of my time in neuroscience. I thought, if we want to study the brain, we have to study it in the environment in which it’s doing its thing. Right?

It deals with high dimensional, complicated, noisy signals.

It’s got a really hard task. That’s why intelligence is interesting, because it’s solving a really hard task. But we study these incredibly simple sort of laboratory stimuli where everything is perfectly controlled, and it was hard enough even to get STTP to work in those highly simplified scenarios. And then machine learning comes along, surrogate gradient descent comes along, and it just like, oh, you want to recognize an image? Okay, just give us some images. Bam, it just learns to do it. Recognize some sounds, bam, it just learns to do it.

And so suddenly we can do the thing that I always wanted to do, which is have these spiking neural networks that are solving interesting real world problems. Of course, now the problem has shifted in my mind to, how do we understand what those networks that we’ve trained are doing and say something interesting about them? And I’m still thinking about that. But at least we can now do the thing that I always wanted to do. So that’s, for me, what was a big change.

[00:23:20] Paul: So that’s one of the areas that it doesn’t bother you that it is not biologically accurate, because surrogate gradient descent, and please correct me of my description, basically does a forward pass with spikes and then emulates the back propagation algorithm, which is not biologically plausible, but it does that by replacing the unit spike with, like, a sigmoid. Right. Or like something that’s differentiable.

It doesn’t bother you at all. That’s essentially what all other rate based networks do in terms of training the network, but it’s not how brains do it.

[00:23:56] Dan: There is one thing that bothers me, which is, I don’t understand why that trick works.

And as far as I know, I keep asking Friedman about this every time I talk to him. None of us really know. I mean, we have some sort of vague intuitions, but we haven’t a really solid answer as to why this trick should work, which means we also don’t know in what cases it doesn’t work.

If we had a solid answer to why it would work, we could say, okay, well, it’ll train to do these sorts of things, but it won’t be able to train to do these sorts of things. I would feel more satisfied if I had that answer right.

But I don’t feel too upset that it’s not learning in the same way that biology is learning, because what I want to do with it is not come up with a model of plasticity, but find out what functions spiking neural networks are capable of. And for that, I don’t really care. It’s just an optimizer for me.

[00:24:48] Paul: And so then, going back to the machine learning approach, one of the reasons that you’re more on board with it now is because you can use more complicated tasks. Right. Ask your networks to solve more complicated, more ecologically valid. Would that be a way to say it as well?

[00:25:06] Dan: Definitely, yeah. And I think all sorts of interesting things come out of that. Like, things that are optimal for very simplified problems are not necessarily the same things as are optimal when things get messy.

[00:25:21] Paul: Part of what you do is come up with a more complicated task for your networks to perform. Because when you train them on these fairly simple tasks, you realize, well, they don’t actually need these features. Right. The networks don’t need these particular features to solve the task, and we can do it fairly straightforward. So part of what you do is just make more complicated tasks and figure out, well, when does it kind of break those simpler models, and when do we need the more complicated models? So I guess my question is, originally, what I was going to say is, all of our problems are messy and require these more complicated computational tricks. But maybe that’s not true. Maybe most of what we do is actually kind of simple, and it’s not necessary.

[00:26:05] Dan: I think right from the beginning, we have a very complicated sensory world to decipher. Right. There’s a lot of noise as I’m here recording. I can hear in the background the lift shaft behind me, the people talking out in the corridor. And I hope not too much of that is coming through on this microphone. And I’m filtering all of that out. That’s already quite hard and interesting. Just going back to earlier, I think that’s something that still machine learning is not terribly good at, actually. It’s extracting the background noise and understanding speech.

[00:26:38] Paul: The cocktail problem.

[00:26:39] Dan: Yeah, the cocktail party problem, exactly.

But, yeah, sorry, that was a distraction.

So I think we’re always dealing with quite complicated problems in terms of the inputs that humans are interested in.

Just thinking, again, about the visual field. That’s an awful lot of data, most of which we’re throwing away, but we don’t know which bits to throw away. That’s already quite a hard problem. Right.

[00:27:09] Paul: Right.

[00:27:13] Dan: I think we’re always doing. I guess our brains wouldn’t be so big if we weren’t always solving quite hard problems.

[00:27:21] Paul: So going back to the surrogate gradient descent that got you all excited about sticking with spiking neural networks. First of all, how close were you to jumping ship?

[00:27:31] Dan: It’s not so much I was thinking of jumping ship, it’s just that I was finding myself thinking more and more about solving problems, not using spiking neural networks. I guess.

Yeah, I don’t think I was fully planning to jump ship. It was just that my interests were sort of starting to drift off in various different directions. And then. Yeah, this surrogate gradient descent really has refocused me, I guess, on thinking. That’s really interesting and worth following up on.

[00:28:01] Paul: Do your cohorts also love the surrogate gradient descent? I mean, because there are so many different other solutions that have been proposed?

[00:28:09] Dan: Yeah, there’s a sort of whole family of things that are similar to surrogate gradient descent, but there’s definitely been a sort of leap at around about that time.

So surrogate gradient descent isn’t the only one. There’s other approaches.

For example.

So Tim Mascarier came up with a really interesting approach to treating the networks as if they had just one spike and then using the time of that spike as a continuous variable. So that was another trick for doing that. And then there’s other groups that have sort of taken a slightly sort of hybrid approach to that.

There’s various similar things that are somehow in the air.

My impression is that surrogate gradient descent, at least from what I’ve seen, works the best. Now, that’s probably going to be controversial, and probably people will tell me that. No, actually, in our tests, our one works the best.

That’s just my feeling. I think it’s good for everyone to do their own thing, but, yeah, no, I mean, I think in the world of people who are interested in spiking neural networks, it’s taken that field by storm quite a lot.

[00:29:20] Paul: Yeah, but you don’t think that having an algorithm like surrogate gradient descent that is really good at training networks, you don’t think that gives us any purchase onto how brains are learning, just how they’re performing the tasks once they’ve learned?

[00:29:35] Dan: Yeah, that’s a really interesting question. It doesn’t have to. Logically, it could be entirely separate. I think it is probably quite an interesting starting point if you’re also interested in plasticity. Okay.

You can’t directly apply surrogate gradient descent for a well known reason, which is that it uses global information that wouldn’t be available to an individual neuron. Right. It’s the same for just using any sort of gradient descent as a model of plasticity. It uses information that the neuron can’t know.

And then there’s a whole sort of slew of work that says, okay, well, can we approximate what gradient descent is doing with a local rule?

And actually, already that can generate a lot of the sort of plasticity rules that people have studied in the past. In neuroscience, this is a generating idea that can sort of backwardly explain a bunch of stuff that we did previously.

And I think that’s also a really interesting approach going forward as well. I think by studying how we could do gradient descent biologically, it’s likely we’ll come up with good ideas.

But I also think that brain plasticity will surprise us, will continue to surprise us. We’ll find things that we hadn’t even imagined that the brain is doing. Like, there was that paper, I can’t remember, sometime in the last few years that showed that neurons were exchanging little packets of rna, they were sending messages to each other in little encapsulated packets of rna.

We don’t really know what that’s doing. We know that that’s happening. Maybe that’s something to do with learning. Who knows, right? And if that’s something to do with learning, then anything is possible, right? Because we’re sending arbitrarily complex message from neuron to neuron.

And I wouldn’t be surprised if more things like that sort of show up in the future.

I don’t think we should be too constrained by a specific idea of, like, learning must happen at the synapse, and anything that the synapse can’t see must be irrelevant.

[00:31:38] Paul: I repeat myself ad nauseam about this, but the more I learn about brains, the more confounding it seems, because there are just so many different possibilities. Right. So we’ve lived since, let’s say, donald Heb and the neurons that fire together, wire together paradigm of learning, and that might just be one of, I don’t know, 20 different way, different ways, which is, I’m not sure if in some sense that’s exciting and in another sense that’s daunting. I don’t know how you feel about that.

[00:32:11] Dan: Yeah, both. But I think more exciting, right. I mean, one of the reasons I came into neuroscience, I think, is because I like the idea of this. Everything is still to play for, unlike maths, where we’ve got thousands of years of history and all the big questions are kind of already solved to some extent.

Everything is kind of unknown in neuroscience. We don’t really know how the brain is working very much at all. We know lots of little bits.

[00:32:37] Paul: And.

[00:32:37] Dan: I think neuroscience has actually been a really exciting time for that at the moment. There’s been some really cool discoveries and things you’ve talked about on the show before, like representational drift, for example, or there’s a surprising amount of synaptic turnover that exists in the brain. Right. We have this idea that memories are encoded in sort of static weights that, once learned, basically never change. That’s sort of classical way, but it looks like that doesn’t really happen anymore. So that completely messes with our way of thinking about what neural networks might be doing.

And so I think that’s great. I mean, I love that I don’t have the answers to those questions, but it’s really fun to have things that completely challenge our really basic conceptions of what’s going on in the brain, finding out that maybe astrocytes are much more important than we ever thought.

[00:33:35] Paul: Oh, I was going to ask you about astrocytes.

[00:33:39] Dan: I also don’t really know how to think about that. That’s another blow. So I think it’s really good to be really open minded about what might be important, because if we go in with too fixed an idea like learning must happen here and it must have this sort of shape, we’re not going to find answers. We have to be more open minded if we’re going to get those answers.

[00:34:01] Paul: So this is a two pronged question, then. But maybe now is the right time to ask it or them.

I agree that we know. It pains me to say this, that there is, let’s say that. Let’s say it this way, this is more optimistic. There’s still lots to learn about the brain. I was going to say that we know very little. I’ll say there’s lots to learn.

[00:34:18] Dan: You’re more diplomatic than me trying.

[00:34:25] Paul: I have a current project where it’s, like, hugely exploratory right now as well, because we don’t know some fundamental assumptions, and it just hasn’t been explored very much, and there’s not much theory to go on.

In that case, how much do you lean on theory, or how much do you think that we need theory versus how much should we be just exploring because of? And we’ll talk about some of your work that explores some hyperparameters and stuff, but because it just seems so high dimensional that the way that the brain can solve things, is it just a matter of exploring, for lack of better term, stamp collecting for a long time while our theories get shaped, or do we need to go in with a theory first approach?

[00:35:11] Dan: That’s a really hard question.

I think we kind of have to explore both ways of doing things, and that’s always going to be my answer. I kind of feel like in all cases in neuroscience, we just need to be trying out a lot of different stuff because we can’t know in advance what is going to work and what isn’t going to work, and everyone just has to make their own bet about that.

I think it’s really the back and forth between those two approaches is what generates a lot of interesting ideas. Right? Like, if we’re talking about experiment versus theory, for example, that discovery of synaptic turnover, for example, that raises a lot of questions about what our theories were before. And you didn’t need to go into finding that there was this, all this synaptic turnover with a theory. Right? Like, you can just observe that and say it, and then everyone’s like, oh, what do we do about that?

[00:36:05] Paul: Right.

[00:36:07] Dan: But also, I think, and this is something maybe we could do more of. We could say, here’s a thing that would make sense for the brain to be doing as a theoretical perspective. Can we do better at probing that with experiments? I think we do that direction. I would say less well than the other direction. But when you said theory, do you mean theory versus experiment, or sort of like theory, mathematical theory versus, like, simulations or something like that?

[00:36:37] Paul: Well, sorry, I meant more along the lines of a David Maher kind of computation approach. Right. So the brain is doing x, right. Or let’s say, like plasticity. Right. So we could come in and saying, well, the brain has to change synapses so that they’re the perfect weights. And then when we start looking and we see that, oh, these weights are constantly undergoing turnover, they’re never the same, it’s a much more dynamic process. So what does that do to our theory? Do we stick to theory and say, well, there has to be some level of static connective strength, right? Because that’s what our theory says needs to happen, but that doesn’t exist. So then where do we go from there? Right? So an alternative approach is to measure these things, to stamp, collect and say, well, how much is it turning over? And then build your theory from there.

[00:37:27] Dan: Yeah. And again, sorry if this is boring, but I think we just have to do. We have to do both, right? And you see this in response to this thing about synaptics over turnover, right? Like the first responses from theory community to that were, okay, but maybe there’s something that’s still invariant, and then we’ll switch to thinking about that thing that’s invariant. And maybe that’s true, but maybe there’s a more interesting answer. But whatever that more interesting answer will be is probably something that’s harder to come by.

Going back to possibly something I said earlier, we kind of have this static way of analyzing things, that we have this mathematical frameworks that are good for that, and we’re less good at dynamical thinking in some sense. I mean, we have dynamical systems, but it’s not exactly what I mean, a dynamical system, because that’s also, in a weird way, as sort of a static thing.

[00:38:21] Paul: It’s all states, spaces.

[00:38:23] Dan: Exactly.

[00:38:24] Paul: Which is static.

What will be beyond dynamical systems?

I’m partial to a process philosophy based approach, but I struggle mightily to think about how to apply it practically in experimental settings.

[00:38:44] Dan: Yeah, it’s super hard.

We don’t have the existing mathematical tools to do it. I think.

[00:38:53] Paul: Is. Is that the barrier you think?

[00:38:55] Dan: For me, that’s one of the big barriers. I just don’t know how to theorize about this in a satisfactory way with maths, because I’m originally a mathematician. It always comes back to maths for me, basically, I’ll have understood something once I can turn it into a sort of mathematical way of thinking about it, which isn’t true for everyone. Yeah, I guess some of what I’m doing in my work is trying to come up with simple enough that I can make progress on them, cases where that slightly more dynamical way of thinking about things makes sense. And then I’m sort of hoping, I guess, that somehow by looking at lots of these sort of examples, something will jump out at me, which I can’t say that it has yet, but, yeah, so in a way, I’m almost doing phishing experiments in the sort of pool of possible computational simulation experiments or something like that.

[00:39:57] Paul: Yeah, maybe this is a good time to talk about heterogeneity in time constants. Right. So is this one of those cases where you, I don’t know how you decided to test that in particular among all the different kinds of hyperparameters that you could have tested? But can you explain kind of what you did and what you learned from it and maybe why you did it?

[00:40:19] Dan: Yeah. So this is a paper that came out a couple of years ago where basically we showed that, well, in a lot of work on spiking neural networks, not all work, but quite a lot, people would say, okay, here’s the neuron model. It’s got these parameters, and we’re going to see what happens when we sort of train the weights to do various tasks or something like that.

And actually, that project just came out of me saying, well, look, this gradient descent thing is we can just make those neuron parameters trainable as well.

In fact, it was a one line change to the code to make those parameters trainable versus not trainable. It was as easy as that in a way. What happens when we do that?

And what we found was that if we made the time constant, which the reason why we picked time constants actually is we looked at other stuff as well, is just that for a very simple leaky integrated fire neuron, there’s not so many parameters that are actually in it. Right? Time constant is one of the main.

[00:41:16] Paul: Ones to define time constant. Just to be thorough, right, time constant.

[00:41:21] Dan: Is basically how, for a leaky, integrated fire neuron, at least, it’s how quickly the neuron forgets its previous inputs. So if it’s got a very short time constant, the only thing that matters to the neuron is the most recent few spikes, or the spikes that have arrived. Let’s say it’s the time window that it remembers over, right? So if you’ve got a one millisecond time constant, only the spikes that have arrived in the last couple of milliseconds will matter. Anything that arrived before that will have been forgotten. If it has a time constant of 100 milliseconds, it’s all of the things that happened in the 100 milliseconds before that. The matter.

When I say it like that, you might think, okay, well, why shouldn’t you always have it be longer.

Well, also, the timing doesn’t matter so much within that window. Right. So if your time constant is one millisecond, because it’s only things that happened in the last millisecond or a few milliseconds that matter, timing is now very important because all of those things happen to have to happen simultaneously. So it’s doing coincidence detection. If it’s like 100 milliseconds, it’s basically just how many spikes have come in recently. So it’s switched from being a more temporal to a more sort of integrating neuron. Right. So what we found was, basically, we tested it on a whole bunch of different tasks, and the tasks varied in how much temporal structure they had in them. So at one end, they were just like static images, essentially. So there’s no temporal structure to them. We can do that with spiking neurons, but we’re not really, in a sense, getting anything that we couldn’t do with an artificial neural network. At the other end, we were looking at sounds which have a lot of temporal structure in them. And basically what we found is that the more temporal structure you had, the more you got an improvement from having allowing this heterogeneity in the time constants. And the gain could be quite dramatic for those most temporarily complex tasks. So that essentially, by adding in that heterogeneity, you got as much increase in performance as you would get from multiplying by ten the number of neurons or 100 or something like that in the network. So if you hadn’t got that heterogeneity, you’d need ten or 100 times as many neurons. So that was one part. And then we also found that after training, if you just took a histogram of the time constants, that is found, they had a very sort of characteristic shape, that when you look in real data, you also see the same characteristic shape. So that was also kind of interesting.

[00:43:42] Paul: One of the points that I’ve read or heard you make is that this is essentially a free, energetically free way for brains to be more robust and to learn better, because it doesn’t cost anything to build in that heterogeneity.

[00:43:57] Dan: Yeah. In fact, it’s even further than that, which is that it may actually cost us to not be heterogeneous. Right.

To have everything be exactly the same might be more expensive for the brain than to have everything be variable. And you also see that in neuromorphic computing. I don’t know if you want to talk about that now or later, but sure, in some forms of neuromorphic computing devices, there’s sort of, like, noise in the manufacturing process, so that, as it were, the equivalent neural properties, it would be really hard to get them all exactly the same or expensive or whatever, to have every property be exactly the same. So having them all be a bit noisy and spread out is actually more the default thing.

And therefore, again, in neuromorphic computing, it may actually be energetically cheaper to have heterogeneity than not to.

Or alternatively, it may be that you just have to live with that if you want to implement things in brains or in neuromorphic computing devices.

[00:44:59] Paul: You don’t have any idea. Sorry, this is an aside, but whether that heterogeneity is a species dependent. So, like, maybe. Maybe lizards don’t have much heterogeneity. They only have really low time constants. I’m sorry if this is a naive.

[00:45:22] Dan: So I use data from the Anand institute, which has this unbelievably amazing database that they just make freely available to everyone. And it was in there across several different species.

Okay, so it was there in humans, it was there in primates, it was there in cats. I can’t remember. There were a number. And mice, maybe there were a number of species in that database, and it was there in all of them. But also, I know that you’ve had Eve Marder on and talked about the STG and the fact that in those circuits, those neuron parameters vary by orders of magnitude from crab to crab. Right.

So it seems to me that it’s likely that heterogeneity is probably everywhere, although, I mean, I’m not an expert on. Maybe in selegans, things are cleaner. But I thought I was going to.

[00:46:13] Paul: Ask about sea elegance next. But I’m sure it’s actually known in sea elegance. But the reason why.

[00:46:17] Dan: I just don’t know.

[00:46:18] Paul: Yeah, me either. I think the reason why I was asking that is because it dawned on me that evolutionarily, perhaps, it’s not energetically favorable, it’s not free. Although the way that. To build in heterogeneity, although the way you said it earlier, that it actually costs more to not have the heterogeneity, maybe that’s the better way to look at it evolutionarily as well.

[00:46:43] Dan: Yeah.

I feel like I’m not enough of a sort of geneticist or a developmental biologist to answer that question. I think that would be a really interesting question to know the answer to. Is it more expensive to have all of the neurons being the same, or is it more expensive to have them all be different. I just don’t know. But I think it may be the case that it’s more expensive to have everything be the same. In a way, you need to have some sort of quality control mechanism that forced them to be the same.

[00:47:13] Paul: Right, right. Which is impossible.

[00:47:18] Dan: Well, I think if it mattered that they had some precise value, then we’d probably evolution would have found a mechanism to make sure it had that precise value.

[00:47:28] Paul: But if it doesn’t matter either way, then you would expect that part of the state space to be explored in some species.

[00:47:36] Dan: I suppose that’s true, yeah. So there might be somewhere it’s more heterogeneous than others.

[00:47:43] Paul: Okay, so this is, like, one hyperparameter. We were just talking about how complicated brains are, how messy they are, how there are so many different hyperparameters that you could play with. And you just mentioned Eve Martyr and her classic work on showing that there’s lots of different ways to do the same thing, and there’s one way to do multiple things.

Going back to the idea of theoretical approaches, part of your tagline on your website is that you’re looking for unifying perspectives, unifying approaches. But then it seems like the more we discover, the less unifying it actually is, because as you fish, you catch a bunch of different kinds of fish. And this fish that we were just talking about are these time constants. Right. But that’s just one of many hyperparameters that’s, like, constantly overturning and constantly changing in this dynamic. Highly recurrent, et cetera, et cetera, brain.

Do we need, like, a theoretical approach for sort of each question? Right. Do we need a theoretical approach for time constants? Is that going to be its own thing? Or.

I’m not saying a unifying theory of the brain, but then the alternative to that is to have 10,000 theories of the 10,000 processes happening in brains. So where do you fall in thinking about that?

[00:49:00] Dan: It’s certainly a possibility that there won’t be a unifying idea, but I think it’s always more interesting to look for them. And so, for example, one of the things that we looked at and we didn’t find an answer to this is we wanted to try and find a mathematical explanation for why this heterogeneity had these properties that it did. And I think we had some intuitions. For example, there’s some stuff from sort of, like, random network theory in machine learning that basically says that having some randomness in the structure can often be beneficial. And I think that they also don’t fully understand exactly what’s going on there. But again, it’s been demonstrated in multiple cases. And maybe what was going on is somehow the same. Right. If everything is the same versus there being some random structure to it, something about robustness may fall out of that. I’m being very vague here because we didn’t manage to pin this down exactly, but it feels to me like there could be an explanation of what was going on that could both explain why it is that these random matrices have interesting structuring, and also explains why it is that having heterogeneity and neuron properties is valuable. I could imagine a theoretical explanation that does cover all of those cases, even if I don’t quite have it yet.

So I think that there can be those unifying principles, but they’re hard to get out.

[00:50:28] Paul: Would you not be satisfied with a non mathematical explanation?

Because randomness, right? I mean, I guess randomness is mathematical, but yeah.

[00:50:45] Dan: Maybe it’s because I’m a mathematician. But for me, my bias is like what it is to understand. It is to be able to reduce it to maybe not like necessarily a simple equation, right? But something that, if it’s understandable and it’s a quantifiable thing, I feel like that’s what maths is to me, in some sense, it’s putting that understanding on there. So for me, it kind of would be that. But maybe that’s just because I have such a strong mathematician bias that I can’t imagine any other sort of understanding that makes sense.

[00:51:17] Paul: Let’s talk about sparsity. So, sorry if this is kind of a leap, but sparsity is on my mind a lot these days, and you had mentioned to me that you have begun to think of it as an important principle of brain reckoning. I’m going to start using reckoning all the time now, by the way. Thank you for that.

The reason why it’s on my mind is because I’m recording neurons in mouse motor cortex and basal ganglia, and these are like really low firing neurons, so they have a sparsity to them. And it has been difficult to get a purchase on how they’re encoding ongoing behaviors because there’s just not as nearly as much structure there as there is in, let’s say, non human primate motor cortex, or different areas while they’re performing tasks. Why do you think sparsity?

Why have you come to think, I suppose, over time, that sparsity might be an underlying important principle.

[00:52:15] Dan: Yeah, well, I mean, sparsity is one description of spikes, right, that’s temporal sparsity. They’re just these messages that occur infrequently in time.

And you also have sparsity in space. You have a lot of that in the brain, right? You’ve got n neurons. You certainly don’t have n squared synapses connecting all of those. That would be way too many.

So we have to have a much sparser set of connectivity.

I guess the reason I think that there might be something unifying those two is that we’ve just seen a lot of cases where putting things through a bottleneck can have really valuable computational properties. So, like in machine learning, for example, you’ve got auto encoders, right? You take a high dimensional thing, you squeeze it down through a few layers into something low dimensional, and then you squeeze it back out again. And that auto encoder network often has a lot of interesting things going on in it.

Basically, it forces you to throw away information that isn’t so relevant. So it has a sort of compression factor, and it often discovers structure in that much lower dimensional representation. So I think that there’s an interesting computational properties, and then there’s also this famous information bottleneck principle, right? Which is a way of sort of analyzing. So the way I think about it, I don’t know if this is the way everyone thinks about it, is, for me, it somehow defines what a computation is.

I think in neuroscience, we often have this idea of representations of something else, right?

The retinal code is a representation of the visual image, for example. And then quite often, you have further transformations that somehow have this representation of quality. You can reconstruct the input from the output. But ultimately, I think what the brain has to do is it has to start throwing away information. It has to say, this is the information that matters, and I want to throw everything else away.

And for me, that’s what makes a computation interesting in some sense, right? That you’ve thrown away irrelevant information, and that’s kind of what the information bottleneck principle tries to encode. It’s like, what is the transformation that maximizes the information about the thing that I care about and minimizes the information about everything else? And I find that a really powerful sort of mathematical framework for thinking about the sorts of computations that I would expect. Particularly, I think the sort of early perceptual systems, like early visual system, early auditory system, would be doing. They would be like, our main job at the start is to keep the relevant stuff and throw everything else away because there’s too much data back to efficiency. Back to efficiency, indeed. Yeah.

Right. Exactly. So that’s why, I guess, I think that those two questions of efficiency and what the interesting computation is aren’t entirely disconnected because it seems like you could think about this bottlenecks as being about efficiency, but they also seem to be actually about being able to do the task well at all, like throwing away irrelevant stuff. Ultimately, you’ve got to do that. If you do the task.

[00:55:28] Paul: You don’t have to. I don’t think that you have to. Right. Because let’s say an auto encoder, there’s a bottleneck, right? But then in an auto encoder, you’re reconstructing, often you’re reconstructing the original signal. And so that means that there is information gained after the bottleneck that has been learned. Right? So the system has learned to just take the quote unquote low dimensional representation and then fan it back out. But that information is encoded in the connections of that network. So in that sense, you’re not really losing information if you can just regain it.

[00:56:00] Dan: So in an information theoretic perspective, you can only ever lose information. You can’t gain information.

So, no, you’re not gaining information when you reconstruct, although maybe it kind of looks like it. It’s just that you’re, I guess, highlighting information that was already there by reconstructing the original image, for example.

[00:56:25] Paul: Okay, so I know, like in Shannon, we don’t have to go down this road, but in Shannon, information then, yeah, you’re not gaining information, but because you have a structure that does transform the signal back into a different high dimensional state or whatever one could say maybe non Shannon informationally, that information is built into the structure and we don’t have to talk about meaning versus information.

So I guess technically you’re right.

[00:56:54] Dan: Yeah. You could think of the auto encoder as just being a way to train to do the encode bit, you don’t have to do the decode bit afterwards. Right. You could work with the encoded thing.

So the fact that you decode is there in order to make sure that you’re not throwing away relevant information.

[00:57:13] Paul: Right. And to your point, in terms of needing eventually to perform the task in the past decade or 215 years, maybe there is all this work on motor cortex, motor brain activity, being on this low Dimensional manifold. And this is where the dynamical systems approach has really shown S-H-O-N-E that once you’re performing the behavior, it’s actually quite a low dimensional representation, if you will.

[00:57:45] Dan: Yeah. Although I think that there is still some open questions about to what extent. That’s just because the task that we’re asking them to do is low dimensional. But let’s not get into that.

[00:57:54] Paul: No, that’s like my current research world right now. It’s actually kind of frustrating. I need your theoretical abilities and your mathematical abilities to help me. Yeah. So sparsity, did we wrap up enough on sparsity?

[00:58:08] Dan: Yeah, I guess so. For me, I think it’s tied up in this idea of bottlenecks being important, and it gets at this thing that I would like to get at, which is that there may be something interesting both in spikes and in sort of sparse connectivity structures. More generally, that is more than just about resource efficiency.

[00:58:32] Paul: How do you get at that? How do you design that? I guess not experiment, because you’re not an experimentalist, but how do you move forward with that? I’m genuinely curious.

[00:58:41] Dan: Well, one of the things that we’ve been doing with one of my PhD students recently is trying to understand to what extent just having sparsity in a network causes the different elements of that network to learn different functions.

So that’s like modularity, basically, like sort of functional specialization.

And what we found is it has to be incredibly sparse to automatically learn different functions. Now there’s other ways you can get specialization, right? There’s learning rules that can encourage it to be specialized. There’s training regimes that can create specialization. There’s all sorts of other. And we’re not saying that that’s the only route to specialization, but this was, I guess, my first attempt to try and start to think about does that sparsity on its own have interesting implications for the types of computations that networks can learn? For example, and in that case, it looks like sparsity on its own wasn’t quite enough. But what we did find was that sparsity combined with other forms of resource constraint did create quite robust specialization. Right. So if you had tons and tons of neurons that it didn’t need to learn to separate functions into modules, if you really cut down the number of neurons to the absolute bare minimum, then it did. Or you could put other types of resource constraint on it. So that’s one way that I’m trying to, I guess, trying to get at that.

[01:00:08] Paul: Yeah. So that’s an interesting thing because you kind of have to keep everything else still while you change sparsity. Right. Which is not the way that brains work. So it’s almost a cheat.

[01:00:18] Dan: Yeah. No, it was nightmarishly difficult to isolate just sparsity and keep everything else relatively controlled. And when you look at that paper, I think there’s a bunch of slightly od choices, and those od choices are because we were really trying to control everything. Right. But that was really hard.

And I think that there is an open question about how much we learn, given that we had to do so many odd choices and control so many things. But that’s what we were trying to get out was just like, just isolating the effect of this one key, where we also introduce resource constraints as well.

[01:00:56] Paul: Right.

[01:00:57] Dan: Several few variables and have them more be precisely controlled.

[01:01:01] Paul: Yeah. So this would be a good time to ask you, then, what you think about the naturalistic turn in neuroscience where. So the traditional neuroscience is control everything. Do very controlled reduced experiments where you try to control everything like you’re doing in your spiking neural networks. Right. And these days. So the data set that I’m working with is just a mouse walking around in a box with an electrode in, not tasked with anything, not doing anything cognitively complex or anything.

Under that regime, you still control. Well, it’s in a box. Right. And you control its environment, but you don’t control where it looks, you don’t control how it moves, where it moves, et cetera. And it’s proving difficult to.

I want to do an experiment. I’m constantly wanting to do an experiment instead of. Which is fascinating, and I’m really interested in it, but it means that things are going really slowly because we don’t really know how to think so much about these kinds of less controlled experiments. So what is your take on this ecological turn, if you will?

[01:02:06] Dan: Yeah, I mean, I think that it’s hard, but we have to do it because that is what the brain is ultimately trying to do. Right. It’s not trying to solve to alternative force choice with every other in a dark room. Right.

So if we really want to understand what it’s doing, we kind of have to do that. We have to deal with more ecological environments. But, yeah, I totally agree. That really leaves us hanging in a like, so what can we do in terms of an experiment? I mean, I think we can certainly learn some things by just letting the animal just do its thing and recording what happens. There’s that great paper by Carson Stringer and others where I think that.

I think it was a mouse just noodling around, looking at stuff. And they also recorded its facial muscle movements using cameras. And they did discover something really interesting from that. They discovered that you could explain as much variance in visual cortex from knowing the facial movements of these mice as you could from knowing what it was looking at. So that’s a discovery. And another one that I think we haven’t really got our head around what that means, but I guess that doesn’t get everything you might want to learn. Yeah, I guess the conflict between having control so that you know what the effect of what you’ve done is and being in a sort of naturalistic environment, it’s really challenging.

[01:03:35] Paul: I mean, there’s an argument that when you do control for everything, you’re actually building in the answer by controlling it.

[01:03:41] Dan: Yeah.

[01:03:41] Paul: So I think I’ve read from you that there’s a lot of low hanging fruit. I’m not sure if that’s the exact phrase that you used right now. Right now there’s a lot of low hanging fruit in terms of spiking neural networks.

I know you have your own projects, but what are some things that you think some projects that people should take on that you’re not personally invested in? If you just had to give advice to someone wanting to use spiking neural networks to understand something?

[01:04:09] Dan: Oh, that’s a good question. I think that there’s a lot of mileage in just seeing how different neuron properties can aid different sorts of function.

So, like that heterogeneity paper that I mentioned that we talked about earlier, that was basically like, what is the contribution of time constants to function?

[01:04:30] Paul: And it’s best to always use one line of code if you can.

[01:04:33] Dan: And if you can do it with just one line of code, it’s all the better.

But I think that you could just generate enormous number of those. You could be like, okay, well, let’s take that one step further. Now, what about dendrites? Are dendrites useful? We could run the same thing in dendrites. What are they doing? And I know that there are people out there doing that already, but there’s all sorts of like, okay, pick an ion channel, right? Look at its dynamics.

What’s that useful for? Or is it just that that’s a mechanism that’s there? Is it not really contributing to function or is it really particularly useful for someone? I feel like you can do a lot with training a network with and without some mechanism and then saying, did that help? I think that would be a low hanging fruit.

[01:05:24] Paul: But you have to ask the right question. In that case. You have to ask the right question. You have to have the right task. I mean, this gets to something I was going to ask you about the right level of abstraction. Right. Since you mentioned ion channels for some reason ion channels are the first thing that people scoff at when they say like, that’s too. But, and you mentioned the Allen Institute. There are people like Gaut Einvall who has been on the podcast, know, had these highly detailed simulations, right, to understand how the neural signals that the brain produces, how those arise, and then how better to study them. And then you have people who say, well, ion channels, dendrites, that’s too much detail for what you’re actually asking functionally to understand it. And you’re somewhere in between there, I suppose.

Are you at the correct level of abstraction? And how do you know?

[01:06:14] Dan: Well, I’d like to think so, obviously, but yeah, no, I mean, it’s a guess and everyone makes their own guess.

I guess my preference is somehow to, I don’t think that those things are irrelevant. I don’t think dendrites are irrelevant. I don’t think ion channels are irrelevant. But I also don’t think that we can just throw all of those details into a massive box and gain understanding.

We have to approach that in a more simplifying, abstracting way. This is just my own personal preference. Right?

So for example, I really like the approach of dendrify, which is a new piece of software that came out in the last year or so, which basically takes a detailed dendritic model and reduces it sort of automatically to just like one or two compartments. And now, okay, now I can maybe understand one or two compartments. I can train those one or two compartments. I can see what things that I can do with that. And maybe that tells me something about what I could do if I had 1000 compartments.

But I don’t want to start with the thousand compartments because I feel like I’ll never get anywhere if I try and do that. And similarly for ion channels, I think that there’s all sorts of things that definitely are potentially interesting, right? So you have different sorts of inhibition. You have like shunting inhibition, which has a different sort of inhibition, like where the inhibition falls on the dendritic tree can be important. Those things might all be really important, but I want to study that in an abstract way, not by coming up with a single neuron model that requires 100,000 parameters of which we only know five and we have to guess the other, whatever.

But I do think all of those things can be important. And time constants is, I mean, in a way it’s just ion channels, right? It’s just one particular abstract view of ion channels.

[01:08:11] Paul: What’s something that’s holding you back right now? What are you stuck on that you feel you need some breakthrough to make progress on?

[01:08:21] Dan: I think there’s two things, I guess. So the easier thing, which I think is just going to be solved at some point.

Okay, let’s say five to ten years. That’s my prediction always. Exactly.

No, it’s a fair comment. It’s a fair comment even before I’ve said what it is.

And that’s basically how to train spiking neural networks as efficiently as we can train artificial neural networks. Because right now, so I’ve talked about surrogate gradient descent and how much I love it. It’s incredibly resource hungry to train.

[01:08:55] Paul: Oh, okay. I didn’t realize that.

[01:08:58] Dan: Yeah, no, it’s interesting because the goal there is something very resource efficient, spiking neural network, but the training is actually much, much less efficient and that’s why we haven’t done it at scale.

The largest spiking neural network I’ve trained with Sargate, Greg, in descent is maybe a thousand neurons. Okay. I think that the largest that anyone’s trained is in the small tens of thousands. And it’s just because the memory consumption of surrogate gradient descent grows very rapidly with number of neurons. So that’s a sort of technical problem and I suspect that we’ll just guess. I mean, there’s lots of people working on solving that. There’s lots of different approaches being tried and none of them have quite worked perfectly yet, although there’s been progress. So I suspect that’s just going to get solved at some point because that feels like just like a purely technical, in some sense, I think that it might involve quite fundamental insight about learning to do that. So it’s not purely technical, but it just feels solvable. It feels like it’s a well enough posed problem that we can solve it.

[01:10:00] Paul: You don’t think it’s just going to be stumbled upon by enough people poking around in different directions?

[01:10:06] Dan: Yeah, it’s possible.

I guess the reason is, what I just said is it’s well posed, and when a problem is well posed, I kind of believe in a solution.

[01:10:21] Paul: Interesting.

[01:10:21] Dan: The less well posed problem that keeps holding me back is I think we touched on it a few times, is the lack of a good mathematical framework to talk about. A lot of the stuff that I’m interested in like to talk about sort of systems that are both discrete and continuous. So spiking, for example, or to approach sparsity in machine learning. Sparsity is a really interesting but slightly niche topic. There definitely are people who are interested in it, but the tools are really much less well developed for sparsity. If you try and use sparse connectivity in one of the big machine learning toolboxes, you’ll find that it’s terribly inefficient, and it’s because that theory development isn’t there yet, both at a technical level, but also, I think, at a conceptual level, we haven’t got the right mathematical concepts to approach a lot of this stuff.

[01:11:21] Paul: So sparsity is something where its importance is recognized without having a good fundamental concept of why.

[01:11:29] Dan: I think so, yeah.

[01:11:31] Paul: And you’re also interested in modularity and understanding modularity better. Is that something also where we don’t have enough mathematical tools to approach? Does that fall under that umbrella?

[01:11:45] Dan: Yeah.

In that paper, we spent a lot of time, in fact, probably the most time in that project, just coming up with a measure of whether a piece of the network was specialized for a particular function or not.

And that was surprisingly difficult. So near the beginning of that project, I think I asked on Twitter, how do you define whether something is specialized on something? And that created this huge discussion, and there were no really clear answers that came out of it. So we came up in the end with three measures, and we were kind of satisfied in that these three measures, which were kind of different from each other, kind of qualitatively did the same thing in our model. So it felt like maybe they were measuring something meaningful or real. But again, that’s a very vague thing to say. Right. Like, we’ve got these three measures, and they qualitatively quite similar. You’d like to say something a bit more concrete than that.

[01:12:39] Paul: Ideally, this is where I’m continuing to try to wrap my head around this, about how to even talk about it. But this is where I want to have principles that can point to something like complexity. Right. Well, in a complex system, this happens and be satisfied with that answer as its own unifying principle. And that’s a terrible example, because complexity is like such a poorly defined term, encompasses so much, but principles of that nature. I want to be able to confidently state and mean and feel comfortable doing so, and I’m not there yet. Do you think that that’s a possibility, though?

[01:13:25] Dan: Yeah, I think so. I think we will find the right concepts. I’m so optimistic about that.

But I do think that we’re still quite far. And I think, in a way, that’s why neuroscience, in a way, not everyone agrees with me on this. I kind of feel like neuroscience is almost like pre parodynamic paradigm.

[01:13:49] Paul: A lot of people do, a lot of people do agree with you.

[01:13:51] Dan: Yeah. Well, it’s a controversial one. Right.

But I feel like there’s so many basic things that we just haven’t ever answered.

In a way, we’ve had 100 years of neuroscience, and we don’t know whether spike tones matter or not.

[01:14:06] Paul: It’s sad.

[01:14:08] Dan: Well, it’s sad, but it’s also just.

It’s because we don’t have the right frameworks to answer questions like that, I guess.

But I think that the development of those is just hard. Right. Like licking physics. Right. The development of the concept of mass, for example, as opposed to weight or whatever, that really unlocks a lot of stuff. But that wasn’t easy to come by. That took a lot of development before we got that concept. Right. And once we got it, it unlocks so much stuff.

I think that we may well have a similar thing here, that we’re still in the sort of stumbling around the dark, just trying out a load of stuff.

But I do think that we will make progress.

[01:15:01] Paul: What do you think is the.

If you had to point to the single most valuable thing that artificial intelligence, let’s say deep learning, has taught neuroscience, what would you say, if anything?

[01:15:17] Dan: I mean, for me, the big thing is it’s the training algorithms.

There are these algorithms that, surprisingly mathematical, can do it.

There’s a lot of surprising stuff there, right? Like stochastic gradient descent, for example, compared to just hill climbing. Sort of standard gradient descent lets you learn really hard tasks, and you throw in momentum into that and you can learn even harder tasks, and we can just take those tools and start using them. For me, that’s brilliant.

So that’s the technical thing. I think, intellectually, there’s a really interesting thing as well, which is that what we think is hard might not be what’s actually hard in some sense. Right? Like, it feels incredibly difficult for us to imagine being able to draw a picture of a tiger in a top hat in the style of Van Gogh or whatever. Right. But it turns out that’s actually easier than catching a ball.

[01:16:30] Paul: This is Moravc’s paradox. Yeah.

[01:16:32] Dan: Who’s that? Sorry, I didn’t.

[01:16:34] Paul: More of x paradox, where it’s. Yeah. A lot of things that we take for granted that are super easy for us are hard for machines. And vice versa.

[01:16:40] Dan: And vice versa. Right.

So I think that’s also really interesting.

What I’m interested in when I look at these machine learning models is that because we know that they’re not as capable in some sense of general intelligence as us, what they’re doing is in some sense simpler. And it tells us that this thing that we were looking at, that we thought was the source of all understanding of our own intelligence, it wasn’t actually quite that. Like, it helps us, I think, focus our attention on what we don’t know and what we still need to understand better in some way. Which is not to say, sorry, I’ve heard this one a lot. So I’m not saying that anything that machine learning can do is automatically not interesting and not what real intelligence is about. I think at some point we’ll have enough of these pieces that it will just be intelligent. It turns out that is what intelligence is.

It’s more that you can do surprisingly much with tools that are not necessarily perfectly aligned to that goal, I think.

And so for me, I think of large language models like that.

I don’t think that they’re doing my bias. I don’t think that they’re doing language or reasoning the way we are doing reasoning. But you’ve got to admit that they do an amazing job and that this thing that isn’t really what we’re doing turns out to be good enough to do 95% of what we do.

But it’s also interesting that they’re surprisingly so bad at some stuff. Like, on a whim, I tried out testing whether GPT could match parentheses, and it can’t match parenthes.

[01:18:25] Paul: What do you mean, match parentheses?

[01:18:26] Dan: Oh, right. Like, are they the same number of open parentheses as closed parentheses in a bit of code, right?

[01:18:31] Paul: Yeah.

[01:18:32] Dan: So why is it that it can’t do this incredibly basic task, but if I ask it to write me a program to do x, it can just do that, even if it’s something that it hasn’t seen before?

[01:18:42] Paul: I would argue humans are not very good at counting open and closed parentheses, at least in my coding, unless it auto fills for me.

[01:18:51] Dan: Fair enough. But if that’s what we were trying to do, I think we could probably do it right.

We can force. But it’s really surprising how bad it is, actually.

You can give it examples where it’s obvious that the parentheses are matched and it just guesses at random, basically, and often just tells you completely unrelated stuff.

I did that and I looked into it, and it turns out that there’s all sorts of surprisingly simple things like that, that large language models are bad at. They can’t count. So if you write like, aaa, how many a’s were there? It gets that wrong. Things like that.

[01:19:26] Paul: Yeah.

[01:19:28] Dan: And of course, each time anyone finds one, the people making them change the model so it can do that. So they never last very long, these things. But it’s amazing that this thing that can’t do is not something we think of as so simple. It can write us a program to do really complicated functions or draw us a picture that’s better than we could draw ourselves or summarize a paper or a field or in a surprisingly accurately. Right. Like it tells us that what we think is difficult is maybe not quite right. And I think, what is that valuable?

[01:20:01] Paul: To me, that’s fairly humbling in terms of thinking of my own intelligence. Right. Because we value those sorts of skills. Like you were just saying above, catching a ball, for instance. But then I see a machine do something that’s very what I would consider an intellectual feat on my own, or something that I can’t do, but I could just ask a large language model to do it. Do you feel the same as. I feel that.

I’m not sure if humbling is the right word, but a sense of like, oh, maybe my reaction is like, oh, maybe that’s not very intellectually difficult. Maybe it’s not very difficult.

[01:20:41] Dan: Yeah. And I don’t really know if difficult really makes sense, but I have the same feeling as you. Right.

Maybe I thought being able to write nice prose was important, but it turns out that it can do a much better job at that.

[01:20:56] Paul: Here’s my question. Does it reduce the value of humans doing things like that? Does it reduce the value of poets, let’s say. Right. Not as human beings, but in terms of how we revere certain talents?

[01:21:15] Dan: In a way, we can already answer that. Right. Because we’ve seen it in things like chess playing. Right. It’s been now a while since we weren’t as good at chess as computers, and I feel like we’re still interested in. I mean, I’m not. I don’t. I don’t play chess, but I used to do chess, but I feel like people who are interested, I don’t know, has interest in chess waned since computers got better at it? I’m not sure that it has.

[01:21:40] Paul: I’m not sure either. Yeah. I don’t know.

[01:21:42] Dan: People are still playing it.

[01:21:44] Paul: Yeah. And I think it’s worthwhile. I’ve taught my son how to play chess, and we play occasionally. It’s been a while, but I’m also less interested in it as an intellectual pursuit, I think, because of the success.

[01:21:59] Dan: Anyway. I mean, I think that’s probably going to be at some point an existential question for our species at some point. But we’re not quite there yet, fortunately.

[01:22:08] Paul: No. Five to ten years.

[01:22:09] Dan: Five to ten years, exactly.

[01:22:11] Paul: Let me ask the converse of the question I asked you before, and then I want to talk about your recent sort of metascience stuff, and I won’t keep you all day. So I asked you what you think that AI has done for neuroscience. What do you think that, let’s say, your own work and or just neuroscience in general has done or will do for AI?

[01:22:32] Dan: Yeah, I think historically, I wouldn’t say it as simply as what has neuroscience done for AI? I think if you look at the early history, it was almost that they weren’t separate questions.

[01:22:45] Paul: Right.

[01:22:46] Dan: The people who were doing one were doing the other.

They didn’t think of it as separate questions. And more recently, they have diverged. But I think that’s almost a shame. I feel like it would be nice to have some of that early energy of, like, this is somehow the same question that we’re approaching, and we’ve got different ways of approaching it.

[01:23:07] Paul: Part of that is just the phrase artificial quote intelligence, which I’ve come to despise because it reifies intelligence, which maybe is not a thing. Anyway, sorry to interrupt.

[01:23:17] Dan: Yeah, no, I agree. I try and always talk about machine learning rather than AI for that reason. I think it’s one of those hot buzzwords.

But as to what that interaction could meaningfully be about, I think I’ve encountered from people who are, like, from machine learning people, a bit of skepticism about the idea that they have anything to learn from neuroscience, and I think I get it.

I think I understand how it would feel for you to be working on this really important and exciting topic and be making so much progress, and then some neuroscientist come along and says, you should be learning from what I do.

Yeah, that would be pretty annoying. And I’m not sure it’s really as simple as that, but I think what could be useful is for us to think of it as two aspects of the same question. Right.

Okay, this is my mathematician talking. It’s probably too abstract, but what are the space of intelligent mechanisms out there? And maybe machine learning is going to explore some part of that space, and neuroscience is exploring another part of the space, but they’re somehow exploring the same problem, but from two different points of view. And I think that it would be good if both sides knew a bit more about the other, even if it’s not a direct matter of, like, we’ll take ideas from one side and go to the other side, or vice versa. But more like, if we have a broader conception of the problem that we’re trying to solve, we might come up with answers that we wouldn’t have done with a more narrow conception. I think that’s how I would put it.

[01:24:53] Paul: And do you feel optimistic for yourself moving forward, that those fruits are there for you to discover?

[01:25:01] Dan: Yeah, I think so. I mean, I feel like there’s a lot of exciting, interesting stuff in that sort of space for me. I think a lot of what I’m thinking about is trying to think about the tasks that are different, I guess, from the ones we’ve looked at before for neuroscience. Neuroscience needs to look at more richer, Messier data. Right. But on the other hand, machine learning people maybe need to think more about behavior or about generalization. Obviously, it’s not. That’s an unstudied topic in machine learning, obviously. But that’s maybe in the study of those sorts of things is where knowing a little bit about how the brain does things might inspire something. Right. Because it’s already doing those things.

So, yeah, I think it’s in that space between. I think that there’s some interesting stuff. That’s why, I guess, where my interests are anyway.

[01:26:10] Paul: Okay. All right, so we’re going to take an orthogonal turn here, because I know that as you age in academia, like any good neuroscientist, you’re becoming more and more curmudgeonly. It’s one way to put it. And seeing the cracks, what’s working, what’s not working. And there’s a cottage industry of complaining about the publishing industry.

Yeah, I just use industry twice. And you’re part of that cottage industry. And do you not review papers anymore?

[01:26:42] Dan: Yeah, no, I quit all my editorial positions and stopped reviewing because I think it does damage to science. And I can talk about my sort of personal reason for doing that, which was, I always found reviewing and later editing an uncomfortable experience.

[01:27:05] Paul: All right, well, Dan, I really appreciate your time. I won’t keep you all day here, but continue the good work with the spiking neural networks, and I’m looking forward to seeing what you’re going to be working on in the near future with them. And we didn’t talk about all of the good deeds that you have done outside of science as well, with things like Neuromatch and snafu.

[01:27:27] Dan: Yeah.

[01:27:27] Paul: Snoofa. Snoofa. Yeah.

[01:27:28] Dan: Snafu is as close as possible to Snafu without actually being snafu.

[01:27:32] Paul: Was that intentional?

[01:27:34] Dan: A little bit.

[01:27:35] Paul: It just worked out that way. Anyway, I’ll mention those in the introduction, but thanks for being with me. I appreciate the time.

[01:27:41] Dan: Okay, thank you very much. It’s really a pleasure to be here.

[01:27:59] Paul: I alone produce brain inspired. If you value this podcast, consider supporting it through Patreon to access full versions of all the episodes and to join our discord community. Or if you want to learn more about the intersection of neuroscience and AI, consider signing up for my online course, Neuroai the quest to explain intelligence. Go to Braininspired Co. To learn more. To get in touch with me, email Paul at Braininspired Co. You’re hearing music by the new year. Find them at the new year. Net. Thank you. Thank you for your support. See you next time.