Transcriptions

VIDEO TRANSCRIPTION

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever

Description generated by Skrybot

No description has been generated for this video.

Transcription generated by Skrybot

OpenAI, a company that we all know now, but only a year ago was 100 people, is changing the world. Their research is leading the charge to AGI. Since ChatGPT captured consumer attention last November, they show no signs of slowing down. This week, Elad and I sit down with Ilya Sutskover, co-founder and chief scientist at OpenAI, to discuss the state of AI research, where we'll hit limits, the future of AGI, and what's going to take to reach super alignment. Ilya, welcome to NoPriors. Thank you. It's good to be here. Let's start at the beginning. Pre AlexNet, nothing in deep learning was really working. And then given that environment, you guys took a very unique bet.

What motivated you to go in this direction? In those dark ages, AI was not an area where people had hope and people were not accustomed to any kind of success at all. And because there wasn't, there hasn't been any success, there was a lot of debate and there were different schools of thoughts that had different arguments about how machine learning and AI should be. You had people who were into knowledge representation from a good old fashioned AI. You had people who were Bayesians and they liked Bayesian non-parametric methods. You had people who liked graphical models and you had the people who liked neural networks.

Those people were marginalized because neural networks did not have the property that you can't prove math theorems about them. If you can't prove theorems about something, it means that your research isn't good. That's how it has been. But the reason why I gravitated to neural networks from the beginning is because it felt like those are small little brains and who cares if you can't prove any theorems about them because we are training small little brains and maybe they'll do something one day. And the reason that we were able to do AlexNet when we did is because a combination of two factors, three factors.

The first factor is that this was shortly after GPUs started to be used in machine learning. People kind of had an intuition that that's a good thing to do, but it wasn't like today where people exactly knew what they need GPUs for. It was like, oh, let's play with those cool fast computers and see what you can do with them. It was an especially good fit for neural networks. So that definitely helped us. I was very fortunate in that I was able to realize that the reason neural networks of the time weren't good is because they were too small.

So like if you try to solve a vision task with a neural network which has like a thousand neurons, what can it do? It can't do anything. It doesn't matter how good your learning is and everything else. But if you have a much larger neural network, you'll do something unprecedented. What gave you the intuition to think that that was the case? Because I think at the time it was reasonably contrary to think that despite to your point, you know, a lot of the human brain in some sense works that way or different, you know, biological neural circuits.

But I'm just curious like what gave you that intuition early on to think that this was a good direction? I think, yeah, looking at the brain and specifically the if you have all those things follow very easily if you allow yourself, if you allow yourself to accept the idea. Right now, this idea is reasonably well accepted. Back then, people still talked about it, but they haven't really accepted it or internalized the idea that maybe an artificial neuron in some sense is not that different from a biological neuron. So now whatever you imagine animals do with their brains, you could perhaps assemble some artificial neural network of similar size. Maybe if you train it, it will do something similar.

So there, so that leads to the so that leads you to start to imagine, OK, like almost imagine the computation being done by the neural network. You can almost think like if you have a high resolution image and you have like one neuron for like a large group of pixels, what can the neuron do? It's just not much it can do if you but if you have a lot of neurons, then they can actually do something and compute something. So I think it was like all right, like it was this was it was considerations like this plus a technical realization.

The technical realization is that if you have a large training set that specifies the behavior of the neural network and the training set is large enough such that it can constrain the large neural network sufficiently and furthermore, if you have the algorithm to find that neural network, because what we do is that we turn the training set into a neural network which satisfies a training set. Neural network training can almost be seen as solving a neural equation. Solving a neural equation where every data point is an equation and every parameter is a variable. And so it was multiple things.

The realization that the bigger neural network could do something unprecedented, the realization that if you have a large data set together with the compute to solve the neural equation, that's what gradient descent comes in. But it's not gradient descent. Gradient descent was around for a long time. It was certain technical insights about how to make it work because back then the prevailing belief was well you can't train those neural nets anything, it's all hopeless. So it wasn't just about the size, it was about even if someone did think gosh it would be cool to train a big neural net, they didn't have the technical ability to turn this idea into reality.

You needed not only to code the neural net, you need to do a bunch of things right and only then it will work. And then another fortunate thing is that the person whom I work with, Alex Krzyzewski, he just discovered that he really loves GPUs and he was perhaps one of the first person who really mastered writing really performant code for the GPUs and that's why we were able to squeeze a lot of performance out of two GPUs and produce something unprecedented. So to sum up, it was multiple things.

The idea that a big neural network, in this case a vision neural network, a convolutional neural network with many layers, one that's much, much bigger than anything that's ever been done before could do something very unprecedented because the brain can see and the brain is a large neural network and we can see quickly so our neurons don't have a lot of time. Then the compute needed, the technical know-how that in fact we can train such neural networks and it was not at all widely distributed. Most people in machine learning would not have been able to train such a neural network even if they wanted to.

Did you guys have any particular goal from a size perspective or was it just as biologically inspired or where that number comes from or just as large as we can go? Definitely as large as we can go. Because keep in mind, I mean, we had a certain amount of compute which we could usefully consume and then what can it do? Maybe if we think about just like the origin of OpenAI and the goals of the organization, what was the original goal and how has that evolved over time? The goal did not evolve over time. The tactic evolved over time.

So the goal of OpenAI from the very beginning has been to make sure that artificial general intelligence by which we mean autonomous systems, AI that can actually do most of the jobs and activities and tasks that people do, benefits all of humanity. That was the goal from the beginning. The initial thinking has been that maybe the best way to do it is by just open sourcing a lot of technology. We later and we also attempted to do it as a nonprofit. It seemed very sensible. This is the goal. A nonprofit is the way to do it.

What changed? At some point at OpenAI, we realized and we were perhaps among the earliest to realize that to make progress in AI for real, you need a lot of compute. Now what does a lot mean? is truly endless as now clearly seen, but we realized that we will need a lot. And a nonprofit wouldn't be the way to get there. We wouldn't be able to build a large cluster with a nonprofit. That's what we became. We converted into this unusual structure called cap profit. And to my knowledge, we are the only cap profit company in the world.

The idea is that investors put in some money, but even if the company does incredibly well, they don't get more than some multiplier on top of their original investment. And the reason to do this, the reason why that makes sense, you know, there are arguments that want to make arguments against it as well. But the argument for it is that if you believe that the technology that we are building, the AI could potentially be so capable as to do every single task that people do. Does it mean that it might unemployed everyone? Well, I don't know, but it's not impossible. And if that's the case, it makes sense.

It will make a lot of sense if the company that builds such a technology would not be able to make infinite, would not be incentivized rather to make infinite profits. I don't know if it will literally play out this way because of competition in AI. So there will be multiple companies and I think that will have some unforeseen implications on the argument which I'm making. But that was the thinking. I remember visiting the offices back when you were, I think, housed at YC or something or, you know, cohabited some space there. And at the time there was a suite of different efforts. There was robotic arms that were being manipulated.

And then there was, you know, some video game related work, which was really cutting edge. How did you think about how the research agenda evolved and what really drove it down this path of transformer based models and other forms of learning? So our thinking has been evolving over the years from when we started OpenAI. In the first year, we indeed did some of the more conventional machine learning work. By conventional machine learning work, I mean, because the world has changed so much, a lot of things which were known to everyone in 2016 or 2017 are completely and utterly forgotten. It's like the Stone Age almost.

So in that Stone Age, the world of machine learning looked very different. It was dramatically more academic. The goals, values and objectives were much more academic. They were about discovering small bits of knowledge and sharing them with the other researchers and getting scientific recognition as a result. And it's a very valid goal and it's very understandable. I've been doing AI for 20 years now. More than half of my time that I spent in AI was in that framework. And so what do you do? You write papers, you share your small discoveries. Two realizations. The first realization is just at a high level. It doesn't seem like it's the way to go for a dramatic impact.

And why is that? Because if you imagine how an AGI should look like, it has to be some kind of a big engineering project that's using a lot of compute. Even if you don't know how to build it, what that should look like, you know that this is the ideal you want to strive towards. So you want to somehow move towards larger projects as opposed to small projects. So while we attempted a first large project where we trained a neural network to play a real time strategy game as well as the best humans. It's the Dota 2 project and it was driven by two people, Jakub Pachocki and Greg Brockman.

They really drove this project and made it a success. And this was our first attempt at a large project. But it wasn't quite the right formula for us because the neural networks were a little bit too small. It was just a narrow domain, just a game. I mean, it's cool to play a game. And we kept looking and at some point we realized that hey, if you train a large neural network, a very, very large transformer to predict text better and better, something very surprising will happen. This realization also arrived a little bit gradually. We were exploring generative models. We were exploring ideas around next word prediction. Those are ideas also related to compression.

We were exploring them. Transformer came out. We got really excited. We were like, this is the greatest thing. We're going to do transformers now. It's clearly superior than anything else before it. We started doing transformers with GPT-1. GPT-1 started to show very interesting signs of life. And that led us to doing GPT-2. And then ultimately GPT-3. GPT-3 really opened everyone else's eyes as well to hey, this thing has a lot of traction. There is one specific formula right now that everyone is doing. And this formula is train a larger and larger transformer on more and more data.

For me, the big wake up moment to your point was GPT-2 to GPT-3 transition where you saw such a big step function and capabilities. And then obviously with 4 OpenAI published some really interesting research around some of the different domains of knowledge or domains of expertise or chain of thought or other things that the models can suddenly do in an emergent form. What was the most surprising thing for you in terms of emergent behavior in these models over time? It's very hard to answer that question. It's very hard to answer because I'm too close and I've seen it progress every step of the way. So as much as I'd like, I find it very hard to answer that question.

I think if I had to pick one, I think maybe the most surprising thing for me is the whole thing works at all. It's hard. I'm not sure I know how to convey this, what I have in mind here, because if you see a lot of neural networks do amazing things, well, obviously neural networks is the thing that works. But I have witnessed personally what it's like to be in a world for many years where the neural networks don't work at all.

And then to contrast that to where we are today, just the fact that they work and they do these amazing things, I think maybe the most surprising, the most surprising, if I had to pick one, it would be the fact that when I speak to it, I feel understood. Yeah, there's a really good saying from, I'm trying to remember, maybe it's Arthur Clark or one of the sci-fi authors, which is effectively it says advanced technology is sometimes indistinguishable from magic. Yeah, I'm fully in this camp. Yeah, yeah, it definitely feels like there's some magical moments with some of these models now.

Is there a way that you guys decide internally, given all of the different capabilities you could pursue, how to continually choose the set of big projects? You've sort of described that centralization and committing to certain research directions at scale is really important to OpenAI's success. Given the breadth of opportunity now, what's the process for deciding what's worth working on? I mean, I think there is some combination of bottom up and top down where we have some top down ideas that we believe should work, but we're not 100% sure. So we still need to have good top down ideas and there is a lot of bottom up exploration guided by those top down ideas as well.

And their combination is what informs us as to what to do next. And if you think about those bottom, I mean, either direction, top down or bottom up ideas, like clearly we have this dominant continue to scale transformers direction. Do you explore additional architectural directions or is that just not relevant? It's certainly possible that various improvements can be found. I think improvements can be found in all kinds of places, both small improvements and large improvements. I think the way to think about it is that while the current thing that's being done keeps getting better as you keep on increasing the amount of compute and data that you put into it. So we have that property.

The bigger you make it, the better it gets. It is also the property that different things get better by different amounts as you keep on improving, as you keep on scaling them up. So not only you want to, of course, scale up what you're doing, we also want to keep scaling up the best thing possible.

What is a, I mean, you probably don't need to predict because you can see internally, what do you think is improving most from a capability perspective in the current generation of scale? The best way for me to answer this question would be to point out the, to point to the models that are publicly available and you can see how they compare from this year to last year. And the difference is quite significant. I'm not talking about the difference between, not only the difference between, let's say you can look at the difference between GPT-3 and GPT-3. 5 and then Chad GPT, Chad GPT-4, with vision and you can just see for yourself.

It's easy to forget where things used to be, but certainly the big way in which things are changing is that these models become more and more reliable. Before they were very, they were only very partly there. Right now they are mostly there, but there are still gaps. And in the future, perhaps these models will be there even more. You could trust their answers. They'll be more reliable. They'll be able to do more tasks in general across the board. And then another thing that they will do is that they'll have deeper insight. As we train them, they gain more and more insight into the true nature of the human world and their insight will continue to deepen.

I was just going to ask about how that relates to sort of model scale over time, because a lot of people are really stricken by the capabilities of the very large scale models and the emergent behavior in terms of understanding of the world. And then in parallel, as people incorporate some of these things into products, which is a very different type of path, they often start worrying about inference costs going up with the scale of the model and therefore they're looking for smaller models that are fine tuned. But then of course you may lose some of the capabilities around some of the insights and ability to reason.

And so I was curious in your thinking in terms of how all this evolves over the coming years. I would actually point out that the main thing that's lost when you switch to the smaller models is reliability. I would argue that at this point it is reliability that's the biggest bottleneck to these models being truly useful. How are you defining reliability? So it's like when you ask the question, that's not much harder than other questions that the model succeeds at. Then you have a very high degree of confidence that it will continue to succeed. So I'll give you an example.

Let's suppose that I want to learn about some historical thing and I can ask, well, tell me what is the prevailing opinion about this and about that? And I can keep asking questions. And let's suppose I answered 20 of my questions correctly. I really don't want the 21st question to have a gross mistake. That's what I mean by reliability. Or like let's suppose I upload some documents, some financial documents. Suppose they say something, I wanted to do some analysis and to make some conclusion and I want to take action on this basis and this conclusion. And it's like, it's not a super hard task.

And the model, these models clearly succeed on this task most of the time, but because they don't succeed all the time. And if it's a consequential decision, I actually can't trust the model any of those times and I have to verify the answer somehow. So that's how I define reliability. It's very similar to the self-driving situation, right? If you have a self-driving car and it's like, does things mostly well, that's not good enough. It's not as extreme as with a self-driving car, but that's what I mean by reliability.

My perception of reliability is that to your point, it goes up with model scale, but also it goes up if you fine tune for specific use cases or instances or data sets. And so there is that trade off in terms of size versus specialized fine tuning versus reliability. So certainly people who care about some specific application have every incentive to get the smallest model working well enough. I think that's true. It's undeniable. I think anyone who cares about a specific application will want the smallest model for it. That's self-evident. I do think though that as models continue to get larger and better, then they will unlock new and unprecedentedly valuable applications.

So yeah, the small models will have their niche for the less interesting applications, which are still very useful. And then the bigger models will be delivering on applications. Okay, let's pick an example. Consider the task of producing good legal advice. It's really valuable if you can really trust the answer. Maybe you need a much bigger model for it, but it justifies the cost. There's been a lot of investment this year at the 7B in particular, but 7B, 13B, 34B sizes. Do you think continued research at those scales is wasted? No, of course not.

I think that in the medium term, medium term by high time scale anyway, there will be an ecosystem, there will be different uses for different model sizes. There will be plenty of people who are very excited for whom the best 7B model is good enough. They'll be very happy with it. And then there'll be plenty of very, very exciting and amazing applications for which it won't be enough. I think that's all. I mean, I think the big models will be better than the small models, but not all applications will justify the cost of a large model. What do you think the role of open sources in this ecosystem? Well, open source is complicated.

I'll describe to you my mental picture. I think that in the near term, open source is just helping companies produce useful. . . Let's see. Why would one want to have an open source, to use an open source model instead of a closed source model that's hosted by some other company? I mean, I think it's very valid to want to be the final decider on the exact way in which you want your model to be used and for you to make the decision of exactly how you want the model to be used and which use case you wish to support. And I think there's going to be a lot of demand for open source models.

And I think there will be quite a few companies that will use them. And I'd imagine that will be the case in the near term. I would say in the long run, I think the situation with open source models will become more complicated. And I'm not sure what the right answer is there. Right now, it's a little bit difficult to imagine. So we need to put our future hat, maybe futurist hat. It's not too hard to get into a sci-fi mode when you remember that we are talking to computers and they understand us. But so far, these computers, these models are actually not very competent. They can't do tasks at all.

I do think that there will come a day where the level of capability of models will be very high. Like in the end of the day, intelligence is power. Right now, these models, their main impact, I would say, at least popular impact is primarily around entertainment and simple questions. So you talk to a model about this is so cool. You produce some images. You had a conversation. Maybe you had some questions. Good answer. But it's very different from completing some large and complicated task like, what about if you had a model which could autonomously start and build a large tech company? I think if these models were open source, they would have a difficult to predict consequence.

Like we are quite far from these models right now. And by quite far, I mean by items, but still like, this is not what you're talking about. But the day will come when you have models which can do science autonomously, like build, deliver on big science projects. It becomes more complicated as to whether it is desirable that models of such power should be open sourced. I think the argument there is a lot less clear cut, a lot less straightforward compared to the current level models, which are very useful. And I think it's fantastic that the current level models have been built.

So like that is maybe, maybe I answered a slightly bigger question rather than what is the role of open source models? What's the deal with open source? And the deal is after a certain capability, it's great, but not difficult to imagine models sufficiently powerful, which will be built where it becomes a lot less obvious as to the benefits of their open source. Is there a signal for you that we've reached that level or that we're approaching it? Like what's the boundary? So I think figuring out this boundary very well is an urgent research project. I think one of the things that help is that the closed source models are more capable than open source models.

So the closed source models could be studied and so on. And so you'd have some experience with a generation of closed source model. And then, then you know, like, oh, these models capabilities, it's fine. There's no big deal there. Then in a, in like a couple of years, the open source models catch up and maybe a day will come when we're going to say, well, like these closed source models, they're getting able to drastic and then some other approaches needed. If we have our, you know, future hat on, maybe it looks like think about like a several year timeline.

What are the limits you see if any in the, in the near term and scaling? Is it like data, token scarcity, cost of compute, architectural issues? So the most near term limit to scaling is obviously data. This is well known and some research is required to address it without going into the details. I'll just say that the data limit can be overcome and progress will continue. One question I've heard people debate a little bit is the degree to which the transformer based models can be applied to sort of the full set of areas that you'd need for AGI.

And if you look at the human brain, for example, you do have reasonably specialized systems are all neural networks, be a specialized systems for the visual cortex versus, you know, areas of higher thought areas for empathy or other sort of aspects of everything from personality to processing. Do you think that the transformer architectures are the main thing that will just keep going and get us there? Or do you think we'll need other architectures over time? So I have to, I understand precisely what you're saying. And I have two answers to this question.

The first is that in my opinion, the best way to think about the question of architecture is not in terms of a binary, is it enough? But how much effort, how much, what will be the cost of using this particular architecture? Like at this point, I don't think anyone doubts that the transformer architecture can do amazing things, but maybe something else, maybe some modification could have some compute efficiency benefits. So it's better to think about it in terms of compute efficiency rather than in terms of can it get there at all? I think at this point, the answer is obviously yes.

To the question about, well, what about the human brain and with its brain regions? I actually think that the situation there is subtle and deceptive for the following reasons. So what I believe you alluded to is the fact that the human brain has known regions. It has like, it has a speech perception region, it has a speech production region, it has an image region, it has a face region, it has all these regions. It looks like it's specialized. But you know what's interesting? Sometimes there are cases where very young children have severe cases of epilepsy at a young age. And the only way they figured out how to treat such children is by removing half of their brain.

Because it happened at such a young age, these children grow up to be pretty functional adults. And they have all the same brain regions, but they are somehow compressed onto one hemisphere. So maybe some information processing efficiency is lost. It's a very traumatic thing to experience, but somehow all these brain regions rearrange themselves. There is another experiment where, which was done maybe 30 or 40 years ago on ferrets. So the ferret is a small animal. It's a pretty mean experiment. They took the optic nerve of the ferret, which comes from its eye, and attached it to its auditory cortex.

So now the input from the eye starts to map to the speech processing area of the brain. And then they recorded different neurons after it had a few days of learning to see, and they found neurons in the auditory cortex which were very similar to the visual cortex. Or vice versa. It was either they mapped the eye to the ear, to the auditory cortex, or the ear to the visual cortex. But something like this has happened. These are fairly well known ideas in AI that the cortex of humans and animals are extremely uniform. And so that further supports the idea that you just need one big uniform architecture. That's all you need. Yeah.

In general, it seems like every biological system is reasonably lazy in terms of taking one system and then reproducing it and then reusing it in different ways. And that's true of everything from DNA encoding. There's 20 amino acids and protein sequences. And so everything is made out of the same 20 amino acids on through to your point, sort of how you think about tissue architecture. So it's remarkable that that carries over into the digital world as well, depending on the architecture you use. I mean, the way I see it is that this is an indication from a technological point of view, we are very much on the right track.

Because you have all these interesting analogies between human intelligence and biological intelligence and artificial intelligence. We've got artificial neurons, biological neurons, unified brain architecture for biological intelligence, unified neural network architecture for artificial intelligence. At what point do you think we should start thinking about these systems in digital life? I can answer that question. I think that will happen when those systems become reliable in such a way as to be very autonomous. Right now, those systems are clearly not autonomous. They're inching there, but they're not. And that makes them a lot less useful too, because you can't ask it, hey, do my homework or do my taxes or you see what I mean. So the usefulness is greatly limited.

As usefulness increases, they will indeed become more like artificial life, which also makes it more, I would argue, trepidatious. If you imagine actual artificial life with brains that are smarter than humans, go, gosh, that seems pretty monumental. Why is your definition based on autonomy? If you often look at the definition of biological life, it has to do with reproductive capability plus some form of autonomy. A virus isn't really necessarily considered alive much of the time, but a bacteria is. You could imagine situations where you have symbiotic relationships or other things where something can't really quite function autonomously, but it's still considered a life form.

So I'm a little bit curious about autonomy being the definition versus some of these other aspects. Well, I mean, definitions are chosen for our convenience and it's a matter of debate. In my opinion, technology already has the reproductive function. If you look at, I don't know if you've seen those images of the evolution of cell phones and then smartphones over the past 25 years, you've got this like what almost looks like an evolutionary tree or the evolution of cars over the past century. So technology is already reproducing using the minds of people who copy ideas from previous generation of technology. So I claim that the reproduction is already there. The autonomy piece I claim is not.

And indeed, I also agree that there is no autonomous reproduction. But that would be like, can you imagine if you have like autonomously reproducing AIs? I actually think that that is a pretty dramatic and I would say quite a scary thing if you have an autonomous reproducing AI, if it's also very capable. Should we talk about super alignment? Yeah, very much so. Can you just sort of define it? And then we were talking about what the boundary is for when you feel we need to begin to worry about these capabilities being in open source. What is super alignment and why invest in it now? The answer to your question really depends to where you think AI is headed.

If you just try to imagine and look into the future, which is of course a very difficult thing to do, but let's try to do it anyway. Where do we think things will be in five years or in 10 years? Progress has been really stunning over the past few years. Maybe it will be a little bit slower. But still, if you extrapolate this kind of progress, you'll be in a very, very different place in five years, let alone 10 years. It doesn't seem implausible. It doesn't seem at all implausible that people have computers, data centers that are much smarter than people.

And by smarter, I don't mean just have more memory or have more knowledge, but I also mean have deeper insight into the same subjects that we people are studying and looking into. It means learn even faster than people. What could such AI do? I don't know. Certainly, if such an AI were the basis of some artificial life, it would be, well, how do you even think about it if you have some very powerful data center that's also alive in a sense? That's what you're talking about. And when I imagine this world, my reaction is, gosh, this is very unpredictable what's going to happen. Very unpredictable. But the bare minimum, there is a bare minimum which we can articulate.

If such very, very intelligent, super intelligent data centers are being built at all, we want those data centers to hold warm and positive feelings towards people, towards humanity. Because this is going to be non-human life in a sense. Potentially, it could potentially be that. So I would want that any instance of such super intelligence, the warm feelings towards humanity. And so this is what we are doing with the Super Alignment Project. We are saying, hey, if you just allow yourself, if you just accept that the progress that you've seen, maybe it will be slower, but it will continue.

If you allow yourself that, then can you start doing productive work today to build the science so that we will be able to handle the problem of controlling such future super intelligence. Of imprinting onto them a strong desire to be nice and kind to people. Because those data centers, they'll be really quite powerful. There will probably be many of them. The world will be very complicated. But somehow, to the extent that they are autonomous, to the extent that they are agents, to the extent that they are beings, I want them to be pro-social, pro-human social. That's the goal.

What do you think is the likelihood of that goal? Some of it, it feels like an outcome you can hopefully affect. Are we likely to have pro-social AIs that we are friends with individually or as a species? Well, I mean, friends, I think that part is not necessary. The friendship piece, I think, is optional. But I do think that we want to have very pro-social AI. I think it's possible. I don't think it's guaranteed, but I think it's possible.

I think it's going to be possible and the possibility of that will increase insofar as more and more people allow themselves to look into the future, into the five to ten year future and just ask yourself, what do you expect AI to be able to do then? How capable do you expect it to be then? And I think that with each passing year, if indeed AI continues to improve and as people get to experience, because right now if you're talking, making arguments, but if you actually get to experience, oh gosh, the AI from last year, which was really helpful this year, puts the previous one to shame and you go, okay.

And then one year later and one year starting to do science, the AI software engineer is starting to get really quite good. Let's say. I think that will create a lot more desire in people for what you just described, for the future super intelligence to need be very pro-social. You know, I think there's going to be a lot of disagreements, going to be a lot of political questions, but I think that as people see AI actually getting better, as people experience it, the desire for the pro-social super intelligence, the humanity loving super intelligence, you know, as much as it can be done, will increase.

And on the scientific problem, you know, I think right now it's still being an area where not that many people are working on. Our AIs are getting powerful enough that you can really start studying it productively. We'll have some very exciting research to share soon. But I would say that's the big picture situation here. Just really, it really boils down to look at what you've experienced with AI up until now. Ask yourself, like, is it slowing down? Will it slow down next year? Look, we will see and you'll experience it again and again. And I think it will keep and what needs to be done will keep becoming clearer.

Do you think we're just on an accelerated path? Because I think fundamentally, if you look at certain technology waves, they tend to inflect and then accelerate versus decelerate. And so it really feels like we're in an acceleration phase right now versus the deceleration phase. Yeah, I mean, we are right now. It is indeed the case that we are in an acceleration phase. You know, it's hard to say, you know, multiple forces will come into play. Some forces are accelerating forces and some forces are decelerating. So for example, the cost and scale are a decelerating force. The fact that our data is finite is a decelerating force to some degree. At least I don't want to overstate.

Yeah, it's kind of within an asymptote, right? Like at some point you hit it, but it's the standard S curve, right? Or sigmoidal. Well, with the data in particular, I just think it won't be an issue because we'll figure out something else. But then you might argue, like, the size of the engineering project is a decelerating force, just the complexity of management. On the other hand, the amount of investment is an accelerating force. The amount of interest from people, from engineers, scientists is an accelerating force. And I think there is one other accelerating force, and that is the fact that biological evolution has been able to figure it out.

And the fact that up until now, progress in AI has had up until this point, this weird property that it's kind of been, you know, it's been very hard to execute on. But in some sense, it's also been more straightforward than one would have expected, perhaps. Like in some sense, I don't know much physics, but my understanding is that if you want to make progress in quantum physics or something, you need to be really intelligent and spend many years in grad school studying how these things work. Whereas with AI, you have people come in, get up to speed quickly, start making contributions quickly. So the flavor is somehow different.

Somehow it's very, there is some kind of, there's a lot of give to this particular area of research. And I think this is also an accelerating force. How will it all play out remains to be seen. Like it may be that somehow the scale required, the engineering complexity will start to make it so that the rate of progress will start to slow down. It will still continue, but maybe not as quick as we had before. Or maybe the forces which are coming together to push it will be such that it will be as fast for maybe a few more years before it will start to slow down. If at all, that's, that would be my articulation here.

Ilya, this has been a great conversation. Thanks for joining us. Thank you so much for the conversation. I really enjoyed it. Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors. com. .

S K R Y B O T

Transcriptions

Terms & Conditions

Catalogue

Contact

Skrybot. 2023, Video transcriptions from YouTube

By visiting or using our website, you agree that our website or the websites of our partners may use cookies to store information for the purpose of delivering better, faster, and more secure services, as well as for marketing purposes.