Full Transcript

·YouTLDR

OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions

58:4713,171 words · ~66 min readEnglishTranscribed Apr 11, 2026
AI Summary

OpenAI's Chief Scientist details the path toward research-level AI agents by late 2024 and autonomous researchers by 2028, driven by advances in reinforcement learning and mathematical reasoning. The focus is shifting from simple pattern matching to models capable of driving scientific discovery and long-horizon task execution.

As OpenAI's Chief Scientist, Jakub Pachocki provides the most direct glimpse into the technical priorities and deployment timelines for the models that will define the next phase of the global economy and scientific research.

Section summaries

0:00-1:53

Intro and Background

optional

Host introduces Jakub and sets the stage; skip if you already know who the Chief Scientist of OpenAI is.

1:53-11:21

Timelines and Benchmarks

watch

Contains crucial info on the Sept 2024 and March 2028 goals for AI research autonomy.

11:21-24:23

Business and Developer Strategy

watch

Discusses whether to use RL or Context and how the developer role is shifting.

24:23-37:40

Continual Learning and Science

optional

Technical deep dive into math proofs and scientific discovery; valuable for those in STEM.

37:40-47:57

AI Safety and Alignment

watch

Explains why chains of thought are hidden and the technical path to alignment.

47:57-51:53

OpenAI Internal Culture

optional

Historical look at the shifts within the organization from academic to scaling lab.

51:53-58:43

Quickfire and Societal Impact

watch

Important discussion on wealth concentration, robotics timelines, and governance.

Key points

  • The Path to Research Autonomy — Pachocki distinguishes between a 'research intern' (specific technical tasks) and a 'full automated researcher' (long-term autonomous goal-setting). OpenAI remains on track for intern-level capabilities by late 2024, using math and coding as 'North Stars' because they are easily verifiable yet arbitrarily difficult.
  • Chain of Thought Monitoring for Alignment — OpenAI intentionally hides internal 'chains of thought' in products to avoid supervising the reasoning process directly, which preserves the model's private reasoning space as a tool for interpretability. By keeping this space 'unsupervised' during training, researchers can more accurately detect if a model is 'scheming' or hiding objectives.
  • Generalization as the Alignment Frontier — Long-term alignment is essentially a problem of generalization: determining what values a model falls back on when it is much smarter than its training data or faces a completely new distribution. Pachocki argues that alignment is not a nebulous philosophical problem but one solvable through concrete technical insights and scaling.
I definitely agree that continual learning is really the thing. It's really the thing that we're building. Jakub Pachocki
We are buying a lot of compute because we still believe... more than ever to some degree. Jakub Pachocki

AI-generated from the transcript. May contain errors.

0:00

I definitely agree that continual

0:01

learning is really the thing. It's

0:03

really the thing that we're building.

0:04

But I don't really think this is like a

0:05

problem that's ignored and and off the

0:07

path of what we're doing currently. I

0:08

think it is what we're working toward.

0:09

>> What are like the other research areas

0:11

within alignment that you're paying

0:12

attention to or that you think are

0:13

promising?

0:13

>> A lot of the like longerterm challenge

0:15

with alignment is about generalization.

0:18

What are the values that the model falls

0:19

back on?

0:20

>> What are the things that you need to

0:21

figure out to be able to really make

0:23

models work well in some of these other

0:25

spaces?

0:25

>> I come back to this.

0:26

>> Akopi is the chief scientist of OpenAI.

0:29

I think literally one of the most

0:30

important people on the planet. And

0:32

today on Unsupervised Learning, I got to

0:34

ask him literally everything that I've

0:36

been thinking about and I know a bunch

0:38

of people in the ecosystem have too. We

0:40

talked a lot about model progress,

0:42

what's required to make longrunning

0:43

agents work, as well as the really

0:45

interesting work Open AI has done in the

0:46

AI for science world and the progress he

0:49

sees in that over the next years. We

0:51

talked a lot about how companies should

0:52

be thinking about model building in this

0:53

moment, when they should be doing

0:54

reinforcement learning, how they should

0:56

be thinking about the evolution of

0:57

harnesses and the impact that will have.

0:59

We hit on a lot of his really

1:00

interesting research, including the work

1:02

he's done around alignment, the work

1:04

that OpenAI broadly has done around math

1:06

competitions. And we also talked about

1:07

this focusing moment in OpenAI and what

1:09

it means for the research organization

1:10

and how he runs his team. literally just

1:13

such an awesome opportunity to talk to

1:15

someone who is driving so much of the

1:18

change that has revolutionized this

1:20

space in the world. I hope folks enjoy

1:21

this wide-ranging conversation as much

1:23

as I did.

1:26

I feel like you are the perfect person

1:29

to talk to about all the questions

1:30

everyone has in the ecosystem. Uh what's

1:32

you know happening with model progress.

1:34

A lot of companies are thinking about

1:36

how they should be building things based

1:37

on what's happening with the models. A

1:39

lot of people at a societal level are

1:40

thinking about the impact AI is going to

1:42

have on science and broader society. Uh

1:44

and you've been at the forefront of the

1:46

space for pretty much every generation

1:48

of uh of improvement uh these past years

1:50

and so really excited to have you on the

1:52

podcast.

1:52

>> Happy to be here.

1:53

>> I think I'll start with one of the mo

1:54

the juiciest things you said which is

1:56

you know four months ago I think you and

1:58

the open team talked about aiming for a

2:00

system with research level intern

2:02

capabilities by September of this year.

2:05

So coming up uh I think that's uh what 6

2:07

months from now. and then a more fully

2:09

automated AI researcher by March 2028.

2:12

And so I guess you know checking in four

2:13

months later, how are you feeling about

2:15

those timelines?

2:16

>> Yeah, I think you know over I think over

2:18

over the last months I think like the

2:21

change that really happened is we've

2:22

seen this explosive growth of coding

2:25

tools.

2:26

>> Yeah.

2:26

>> Um

2:27

>> it's an understatement. Yeah, we've

2:29

definitely like really kind of gone um

2:32

to a place uh in OpenAI where we use

2:34

Codex for the um for the majority of um

2:38

you know actual coding. Um and so I

2:42

think I think for most people like the

2:43

kind of the act of programming has has

2:45

has changed quite a bit. Um

2:49

so I definitely see this as a signal

2:50

that like you know something here is on

2:52

track. The other kind of like very

2:54

interesting update over the last few

2:55

months to me has been the progress on

2:57

the math research capabilities. Uh also

3:01

the results we've kind of seen in

3:03

physics in other fields. I think I think

3:05

this kind of level of capability this

3:07

level of like ability to provide insight

3:09

when combined with

3:11

ability to access infrastructure ability

3:13

to use maybe uh more computed test time

3:16

that's something that cod is using

3:17

currently uh and very strong improvement

3:21

in general intelligence which I also

3:23

expect over over the next couple of

3:24

months. Yeah, it's something we're still

3:27

very much planning for and very focused

3:29

on.

3:29

>> And how do you like know when you've

3:31

you've gotten there? like what's like a

3:32

a workflow you might look to to say hey

3:34

okay I think we've got these you know

3:36

research intern level capabilities

3:38

>> the the way I would distinguish you know

3:39

a research intern from from full

3:42

automated researcher uh is um the kind

3:45

of span of time that that we would have

3:48

it work um mostly autonomously or the

3:51

kind of like specificity of the task

3:53

that has to be given so I don't expect

3:55

uh you know we'll have systems where you

3:56

kind of just tell them oh like you know

3:58

go improve your model capability go

4:00

solve align ignment uh and you know and

4:03

they will do it not this year you know I

4:05

think we might get there at some point

4:07

uh but I think for like more specific

4:09

technical ideas like I I have this

4:11

particular idea how to improve the

4:12

models how to like you know run this

4:13

evaluation differently I think I think

4:16

we have the pieces that we mostly just

4:18

need to put together Carpathy released

4:20

you know a pretty viral version of of uh

4:23

using some of these models to you know

4:24

improve some of his uh you know

4:26

obviously way less complex models than

4:28

what you guys are building here but did

4:30

that feel like generally in this uh you

4:32

know in the spirit of of uh some of what

4:34

these tools might look like.

4:36

>> Yeah, I think it's in the spirit. Yeah,

4:37

I mean I I I expect it to look like a

4:39

pretty continual evolution uh from kind

4:43

of where Codex is now. I think towards a

4:46

bit more autonomy uh running for a

4:48

longer time. Um but yeah, I I think I

4:52

think we'll see a lot of this sort of

4:53

application. I think in general we'll

4:54

see we'll see like more autonomous and

4:57

higher compute use of these models for

4:58

different things. you mentioned kind of

5:00

like the math and physics side and

5:01

obviously you've had these really

5:03

impressive breakthroughs uh in math on

5:05

you know uh some interesting like

5:06

different kinds of competition uh you

5:08

know problems maybe you know I think for

5:11

our listeners it like intuitively makes

5:12

sense how progress in coding directly

5:14

translates to something like you know

5:16

helping with AI research how does like

5:18

math and physics progress like also tie

5:20

into this

5:21

>> the the biggest role that like u you

5:23

know focusing on these math benchmarks

5:25

has played for us as as a general yeah

5:28

like benchmark and and and and a

5:31

northstar for like how to improve this

5:32

technology. Like math is very

5:34

measurable, right? It's much easier to

5:36

tell whether you've actually solved the

5:38

math problem than whether you've even

5:39

like produced a good uh you know piece

5:41

of software and also it can get very

5:43

hard right so you can have things where

5:44

like it's very definite whether you've

5:46

solved them but it can be like

5:47

arbitrarily pretty much hard to to

5:48

actually solve them. You know, I would

5:50

say like up until not too long ago like

5:53

um you know, my perspective has been

5:54

like well okay like we you know our

5:56

models are not you know maybe able to

5:58

solve like simple math problems. Okay,

5:59

our models are able to solve simple

6:00

enough problems but are not able to

6:02

solve like IMO level problem. So clearly

6:03

there is just like a gap in just like

6:05

this uh you know intelligence of these

6:07

models that like that that is very

6:09

measurable very

6:11

you know very easy to run at. It's very

6:13

clear what we need to do and you know

6:16

and this has be kind of our northstar

6:17

for like reasoning models and so forth.

6:19

Now of course um that is changing quite

6:22

a bit right and we are um you know we

6:24

have kind of reached these milestones

6:26

that we've been working towards of like

6:29

yeah IMO goals level solving IMO problem

6:31

six and you know and making forests into

6:34

research level mathematics

6:36

um and you know from this point I think

6:38

I think there still is uh you know there

6:41

definitely still is utility like

6:43

continuing to measure progress on this I

6:45

think there's also like you know there's

6:47

definitely like transfer that that you

6:49

can get from like getting better at

6:50

mathematical reasoning to getting better

6:52

at AI research. You know, a lot of our

6:55

uh best researchers uh are uh you know

6:58

mathematicians we're training or from

6:59

other kind of theoretical fields. But

7:01

definitely we are uh you know we are

7:03

very much uh changing how we think about

7:09

you know these nerf stars and we are

7:10

very focused on how the models the next

7:14

models that we're producing are actually

7:16

useful in the real world you know useful

7:19

you know especially for a research but

7:20

also for other kind of economically

7:22

valuable activities and for other uh

7:24

fields of science uh and especially

7:27

maybe more applied sciences. And the

7:29

reason for this shift is because we

7:30

believe the models are now capable

7:32

enough, not as smart as people and

7:34

always, but capable enough to actually

7:37

materially change the economy, change

7:39

how things are done. And so, uh, yeah,

7:41

we feel a lot of urgency about that.

7:43

>> In the early days, uh, picking a domain

7:45

like math that is so, uh, hard to solve,

7:48

but then easily to verify whether you

7:49

did it, like it's kind of the the

7:50

perfect place to get started. And I

7:51

think code obviously shares a lot of

7:53

attributes to that. You know, uh

7:55

possible to check uh and verify and

7:57

great for reinforcement learning. I

7:59

think one question that a lot of people

8:00

are are thinking about is okay, we've

8:02

seen reinforcement learning work

8:04

incredibly well in these domains where

8:05

you can verify it rather easily. A lot

8:07

of, you know, valuable tasks in the

8:09

world, medicine, law, finance, you know,

8:12

there's some level of of the ability to

8:13

do that, but it's certainly not to the

8:14

same extent that math and code are. And

8:17

so I think a lot of people are trying to

8:19

figure out, you know, are we going to

8:20

see similar improvements? You know,

8:22

obviously code and math the the rates of

8:24

improvement have been so astronomical

8:25

and shocking.

8:26

>> Yeah, I definitely expect so. Um I think

8:29

an interesting duality that we think

8:31

about a lot is um you know for this more

8:35

general task for these tasks are kind of

8:37

harder to evaluate. They share a lot lot

8:39

of common uh commonalities with um just

8:43

longer horizon tasks, right? Because if

8:45

you think about even like a very well

8:46

specified math or coding problem again

8:48

like if it's it's something that you

8:50

need to work on for like a year then uh

8:52

you know even it's very clear what the

8:54

criteria of success are in the long term

8:56

like what to do on your first day of

8:58

working on it is a pretty open-ended

9:00

problem. Yeah. And so I I kind of

9:02

believe this these difficulties coincide

9:04

and they're very clearly the next the

9:06

next frontier

9:08

uh for for how these systems develop.

9:10

And I think we've definitely seen very

9:12

encouraging signs both on just like our

9:13

ability to scale RL on these more

9:15

general domains. And I I think also like

9:18

we can we can scale um

9:21

efforts that that that that that's a lot

9:23

of promise.

9:24

>> In these other domains, it feels like

9:27

one of the hardest things to know is

9:28

just what was success in a task, right?

9:30

And you can imagine you know there's

9:32

going to be you know whatever the

9:34

problems you are that are facing code of

9:35

math that are short-term tasks and then

9:37

longerterm tasks feels it will be

9:39

amplified in the space that is you know

9:41

outside of those right where a

9:42

short-term uh legal task or medical task

9:44

may be harder to run thousands of

9:46

iterations on right and figure out you

9:48

know was that done correctly and then

9:50

those longer term tasks like even harder

9:52

I'm curious like how you even

9:53

conceptualize that research challenge

9:55

like what are the things that need to be

9:57

that you need to figure out to be able

9:58

to to really make models work well in

10:01

some of these other spaces.

10:02

>> Yeah, I think I think I I come back to

10:04

this reality of like how do we make the

10:06

models work for a very long time and how

10:09

do we teach them to evaluate kind of

10:11

partial progress.

10:12

>> Yeah. I mean I think if if you look at

10:14

like even outside of RL like like where

10:17

that sort of progress on longer horizons

10:19

is coming from right like I mean as the

10:21

models kind of become more consistent

10:24

from just like pure supervision in

10:25

pre-training

10:27

um they uh they gain some idea of like

10:30

you know oh what what does like a good

10:32

partial artifact here look like and so I

10:33

think I think even if we weren't like

10:36

scaling RL very meaningfully we would

10:38

see an alongation of these horizons over

10:40

time yeah it's definitely um you know a

10:44

research challenge to like to figure out

10:46

how to like leverage this new ideas from

10:48

RL and so forth to to apply this to

10:50

general domains. But I'm quite

10:51

optimistic about that.

10:52

>> Yeah. And it's interesting. It sounds

10:53

like part of your mental model is like

10:55

the models themselves being able to

10:56

check progress with some at some sort of

10:58

cadence that is, you know, reliable

11:00

enough from the outside at least. It's

11:01

not totally clear if we've seen like

11:03

generalization in RL yet. feels like we

11:05

yeah clearly you seem to have some

11:06

techniques that really optimize models

11:08

around whatever we choose to focus on

11:10

but it's like almost feels like an older

11:12

school version of of ML of like one one

11:14

thing at a time is that like you know I

11:17

guess would you agree with that

11:18

characterization and like you know how

11:19

do you kind of see this this current

11:21

climate

11:21

>> well we are buying a lot of compute

11:25

right because we we don't I mean we

11:29

still believe a bit less and we believe

11:31

you know more than ever to some degree

11:33

yeah we've seen you know, new techniques

11:35

and I think new ways to scale, but like

11:38

that that is kind of the the lens

11:39

through which we've been viewing things.

11:41

Yeah, I think there is a certain amount

11:44

of

11:46

complexity that

11:49

we needs to grapple with and kind of

11:51

everyone needs to grapple with because,

11:53

you know, we're no longer really like

11:56

purely building like um um you know,

12:00

brain the sky that's completely isolated

12:02

from the real world, right? Like if you

12:04

actually you know if you want this model

12:06

to do like medical research if you want

12:08

it to cure cancer at some point it needs

12:10

to like learn about the real world is a

12:12

meaningful way you know maybe conduct

12:13

some experiment and learn from its

12:15

results and for that you you need to

12:17

figure out how to actually connect it

12:19

right and that is going to involve

12:20

something that is yeah that that goes in

12:22

the direction you described but I I

12:24

don't think that goes counter to

12:26

actually scaling the the like finding

12:29

and scaling the simple algorithms that

12:31

that we've been developing. I feel like

12:32

I talk to a lot of companies and like I

12:34

one of the main questions everyone seems

12:36

to be asking these days is like should

12:38

we be doing you know our own

12:40

reinforcement learning like take an open

12:42

source model and like we have some data

12:44

on a task that people do. um we have

12:46

evals cuz we know our domain pretty well

12:48

like is this something that makes sense

12:50

for us to do or like should we just wait

12:52

for the models to continue to get better

12:53

at at some of these things. you know

12:54

what advice would you guess would you

12:56

give for like the many builders that

12:57

listen to the podcast as they think

12:59

through you know uh the extent to which

13:00

they invest on the on the reinforcement

13:02

learning side reinforcement learning

13:04

definitely can be a very data efficient

13:06

way to like really improve the model as

13:08

some sort of task right there is a much

13:10

more data efficient way of learning that

13:12

we know right which is like learning in

13:14

context right and this is maybe the most

13:16

fundamental way that people you know

13:17

teach these models you just prompt them

13:19

with like examples with with with

13:20

instructions for what you want I expect

13:22

that learning is going to get much

13:25

better over time. And so I think it

13:27

definitely really matters that the

13:28

models can adapt to your context. They

13:31

can adapt adapt to kind of the the kind

13:33

of tasks you care about. So I think that

13:34

will be very important. I'm not sure if

13:36

like you know replicating the kind of

13:38

current a pipeline is going to be like

13:41

the right way to go about it. But yeah,

13:43

it's definitely a problem that that

13:45

we're thinking about.

13:46

>> Yeah. So it's almost like yeah you still

13:47

have to do the work like you still

13:48

should you know figure out what the eval

13:49

are that matter gather the data the

13:51

examples but like it may just turn out

13:52

in the future you're far better off just

13:54

feeding that into this context than

13:55

trying to like do anything on on you

13:57

know your own model. Yeah, I think I

13:59

think that's quite plausible. And I

14:01

think that like you know obviously

14:03

people have seen the success of of tools

14:04

like Codex which I know you know you've

14:06

obviously been a key part of and um and

14:08

wondered like you know hey do we need to

14:10

build like our own kind of you know

14:12

should we build our own harnesses or our

14:13

own ways of of using these things or you

14:16

know uh for for our own domains whether

14:18

it's like you know uh legal or finance

14:21

or or healthcare or do we kind of just

14:22

like take the harnesses that the large

14:24

models do um and and kind of use them

14:27

within you know with with the context

14:29

that we have. uh any any thoughts around

14:32

like that

14:32

>> like the implementation of the harness

14:34

shouldn't really be a limitation for a

14:35

very long time. I think we'll be able to

14:37

get like much more general harnesses

14:39

that people can use for uh for all sorts

14:42

of other domains. I mean I think codex

14:43

is pretty good actually if you try using

14:45

it for things beyond coding.

14:46

>> That's so interesting. Like a much more

14:47

general harness being something that's

14:49

almost like uh adaptive to or like just

14:51

works across whatever the you know

14:53

specific set of tools you have in your

14:55

domain or specific set of things you

14:56

want to expose to the model.

14:59

>> Yeah. I mean I I think and you know I

15:01

think it's also worth thinking about

15:02

like you know why like you know what

15:04

what what is kind of the kind of

15:07

ultimate interface that we want to

15:08

interact to the model with. So, so the

15:09

model gives some the models gives some

15:11

UI hard forensicness, right? They can

15:12

build their own UIs. They can kind of do

15:14

things that uh you know people would

15:17

find very timeconuming. Um but I yeah I

15:20

definitely think there is also just like

15:22

a lot of space to kind of enable the

15:24

models to access like the current

15:26

interfaces that we use for for people

15:27

right. So I think like we want to have

15:30

um

15:32

um you know AIs on Slack for example or

15:35

that that are kind of plugged into our

15:36

our context and uh and yeah and are able

15:40

to to learn from it and a able to kind

15:42

of yeah to realize this existing things

15:44

right so definitely like there is some

15:48

meet in the middle here but definitely I

15:49

believe like longterm like uh you know

15:52

like by default the AI should kind of

15:54

meet you where where you are uh and if

15:57

Not that would be because it kind of it

15:59

has new abilities, not because it has

16:00

limitations.

16:01

>> Yeah, it's an interesting point that

16:02

basically today it feels like these

16:04

harnesses are so bespoke to certain

16:05

environments, but like over time as you

16:07

add more and more skills and tools and

16:09

models can navigate uh across those

16:11

effectively, it's like there just be a

16:13

general like you know the way humans

16:14

have uh that that makes a tremendous

16:17

amount of sense. I guess I'm curious

16:19

like you know you uh obviously I'm I'm

16:22

sure like every day you see kind of

16:23

crazy stuff on the research side at this

16:25

point like what are the milestones that

16:27

are like still meaningful to you as you

16:29

think about like it would be pretty

16:30

crazy if I you know uh did a run one day

16:33

and saw like X or Y like what are the

16:35

things you're paying most attention to?

16:38

>> Yeah. Um I mean at this point it really

16:41

is about um

16:44

research right like is it about it is

16:46

about can the model discover new things

16:49

can it execute on like a longer horizon

16:53

um research problem.

16:54

>> It's almost like looking for some sort

16:56

of insight that you're like oh someone

16:57

on my team had come up with that that

16:58

would I've been pretty intrigued by

17:00

Yeah, we we've actually had like some

17:02

minor uh um but I think I think quite

17:05

impactful ideas uh come from uh even

17:08

like GPT 5.2 Pro uh that that we're

17:11

using entirely. But you know, I think

17:13

it's still very very small compared to

17:14

where I expect it to be.

17:16

>> Yeah, I mean it seems like almost

17:17

inevitably like these models are going

17:18

to get better. They will be used in

17:20

research. They'll be used in science

17:21

more generally. You're like one of the

17:23

first people interacting directly with

17:24

these models as like research partners

17:26

almost at this stage. anything like

17:28

you've learned around the right way to

17:29

do that or do you think about like what

17:30

a research organization you know as

17:33

these models continue to get better

17:34

might look like? Yeah, I I I think we're

17:37

definitely kind of at um at a transition

17:39

point where kind of the shortterm

17:42

immediate quality of the model uh is

17:46

about to be a quite determining factor

17:48

for the pace of our research progress

17:50

because the models are going to drive a

17:52

lot of that. And so that definitely

17:54

requires um you know rewiring some

17:56

intuitions about how to um run a

17:59

research organization. Uh you know

18:00

normally you kind of try to not be too

18:02

focused on like immediate quality. you

18:04

try to be much more focused on like the

18:06

longer term. I think we have like a lot

18:08

of very exciting uh stuff queued up that

18:11

we are kind of working towards but I

18:13

feel a lot of urgency to kind of yes to

18:16

actually

18:17

>> u execute on it and to actually use this

18:19

advances in model intelligence to um

18:22

accelerate research on the AI and

18:24

especially AI alignment. Yeah, it's such

18:26

a fascinating point because I've heard

18:27

you talk before about running a research

18:28

organization and I feel like in the past

18:30

it was like giving people the space to,

18:32

you know, pursue a lot of things that

18:33

weren't like directly, you know, hey,

18:35

this is for a month or two months of

18:37

progress, but it's like what are the

18:38

ideas that are really going to drive

18:39

things forward, but it makes total sense

18:41

that we're in a time now where uh you're

18:43

like, look, everything we do will be so

18:45

much better if we just focus on this in

18:47

the in the short term and make it

18:48

better. It must be like fascinating to

18:50

navigate uh that and like these maybe

18:53

further off research ideas at the same

18:54

time and like running an organization.

18:56

>> Yeah. Yeah. It's definitely Yeah, it's

18:58

definitely something we we spend a lot

18:59

of time on with Mark nowadays. Yeah.

19:02

>> Right now you have um you know a a ton

19:05

of compute as a company, but you

19:06

obviously you have great scaling laws on

19:07

the pre-training side, you have great

19:09

scaling on the RL side, you have

19:11

probably lots of experiments going on

19:12

that have nothing to do with either of

19:13

those vectors, but are like interesting

19:15

new ways. How do you even think about

19:17

like allocating compute across all of

19:19

this stuff?

19:20

>> Yeah, it can get very complicated,

19:21

right? Because there's so many things

19:23

that we need to do. One thing we've been

19:25

one kind of discipline we've started

19:27

keeping is we um we try to make sure we

19:30

just like explicitly budget like a large

19:32

chunk of our compute to the most

19:33

scalable methods to the things that we

19:35

believe are the most responsible for

19:36

driving general model intelligence. And

19:38

you know even if it's not the most

19:40

efficient allocation of comput at all

19:41

times because you know if you're

19:43

allocating so much compute to like one

19:44

experiment or like one set of

19:45

experiments you know there's so many

19:47

things you can accelerate a little bit

19:48

of that compute elsewhere. Uh but you

19:52

know but I think it's easy to kind of

19:54

like with all the all all the all the

19:56

interesting and important things that

19:57

we're doing I think it'll be very easy

19:58

to kind of partner all of it and like

20:00

not not really end up doing the things

20:02

that we believe are most important. You

20:04

definitely want to like understand the

20:05

kind of empirical evidence. You

20:06

definitely want to make sure your

20:08

evaluations are in order and the kind of

20:10

experimental rigor is there. And then

20:11

you also want to apply some

20:13

regularization based on like okay do we

20:14

understand this method? Do we actually

20:15

expect it will scale? Do we expect this

20:17

is something you can actually build on

20:19

in the future? Is this kind of a

20:20

one-off? Right. And I think and based on

20:22

that uh determine the priority.

20:24

>> Yeah, it's so interesting. probably find

20:25

all the yeah ways that you like know you

20:27

could improve things but they feel maybe

20:28

like uh off off a little bit to the side

20:30

of where you think the overall arc of

20:32

progress is and so you end up leaving

20:34

some of these like lowhanging fruits to

20:36

some extent because really the most

20:37

important thing is finding the future

20:38

direction and then the scaling within

20:40

that and uh devoting compute toward that

20:42

obviously the the place where we talked

20:44

about codeex a lot and and the success

20:46

of coding and it feels like you know

20:47

last year was like the year of just

20:49

incredible hill climbing on on coding

20:51

I'm curious you know obviously Codex has

20:53

been a super successful product in many

20:55

ways like anthropic was kind of first to

20:57

this market you know claude code you

20:59

know was it was a dominant product there

21:01

what do you kind of like you know

21:02

reflecting on that I guess like what do

21:04

you make of the success anthropics had

21:05

in this space

21:07

>> yeah I think I think it's a matter of

21:09

you know really focusing your product

21:10

direction or on where where you believe

21:13

the kind of the the next application of

21:15

the technology is right and um you know

21:19

if you look at the kind of priorization

21:21

we've had on the on our product right I

21:24

mean we have been right like working on

21:26

on cutting products but they have kind

21:28

of been like a secondary thing right

21:30

compared to like our main priorities and

21:33

the interesting thing is that is not

21:34

very reflective of like the priorities

21:36

of the research organization within open

21:37

AI uh I think you know given that like

21:41

we've kind of had this you know

21:43

explosive success of charg you know

21:45

charging as it was you know I I think

21:47

charging

21:49

quite a bit and it's going to evolve

21:50

quite a bit but as it was in 23 right is

21:52

this particular you know product that's

21:53

maybe not, you know, I think it's

21:55

definitely quite aligned with our vision

21:58

of like where AI is going, but but like

21:59

it's not really like the like

22:01

representative of like everything that

22:03

that that that it enables. And so the

22:06

majority of like our work in research

22:08

has been focused on like that that

22:09

future thing. And I think increasingly

22:12

it has decoupled from our our our kind

22:14

of like short-term product strategies,

22:16

right? Yeah. I'm very kind of um

22:19

confident about um the things we've been

22:23

building and the things we we we are

22:24

building on on on the research on the

22:26

model intelligence side. You know, a lot

22:28

of our our rep refriation and increased

22:31

focus on the on the product side is

22:33

about actually kind of getting to deploy

22:34

them and the belief that actually they

22:35

are uh the thing that really matters

22:37

now.

22:38

>> Yeah. And now it feels like you know the

22:39

uh clearly the whole company priority

22:42

you know is so locked in and focused on

22:43

this and you've seen just incredible

22:45

improvement in codecs in recent months

22:47

for all the developers that listen to

22:48

the podcast like if again it's almost

22:51

like hard to comprehend like what the

22:53

world looks like as these models keep

22:54

hole climbing on longer and longer tasks

22:55

like what do you think will look

22:57

different in their lives or like how

22:58

will they be using codecs in you know

23:00

three six months. I realize 3 months and

23:03

six months are very different timelines

23:04

in this world, but take whichever uh

23:06

whatever in between point you'd like.

23:08

>> I would expect um just a a gradual

23:12

increase in just the level of autonomy

23:16

uh you feel comfortable uh foring the

23:18

model just the the fagness of

23:20

description that can work with you know

23:21

the level of supervision it needs. I

23:23

think we're not very far for models that

23:25

can work autonomously for a couple days.

23:27

Um maybe use quite a bit more computer

23:29

than they're using now and produce much

23:30

higher quality artifacts on their own.

23:32

Do you have a gut instinct on like what

23:33

like you know there's always been this

23:34

question of like will the world you know

23:36

do you need that software engineering

23:37

skill set to supervise these models

23:39

running for a few days or like hey does

23:40

it turn out at some point of like being

23:42

able to run for a while you know anybody

23:44

can can use coding agents and supervise

23:46

them to to some sort of output. I mean I

23:49

think definitely for like a lot of

23:50

outputs you already don't need much

23:53

experience right I think I think still

23:54

the distinction I would draw between

23:56

like you know an intern here and like

23:58

really an autonomous researcher software

24:00

engineer would be that like if you want

24:02

to build something bigger like you know

24:05

you probably still want to apply

24:06

supervision you still kind of want to

24:07

have like an overarching thing you want

24:08

to recognize like what what what

24:10

building blocks fit in and what which

24:12

don't but yeah I definitely expect that

24:14

like that desired skill set uh to shift

24:17

quite a bit over

24:18

Yeah,

24:19

>> towards towards this like more general

24:21

uh vision setting.

24:23

>> You know, I guess on on the on the

24:24

research side, I feel like there's been

24:26

uh you know, maybe maybe like a month

24:27

ago, I feel like all anyone could talk

24:29

about was continual learning and there's

24:30

just you know, it was in the Zeitgeist.

24:32

There's all these neolabs starting to go

24:33

focus on continual learning. Some folks

24:35

left OpenAI to go focus on that. Um I'm

24:38

curious like you know I think it part

24:40

maybe part behind that is a belief that

24:42

like you know uh RL alone you know

24:45

either won't get us there or will get us

24:47

to like some level of very inefficient

24:49

scaling and it's kind of different than

24:50

the way you know humans learn. I think

24:51

even I've heard you say before like that

24:53

you know RL is still very different

24:55

today than the way that humans learn.

24:57

What's your take on on like that you

24:59

know that whole movement?

25:02

Yeah, I I am a little bit confused by it

25:04

because you know in my mind like

25:08

the whole kind of like excitement that

25:10

like we've had I mean even even if you

25:12

look at the titles of like the GPT uh

25:14

you know three paper right like it is

25:17

that like oh you know this class of

25:19

models is actually capable of continue

25:21

learning right it's capable of like

25:23

learning uh um learning to learn in

25:26

context right that has been really you

25:29

know the driving force behind the kind

25:31

of excitement to like scale these GPD

25:33

models further. That has been like the

25:35

premise for why we really need to teach

25:38

them with RL like learn in context more

25:40

efficiently. And so I definitely agree

25:42

that continual learning is really the

25:45

thing, right? Like it's really the thing

25:46

that we're building, but I I don't

25:47

really think this is like a problem

25:49

that's like, oh, you know, it's kind of

25:51

ignored and off the path of what we're

25:52

doing currently. I think it is what

25:53

we're working towards.

25:54

>> Yeah. Like in your mind, this is like

25:55

the single best path to get there is to

25:57

continue to kind of scale uh the

25:59

pre-training in RL. I think that is kind

26:01

of how we've made the most progress on

26:02

this problem so far and you know I think

26:04

there are I think that there definitely

26:07

are like more ideas more steps um I

26:09

think also a lot of improvements that

26:11

will just come from scale

26:12

>> yeah and I guess like you know we have a

26:14

lot of folks listening that maybe have

26:16

you know have been able to do a lot of

26:17

simpler things with these models and

26:18

then they try to do like some of these

26:20

more complex you know I don't know call

26:21

it 100 step or longer term tasks and

26:23

they're like oh you know the the models

26:25

don't work for this yet and I think it's

26:26

harder you on the inside constantly feel

26:28

this improvement but for them it feels

26:30

like hey this is like night and day away

26:32

from you know being able to do this much

26:34

longer thing. How do you kind of

26:35

articulate to them I guess the set of

26:38

things that need to be true for these

26:39

like much longer steps to happen. Is it

26:41

around kind of checking in more often as

26:43

you were talking about before or I feel

26:45

like there's just this belief uh among

26:47

the research community of like oh all of

26:49

these tasks will be solved in the next

26:50

year or two and then in the wild a lot

26:52

of people maybe not totally groing that

26:54

like improvement line that we've been

26:56

seeing.

26:58

>> Yeah. I mean I think a lot of that

27:00

prediction comes from just looking at

27:02

like historical improvement lines,

27:04

right? And but I think increasingly we

27:06

can we can roughly see the the the the

27:10

shape here. I do think a lot of this is

27:11

about just the models becoming

27:12

intelligent enough to recognize like

27:14

whether you know they're making

27:15

progress. Um I think some of this is

27:19

like yeah this very kind of pragmatic

27:20

work of like are the models actually

27:24

you know can they actually access you

27:26

know all the context all the files all

27:27

the infrastructure they need to do the

27:29

work you want them to do which yeah I

27:31

remember like in the past when we were

27:32

discussing you know the kind of the the

27:35

road map uh that we're taking with RL

27:38

you know I definitely view like okay we

27:39

just need to teach the model to kind of

27:41

reason with its own tokens as kind of

27:43

the priority and then of course we'll

27:44

need it to use tools like the

27:46

environment, you know, at some point we

27:48

definitely need to teach it to see,

27:50

right? At some point, we need to teach

27:51

it to use a physical body, right? Like,

27:53

but like uh yeah, I mean, I think we're

27:55

definitely like well into the stage

27:56

where, you know, really needs to like

27:57

interact with the environment and it

27:58

really needs to see uh and you know,

28:00

someday soon we'll we'll really cover

28:01

about robots, but yeah.

28:02

>> Yeah. I mean, it does feel like a lot of

28:03

the times when I hear people complain

28:05

about, oh, a model can't do X or Y, it's

28:06

like literally just because you haven't

28:08

fed, you know, or connected it to

28:09

systems or fed enough context into it.

28:10

Actually, I do wonder if like context

28:12

was universally applicable and able to

28:14

flow into these things. Like I feel like

28:16

a lot of these problems would actually

28:17

just be solved with today's models. You

28:19

know, I want to talk about some of the

28:20

AI for science stuff um that you guys

28:21

have been working on. And one thing in

28:23

particular, you know, I feel like the

28:25

coding stuff is something that everyone

28:26

feels very viscerally um you know, in

28:28

every company they're using these tools

28:30

and getting tons of productivity. You

28:31

know, on the math side, not all of us

28:33

competed in in in IMO competitions and

28:36

uh necessarily have as much of like an

28:37

intuitive feel for some of these

28:38

breakthroughs. And so one of them I know

28:40

that was really interesting that you

28:41

guys did is you use some compelling work

28:43

around like first proof, right? And I

28:45

think these are like very different

28:46

problems than kind of traditional

28:47

competition math. I wonder if you could

28:49

just speak a little bit to that because

28:50

I think it's just a space that our

28:51

listeners might be less familiar with

28:52

and kind of less familiar with

28:54

understanding the implications of models

28:55

being able to do pretty cool work here.

28:58

Yeah, I mean you know I think yeah I I

29:01

was very excited with the first proof

29:02

challenge and you know again like I I

29:04

kind of you particular one is kind of a

29:06

benchmark right it's like a couple you

29:08

know respected mathematicians

29:09

theoretical computer scientists

29:10

releasing problems that like they

29:11

believe are like representative of their

29:13

day-to-day work but haven't been

29:15

published anywhere so that we can really

29:16

have our models take a crack. We were so

29:18

excited about this challenge, but you

29:19

know, it was kind of dropped um without

29:22

any any any

29:25

advanced warning um with like a week-l

29:28

long deadline to actually execute. Um we

29:31

had a we had a very exciting model

29:32

training uh at the time. And so uh um uh

29:38

um one of the people in charge of

29:39

training James Lee kind of started

29:42

prompting the uh that model just um by

29:46

hand and and and and

29:49

uh and yeah and actually kind of seeing

29:51

oh okay it's actually solving these

29:52

problems was really a fascinating things

29:54

to see. uh you know one of these powers

29:57

actually is from a domain that I I I I

29:59

did my PhD in and yeah seeing the model

30:02

kind of come up with these ideas which I

30:03

would you know quite proud to come up

30:05

with like in a in a week or or two uh

30:08

seeing it come up with them in like an

30:09

hour or so that was very uh yeah it's a

30:13

very weird feeling right like like yeah

30:15

I think like in the past the when I felt

30:19

like that was like when watching our

30:20

data bot like play just like very

30:22

interesting data games infinitely right

30:24

and it feels like just there's some sort

30:25

of magic happening because like you know

30:28

interesting things should not be like

30:31

>> indefinite.

30:32

>> Yeah. And so seeing that happened for

30:34

math right for something that I believe

30:35

like you know is actually like quite

30:38

representative of of of our our or you

30:41

know a precursor to a lot of the work

30:42

that we're doing and a lot of the work

30:44

that like really matters in the world.

30:45

Um yeah definitely really increase my

30:48

feeling of urgency. One thing that's

30:49

fascinating too is the idea that you're

30:51

you're training these models and it's

30:52

like you know you pro you throw these

30:54

problems in and it's like nobody knows

30:55

whether you know how good will they be

30:56

at solving them and and I think just

30:58

like it must just be fascinating to see

31:00

uh something that you know so well and

31:02

and a space that you spend so much time

31:03

in and and realizing hey probably the

31:05

previous generation of models wouldn't

31:06

have been able to do that and you

31:08

wouldn't even thought necessarily that

31:09

this was like the the benchmark to do

31:10

but it's like just generally showing the

31:12

the general purpose capabilities and and

31:14

improvements of the models. I mean it it

31:16

is at a stage where like you know we

31:17

needed to like seek out experts in the

31:20

in the particular domains to be to be

31:22

able to tell us whether these particular

31:23

proofs are correct or not but you know

31:25

it's still much easier to like tell

31:28

whether you've you've actually made

31:29

progress than you know than for

31:31

something like uh even coding right like

31:33

because sure like competitive

31:34

programming you can evaluate but most

31:35

programming is not competitive

31:36

programming and it's you know it's about

31:38

like are the abstractions right are

31:39

handling all the all the cases and yeah

31:41

>> yeah I guess like you know I feel like

31:43

there was this maybe common critic

31:44

system a year ago and I don't know if

31:46

it's as strided now that like okay these

31:48

models are like pattern matchers but

31:49

like you really want AI for science like

31:51

we're not going to get new ideas or like

31:53

you know entirely novel things out of

31:55

out of pattern matching feels like we

31:57

continue to like chip away at that

31:58

narrative are we getting closer to kind

32:00

of fundamentally disproving that

32:02

>> I believe so yeah I mean I think kind of

32:04

on schedule we're starting to see like

32:07

minor advancements right like not huge

32:10

things right like a small idea here or

32:12

there I mean maybe maybe some like

32:13

bigger papers in collaboration with with

32:15

scientists, right? But, you know, was

32:19

Alpha Zero a pattern match, Alpha Go a

32:21

pattern matcher? You know, our our datab

32:25

match like they did kind of come up with

32:27

new strategies for the respective games.

32:29

>> Yeah.

32:29

>> Um,

32:30

>> it's funny that there's counter examples

32:31

to it all the way back to, you know,

32:32

2016, 2017.

32:33

>> Right. Right. And and, you know, and you

32:35

can say like, well, I guess you can

32:37

always fall to flaws in that which I

32:39

think is interesting like AlphaGo can be

32:40

beaten with some strategy. our data bots

32:43

could have been been bitten with some

32:44

with some strategy. I think I think

32:46

there will be a lot of definitiones for

32:48

a while of of like these models, right?

32:50

But but I think also like they they are

32:53

able to discover new things because they

32:55

have a lot of these capabilities and

32:56

like the way you know yeah I mean it's

33:00

you know taken a couple years to like

33:02

get go from like this like very tiny

33:04

game environments to like this much more

33:07

um general scientific research. it

33:09

required kind of going through um you

33:12

know like a decent approximation of like

33:14

all human knowledge in the meantime and

33:17

you know learning all the human

33:18

languages and so forth but but um but I

33:20

think the basic principle is is is very

33:22

similar.

33:23

>> Yeah. You know, it's funny. I think like

33:24

when you guys had these first proof

33:26

results, um I remember like the

33:28

organizers said, you know, they were

33:29

commenting on these AI solutions and

33:30

they were like this feels like, you

33:32

know, 19th century mathematics of like

33:34

brute force, you know, computationheavy

33:36

approaches rather than these like

33:37

elegant modern techniques. Um which I'm

33:39

not sure is a feature or bug of of you

33:41

know, obviously the the way these models

33:42

work, but like you know, hearing that I

33:45

mean does that like does that concern

33:46

you, excite you?

33:47

>> It doesn't concern me. I mean I think

33:50

it's expected that like I I'm sure I I

33:53

thought for at least one of the problems

33:54

like actually actually our produced

33:56

pretty pretty nice pro that was quite a

33:58

bit shorter than like the intended one

33:59

you know but I think in general you

34:00

would expect like yeah this models kind

34:02

of you know they can produce so much

34:04

more reasoning in a short time than like

34:06

a person can right just like in terms of

34:07

just raw number of like tokens or

34:09

thoughts I don't expect that to be like

34:11

kind of a long-term feature

34:13

>> it feels like there's so much momentum

34:14

behind AI for science right now and you

34:16

mentioned obviously like you know at

34:17

some point you do have to connect these

34:19

these models to the physical world and

34:20

you guys released some cool stuff with

34:22

GKO and like some of these other things

34:23

you've been experimenting with. I'm sure

34:25

you've thought a lot about like AI for a

34:27

bunch of different areas of science. You

34:29

know, as you've kind of dug into some of

34:31

this stuff, have you dealt with any

34:32

intuition for as you think about like 3

34:34

years from now, the spaces where of

34:36

science where you're like, "Oh, that

34:37

there's going to be crazy progress there

34:39

versus the ones that might prove like a

34:40

little more resistant to immediate

34:42

change." You know, a tempting answer

34:44

would be that like oh, you know, it's

34:45

really about like um you know, do you

34:49

uh you know, what are the things that

34:51

kind of require some some you know,

34:53

manual work like where the models are

34:55

not like not not quite plugged in the

34:57

ecosystem or you know like the that the

35:00

the different laboratories will also

35:01

kind of evolve pretty quickly to adopt

35:03

to like these new technologies

35:04

>> within those STEM fields. Obviously, you

35:06

know, I feel like there's a question of

35:07

is it like an LLM with access to the

35:10

physical world or you've obviously had

35:12

companies that are have been started

35:13

specifically around these domains,

35:14

right? Like an isomorphic in biology or

35:16

periodic in in material sciences or

35:19

physical intelligence and robotics.

35:21

What's your kind of gut instinct on the

35:22

extent to which it makes sense to pursue

35:24

some of these things like independent

35:26

with different model architectures

35:27

versus like all within the context of

35:29

one place?

35:30

Yeah, I think it's kind of similar to

35:32

you know my answer about like the um UI

35:35

for you know for codex which like I I

35:37

would build around the capabilities of a

35:38

technology and not around it limitations

35:40

so much. Um so you know you definitely

35:44

like if you have something that like can

35:46

suddenly design like a huge amount of

35:48

like interesting like chemical or

35:49

biological experiments like yeah I mean

35:51

it makes sense to uh you know build labs

35:54

that enable that. You know, I think if

35:56

we if we did get to a place where like

35:58

the model is like very capable of

35:59

designing high quality experience. It

36:00

also makes sense to like have it work

36:02

with humans in a loop, right? Like we

36:03

shouldn't think of it as like oh it's

36:04

either you kind of automated fully and

36:06

you have this like fun thing using some

36:08

tools on the side. Like we will get to a

36:10

world where like it's just very natural

36:12

to be collaborating with um you know AI

36:14

scientists that are that are working

36:16

hard on a problem.

36:16

>> Yeah, it's so interesting. It's almost

36:17

like a different vision. It's like one

36:19

world where this works is like hey you

36:20

just train a model you know to basically

36:22

run these endto-end tasks and like be

36:24

the automated like you know uh biologist

36:27

or you know chemist or whatever it is

36:29

and there's another one which is like

36:30

well you're building really tools to you

36:32

know both propose run kind of work in

36:35

tandem with a bunch of human researchers

36:37

>> I mean you know I wouldn't necessarily

36:38

categorize it as I mean you know of

36:40

course there are tools in some sense but

36:41

I think like you know we will get to a

36:42

point where they're driving a lot of the

36:44

like design and and ideation for the

36:45

whole process. Yeah, with with like an

36:47

LLM architecture, but just like you know

36:49

being able to figure out the right way,

36:50

the right kinds of experiments to run

36:52

and and then actually design it. And

36:53

yeah, when it comes to like different

36:54

architectures and you know, I mean, you

36:57

know, for sure like you know like

36:59

natural language reasoning like the kind

37:02

of the kind of things u that that we're

37:04

prioritizing that gives you a lot of

37:06

generality like there there are things

37:08

that are that you know you kind of want

37:10

to train it you want to train a

37:11

different model to to model right you

37:13

know I think even like yeah if if you

37:15

want to create a very good you know G

37:17

model I I don't think like large

37:19

language models are like the most

37:20

efficient way to go about this although

37:22

they might result in the best model

37:23

eventually but uh you know I think it's

37:25

similar for like uh you know protein

37:27

folding or or other task of this kind.

37:29

>> Yeah. So you think it makes sense to

37:30

have like some independent efforts

37:31

around that but obviously the like you

37:33

know that will end up being paired with

37:35

like a core really good researcher large

37:37

language model that is you know helping

37:38

drive a bunch of this stuff.

37:40

>> Yeah. I want to also make sure just to

37:41

talk about AI safety because I think

37:42

that's an area that you've done a lot of

37:44

really pioneering work on. Um and you

37:46

know I'm not sure all our listeners will

37:48

be familiar with uh you actually did

37:49

some really interesting work across the

37:50

labs right uh and and were focused on

37:53

you know chain of thought monitoring and

37:55

so maybe to start just talk tell us a

37:57

little bit about that work and and you

37:59

know uh you know what you found.

38:00

>> Yeah so this is um a realization that

38:04

actually we had um around the time we

38:07

actually saw like the first um reasoning

38:11

models of kind of the current crop. We

38:14

realized that like okay like well this

38:15

works right and we were pretty uh you

38:18

know we were thinking a lot about what

38:19

this means we kind of were like okay

38:21

like probably the word really changes

38:22

over the next I don't know year or two

38:24

or three you know we were thinking what

38:26

this means for for safety and for for

38:28

our ability to kind of understand what

38:29

these models are doing and we realize

38:30

that because of the way we train these

38:32

models that because we don't supervise

38:35

the reasoning process directly right

38:36

it's not like you know chpt is trained

38:38

to kind of um you know be be polite and

38:41

nice and like Um, and

38:43

>> it always tells me I have great ideas.

38:45

>> Yeah. Well, you know, that's a separate

38:48

issue, right? Like, but but you know,

38:50

but but like even assuming it's like

38:52

aligned exactly in the way we would want

38:54

it to, which is definitely not, you

38:55

know, uh, sick ofic like it's still kind

38:57

of not going to be uh, you know, there

39:00

are just still still some things it's

39:02

not going to reveal about its

39:03

motivations and time because, you know,

39:05

maybe it would be unsafe or maybe it

39:07

would be unkind. um um or you know or

39:10

maybe because it's not maybe it's

39:12

actually not aligned the way we think

39:13

but it wants to hide that right and uh

39:17

and the way we train the reasoning

39:19

models like the the the train of thought

39:21

doesn't have any of that it's not

39:22

optimized to uh to be in any particular

39:26

way because it's just not not directly

39:28

great it's only great in how it relates

39:30

to like producing a high quality output

39:34

um and realize this is actually a very

39:36

powerful

39:38

paradigm time for being able to

39:40

interpret what the model is doing,

39:41

right? It's actually not a very

39:43

different idea from uh um mechanistic

39:45

interpretability, right? Because in

39:46

mechanistic like the idea is again like

39:48

you kind of have this model, you have

39:50

these activations of the model um that

39:53

you know are not directly supervised to

39:55

predict any label. they're they're kind

39:57

of like indirectly supervised but you

39:59

know the model kind of has never been

40:01

trained with like any sort of like uh

40:03

you know inspection of the of these

40:04

activations and so these activations

40:05

might reveal something about this in

40:07

inner workings but the big advantage of

40:09

the chains of thought is that you know

40:10

by default they are in English right and

40:12

so it's so much easier to understand

40:13

what is going on especially you know as

40:16

the concepts get more advanced u and the

40:19

other interesting thing is um you know

40:22

we were just talking about how probably

40:25

you know how how we believe in in the

40:26

future where we go uh well these models

40:29

work for a very long time they work

40:30

autonomously right and so there there is

40:32

much more of this reasoning uh and so

40:34

you know if this is a big axis of how

40:37

the capability of these models increases

40:40

um that the sort of our ability to

40:42

supervise them will will scale uh uh

40:45

comately. Yeah, this really comes down

40:47

to this

40:49

principle though that like you know

40:51

you're not supposed to supervise the

40:52

train of thought and so this is actually

40:53

something uh when we originally you know

40:56

we're releasing the preview model like

40:58

we made this decision to like hide the

41:00

chains of thought and

41:01

>> yeah I remember

41:01

>> and um you know for me that was the

41:04

primary motivation that was the reason

41:06

like I didn't really even want to

41:09

consider releasing it in different ways

41:11

you know there definitely was a bit of

41:12

internal discussion about this but like

41:14

the reason I felt very strongly like we

41:15

should we should just hide it is because

41:17

of this. Uh then there was this other

41:19

concern that like I didn't initially

41:20

think about but I think was also like

41:21

very valid of like well you know like

41:24

this model is going to be distilled to

41:25

some extent blah blah uh and you know

41:26

and that's definitely also been like a

41:28

big factor here. Uh but but yeah but I

41:31

actually think that like this uh you

41:34

know allowing the models some sort of

41:36

private space uh oh and by the way like

41:38

why do I think it's important that we

41:40

don't like you know show this change of

41:41

thought in product you know um if if if

41:45

I'm saying like the important thing is

41:46

not to supervise them during training

41:47

well I think if we did show in if we

41:49

like established a paradigm where like

41:51

oh you just show this chains of thought

41:52

in product uh eventually you kind of

41:55

have to train them right like you'll

41:56

have to train them for the same reasons

41:58

you have to train like whatever models

41:59

you ship. Um and I just think that

42:02

>> we might not all want to know what the

42:03

chain of thought our model has that gets

42:04

to a response for

42:05

>> right I mean you know I think I think

42:07

it'll be useful to some extent and we

42:09

are trying to capture most of that value

42:11

you know either with like chain of

42:12

summaries uh which I think are kind of

42:15

like a little bit of a stop gap. I think

42:16

the longer term solution here is having

42:18

the model actually talk to you in real

42:19

time which you know the later the latest

42:21

version of Codex kind of do latest

42:23

version of of the reasoning GP models

42:25

kind of do but I think I think that will

42:26

get much better um

42:30

yeah but but yeah I think there's

42:32

something very exciting here about just

42:34

like not u not having the training

42:38

signal fight against us right and not

42:42

not Yes because yeah I think if you

42:46

If you want to be able to understand

42:47

what the model does in the long term,

42:49

but you know you're scaling a method

42:51

that is like kind of going directly

42:52

against that, it's you're probably not

42:54

going to have a good time, right? That's

42:55

the other side of the better lesson. Uh

42:58

and so this decoupling I think is a very

43:00

it's an idea that gives me a lot of hope

43:02

for our ability to at least understand

43:05

um you know how these models motivations

43:07

and generalization evolve as they get

43:09

better as they as they work for longer.

43:11

Um yeah, I don't think it's a complete

43:14

solution to AI as alignment by a long

43:16

shot. I think it's just another tool in

43:18

our in our toolbox. Uh but I am hopeful

43:21

that building our toolbox with technical

43:23

tools like this, we can actually

43:25

continue chipping away at the

43:26

fundamental problems here.

43:27

>> Yeah, it seems like almost like over

43:28

the, you know, medium term, it's like

43:30

something that's going to be incredibly

43:31

helpful. Probably not the catchall

43:33

solution for for long-term alignment.

43:35

Yeah, I mean I think it's a tool that

43:36

can help us understand like I think it's

43:38

actually very useful to like build

43:40

understanding of long-term alignment,

43:41

right? For example, there has been this

43:43

very exciting quark um from um um um

43:48

from a planning collaboration with other

43:50

labs uh on uh model scheming where they

43:54

investigate uh you know depending on

43:56

kind of what environment you pro you put

43:58

the model in, how you train it like is

43:59

it is it prone to like start kind of

44:01

like having hidden objectives that it

44:03

pursues and you know what enables that

44:05

that whole line of work is chain of fat

44:07

monitoring right is this notion of like

44:09

oh you can actually inspect what the

44:10

most motivations are uh so you know and

44:14

I think from that like that might take

44:16

us in a completely different in terms of

44:18

mitigations right like maybe the right

44:19

way is like changing the pre-training

44:20

data of the model or maybe it's

44:22

something like uh you know the

44:23

inoculation prompting from a topic like

44:25

I think I think those are very

44:26

interesting ideas but I think like

44:27

having this ability to like understand

44:28

is very helpful to to evaluate these

44:30

>> yeah it's almost like foundational for

44:31

any further uh area of research what are

44:34

like the other research areas within

44:35

alignment that you're paying attention

44:36

to or that you think are promising you

44:38

know areas to focus on Um yeah, I think

44:42

I think a lot of the

44:44

a lot of the like longer term challenge

44:46

with alignment is about generalization,

44:49

right? Like we can train our models to

44:51

do well and and and and or you know at

44:54

least mostly to some extent like we we

44:56

can mostly kind of control their

44:58

behavior in the in the things that that

45:01

you know are in distribution that that

45:02

we train for. Um, but you know the

45:05

things that are worrisome is like well

45:07

what happens when animal is asked to do

45:08

something very very different or it

45:10

finds itself in a very different

45:11

situation or it's like much smarter than

45:13

it ever was before and and and you know

45:14

it has all these capabilities. It's like

45:16

we haven't really kind of thought about

45:18

how to train for and so yeah so so I

45:20

think I think you know the study of like

45:22

this kind of longer term value alignment

45:24

is really a study of generalization like

45:26

what are the values that the model falls

45:28

back on. Um like one line of research

45:31

I'm very excited about here and

45:33

something that we're uh investing in

45:36

quite a bit is uh understanding like how

45:39

that um how the generalization falls

45:42

back onto the pre-training data. Um

45:46

um yeah and yeah I I I think there's

45:50

quite a lot there. I guess over like you

45:53

know the last six months have your

45:54

concerns around alignment increased

45:56

decreased like how do you you know where

45:57

are we kind of trending overall uh you

46:00

know with this work

46:01

>> I I I will speak to like the the the

46:03

longer term challenges of like fignment

46:05

right or like what happens when you have

46:06

very smart models the the way my

46:08

thinking about the problem has evolved

46:09

over the past few years is definitely

46:11

kind of gone from

46:13

you know oh is this like very nebulous

46:15

problem that like is just like very hard

46:17

to even grapple with or define uh to

46:19

like oh you know I think we can actually

46:21

make prog progress at it by very

46:23

concrete technical solutions and

46:24

technical insights. And this is why

46:26

we've really been uh

46:29

viewing alignment as like just a core

46:32

part of of research and really uh you

46:34

know making sure that like we are you

46:36

know designing our reasoning models uh

46:39

thinking about this and we are you know

46:40

and we are kind of like conducting our

46:42

alignment research with like these

46:43

reasoning models in mind and so forth.

46:45

Um

46:46

so I think my general kind of uh belief

46:51

that there's like a research path here

46:53

that actually gets us to an extremely

46:55

happy world uh has increased quite a

46:57

lot. Um,

46:59

at the same time, right, I think

47:02

uh my timelines to very capable models

47:05

have definitely decreased a lot, right?

47:06

I think we're we're not that far, right?

47:08

Again, I don't think these are models

47:09

that are smarter than all the ways, but

47:10

I think these are models that are just

47:11

very transformative. And so, I'm quite

47:14

optimistic like we can keep a good grip

47:16

on like how we're doing on the alignment

47:19

problem, how to roughly evaluate the

47:22

risks of of of of

47:25

our models or or the problems with them.

47:26

you know, but I do think we have to be,

47:28

you know, as an industry as really

47:30

prepared to like take trade-offs and,

47:31

you know, and possibly, you know, slow

47:33

down development uh um depending on what

47:36

we see. It

47:37

>> it's already interesting to see a lot of

47:38

this work happening across the major

47:39

labs. You know, the fact that you did

47:40

this in collaboration with I think

47:41

Anthropic and Deep Mind and you know, it

47:43

seems like uh has that just come up

47:46

organically or imagine like is there a

47:47

lot of like alignment talk between you

47:49

know, the the major players, you know,

47:51

uh given I guess the three of you are

47:52

really at the forefront of all this?

47:54

There's definitely some I mean there's

47:55

definitely like shared interest in this

47:57

topics. Yeah.

47:57

>> I want to shift a little bit to going

47:59

inside OpenAI. I feel like no no company

48:01

probably or the world has been more

48:03

interested in over the last uh 2 three

48:05

years and you know I think particularly

48:06

what it's like to run a research

48:08

organization. You know we talked a

48:09

little bit about this uh previously but

48:11

you talked before about how it's you

48:13

know important part of your job is

48:15

giving researchers you know uh to to

48:18

kind of have comfort and space to you

48:19

know almost be cave dwellers right and

48:21

think about what the models will look

48:22

like in a few years. Um, you know, we

48:24

were kind of alluding to it earlier.

48:25

We're also in a time where it feels like

48:27

there's just massive competitive race

48:30

and you know, uh, it's it's it's

48:31

certainly, you know, everyone's going

48:33

really gung-ho on these coding models.

48:35

I'm wondering like how do you actually

48:36

operationalize this balance today and

48:38

and you know, anything you've kind of

48:40

changed in your thinking, you know,

48:42

overseeing this organization around the

48:43

right way to do this? you know I focus

48:45

on on just high quality experiments

48:48

recognizing you know are we actually

48:50

making progress being honest with

48:51

ourselves and you know and promoting

48:53

honesty about about the results um I

48:55

don't think that has changed right and

48:57

and uh you know even though our work

48:59

will evolve a lot I believe we still

49:01

have quite a lot of work left to do and

49:04

so I don't think it's like oh you know

49:05

we need to wrap up all our projects uh

49:07

um you know very very quickly so yeah I

49:10

don't think those fundamentals change I

49:11

think what what does change is uh you

49:13

know a level of urgency to really kind

49:15

of bring some of these things that we

49:16

think are most promising uh to fruition

49:19

>> and then obviously you know I feel like

49:20

there's been um you know some very

49:22

public internal moments of open AI over

49:24

over the years you've been here for a

49:26

long time as you kind of reflect back

49:28

like what were some of the difficult

49:30

decisions that you guys made that maybe

49:31

were like 5149 that really you know

49:34

defined the company or any any any as

49:36

you think back of the movie of the last

49:37

you know seven eight years of your life

49:39

um you know the key moments that kind of

49:41

stick out to you. Well, yeah. I mean,

49:42

there's certainly a number of, you know,

49:44

dramatic moments, uh, like this. Um, you

49:47

know, I think the ways the company

49:48

underwent the most change is not really

49:50

this like snap changes, snap decisions,

49:53

but more like just like shifts and and

49:56

how it operates, right? I would say like

49:58

opening has gone for a couple phases.

50:00

you know when I joined at the start of

50:01

2017 2017 very much kind of uh felt like

50:05

very academic lab pursuing like a lot of

50:08

different ideas not so you know scaling

50:10

pill in practice uh and I think that was

50:12

like the first like big change with the

50:15

data product with GPT we've kind of

50:17

moved to okay like we actually are going

50:19

to have to buy big computers we're

50:20

actually going to have to um scale

50:22

things we going to have to develop the

50:24

science of scaling we'll have to develop

50:25

the infrastructure for it um and so that

50:29

kind of started the second phase of of

50:31

okay now we're scaling right like we're

50:33

we're we're still going to pursue like a

50:35

lot of these basic research ideas but we

50:37

are going to evaluate them like for the

50:38

act are this are they scalable um um

50:44

then yeah then there was this

50:45

interesting period I talked about

50:46

earlier right where you kind of have

50:48

>> chat GPT is this big thing

50:52

yeah I mean I thought it would look a

50:55

little bit differently right like I

50:56

think I I was actually surprised that

50:57

like text models

51:00

I was pleasantly surprised like text

51:02

models are actually kind of the first

51:03

thing. I thought we would be in a world

51:04

where like it's more the kind of like

51:07

you know video style uh uses of

51:09

generative AI are kind of like the first

51:12

>> uh the first big thing to take off and

51:13

like and we'll have to like trade off

51:15

like pursuing the kind of longer longer

51:17

term text based research. Uh so yeah so

51:21

so so but yeah but I think definitely

51:24

like we anticipated that like this sort

51:25

of tension would arise right where like

51:27

you have a thing that is kind of like

51:29

popular now but it's like you know you

51:30

believe it's going to evolve quite a lot

51:32

before you get to where you're going and

51:34

so I think that's kind of the phase

51:35

we've been in for a while um and yeah I

51:39

think now we're we're like uh

51:43

um well yeah I mean we believe we are

51:46

kind of like starting to be in this

51:47

phase where yeah we're actually

51:48

deploying AGI or you know deploying

51:50

models that are actually very economic

51:52

transformative.

51:53

>> No, it's uh it certainly seems that way.

51:55

Well, I guess we always like to end

51:57

interviews with a standard set of

51:58

quickfire questions which are basically

52:00

me just stuffing all my overly broad

52:01

questions I couldn't fit anywhere else.

52:03

Uh so if you you'll shamelessly indulge

52:04

me uh you know I guess to kick it off

52:07

would love what's one thing you've

52:08

changed your mind on in the AI world in

52:10

the last year? Yeah, I mean I I think I

52:12

think it's really, you know, starting to

52:14

reconcile this tension between, you

52:18

know, the AI that you build ultimately

52:20

is something that affects the world,

52:21

but, you know, until you until you kind

52:23

of get pretty close, it's like a pretty

52:25

theoretical thing that you're just kind

52:27

of, you know, u training and developing

52:29

algorithms for. And so, you know,

52:32

recognizing that okay, now we actually

52:35

need um we really need to um

52:40

you know make a lot of pro progress and

52:42

focus on like how actually we're

52:43

deploying this technology and um in a

52:46

while. This is definitely something I've

52:48

been I've been thinking about a lot

52:49

lately.

52:50

>> Yeah, it's so interesting. basically

52:51

like you know uh outside of chat it was

52:54

almost like more in the in the abstract

52:56

or research hill climbing you know with

52:58

some usage in the real world and then in

52:59

this last year we've obviously seen you

53:01

primarily via coding agents just you

53:03

know it it trickle in you know in in a

53:05

pretty massive way.

53:06

>> Yeah I I I I think I I believe is kind

53:09

of going in the same direction as like

53:10

the coding models where like it's

53:12

actually going to be something um you

53:14

know very useful it's going to be

53:16

something that's like a meaningful part

53:18

of of of people's lives. when you say

53:20

going in the same way you mean just like

53:21

executing longer term tasks or more like

53:23

you know the

53:24

>> I feel that's part of it right but also

53:26

just um you know coming to become like a

53:29

dependable trustworthy assistant or

53:31

compion

53:32

>> yeah it's amazing to watch the way

53:33

younger people use jet I'd argue it's

53:35

it's already pretty much there for uh

53:37

the way a lot of folks in in high school

53:39

and college and you know uh seem

53:41

increasingly you know comfortable using

53:42

it um you know I wouldn't be a shameless

53:45

podcaster if I didn't ask a top

53:46

researcher you know timelines for a few

53:48

things I think particularly interesting

53:49

is the stuff outside of the core LM

53:51

world and so think there's a lot of buzz

53:53

around robotics these days. Do you have

53:55

any like in I mean obviously it's hard

53:56

to pinpoint like a moment robotics quote

53:59

works but I think you know whether it's

54:00

finding scaling laws or finding some

54:02

sort of like chatbtesque moment for

54:04

robotics.

54:05

>> Yeah. I mean I definitely think there

54:06

are like very promising algorithmic

54:08

ideas there that I I believe are going

54:10

to work that are you know not too

54:12

dissimilar from the space of ideas. So

54:14

I'm I'm quite optimistic about about

54:17

timelines there. Uh although I do think

54:19

they're longer than like the kind of the

54:21

virtual um AI.

54:23

>> Obviously I'm sure you think a lot about

54:24

you know cuz you're always thinking

54:25

about the next frontier for what these

54:27

models can do. Um you know just the

54:29

impact on on society as a whole as you

54:31

think about this kind of pace of

54:32

continued model improvement. You know

54:33

what's maybe one thing that you think

54:34

we're underthinking right now as a

54:36

society in terms of the impact of these

54:38

models? Yeah, I I I think getting to a

54:42

point where so much intellectual work um

54:45

can be automated I think comes with

54:49

pretty big problems that I don't think

54:51

have obvious solutions. One natural is a

54:54

question of jobs and you know

54:56

concentration of wealth and I suspect

54:59

this requires like real policy maker

55:02

involvement. Yeah, I've heard some kind

55:04

of optimistic takes on how is this

55:06

resolved, but I think I think at a at

55:09

fundamental level it does seem like you

55:10

know some things that like used to be

55:13

very valuable used to kind of cost a lot

55:15

and used to provide something like now

55:16

can be done pretty cheaply and you know

55:19

in the long term it should be a good

55:20

thing but I think it does lead like I

55:22

think it can happen quite quickly.

55:25

Um

55:27

and there is a related question of

55:30

you know you really can like if you

55:32

actually have you know an automated

55:36

research laboratory an automated company

55:37

that can do so many things like it can

55:39

be controlled by a very small number of

55:40

people right it can be it can do a lot

55:42

right and this gets this gets you know

55:45

even more crazy when you have robots but

55:46

but you don't need to have robots and

55:48

you know I think figuring out like what

55:50

does governance of such things looks

55:51

like look like right like what are these

55:53

like organizations that like so powerful

55:55

and yet maybe made of like only a couple

55:58

of people like what how to think about

55:59

these things I think is uh it's a new

56:01

question we have to grapple with our

56:02

society when speaking of other new

56:03

questions one thing that's very top of

56:05

mind for me I I recently had a kid and

56:06

I've been thinking a lot about like you

56:08

know what is his life going to look like

56:10

in in 10 years um you're really close to

56:12

this stuff how has your work on on on AI

56:15

changed the way you think about like the

56:17

way in in which you know this next

56:19

generation should be raised

56:21

>> a task for all of us right is to build

56:24

the AI right build a world in a way

56:26

where uh you know at the end of the day

56:28

humans have the agency right humans set

56:30

the the direction right and you know

56:32

maybe a lot of the

56:34

the technical challenges that we cherish

56:36

right now will become more of a you know

56:39

past time that's something that we

56:40

really kind of like needs to do in order

56:42

to make progress and and the challenges

56:44

will be more and like figuring out like

56:45

what are the things that are important

56:47

what are the things we should go do you

56:48

know I think that that will still be you

56:51

know I think I think you know in that

56:54

world like people can end up with you

56:56

know more things to do and definitely

56:58

more more exciting things to to do and

57:00

you know I think I think you still want

57:01

like to have an understanding of you

57:04

know of like uh you know some

57:06

understanding of like you know

57:07

technology like all all the kind of like

57:10

uh basic you know education however you

57:12

want to acquire it for the sake of being

57:14

able to think about these problems.

57:15

>> Well this has been fascinating man I

57:16

really appreciate you sitting down and

57:17

and talking about so many different

57:19

things. Um, I want to make sure to leave

57:20

the last word to you. Like anything you

57:23

uh want to point our listeners to,

57:24

whether it's research you're doing or

57:26

products you're excited about or really

57:28

anything you'd like to uh to plug uh the

57:30

floor is yours. Um, you know, anything

57:32

I'm sure there's tons of threads people

57:33

want to uh pull out of this

57:35

conversation.

57:35

>> I think the set of problems we just

57:37

discuss, right, and also the questions

57:40

around alignment, monitorability, I I I

57:43

think I think those are growing to be

57:45

very urgent challenges. And I don't

57:47

think there are challenges only for AI

57:49

researchers, right? I think there are

57:50

challenges challenges for policy makers,

57:53

but also also just things we have to

57:55

think through as a society and uh yeah,

58:00

I I'm you know, I'm happy to see some

58:02

discourse starting to arise and I I

58:04

think we need more of it.

58:05

>> Yeah. Well, I thought I could talk to

58:06

you for hours more, but I'd be doing the

58:08

world a great disservice by keeping you

58:09

from your actual work of continuing to

58:10

improve these models. Thank you so much

58:12

for doing this. This was a ton of fun.

58:14

>> Thank you. I'm Jacob Efron and this has

58:15

been Unsupervised Learning, a podcast

58:17

where I get to talk to the smartest

58:19

people in AI and ask them tons of

58:21

questions about what's happening with

58:23

models and what it means for businesses

58:24

in the world. As I hope is clear, I have

58:26

a ton of fun doing this. It's a nights

58:28

and weekends project in addition to my

58:30

day job as an investor at Redpoint. But

58:32

our ability to get these incredible

58:33

guests on really comes from folks like

58:35

you subscribing to the podcast, sharing

58:37

it with friends. It's really what

58:39

ultimately makes this whole thing work.

58:40

And so, please consider doing that. And

58:42

thank you so much for your support and

58:43

listening. We'll see you next episode.

More transcripts

Explore other videos transcribed with YouTLDR.

Get the TLDR of any YouTube video

Transcribe, summarize, and repurpose videos in 125+ languages — free, no signup required.

Try YouTLDR Free