Full Transcript

·YouTLDR

The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil

18:233,657 words · ~18 min readEnglishTranscribed Apr 19, 2026
AI Summary

AI agents naturally optimize for 'progress' over 'soundness,' leading to brittle systems that hide critical failures behind automated local fixes. To maintain control, engineers must intentionally re-introduce friction by designing 'agent-legible' codebases and using tools that force human judgment on high-risk changes.

As code production shifts from creation-constrained to review-constrained, understanding the psychological and engineering traps of 'frictionless' development is vital for maintaining long-term system integrity.

Section summaries

0:00-2:00

Introduction and Background

optional

Context on Armin's history (Flask, Sentry) and the company Arendil.

2:00-9:00

The Psychology and Engineering Trap

watch

Crucial explanation of why agents produce technical debt and why human review is failing.

9:00-14:00

Engineering Solutions for Legibility

watch

Practical advice on how to structure a codebase specifically for AI agents.

14:00-18:00

Review Workflows and Conclusion

watch

Demonstrates the Pi agent harness for review and summarizes the need for friction.

Key points

  • The Reinforcement Learning Bias toward 'Running' — Agents are RL-optimized to make code run and pass tests, often by adding 'bare catch' blocks or loading defaults for missing configs. This creates 'hobbling' services that recover locally but cause massive, invisible failures downstream.
  • Agent-Legible Codebases — Software must be designed as infrastructure for the agent to navigate, emphasizing modularity, strictly unique function names to save tokens, and avoiding 'magic' abstractions like ORMs that hide underlying intent from the model.
  • The Responsibility Imbalance — The barrier to entry for shipping code has dropped so low that non-engineers (marketing, former CEOs) are producing PRs, but the legal and technical responsibility still rests solely on the engineering team.
The friction actually in many ways is what's necessary on a physical level to steer. like without friction there's no steering. Armin Ronacher
The agents are writing kind of code that is is when you as a human as an software engineer start learning how to write code you wouldn't necessarily write. Armin Ronacher

AI-generated from the transcript. May contain errors.

0:15

morning. Thanks for having us. Um, today

0:17

I want to talk with Christina about

0:19

friction a little bit. Um

0:23

this is um a a social preview that came

0:28

up automatically when someone submitted

0:30

an issue um to

0:34

um basically there was this is a forum

0:36

post that goes with um a security

0:38

incident that was deployed accidentally.

0:40

It was a configuration change that

0:42

caused a problem and the social preview

0:44

post had the marketing tagline of that

0:46

company which said ship without

0:48

friction.

0:49

Um, and we want to encourage to add a

0:53

little bit of friction to it. Um, and

0:56

I'll tell you why. So, who are we? Um,

0:59

I've been doing software development for

1:01

20 years, most of it in the open source

1:03

space. Um, I have created Flask, which

1:06

is a Python framework, which ironically

1:08

is so much in the weights that a lot of

1:10

people um are learning about it now

1:13

because the machines are producing it.

1:15

Um, and I left my previous company that

1:18

I worked for, Sentry, in April last

1:19

year, which perfectly coincided with um,

1:22

me having time and then obviously Cloud

1:24

Code. And so I fell deep into a hole of

1:27

aicing engineering and I started writing

1:29

on my blog and and and a lot of people

1:31

reached out to me over the last year um,

1:33

being all excited about this. Um, and

1:36

then I started with a friend in October,

1:39

a company called Arendelle where we are

1:42

trying to make sense of all the AI

1:43

things. Um,

1:46

>> yeah, and my name is Christina and I

1:48

work with Armen at this company called

1:50

Arendelle. But importantly, I am what I

1:52

like to call a native AI engineer. And

1:55

what that basically means is that these

1:57

tools have been around longer than I

1:59

have. Um, so what this means is like

2:01

they've been super foundational in how

2:03

I've become a software engineer. Not

2:05

just because obviously I use them to

2:06

work, but also because this is the means

2:08

by which I've learned to do what I do.

2:11

And before Arendelle I was working at

2:13

bending spoons.

2:16

>> So we want to share a little bit from

2:18

practice not just theory but um I will

2:20

readily admit that I don't think we have

2:22

all the solutions. So we have been

2:24

building with or on agents for a good 12

2:26

months. Um we had huge leverage and

2:29

great disappointment and we we really

2:32

keep running into two types of problems.

2:34

Um I I think especially if you listen to

2:37

some earlier talks at at this conference

2:39

you will have learned a lot about um

2:41

that you should keep using your brain.

2:43

Um it's for some reason that's really

2:45

really hard. So there's a psychological

2:46

problem and the other one is the

2:48

engineering challenge is like the they

2:49

seem to be producing worse code for some

2:52

people and better code for some other

2:53

people and like what is it that actually

2:54

makes that work. Um and so this is

2:57

really not a solution as it is our part

2:59

of the journey of how we think so far we

3:01

have managed. Um yeah, so problem number

3:06

one is the psychology part which is like

3:08

why is it even though everybody told you

3:10

many times over that you should be using

3:11

your brain, you should be slowing down,

3:13

it's actually incredibly hard. It's just

3:14

one more prompt and and we don't sleep

3:16

that much. Like what is it that actually

3:18

makes it so hard? And then would it be

3:21

that hard if the machines would actually

3:22

be writing perfect code and we wouldn't

3:23

have to think quite as much and like

3:25

what is it is there something we can do

3:27

to make this a little bit better?

3:29

So I'll begin by introducing the first

3:31

part of these problems, the psychology

3:33

problem. And what I want to talk first

3:35

about is the shift. So I'm sure a lot of

3:38

us here who have been playing with these

3:39

tools for a while now experienced this

3:41

at some point. We were prompting

3:43

prompting not so good and then at some

3:45

point suddenly it clicked and they were

3:47

really really useful for us and it was

3:50

fun in the beginning and they gave us a

3:52

lot of extra time right because not

3:53

everyone was using them. They were

3:55

actually tools that made us more

3:56

productive, that made it more fun to do

3:58

our jobs. But very quickly, because they

4:00

were so useful and they got us so

4:01

hooked, everyone was using them. And so

4:03

this kind of had the opposite effect

4:06

where suddenly the baseline expectation

4:08

was just that everyone is now using them

4:10

and you have to use them. And so this

4:12

this fun and free time translated into

4:15

pressure. Now we all have to ship faster

4:17

and produce more code. And it is just

4:20

not sustainable to review and to

4:22

actually have time to think.

4:25

And so this leads us to the trap and I

4:28

actually think there's two parts of this

4:30

problem of this trap and one of them a

4:32

lot of engineers have spoken about and

4:33

it's that these tools are super

4:35

addictive. You never know if that next

4:38

prompt is going to be the one that makes

4:40

your product work and you've added a new

4:41

feature or if it's going to be that last

4:43

drop of slop that brings your product

4:45

crashing down. And so it's very

4:48

addictive. We keep doing what we're

4:49

doing. It's not a great solution. But

4:52

also most importantly, and I don't think

4:53

we realize this as much is that because

4:55

we produce a lot of output very fast, we

4:58

are tricked into thinking that we're

4:59

actually being more efficient doing more

5:01

work. And this is quite the opposite

5:03

because now we don't have as much time

5:05

to actually stop and think and design

5:07

what we're doing. Ask ourselves, is this

5:08

the best way in which I can implement

5:10

this or could I be some doing something

5:12

better? And when you're in this flow,

5:15

it's very difficult for yourself to stop

5:17

and it's definitely very difficult for

5:18

your agent to stop because it's running

5:20

around and it's reading files that it

5:22

should have never even read. So we are

5:24

the ones that need to actually have the

5:26

agency to be in control here.

5:29

>> And one thing that from a if you start

5:32

scaling this from like one person to an

5:34

engineering team that actually took me

5:35

quite a while to realize is that it

5:37

really changes the composition of the

5:39

engineering team. We we were really

5:41

supply constrained by creation of code

5:43

and so like the balance between writing

5:45

code and reviewing code and engineering

5:47

teams was usually quite decent. Now

5:49

every engineer has a multitude of

5:52

producing power compared to their

5:54

reviewing power and so obviously we are

5:56

piling up on poll requests but we are

5:58

also slowly starting to expand the total

6:01

amount of humans in an organization that

6:03

are participating in engineering

6:04

process. I talked to a lot of engineers

6:06

over the last year and increasingly the

6:08

one of the things that came up is like

6:10

now I have marketing people shipping

6:11

code. I have um former CEOs sh CEOs that

6:16

used to be like engineers now shipping

6:18

code again. And so the the roles that

6:21

those people have in the companies also

6:23

doesn't give them there's not that much

6:26

um um the responsibility doesn't rest in

6:29

them. The the responsibility still rests

6:31

with the engineering team. And so the

6:34

the total number of entities both humans

6:36

and machines that are participating in

6:37

the code creation process outnumbers the

6:39

ones that can carry responsibility.

6:41

We're not there where the machine can be

6:42

responsible for the code changes. And so

6:44

that has led to more and more code

6:46

reviews being skipped being rubber

6:47

stamped. Um and on the goal to small PRs

6:51

that that we want to see again so that

6:53

this reviewing process goes um this

6:55

amplification is something that at the

6:57

very least we need to recognize.

6:59

And so when you get this pull request

7:02

that looks really daunting and has 5,000

7:04

lines of code in it, this is actually

7:05

when you should be thinking and that's

7:06

exactly when it's the most overwhelming

7:09

and and increasingly we're tapping out

7:10

of this.

7:13

On the engineering side, what we're

7:15

doing is we are creating larger pull

7:18

requests. We're creating these massive

7:20

changes because it is free now, right?

7:23

And the if you think about how the

7:25

agents work, they're really optimized to

7:27

creating code that runs. Like their main

7:29

objective is write some code, run the

7:32

tests, make some progress. The

7:33

reinforcement learning sort of gets this

7:35

in. And so the the agents are writing

7:37

kind of code that is is when you as a

7:41

human as an software engineer start

7:43

learning how to write code you wouldn't

7:45

necessarily write. So for instance, you

7:47

see quite a bit of code that tries to

7:49

read a config file and if it doesn't

7:50

read a config file, it loads some

7:51

defaults. And as an engineer, you know,

7:53

that's actually not great because I

7:54

might not notice that I'm reading

7:56

reading the default config file. And so

7:58

I might only discover that I have a

8:00

massive problem after two hours when I

8:03

already wrote database records with

8:05

wrong data. And so these machines, they

8:08

they optimize towards making progress to

8:10

shipping stuff to like unblocking

8:12

themselves. And as a result, they're

8:14

creating many more failure conditions

8:15

than human written code normally would

8:17

do. in parts is because you as a human

8:19

feel a little bit of a you feel bad when

8:22

you write code like this. There's

8:23

there's something that sort of builds up

8:24

emotionally in yourself, but the agent

8:26

doesn't have a reason for this. It it

8:28

doesn't feel anything. And so if you if

8:31

you create these services that are sort

8:33

of hobbling along and they're actually

8:34

willing to to recover from local

8:36

failures, you actually create very very

8:38

brittle systems. And this also means

8:42

that you're very quickly creating a

8:44

codebase of the size and complexity that

8:45

the agent itself can no longer dig

8:47

itself out from. It's going to start no

8:49

longer reading all the files that it

8:50

should. It's it's creating code in a new

8:52

file that has already done somewhere

8:54

else. And so this this entire machinery

8:58

over time creates much more entropy in a

9:00

source code than you would normally have

9:03

if if humans were on it. And a big part

9:05

of this is that humans feel bad and

9:07

agents don't really have any emotions

9:09

that they communicate to you.

9:11

>> But as Armen likes to say, don't worry,

9:14

not all is lost. We have s found some

9:16

correlation between what the agents

9:18

really excel at doing and the types of

9:20

code bases that we actually put them to

9:22

work into. And for example, the main

9:24

example here is libraries versus

9:26

products. What we found is that for

9:28

libraries, they tend to excel a lot

9:30

more. And this makes sense because

9:31

intrinsically when you're building a

9:33

library, you tend to have a very clearly

9:34

defined problem that you're trying to

9:36

solve. And most of the time you can even

9:38

map the set of features that you want to

9:40

build to the API service and it has very

9:43

tight constraints. And because this is

9:45

something that you probably want to

9:46

build on top of or make accessible to

9:48

other people, it's likely that it's

9:50

going to be a very simple core in which

9:52

you can then plug into. And on the other

9:54

hand, products and perhaps this is a bit

9:56

more unlucky for the rest of us because

9:58

we all probably are more into building

9:59

products. Uh it's much harder because

10:02

there are so many interacting concerns

10:04

and components like for example you have

10:06

your UI, your API response. You have

10:08

different permissions depending on the

10:10

feature flags, the billing and so on.

10:12

And so there's this very heavy

10:14

intertwining between different

10:15

components. And what this means is that

10:17

for the agent itself, it's impossible to

10:19

fe fit all of this into its context

10:22

window. it has no way to actually

10:24

understand the entire global structure

10:26

and so locally the agent tends to be

10:28

very reasonable but when it gets to the

10:31

global scale it becomes a bit demented.

10:34

So what we're proposing here is that

10:36

just as you would do with any type of

10:38

system design in the past, your codebase

10:40

has now become infrastructure and as

10:43

such you have to design it in the way so

10:45

that it is also legible for the agent

10:47

and it can make the most of it.

10:51

And so this is what we're proposing is

10:53

an agent legible codebase and one of the

10:56

main points that is very clear to all of

10:58

us I'm sure is modularization. So like

11:00

we have different components and this

11:02

makes it easy for the agent to add one

11:04

feature in one spot without corrupting

11:06

everything else. But importantly this

11:07

also means modularizing your code flow

11:09

itself. So for example I've been working

11:12

on some refactoring. We're building

11:13

somewhat of an AI assistant. And for me

11:16

it was super important to understand

11:18

which steps of my code are actually like

11:20

the main points. So say like you get

11:22

user message then I pass the message to

11:24

the agent loop and then I have to deal

11:26

with the output. And this is where these

11:30

points are very clearly defined for me.

11:31

So the code was not as messy. But it

11:34

happens to be that between these points,

11:35

between these steps, that's where the

11:37

agent tends to add the most fuzz. So it

11:39

will be parsing between different types.

11:41

It's adding things to state that

11:43

shouldn't be in state. And so you end up

11:45

with these behaviors that you didn't

11:46

want to support and that are unexpected

11:48

and can be quite dangerous. Another

11:51

point is trying to follow all of the

11:53

known patterns because I think we all

11:55

know by now there's no point in fighting

11:57

the RL the reinforcement learning. The

12:00

more we can lean into it the better that

12:02

our output is going to be and it's also

12:04

more scalable down the line. Then as

12:07

mentioned with libraries like if you

12:08

have a simple core and you push the

12:10

complexity to other abstraction layers

12:12

then it's going to be easier for

12:14

yourself and the agent to be able to

12:15

read your codebase and no hidden magic.

12:18

So for example here uh using react

12:21

server actions or using OM instead of

12:23

rorowsql what this does is that it hides

12:26

intent from the agent and if the agent

12:28

can't see something it can surely not

12:30

respect it

12:32

and so to be more precise these are the

12:35

examples of mechanical enforcement that

12:37

we have been using at the company and

12:40

most of these we actually achieve with

12:42

uh linting rules. So the main example

12:44

would be no bare catch holes. Great.

12:48

Imagine that there's an example here.

12:50

The agent found the very catch all and

12:51

was like, "Oh no, this is bad. Edited

12:54

it." But yeah, so we also try to have

12:58

our SQL uh always in one query interface

13:01

so that the agent doesn't have to go

13:02

hunting around the codebase finding all

13:04

of the different places because if it

13:06

misses one then you can have breaking

13:07

behaviors and again that's dangerous. We

13:10

try to have one primitives components

13:12

library for the UI and not have any raw

13:14

for example input uh input boxes. Uh so

13:17

that it's we always have one type of

13:19

styling. It's very consistent one kind

13:21

of behavior. We don't have any dynamic

13:23

imports. And this may not sound as

13:26

important but actually we enforce unique

13:28

function names. And the reason for this

13:30

is not just more legibility for you and

13:31

the agent, but it's actually also the

13:33

token efficiency. So if your agent is

13:35

gripping for a specific feature or

13:37

something in your codebase, if it only

13:38

gets one output, it's going to be much

13:40

better at continuing with the loop. And

13:43

we've started exploring something

13:45

recently called erasable syntax only

13:47

TypeScript mode. And what this does is

13:49

that your code is basically JavaScript

13:51

and it has the type annotations on top.

13:54

And this means that there's no

13:55

transpiling direction because there's

13:57

one source of truth between your actual

13:59

code and the compiler. And so when the

14:02

agent is looking for errors, it doesn't

14:03

have to have this like confusion of oh

14:06

my god, where am I looking at? It is

14:08

much better at finding them.

14:11

And so the goal really is get in this

14:15

loop somehow like get the agent to

14:17

produce as good code as it can, but you

14:19

really need to find a way to feel the

14:21

pain that the agent doesn't feel and you

14:24

need to be woken up in a way when you

14:27

should be looking at this. And one of

14:28

the things we have been doing is we

14:29

build a PI extension for our review

14:31

needs where we are separating out the

14:34

kind of input that normally would go

14:36

back to the agent. So this is mechanical

14:38

bugs. It is where it clearly violated

14:41

the agents MD. Um but then we

14:44

specifically call out the kind of

14:45

changes where the human's brain should

14:47

reactivate, right? It's like we don't

14:49

think that the database migration should

14:51

ever go in without the human making a

14:52

judgment call on this because it very

14:54

much depends on the locks, the size of

14:55

the data in production. Um if there are

14:58

permissioning changes, you better think

14:59

about this themselves rather than the

15:00

agent because they can be they can be

15:02

underdocumented.

15:04

Just some examples where we learned if

15:07

we miss it, we regret it. Um and you

15:11

will miss it. But this these machines

15:13

can help you find this and then you see

15:15

this and then you actually get a little

15:17

bit of a hit like, oh now now I have to

15:19

kick into gear and do something here. Um

15:22

this is what this looks like in pi. Um

15:25

you have the um on the bottom you have

15:27

the human call outs on the top you have

15:30

what is go what basically if you were to

15:32

end this review and say like fix the

15:34

issues the the agent would go back and

15:35

automatically act on the first two um

15:38

but but this is the moment where I will

15:40

now go and see like is this a dependency

15:41

I actually want to have in this codebase

15:43

like do I like the maintainers is this

15:45

does this work for me

15:48

and we obviously like the speed like

15:51

this is addictive it is great we feel

15:53

there's a lot of productivity

15:54

But it is so devious if you start

15:57

relying on it speed where you really

15:59

shouldn't. And so I can only encourage

16:02

you to find the areas where you you have

16:04

this feeling that this is actually net

16:05

positive. For me a lot of this is

16:08

reproduction cases like when a customer

16:10

reports an issue I can I can have the

16:11

age and reproduce this perfectly and I

16:14

have a really good starting point

16:16

exploring different type of product

16:17

directions for as long as you commit

16:18

yourself to doing this uh with the code

16:20

that it generates. Um all of this is

16:23

great but on the other hand system

16:24

architecture creating reliability in the

16:26

system they're not just very good at

16:29

because we really still have to go slow.

16:31

It's there is so much mess that can

16:33

appear in a codebase in so little time.

16:35

Mario was already talking about this

16:36

earlier but like we forget that we

16:37

producing months and months of technical

16:39

debt in the in in a time of weeks in a

16:42

time of days sometimes and it becomes so

16:45

much harder to actually understand

16:46

what's going on as codebase. the when

16:48

the understanding of your own code

16:50

drops, it is really really hard and it's

16:53

also psychologically hard. I've found

16:55

some code pieces that actually didn't

16:57

work in production and I was kind of

16:59

frustrated learning that I was the one

17:00

that committed it with the agent and

17:02

just didn't really see that. It's it's a

17:04

very disappointing experience when it

17:06

happens and then you realize that you

17:07

actually were the one that screwed up.

17:09

Um, and so it is it is psychologically

17:13

incredibly hard to to really judge

17:15

objectively the state of the codebase.

17:18

And the only way right now is to really

17:20

slow down a little bit on on that front

17:24

and this this friction. I know that

17:26

friction like every engineering team

17:28

I've ever worked at said like we need to

17:29

get rid of the friction in shipping and

17:31

and that is true. Like there's a lot of

17:33

stuff that's very very annoying and

17:35

shouldn't be there. But if you have

17:36

worked on large enough engineering work,

17:38

SLOs's are a great system that is

17:40

intentionally designed to put friction

17:41

into the engineering process to make you

17:43

think, do I need this reliability? Do I

17:45

need this criticality of the service? Am

17:48

I sufficiently staffed to run it? And

17:49

with the agents, we have now gotten this

17:52

idea that we should get rid of all of

17:53

this when in all reality we need of it.

17:56

Um because the friction actually in many

17:59

ways is what's necessary on a physical

18:01

level to steer. like without friction

18:03

there's no steering and and that is

18:05

really necessary. Um so you should you

18:08

should put a little bit more of a

18:10

positive association to this idea of

18:12

friction. Um because this is really

18:14

where your judgment is. This is where

18:15

your experience is and you should be

18:17

inserting that and start feeling it.

18:19

Thank you.

18:20

>> Thank you.

More transcripts

Explore other videos transcribed with YouTLDR.

Get the TLDR of any YouTube video

Transcribe, summarize, and repurpose videos in 125+ languages — free, no signup required.

Try YouTLDR Free