Full Transcript

Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary

27:045,126 words · ~26 min readEnglishTranscribed Apr 19, 2026

AI Summary

Agentic engineering is a shift from using AI as an autocomplete tool to collaborating with it as a junior developer. Success requires managing the 'Context Window' as a finite resource and following a strict Research-Plan-Implement loop to prevent hallucinated or contextually incorrect code.

As AI agents gain the ability to execute code and manage pull requests, the primary bottleneck in software development shifts from writing lines of code to the 'harness engineering' of context and intent.

Section summaries

0:00-1:00

Introduction

optional

General stats on AI usage that most practitioners already know.

1:00-4:00

The Paradigm Shift

watch

Crucial mental model of treating AI as a junior developer rather than a hammer.

4:00-7:00

Context Engineering

watch

Explains the 'dumb zone' and how bad context poisons results.

7:00-11:00

Management Tactics & Anecdotes

optional

Contains an anecdote about iPad wireframes; helpful for perspective but can be skimmed.

11:00-17:00

The Research-Plan-Implement Loop

watch

The most actionable technical section of the talk.

17:00-21:00

Agent Configuration & Standards

watch

Covers agents.md, skills.md, and MCP servers.

21:00-27:00

Kilo Product Demo & Outro

optional

Specific features of the Kilo tool; skip if you use Cursor, Windsurf, or other tools.

Key points

The 'Junior Developer' Mental Model — Treat AI agents as energetic, well-read, but ego-less junior developers who possess massive breadth but zero business judgment or architectural context.
The Context 'Dumb Zone' — Model performance degrades significantly once the context window is more than 50% full. Overloading context with irrelevant files or active MCP servers makes the model less capable of reasoning.
Research-Plan-Implement Loop — A structured workflow where agents are first restricted to 'Ask Mode' (research only), then tasked with creating a step-by-step 'Plan' file, and finally allowed to 'Code' in a fresh, isolated session.
Agentic Infrastructure (agents.md & skills.md) — Adopting standardized documentation for agents within repositories to define project conventions (agents.md) and reusable task-specific playbooks (skills.md).

“You kind of have to think about your AI agent as an energetic, enthusiastic, extremely well-read, often confidently wrong junior developer.” — Brendan O'Leary

“A bad line of research can potentially be hundreds of lines of bad code.” — Dex Horthy (quoted by Brendan)

AI-generated from the transcript. May contain errors.

0:00

Let's talk a little bit about what I

0:01

mean by agentic engineering.

0:04

And let's maybe start with a question.

0:07

If I were to ask you right now, how are

0:09

you using AI in your work? Could you

0:11

actually really explain it?

0:13

Not just, you know, it helps me code

0:15

faster. It can write code really fast,

0:17

but like the real workflow.

0:19

What you hand off, what you keep, how

0:22

you decide in between.

0:26

Most engineers can't and that's a little

0:27

wild to me because 90% of engineers are

0:30

already using AI tools or have used

0:32

them. Maybe only half of them are using

0:34

them on a regular basis, but that's a

0:36

number that's definitely growing all the

0:38

time.

0:39

And that's the current state.

0:41

So, the question isn't whether your team

0:43

is using AI, they are. The question is

0:45

whether you're getting the most out of

0:47

it or you're just kind of auto

0:49

completing your way through the day.

0:52

That gap between using AI and being able

0:56

to articulate how you work with it,

0:59

that's what this talk is all about.

1:01

And really, I think it represents a

1:03

paradigm shift of how we think about AI.

1:07

And you know, the history of AI and

1:09

software engineering is moving

1:11

uh very fast. It's also very

1:13

surprisingly short, right? In the 20

1:16

early 2020s, we got tools that could

1:18

finish the lines for you. You type, you

1:20

know, half of a function signature and

1:22

the model would guess the rest of it.

1:24

You know, kind of like auto complete on

1:26

steroids. It's a neat trick.

1:28

And then in 2022,

1:30

models started to be able to suggest

1:32

entire functions, right? You could

1:34

describe what you wanted and chat with a

1:36

model and maybe get a working

1:38

implementation back. And this is where

1:40

GitHub Co-pilot first came on the scene

1:42

and broke through and millions of

1:44

developers started using it. And for the

1:46

first time it was starting to seem like

1:47

maybe AI wasn't a novelty, maybe it was

1:50

generally useful.

1:52

But then in 2025, something really

1:55

broke. It's, you know, what we're living

1:57

in now in 2026. The the models don't

2:00

just suggest, they can execute. They can

2:03

take a task and break it down and figure

2:06

out which files need to be touched and

2:07

make the changes and run the tests

2:10

themselves and then come back with an

2:11

actual pull request.

2:13

And so, that's not just fancy auto

2:15

complete. It's not just a a faster

2:17

horse. It's a collaborator. It's a

2:19

different way of working.

2:21

And Armin, the creator of Flask for

2:23

those Python folks here, put it, I

2:26

think, perfectly.

2:27

We're no longer just using machines.

2:30

We're now working with them.

2:32

And that framing, I think, captures this

2:34

real shift.

2:36

Right? Tools are things that you pick up

2:37

and put down. You use a hammer. You

2:40

don't work with a hammer.

2:42

But the AI coding agents we have today,

2:44

they're kind of somewhere more in

2:46

between and they're maybe a little bit

2:48

more like working with another engineer.

2:50

Now, it just happens to be an engineer

2:52

who's read every Stack Overflow answer

2:54

ever written.

2:57

And I think that needs a a mental model

2:59

shift. And this is the mental model I

3:01

want you to carry through the rest of

3:03

this video and honestly through the rest

3:04

of your, you know, next couple years of

3:06

your career in working with these tools.

3:08

I I do think they're still tools, but we

3:10

have to think about them differently.

3:12

You kind of have to think about your AI

3:14

agent as an energetic, enthusiastic,

3:18

extremely well-read, often confidently

3:21

wrong junior developer.

3:25

That junior developer is incredibly

3:27

fast. They don't easily get tired. They

3:30

don't have any ego about their code.

3:32

They'll happily rewrite something six

3:33

times if you ask them to.

3:35

And they have an astonishing breadth of

3:37

knowledge. They've seen lots of

3:38

languages. They've seen lots of

3:40

frameworks. They've seen lots of

3:42

patterns.

3:43

But, and this is critical, what they

3:45

don't have is judgment. They don't know

3:48

your business context. They don't

3:50

understand the reasons why you made that

3:51

very specific architectural decision 3

3:53

months ago.

3:55

And they'll confidently write code that

3:56

is technically correct and contextually

3:59

wrong.

4:01

Armand also said that he's gained more

4:03

than 30% of time in his day because the

4:05

machine is doing a lot of the work.

4:07

That's a real gain.

4:09

But he's getting that 30% because he

4:11

knows what he can hand off and what he

4:13

has to keep for himself.

4:15

He's not just blindly accepting every

4:16

suggestion. He's directing the work.

4:19

And that's the difference between using

4:20

AI and working with AI. And that's what

4:23

agentic engineering actually means.

4:28

And so, let's get tactical. If you're an

4:29

engineer, how do we really

4:31

get good at this?

4:33

I think the number one thing to think

4:35

about is context engineering.

4:37

And here Karpathy says, you know,

4:39

context engineering is a delicate art

4:42

and science of, you know, filling the

4:45

context window with just what needs to

4:47

happen for the agent to have the right

4:50

context for the right iteration for the

4:52

next step.

4:54

I think that's really critical for a

4:55

couple of reasons. First, context is

4:58

expensive, right? Every token you add

5:00

into the context is going to add cost

5:02

because all of those things, that whole

5:05

chat history, is sent back in as a input

5:09

tokens every time that you send it.

5:11

And

5:12

that, you know, can can add up pretty

5:15

quickly.

5:16

And the other key is that more context

5:18

doesn't always mean better results. And

5:20

in fact,

5:21

um it can make the model actually

5:23

dumber.

5:25

Right? It's not just about the money.

5:27

The quality can degrade as you get over

5:30

about 50% full.

5:32

And there's lots of things that can trap

5:33

you here. And not the least of which

5:35

are, you know, the facts that fact that

5:36

MCP servers became so popular

5:39

that we have a lot of these enabled all

5:41

the time now. Well, each one of those

5:42

loads more and more context. Uh you

5:45

know, more and more input code tokens in

5:47

the context.

5:48

And and that can be a real problem if

5:50

you start getting into this dumb zone

5:52

around 50% context.

5:54

And

5:55

that also isn't the only problem because

5:58

not only can more context be a problem,

6:00

but bad context can be a problem and can

6:02

poison everything.

6:04

Right? So, this happens when you're

6:06

maybe mis- mixing two different tasks

6:08

that didn't really overlap. Or you've

6:10

kind of got some outdated comments

6:12

either in the code or that you've made

6:13

to the agent. Or even worse, what I've

6:17

seen a lot of people do is they start

6:18

walking down the road with an agent and

6:20

then realize,

6:22

"Hey,

6:23

we're down the wrong path. We've made a

6:24

lot of wrong decisions." And they try to

6:26

steer the agent back.

6:28

But the problem is again, the agent is

6:29

not doing real reasoning like you and I

6:31

as a human. Right? It's taking all that

6:34

context every time.

6:36

And it may get lost in the middle or

6:38

even see some of those negative things

6:41

that you had before as still part of the

6:43

context.

6:44

And you see those negative patterns

6:46

creeping back in if you're not careful.

6:49

That's why it's better, you know, to not

6:50

let these things kind of compound.

6:53

But also, you know, always start a new

6:55

session once you realize things are kind

6:57

of off the rails.

6:58

Right? Because not only is context

7:00

expensive,

7:01

the more we have doesn't always mean

7:03

better quality. In fact, at a certain

7:05

point there's a tipping point where it

7:06

means worse quality.

7:08

And bad context can corrupt the output.

7:11

So, the real critical thing for

7:12

engineers is to manage the context. And

7:15

what does that mean?

7:16

Well, one, I think it means persisting a

7:18

lot of information outside of the

7:20

context window so that we can bring it

7:22

in, right? So, this is things like

7:24

scratch pads for things we're working

7:25

on, memory files, the agents.md,

7:29

those kinds of files that help the

7:31

agents have context to what you're

7:32

working on.

7:35

We also need to be very selective when

7:36

we're

7:37

selecting that context. So, that means

7:40

only pull in what's relevant for this

7:42

step of the problem, right? Don't just

7:44

pull in everything that might be useful.

7:46

And so, that could mean,

7:48

you know, things like bringing in the

7:49

right at mentions for files that we're

7:51

referencing. That could mean making sure

7:53

we don't have unnecessary MCP servers

7:56

enabled. Uh and it means, you know,

7:58

making sure that the agent has the right

8:00

data and that we as a human have curated

8:02

that data for the agent.

8:05

And then, as it's getting bigger and

8:06

that that window gets bigger, we want to

8:08

summarize and trim and compress that

8:11

context, right? If we've gone through a

8:12

whole big deep dive and debugging

8:15

session with the agent and now we think

8:17

we have the problem and the solution,

8:19

well, that's great. It might be time to

8:21

compress that context and just focus the

8:24

agent back in on, "Okay, now we

8:25

understand this problem. We're going to

8:27

go fix it."

8:28

Uh and then the other most important

8:30

thing is to isolate context. And I think

8:32

this is why we've seen this huge rise in

8:34

the past six or eight months of parallel

8:37

agents because splitting work across

8:39

several agents or several sessions can

8:42

help things not accumulate. And really

8:45

drive this kind of task separation.

8:48

And again, if you think about it, aren't

8:50

these all of the same things that I

8:51

would tell a brand new engineering

8:54

manager about about managing a junior

8:56

engineer?

8:58

Like the story I tell here is a when I

9:00

was early in my career, I spent a lot of

9:02

time as an engineering manager and

9:04

product manager before I

9:06

went into the dark arts of developer

9:08

relations.

9:10

And in my first job ever as an engineer

9:12

manager, I was at a healthcare software

9:15

company.

9:16

And there was this new thing coming out

9:17

called an iPad. And that dates me a

9:19

little bit. Um but it was it was

9:21

released in the market and we thought

9:23

this could be a great place to collect

9:25

patient history, you know, that form you

9:26

have to fill out every year at the

9:27

doctor. It's very critical to assessing

9:30

a lot of your, you know, risk of

9:32

disease.

9:33

Um but having to fill it out from

9:35

scratch every time is is not fun.

9:38

And so, I designed in this other archaic

9:41

tool that some people may have heard of

9:42

called Balsamiq, basically a wireframing

9:44

tool, a wireframe of what this would

9:46

look like.

9:48

Now, that wireframing tool used things

9:50

like Comic Sans and like silly smiley

9:53

face icons as placeholders.

9:55

And a lot of other stuff like that that

9:57

you'd expect from just a wireframe.

9:59

And I handed that to a set of interns

10:01

that we had working for us that summer

10:03

thinking this is a great green field

10:04

project for them to take some time on.

10:07

And you know, a few weeks later I got

10:09

back a working prototype

10:12

and the font was Comic Sans and there

10:14

were silly emoji placeholders.

10:17

And that's because that's what the spec

10:19

had in it.

10:20

And so so whose fault was that?

10:22

Obviously it was not the intern's fault.

10:24

It was my fault as an engineering

10:25

manager not giving the right context to

10:29

those junior engineers as to what's

10:31

important, what's not, and what we

10:33

really need to focus on and what problem

10:35

we're solving.

10:37

And so I think the habits that can tie

10:38

all of that together

10:40

are you don't need to think about all

10:41

four of those things for every task, you

10:43

just need to think about doing one task

10:45

per session,

10:47

keep an eye on your context meter, and

10:49

if you're in doubt and it feels like

10:51

things are off the route rails, you're

10:53

probably right.

10:54

So start a new session, ask it to

10:56

summarize the session for a new agent.

10:59

Turns out that AI is really great at

11:01

writing prompts for AI. So if you've

11:03

worked on something with an agent for a

11:04

while,

11:05

have that agent summarize where you're

11:07

at,

11:08

you can now read it, make sure it

11:10

matches with your understanding and then

11:11

start a new

11:13

uh session with just that right context.

11:15

Again,

11:16

it's a little bit of art and a little

11:17

bit of science.

11:19

So how do we put this into practice?

11:21

Well, I think there's a lot of

11:22

workflows, there's lots of things

11:23

written out there that you can read.

11:25

I've even compiled a lot of them at

11:27

path.kilo.ai.

11:29

It's a where you can find like all of

11:31

these kinds of trends and ideas and

11:33

workflow patterns that have been talked

11:35

about.

11:36

But what I think I keep coming back to

11:38

is is maybe one of the simpler ones

11:41

and that's the research plan implement

11:43

loop.

11:45

Right? And I think this really helps us

11:47

solve for a lot of like classic mistakes

11:49

that people do when they pick up agentic

11:52

engineering for the first time or pick

11:54

up AI to help

11:55

try to do some engineering.

11:57

Um and what most people do is say, "Hey,

11:59

help me implement this feature. I want

12:01

it to do X and Y."

12:02

And you know, these large language

12:05

models are very good at outputting lots

12:07

of code. In fact, when I joined Kilo

12:09

Code over a year ago,

12:11

I made a pronouncement that we would

12:14

never have our website be

12:16

just prompt and a whole lot of code

12:18

flying by.

12:19

Makes for a great demo and you've seen

12:21

lots and lots of coding agents that

12:23

maybe that's how they show it off.

12:25

But I think the reality is jumping

12:28

straight into code like that can cause a

12:30

lot of wrong assumptions, it can waste

12:32

even more time rather than saving time,

12:34

and just create a lot of frustration.

12:37

And it really creates that kind of

12:39

paradigm that we've seen where people

12:40

are kind of anti-AI or think that AI is

12:43

not a useful tool because they've jumped

12:45

right in and gotten, you know, put

12:47

garbage in and gotten garbage out. Uh or

12:50

maybe it's been a while since they've

12:51

used it, right? I mean, if you think of

12:52

the the Will Smith eating spaghetti when

12:55

it comes to AI videos, that's come a

12:57

long way in just the past two, three,

12:58

four years.

13:00

You know, the same is true of the AI

13:01

coding models, but you have to do what

13:03

works to give them the best chance at

13:06

getting a great result. And what that is

13:09

is first understanding the problem

13:10

really well and making sure you and the

13:12

AI agent can understand the problem

13:13

really well.

13:15

Then laying out explicit steps for

13:17

implementing that

13:18

uh that those changes or fixing that

13:20

problem.

13:21

And only then do we jump to the

13:23

implementation phase where we're writing

13:25

code.

13:26

And Dex Horthy has a great uh phrase

13:29

that he says here, which is a bad line

13:31

of research can potentially be hundreds

13:33

of lines of bad code.

13:34

And so we're really going to focus in on

13:36

how do we get the research and the plan

13:38

in place

13:39

in order to make give ourselves

13:41

the best chance of having great code

13:44

come out.

13:45

So in that first phase, we're going to

13:46

use a tool that is only going to be

13:49

focused in on research. And so for Kilo,

13:51

we call that ask mode.

13:53

And the reason we call it that is

13:55

because the ask mode can't actually do

13:57

anything. It can only chat. It can't

13:59

write files. It can maybe read files if

14:01

you let it,

14:02

but it can't, you know, start trying to

14:04

code a solution.

14:06

And so instead of trying to to code a

14:07

solution from the beginning, we're going

14:08

to first try to understand the system.

14:11

You know, how does it actually work

14:12

today? Where are the right files that

14:14

are going to be involved? What are the

14:16

right paradigms that we want to mirror

14:18

or how does this differ from something

14:20

that we have already?

14:22

And you know, just kind of learn where

14:24

in the code base this this is going to

14:26

go and you know, how the data is going

14:28

to flow through the system and how it's

14:30

going to change with our change as well

14:33

as like any edge cases we can need to

14:35

consider, right? AI is really great at

14:37

brainstorming and so it can help you

14:39

kind of brainstorm those things and make

14:40

sure you've really covered all of your

14:42

bases.

14:44

And then once you're done that research,

14:45

what's going to come out of that is an

14:47

actual output document that shows the

14:51

the details of that research that you

14:54

can then read and basically agree with

14:56

and understand, "Hey, this this matches

14:58

my understanding of the problem.

15:00

I think we're ready to move on to the

15:01

plan."

15:04

And so then once we've reviewed that as

15:06

a human, now we can say, "Okay, let's

15:08

outline the next steps. What kind of you

15:10

know,

15:11

files are we going to create or or

15:13

change? Maybe there's some code

15:15

snippets, but not always is it a good

15:16

idea to have a code snippet in the plan.

15:18

We are definitely going to include like

15:20

how is how are we going to verify and

15:22

know this change is correct? What are

15:23

the test either changes or additions

15:26

that we're going to make to know that?

15:28

And we're also going to be really

15:29

explicit at the plan planning phase

15:31

about what is in and out of scope, what

15:33

is going to change, what isn't going to

15:34

change.

15:36

And again, the output of that is going

15:37

to be a very clear plan file, right?

15:39

You'll see a lot of repositories

15:40

nowadays have a folder called plans.

15:43

Right? And we want to have that plan

15:45

file be step-by-step instructions with

15:48

specific changes that we're going to

15:49

make that have test commands to verify

15:52

it, that has a strategy for

15:53

understanding how it's going to change

15:54

the system. And it's going to be very

15:56

clear so that we can even use maybe a

15:59

smaller, faster, or cheaper model to

16:01

implement it because we've spent the

16:03

time in the research and plan phase to

16:06

really understand what we're going to be

16:07

doing once we get to

16:09

implementing the change.

16:11

And when we come to implementing the

16:12

change, we now can start over a new

16:14

session and give it just the plan

16:17

execution.

16:18

It allows us to keep the context in that

16:20

session very low. It allows us to

16:22

carefully review each change and I think

16:25

commit very frequently. Now, I used to

16:27

work at a company called GitLab for

16:28

many, many years. Uh so maybe I'm a

16:30

little biased towards Git, but I think

16:32

Git can be a huge helper here when it

16:35

comes to helping you slowly iterate and

16:38

understand the changes that the agents

16:39

are making.

16:41

I treat Git on my local machine kind of

16:44

like my own first pull request review

16:47

with my agents before I maybe put up an

16:49

actual pull request for my

16:52

uh you know, for my colleagues to look

16:54

at.

16:55

But I think again, it's critical to

16:57

understand here that human research at

16:59

the planning or sorry, human time at the

17:01

planning and research phases

17:04

is really the highest highest leverage

17:06

use of your time.

17:08

By the time you're implementing, you

17:09

want to have all that hard thinking

17:11

done.

17:12

Uh and that's really critical cuz again,

17:13

going back to Dex Horthy who's who's

17:15

spoken a lot on the subject and uh I I

17:18

highly recommend you check out his you

17:19

you know, videos of him on YouTube

17:21

talking about this.

17:22

He says very aptly that AI can't replace

17:25

thinking. It can only amplify the

17:26

thinking you've done

17:28

or the lack of thinking you haven't done

17:30

or you know,

17:31

the fact that you haven't thought it

17:32

through.

17:34

And so let's talk about how we can

17:36

figure our agents kind of like one more

17:38

step down from this

17:39

this uh paradigm of research plan

17:41

implement to really make sure we do

17:43

this.

17:44

So first we talked about modes and

17:46

customizations. We already talked about

17:48

these modes, ask, code, architect. These

17:51

modes that are specialized and focused

17:53

on the thing that we're trying to get

17:54

done. Right? Architect is maybe for

17:56

planning. Ask mode is for research. Code

17:59

mode is for actually implementing.

18:01

Uh then we also want to have, you know,

18:03

a set of rules that make sense for our

18:06

workspace, right? For the the repository

18:09

we're in.

18:10

Uh or maybe globally on our machine so

18:13

that we understand, you know, that we

18:14

have a certain set of rules that we

18:16

always want to adhere to.

18:18

Uh and agents are pretty good at loading

18:20

in and understanding those rules.

18:23

Uh but we have to have them written down

18:25

for them to have those in their context,

18:26

right?

18:28

And so I think a lot of the agent

18:30

behavior then

18:32

is are things that we want to tweak as

18:34

we're learning, right? How many Do we

18:36

want to do multiple agents at a time? Do

18:37

we want those agents to use work trees

18:40

so that we can then again, merge them

18:42

back in to our local uh repository

18:46

locally before committing them to to a

18:48

pull request?

18:50

Uh how much do we want to auto-approve,

18:52

right? So most agents have the ability

18:54

to tune, you know, what are the things

18:56

that it can do independently? What are

18:58

the tools it can use independently? Can

18:59

it read files? Can it read files inside

19:01

or outside of the workspace? Uh can it

19:04

run tests? You know, what can the agent

19:06

do autonomously without your

19:08

intervention versus what do you need to

19:09

approve?

19:10

Yeah, I think this is something that you

19:11

have to set up to be comfortable with in

19:13

the beginning and then also you need to

19:15

be comfortable changing as you learn how

19:17

to use these tools.

19:21

And then I think a good mental model um

19:23

for this agent configuration is maybe

19:25

kind of three distinct buckets, right?

19:27

We talked about modes, right? This is

19:29

that that role-based configuration, you

19:32

know, a behavior of an agent that we

19:34

want.

19:35

Uh but there's two other really key

19:37

things and that is the agents.md and

19:39

then skills.md that you'll hear about.

19:42

Uh and so what are those what's the

19:43

difference between the two?

19:45

Well, the agents.md is now quickly

19:47

becoming the de facto standard for where

19:50

all agents go kind of for their readme,

19:52

for the like always-on rules and details

19:55

about the project. Uh so I think it's

19:58

critical that your project has an

19:59

agents.md with a minimal amount of

20:01

information that an agent needs to know

20:03

about, you know, what are the

20:04

conventions that we're using, what are

20:06

the commands that we're using to get it

20:08

built or tested, and like what are the

20:10

requirements around testing,

20:12

uh or requirements that we need to be

20:14

sure check off before committing.

20:17

And then skills are kind of more of a

20:18

specific workflow, right? So there's

20:21

reusable kind of playbooks for agents.

20:24

So if there's something that you're

20:25

doing a lot, you're making motion

20:28

graphics with their motion often, or

20:30

you're

20:31

um you know, doing some sort of like

20:34

uh daily or weekly or monthly change log

20:37

compiling,

20:38

those kinds of things

20:40

are great to put in as skills that an

20:43

agent can then pick up when it needs it

20:45

to do those specific kinds of workflows.

20:48

And so typically those are on demand and

20:50

you say, "Hey, let's use this skill for

20:52

this task." Versus the agents is almost

20:55

always loaded into the context for the

20:56

agent, so it knows what's going on.

20:59

And then of course, I I work at

21:01

Kilocode, and so I've got some power

21:03

user tips there,

21:04

um but I think some of these many of

21:06

these apply, you know, regardless of

21:08

which agent you're using, but I think

21:09

they're critical as you kind of get

21:11

comfortable with those first kinds of

21:13

paradigms. How do I now customize this

21:16

and make it work for me? And one is

21:18

at-mentioning for context. So mentioning

21:20

files or commits or, you know,

21:24

things from the terminal that output.

21:27

Those kinds of things and bringing them

21:28

into the context quickly are really

21:29

helpful. Uh using slash commands to do

21:32

things like starting a new task when we

21:33

need to, or condensing the context when

21:36

it's getting too full.

21:38

Uh those kind of quick commands can help

21:39

us move a lot faster.

21:41

Uh we also can, if we're working in in

21:43

VS Code uh with Kilocode, we can select

21:47

uh a section of of code and right-click

21:49

and say add to Kilocode, and then that

21:50

context is brought right in there, and I

21:53

can then talk or ask or

21:55

uh questions about the that code, or ask

21:57

the agent to change a certain part about

21:59

that code. Uh and then of course, we

22:01

have autocomplete built in as well,

22:03

which I think is still useful,

22:06

especially because we also have it not

22:07

just in code, but as you're prompting.

22:11

And then kind of beyond the IDE, I think

22:13

we're seeing, you know, also this shift

22:15

this year in, you know, where else do I

22:19

want to be able to use this? In the CLI,

22:20

from my mobile phone, in a cloud agent,

22:23

directly in Slack. Right? The ability to

22:25

kind of use these agents wherever you

22:27

are is something that's becoming more

22:30

expected

22:31

uh of of everyone and everyone's agents.

22:34

And I think that's a good thing. I think

22:35

that means that we're starting to learn

22:36

how we can use this these agents again

22:40

more like a collaborator that's

22:42

everywhere that we need to be.

22:46

And then one other thing that I want to

22:48

talk about um are is getting other

22:50

context things in. First of all, model

22:52

context protocol, right? Context is

22:55

right in the name.

22:56

Um

22:57

the idea of this is, you know,

22:59

fundamentally these models originally

23:01

can only like

23:02

it receive input tokens and create

23:04

output tokens, right?

23:06

Uh and slowly but surely we've been

23:08

enabling them to use tools where they

23:10

can, you know, make tool calls out uh

23:13

and affect things in the environment,

23:14

like running tests.

23:16

Uh the MCP, the concept of MCP basically

23:19

expands this to say, "Hey, I want to

23:21

give other tools." Right? For instance,

23:23

the GitHub MCP gives the agent a lot of

23:26

tools to interact with the GitHub API,

23:29

look up pull requests,

23:30

um look up comments, look up issues, and

23:33

understand a lot more about your your

23:35

GitHub environment, right?

23:37

Um or context seven helps it look for

23:41

up-to-date framework documentation,

23:43

because of course, as you know, the LLMs

23:45

kind of have a cutoff date where their

23:46

knowledge cuts off, and then then

23:48

anything that's improved since then they

23:49

don't know about.

23:51

Um

23:52

so these MCP servers can be very

23:54

helpful, and there's there's thousands

23:56

of them out there.

23:57

Uh but the concern is that every one of

23:59

them is going to add at least some

24:00

information, right? Details about those

24:02

tools that it has to the system prompt

24:04

that gets sent every time uh you're

24:07

having an interaction with an agent. And

24:09

so you want to make sure, if you're not

24:10

actually using that, to disable it,

24:12

right? Let's say I have a Postgres MCP

24:14

that connects to my database, and I'm

24:16

doing a whole bunch of front-end work

24:18

that doesn't involve the database at

24:19

all. Well, that Postgres MCP is just

24:21

going to be wasted tokens, and maybe

24:23

even worse,

24:24

tokens that help, you know, kind of

24:26

confuse the agent and and not understand

24:28

that it's not supposed to touch the

24:29

database right now.

24:31

Uh so we want to be really careful to

24:32

not like overuse MCPs.

24:36

And then another thing we hear from

24:37

um

24:38

enterprises a lot is how do we work with

24:40

internal platform APIs?

24:42

Uh and I think that, you know,

24:45

there's kind of four different ways of

24:46

doing that. One, if there's already an

24:48

OpenAI open API spec for it, or Swagger

24:51

spec, use that.

24:53

If there's not, then convert it to

24:54

markdown so that you can save that

24:55

markdown, you know, in the agents.md or

24:57

somewhere else in the repository to

24:59

reference it.

25:01

Uh and if it's something that changes a

25:02

little bit more frequently, maybe you do

25:04

need to have like a reference URL that

25:06

you can pull in

25:08

uh and have the agent go pull every time

25:10

to see the latest and greatest.

25:12

Uh and then we've seen some customers

25:14

who, you know, have complex multi-step,

25:15

multi-system workflows, where building

25:18

their own MCP server might be the right

25:20

choice.

25:22

But, you know, one way or another, I

25:24

think the the key is to, when working

25:26

alongside Kilo or any of these agents,

25:29

you know, isolate your work from the

25:31

agent's work, and then review that

25:33

agent's work as a pull request, right?

25:35

That helps you understand, you know, how

25:38

can I

25:39

um

25:41

best review the code just like I would

25:44

review a junior engineer's code.

25:48

And so that's really the presentation

25:50

that I have on Kilo. We've got some

25:51

exciting new features coming up. We've

25:53

got, you know,

25:55

expanded across all these surfaces.

25:58

Uh we also have a big focus on Openclaw

26:00

and Kiloclaw and making a very safe way

26:02

to use um Openclaw agents.

26:06

Uh and so if you haven't taken a look at

26:07

Kilo, I've just a little plug at the end

26:10

here, visit kilo.ai,

26:12

uh and we'd love to get your feedback on

26:14

what we're building.

26:16

And you know, just kind of to give you,

26:18

you know, where do we go from here?

26:19

Again, I think you've kind of got to

26:20

pick a tool and get lots of reps, right?

26:24

We said earlier on that, you know, it's

26:26

part art and part science, and I think

26:28

that just means you need a lot of reps,

26:30

right? To kind of get the feel for what

26:32

can I trust the models to do, and what

26:33

can't I trust the models to do.

26:36

Uh and then try this research, plan,

26:38

implement, feedback loop. See how that

26:40

works for you.

26:41

Um and I think maybe you'll end up like

26:43

some of these other senior engineers who

26:45

have said, "Hey, look, I'm having more

26:47

fun programming now than I've had in in

26:49

years and years." Uh as we, you know,

26:53

farm out some of this tedious work to AI

26:56

agents and let our brains work on the

26:58

harder engineering problems.

27:01

Thanks.

More transcripts

Explore other videos transcribed with YouTLDR.

Get the TLDR of any YouTube video

Transcribe, summarize, and repurpose videos in 125+ languages — free, no signup required.

Try YouTLDR Free