Exploration, exploitation, and thinking
The cognitive cost of easy answers, a lesson from reinforcement learning
What happens to our minds if we never have to struggle our way to an answer?
I’ve been thinking about AI and machine learning for about a decade, and one thing that still surprises me is how often the ideas spill over into everyday life.
In reinforcement learning (RL) there’s a tension between two forces: exploration and exploitation. Exploration means trying new things. Exploitation means sticking with what you already know.
The way RL agents learn is by exploring a lot in the beginning. And once they know more, they can safely exploit. But if they skip exploration, they get stuck; they just keep repeating whatever they stumbled onto first, even if it isn’t very good. An agent that exploits too early never discovers the better strategies it could have found.
Humans aren’t so different. We need a period of exploration, where we struggle and figure things out for ourselves. That’s how you build the mental muscles for judgment.
If you outsource that too early, those muscles never really develop.
And as of now, it’s never been easier to outsource. We have a superpower at our fingertips that can take away the struggle.
But the struggle is the point; that’s how you learn to think.
Skip the struggle, and you only think you’re thinking. People who lean on AI too early remember less, don’t go as deep, and fail to build real judgment. Over time, it becomes a kind of thought atrophy.1
It’s not that this must be the case; AI could help us explore more. It can help us try more experiments, more ideas, harder problems.
But it usually does the opposite. It tempts us into premature exploitation; it’s just so much easier to ask than to struggle.
That’s why so many students and young workers risk getting stuck.2 They trade the long-term reward of learning to think for the short-term reward of an easy answer.
This is the exploitation trap. You get an answer, but at the cost of the skills you need to find answers yourself, maybe even better ones. And the younger you are, the bigger the cost, because you may skip exploration entirely.
An RL agent that skips exploration never learns its environment. A generation that skips exploration may never learn to think.
Originally posted on my old blog in September 2025

