AI and Intermittent Reinforcement

I realized last night that my experience with AI agents resembles a variable ratio positive reinforcement schedule. Or perhaps more accurately, it literally is a variable ratio reinforcement schedule. Given that LLMs behave in a non-deterministic, stochastic manner, it makes sense it would create this kind of interaction, where sometimes you get something close to what you desired, and other times you don’t.

I feel rewarded using AI tools; especially at first. But I remain skeptical they’re actually effective to use in as many situations as I try them. I convince myself, “I’m doing research on the limits and usefulness of these new tools,” but can I actually tell when I have gone beyond research, and entered into the sunk cost fallacy?

I know from modern game design that VRR is one of the most effective techniques for creating compulsion and addiction. But I still feel foolish, because awareness of what’s happening doesn’t prevent me from falling victim to it. And I felt worse knowing that others who interact with these tools fall prey to them, too; maybe most of us do.

I feel excited by the local possibilities and opportunities AI opens up, but deeply worried about global detrimental effects economically, socially, environmentally, and existentially. Yet, my brain wants to sidestep these deep concerns because the VRR schedule feels rewarding and trying to fathom second- and third-order effects does not. I don’t believe I’m alone in this experience. We humans are brilliant, and stupid.