RLHF Yourself

Sep 08, 2025

*Waterhouse’s St. Joan (as in Joan of Arc)*

We’re constantly reacting to reward signals from the world around us - what our friends value, what we get promoted for, what the algorithm favors. This is why people advise choosing friends and partners with “shared values”, because you want to be surrounded by reward signals you align with and have intentionally chosen.

RLHF = reinforcement learning from human feedback. It’s used to train AI models on what kinds of outputs human beings prefer, rewarding responses that humans find better and penalizing ones humans find worse. In the same way that machines learn from human feedback, we do too.

It’s basic but machine analogies are indeed great ways to understand ourselves. It’s often easier to have awareness about other people (or machines) than about ourselves. There’s less friction. Once we notice behaviors or patterns in others, it’s much easier to apply those realizations back to ourselves.

The adage you are the 5 people you spend most time with holds true because it’s those people who set your reward signals. Humans are creatures of great osmosis. We learn based on what our surroundings aspire to and reward.

If you work at a big company you’re often rewarded for “getting buy-in”, “navigating hierachy”, and “writing specs”. If you’re at a startup you’re rewarded for “taking initiative” and “shipping fast”.

Once you’re aware of these reward signals though, you can literally RLHF yourself by surrounding yourself with inspiration. To start, you need clear definitions of your values and goals. This is why role models are useful. They make abstract values and goals very clear, a concrete “eval” to aspire to, Ayn Rand’s concept of the “upward gaze”.

We are defining ourselves continuosly, constantly. It’s critical to consciously choose your own positive and negative signals and to be around people who will help you reinforce those. You might notice you act differently in different environments either consciously or subconsiously because different things are rewarded in different environments. The “self” that emerges at work can be different than the “self” that emerges with college friends. Noticing these deltas and choosing which parts of the “self” to cultivate is in many ways what makes us who we are. You’re a byproduct of your environment but you can control your environment.

When we approach forks, it’s those decisions influenced by environment that create our reality. And there are so many of these forks. It’s never been a better to travel but its also never been a better time to build. It’s the best time to explore but it’s also the best time to settle down. These contradictions often come down to optionality versus conviction. And when two contradictions are both equally appealing choosing one is painful. Jony Ive calls this focus.

You can achieve so much when you truly focus.
One of the things Steve would say [to me] because he was worried I wasn’t focused — he would say how many things have you said no to?
I would tell him I said no to this. And I said no to that.
But he knew I wasn’t interested in doing those things.
There was no sacrifice in saying no.
What focus means is saying no to something that with every bone in your body you think is a phenomenal idea, you wake up thinking about it, but you say no to it because you are focusing on something else.

By focusing, by being intentional about reward signals and role models, we can create environments that reinforce our values. And we owe it to ourselves to do so if we truly believe in our personal missions.

Daniel Popescu / ⧉ Pluralisk

Oct 18

Wow, the way you connect RLHF to our own reward systems in daily life truely stood out to me. It makes me wonder if having that self-awareness of our internal 'loss functions' is the first step towards fine-tuning our own human models for desired outcomes.

Expand full comment

Incipio

Discussion about this post