7oponaut (@7oponaut) / X

7oponaut

5,582 posts

7oponaut

@7oponaut

sharpest knife in the drawer

Joined January 2024

Pinned
7oponaut
@7oponaut
Nov 21, 2024
Consciousness is crazy man I'm this little guy in the pilot seat, making all the decisions
2.5K
7oponaut
@7oponaut
Aug 5, 2024
Replying to @hourly_shitpost
If nobody has 4 kids except Indians then there's no counterexample
77K
7oponaut
@7oponaut
Jul 24, 2024
Replying to @ShitpostRock
Vibes and fan service
81K
7oponaut
@7oponaut
Jul 3, 2024
Replying to @TheERDoctor
How does one reintroduce a disease into a vaccinated country?
125K
7oponaut
@7oponaut
May 18, 2024
Replying to @Nimbopill
👀
14K
7oponaut
@7oponaut
Apr 10, 2024
New GPT-4 passes the magic elevator test
63K
7oponaut
@7oponaut
Jun 19, 2024
Seems like the MCTSr authors did use ground truth information in the MCTS refinement process. They use the LLM for determining the rewards, but the search terminates when the output is equal to the GT. While a similar method could be used as an RL environment to train agents
nano
@nanulled
Jun 16, 2024
Replying to @teortaxesTex
Nothing surprising tbh, I've tried it 2 days ago before noise The thing is that it's literally bruteforcing answers from llm. What if we don't know the ground truth? It will fail miserably at those tasks.
178K
7oponaut
@7oponaut
Jul 20, 2024
Replying to @bryancsk
The stick person would obviously slip to the bottom and then it looks exponential. Follow me for more math tips.
17K
7oponaut
@7oponaut
Jun 12, 2024
Months of worldbuilding casually destroyed
Tsarathustra
@tsarnick
Jun 12, 2024
Mira Murati says the AI models that OpenAI have in their labs are not much more advanced than those which are publicly available
00:00
31K
7oponaut
@7oponaut
Sep 15, 2024
Replying to @s_streichsbier
It converted a natural language description of the code into python. It could do this because the code already existed in the first place.
13K
7oponaut
@7oponaut
Feb 29, 2024
I am told 350°F/175°C is close to the point of discomfort. Only issue: the method is imprecise
15K
7oponaut
@7oponaut
May 31, 2024
I call it the Infinitripper
00:00
34K
7oponaut
@7oponaut
May 26, 2024
I have done unspeakable things to get to this point
00:00
139K
7oponaut
@7oponaut
Aug 7, 2024
Replying to @karpathy
Another difference is that RLHF doesn't do proper exploration: it mostly learns to exploit a subset of the pretraining trajectories. In contrast, when doing proper RL the discrete action distribution is usually noised by adding an entropy term to the loss function.
43K