×
all 40 comments

[–]RepostSleuthBot[M] [score hidden] stickied commentlocked comment (0 children)

Looks like a repost. I've seen this image 3 times.

First Seen Here on 2024-04-19 82.03% match. Last Seen Here on 2025-08-27 84.38% match

View Search On repostsleuth.com


Scope: This Sub | Target Percent: 75% | Max Age: None | Searched Images: 1,098,813,021 | Search Time: 2.88607s

[–]Glittering_Poem6246 115 points116 points  (11 children)

Programmers in 2030, "Claude build me a billion dollar business app".

[–]geldersekifuzuli 75 points76 points  (8 children)

Lead Data scientist here. I trained many small models. You need carefully annotated data to train a small model. If annotation is done by another team, you need to train them about what your classes mean, how should they decide in edge cases. After a few iterations, you will see that there are under represented classes. So, you will ask annotators to annotate more data from these classes.

This process can take up to 6 months depending on the project.

Time is money. Your data scientist's 6 months of salary is probably more expensive than running an LLM for such a task. You can adjust your LLMs behavior a lot easier with promoting.

Plus, LLM solution can be ready for production a lot faster. Shipping a working solution faster is a big deal for many organizations. Your projects have deadlines. Your managers and your team can be under time pressure. Yes, the world is not perfect.

Training a small model and put it in production is more compute efficien, for sure. But, It doesn't mean it's the best way to do it in the bigger picture.

More from r/ProgrammerHumor

  Hide

Comments, continued...

[–]InTheEndEntropyWins 4 points5 points  (0 children)

For some small domain specific classification SVM can give better results, is faster and cheaper than a LLM.

[–]Top_Meaning6195 6 points7 points  (3 children)

You're not a real programmer if you use garage collection.

[–]Grandmaster_Caladrel 11 points12 points  (0 children)

Thank goodness I just have the one. It's a small two-car though, so I'm not as serious as those 10x developers who bike to work.

[–]WavingNoBanners 3 points4 points  (1 child)

Upvoting this because I know you meant garbage collection but what you said is far funnier.

[–]ProfBeaker 2 points3 points  (0 children)

Spotted the guy that has 5 garages for some damn reason. :P

[–]extremelySaddening 1 point2 points  (2 children)

"LSTM with BERT embedding model" yeah meme-maker does NOT know wtf they are talking about

[–]not-ekalabya 0 points1 point  (1 child)

Instead of:

Text → BERT → LSTM → Dense → Output

You usually do:

Text → BERT → Dense → Output

It is called HUMOR in programming to over complicate stuff with no reason.

[–]extremelySaddening 0 points1 point  (0 children)

"Usually do" is funny lmao, you would never do the first, because you're fitting a square peg in a round hole. Because if you have embeddings already generated by bert, then, pray tell, what the fuck do you want the lstm to do?

It implies the meme maker doesn't know what an LSTM is, which then is funny because the meme acts like they do. And therefore I am making fun of them.

[–]Thick-Protection-458 0 points1 point  (6 children)

Nah, BERT itself can be tuned to do classification.

But to train it - you need big enough dataset. While LLMs (not necessary openai ones, not even big) may be a good few-shot style start.

[–]MissinqLink 4 points5 points  (5 children)

I love that young people seem to be rediscovering BERT like it’s a long lost relic. It was new not very long ago.

[–]Thick-Protection-458 2 points3 points  (4 children)

> I love that young people seem to be rediscovering BERT like it’s a long lost relic. It was new not very long ago.

Well, funnily enough - some parts of NLP-related stuff changed so much so I can kinda relate. "I was here, Gandalf... 3000 years ago", lol.

[–]x0wl 2 points3 points  (3 children)

BERT literally has almost the same architecture as any transformer-based generative LLM (I mean, it's literally in the name). The only difference is that the attention goes in both directions instead of just forward in decoder only models.

Also using LSTM with BERT doesn't make much sense, since the whole reason for transformers to exist is to address training issues in LSTM, but whatever.

[–]Thick-Protection-458 0 points1 point  (2 children)

Yeah, technically you can freeze base encoder (already capable of some language tasks) and make LSTM-head on top of that.

But...

- Why make head LSTM-based, not self-attention based?

- Why not tune BERT itself? (For some cases this will make sense, but in general case you can as well just tune encoder + some linear heads).

[–]x0wl 0 points1 point  (1 child)

BERT is the encoder with self attention, it's what the E stands for :)

What you typically do is stick a [CLS] token in the beginning of your sentence, a single layer classifier connected to that token's embedding in the output, and then fine tune either the whole thing, or a couple top layers of BERT + the classifier.

Bert is only 150m, doing full ft is super cheap

[–]Jonny_dr 0 points1 point  (0 children)

Yeah, and LSTMs sucked ass. There is a reason why the general public knows about LLMs but not LSTMs.

[–]GlitteringLaw3215 0 points1 point  (0 children)

then we fought segfaults with gdb, now we pray copilot doesn't invent new ones