Stories by Lazy Programmer on Medium

Is Classic / Traditional Machine Learning Dead?

Lazy Programmer — Mon, 02 Mar 2026 15:01:01 GMT

We have all seen that AI can write code.

[NEW COURSE] Cutting-Edge AI: Deep Reinforcement Learning in PyTorch (v2)

Lazy Programmer — Wed, 25 Feb 2026 14:31:01 GMT

Hello friends!

Don’t want to read my little spiel? Just get the course here: https://deeplearningcourses.com/c/deep-reinforcement-learning-ddpg-td3-in-pytorch

Deep Reinforcement Learning is one of the most exciting (and most misunderstood) areas of AI.

It’s the technology behind agents that can learn to play games, control robots, and make decisions in complex environments where there is no obvious “correct answer.”

And now, I’m excited to officially announce the release of my newest course:

Cutting-Edge AI: Deep Reinforcement Learning in PyTorch (v2)

Build AI agents using Reinforcement Learning in PyTorch: DDPG, TD3, SAC, + More

This is Version 2 — and I’m not exaggerating when I say:

This is a complete rebuild from scratch.

A full redesign meant to match how Deep RL is done today.

Why Deep Reinforcement Learning Matters Right Now

Most machine learning courses focus on supervised learning:

Here’s some labeled data
Train a model
Predict outputs

But the real world doesn’t work like that.

In the real world, intelligent systems must learn through:

trial and error
delayed rewards
uncertainty
noisy feedback
risk and long-term planning

That’s where Reinforcement Learning dominates.

Deep RL is the core technology behind:

robotics simulations
autonomous agents
AI strategy systems
decision-making under uncertainty
algorithmic trading and portfolio risk control
continuous control environments

If you’ve ever wanted to build an AI agent that actually learns, this is the skillset.

Why This Course Exists (And Why Version 2 Is a Big Deal)

The original version of this course was built using TensorFlow 1.

At the time, it made sense.

But the world moved on.

And so did I.

So I rebuilt the entire course using the modern DRL stack:

What’s New in Version 2?

1. Fully PyTorch Native

TensorFlow 1 is clunky. Graph-based. Awkward.

PyTorch is what researchers and serious practitioners use today.

So this course is now written with:

clean PyTorch code
modern training workflows
readable implementations you can actually modify

No black-box libraries. No magic functions.

You’ll understand what’s happening at every step.

2. MuJoCo Is Now Free (And We Use It)

MuJoCo is one of the most powerful physics engines ever built for robotics simulation.

It used to be expensive.

Now it’s open-source and free.

Which means you can now train agents in realistic continuous control environments — without needing expensive proprietary tools.

This course shows you how.

MuJoCo screenshot

3. Refined Explanations (Less Pain, More Clarity)

Deep RL has a reputation for being:

math-heavy
confusing
unstable
“impossible to implement”

That’s because most explanations are terrible.

This course is designed to make the key ideas intuitive:

why Bellman equations matter
why policy gradients work
what “overestimation bias” really means
how to debug agent training

The goal is simple:

You should actually understand what you’re building.

What You’ll Build and Master

This course bridges the gap between academic theory and production-ready implementation.

You will not just run prewritten scripts.

You will build real deep reinforcement learning agents from scratch.

Part 1: The Foundations (The RL Brain)

Before we jump into deep networks, we make sure your fundamentals are rock solid.

You’ll master:

Markov Decision Processes (MDPs)
returns and value functions
Bellman equation (the heart of RL)
policy improvement concepts
Monte Carlo vs TD-learning / Q-learning

If you’ve ever felt like RL was “hand-wavy”… this section fixes that.

Part 2: Deep Deterministic Policy Gradient (DDPG)

DDPG is where deep reinforcement learning gets serious.

It’s one of the foundational algorithms that brought RL into:

continuous action spaces.

Instead of choosing from a small list of actions like DQN…

DDPG lets an agent output real-valued controls like:

throttle values
joint rotations
torque outputs
position adjustments

This is how robots actually move.

And in this course, we go deep:

DDPG theory (3 parts)
DDPG implementation in PyTorch (3 parts)
actor-critic structure
replay buffer mechanics
target networks
exploration noise
training stability tricks

You’ll understand it and implement it.

Part 3: TD3 (Twin-Delayed DDPG)

If you’ve ever tried DDPG before, you know the problem:

It can be unstable and unreliable.

That’s why TD3 exists.

TD3 is the “fixed” version of DDPG — and it’s one of the most important modern DRL algorithms.

In this section you’ll learn:

clipped double-Q learning
delayed policy updates
target policy smoothing
how TD3 reduces overestimation bias

And of course…

You’ll implement TD3 in PyTorch from scratch.

Part 4: A Preview of SAC (Soft Actor-Critic)

SAC is one of the most popular modern DRL algorithms today.

We include a preview so you understand:

entropy maximization
stochastic policies
why SAC is so stable

This gives you a pathway toward more advanced RL systems.

The VIP Project: Algorithmic Trading with Reinforcement Learning

This is where everything comes together.

In the VIP project, you will build a complete custom environment for:

Position Sizing in Algorithmic Trading

This isn’t a toy example.

You will code the environment from scratch and train an agent to make risk-sensitive decisions like:

when to increase exposure
when to reduce risk
how to handle drawdowns
how to maximize returns over time

Then you’ll deploy:

DDPG
TD3

…to solve it.

This is one of the most valuable real-world applications of RL because it forces you to think about:

long-term reward
risk management
noisy environments
stochastic dynamics
non-stationary behavior

In other words:

exactly what reinforcement learning was built for.

Key Highlights (What Makes This Course Different)

This course doesn’t just dump code on you.

It teaches you how to think like an RL engineer.

You’ll learn:

Gymnasium fundamentals
vector environments
autoreset paradigm
DQN review (for context and foundations)
DDPG deep dive (theory + implementation)
TD3 implementation (state-of-the-art continuous control)
algorithmic trading environment design
agent training and debugging techniques
plus a return to evolution strategies

By the end, you’ll have a skillset that’s actually useful in real AI work.

Who This Course Is For

This course is designed for:

‍programmers who want to build real AI agents
data scientists who want to expand beyond supervised learning
machine learning engineers who want to understand modern RL
robotics enthusiasts who want continuous control skills
quants and traders who want to explore RL in finance

And especially:

People who are tired of plug-and-play tutorials.

If you want to understand what’s happening under the hood…

This course is for you.

What You Need Before Starting

This is an intermediate-to-advanced course.

You should be comfortable with:

Python
NumPy
neural networks
PyTorch fundamentals
probability/statistics basics
calculus fundamentals
basic RL concepts (MDPs, TD learning)

If you’ve taken other ML / deep learning courses before, you’ll be in a great position.

Why You Should Enroll Now

Deep Reinforcement Learning is not a “future skill.”

It’s happening right now.

And if you can build RL systems from scratch, you’re entering a category of AI that most engineers never reach.

This course gives you:

modern PyTorch-based implementations
practical continuous-control algorithms
real projects
a finance-based capstone
clean explanations designed for self-learners

It’s the course I wish existed when I first started learning deep RL.

Ready to Build Real AI Agents?

If you’ve been waiting for the right time to learn deep reinforcement learning…

This is it.

Cutting-Edge AI: Deep Reinforcement Learning in PyTorch (v2) is available now.

You’ll learn how to build agents that can learn, adapt, and optimize decisions in complex environments — using the same algorithms powering modern AI breakthroughs.

Enroll now and start building the next generation of intelligent systems.

(And if you’ve taken my previous courses, you already know: this one is going to be a game changer.)

[NEW COURSE] Reddit Comment Score Prediction Using AI

Lazy Programmer — Wed, 18 Feb 2026 14:11:00 GMT

Hello friends!

Don’t want to read my little spiel? Just get the course here: https://deeplearningcourses.com/c/machine-learning-social-media-marketing

One useful application area for data science and machine learning is marketing.

People often think of things like A/B testing landing pages, email subject lines, or YouTube thumbnails. In NLP, a classic example is “article spinning” (which I cover in other courses), although these days that application has mostly been subsumed by large language models.

Recently, I wanted to explore a different question:

Can we predict how well a Reddit comment will perform before we post it?

In other words, given a piece of comment text (which could even be generated by an LLM), can we predict whether Reddit users will upvote or downvote it — and by how much?

That question turns social media engagement into a concrete, measurable machine learning problem. And that’s exactly what this new course is about.

What This Course Actually Does

This is a short, project-based course where we build real models that predict Reddit comment scores.

There’s no theory for theory’s sake here. Instead, I walk through the same steps I would take for a real ML project:

Downloading and inspecting a dataset
Defining the problem properly
Trying multiple modeling approaches
Evaluating what actually works (and what doesn’t)

Along the way, we compare three different approaches:

Classical ML + NLP baselines
Fine-tuned transformer models (Hugging Face Transformers)
Zero-shot prediction using a state-of-the-art LLM (“Generative AI”)

The goal is not just to build a model, but to understand how these approaches compare in practice.

What You’ll Build

By the end of the course, you’ll have a complete ML pipeline that can:

Predict whether a Reddit comment will get a positive or negative score
Predict the actual score value of a comment
Compare traditional ML, fine-tuned transformers, and modern LLMs side-by-side

This isn’t a toy demo — you’re solving a real problem with real data.

What You’ll Learn Along the Way

Some of the practical questions we tackle include:

How to fine-tune a transformer for classification and regression
How to use an LLM for zero-shot prediction, without collecting or training on a dataset
How to build strong classical baselines, and why they still matter
How to think about feature design when using transformers, where everything feels “pre-built”

We also run into real-world issues that don’t show up in simplified examples, such as:

Reddit scores following a power-law distribution (what does that mean for regression?)
Using more than just the comment text (parent comment, parent score, subreddit, etc.)
Choosing modeling approaches that fit into a general ML project workflow, not just this one task

Why This Course Is Different

Short and focused — no filler, no unnecessary theory
End-to-end project — from raw data to predictions
Modern tools — transformers, Hugging Face, and LLMs
Real-world relevance — ML meets social media marketing

You’ll walk away with skills that apply not just to Reddit, but to any text scoring, engagement prediction, or NLP analytics problem.

Who This Course Is (and Isn’t) For

This course is for:

Beginners to intermediate Python users interested in ML and AI
Marketers and creators curious about data-driven social media optimization
Developers and data scientists who want a hands-on NLP project

It’s not a theory-first course. If you don’t already know things like loss functions, fine-tuning, or basic text preprocessing, you may need to look those up as you go. That’s intentional — this course is about doing.

Final Thoughts

If you’ve ever wondered whether modern AI can actually help you write better-performing content — and how it compares to models you train yourself — this course answers that question in the most direct way possible.

If you’re looking for a quick, no-BS walkthrough of an extremely practical ML/AI project, this one is for you.

You can check out the course here:

LLM-as-a-Judge: Goodbye BLEU Scores and ROUGE Metrics

Lazy Programmer — Wed, 31 Dec 2025 04:29:05 GMT

Evaluating machine learning models is pretty simple when your output is just a number or a category (choose 1 of K), or just a yes/no. But…

Continue reading on Medium »

AI Safety is Dead: Game Theory Explains Why

Lazy Programmer — Mon, 27 Oct 2025 00:51:11 GMT

In this blog post, we take a game-theoretic perspective to explain why prioritizing AI safety — while well-meaning — is not a rational…

Continue reading on Medium »

[NEW COURSE] Evolutionary AI: Deep Reinforcement Learning in Python (v2)

Lazy Programmer — Sat, 04 Oct 2025 06:39:22 GMT

Deep reinforcement learning (RL) has given us some of the most jaw-dropping breakthroughs in AI — from robots that can walk and run, to AlphaGo defeating world champions. But if you’ve ever tried implementing these algorithms yourself, you’ve probably hit the same roadblocks many others have: exploding gradients, unstable training, and endless hyperparameter tuning.

That’s where a new paradigm comes in: Evolutionary AI.

Instead of relying on gradients, evolutionary methods treat learning like an optimization problem. Think of it like natural selection for neural networks — simple, elegant, and often far more scalable than traditional RL.

And now, there’s a course designed to teach you exactly how to harness these ideas.

Don’t want to read all this? Just get the course here: https://deeplearningcourses.com/c/evolutionary-deep-reinforcement-learning

The Udemy version can be found here (this coupon expires Nov 2, 2025): https://www.udemy.com/course/evolutionary-deep-reinforcement-learning/?couponCode=EVOVIP

Why Evolutionary AI?

Most RL courses focus on policy gradients or Q-learning. While powerful, these methods can be brittle in practice. Evolution Strategies (ES) and Augmented Random Search (ARS) offer a refreshing alternative:

Gradient-free learning — No backprop headaches, vanishing gradients, or unstable losses.
Simplicity with power — Fewer moving parts, yet results that rival state-of-the-art deep RL.
Scalability — Perfect for parallelization, making them ideal for real-world environments.

In this course, you’ll master both ES and ARS, coding them from scratch in Python, and applying them to problems where they shine.

What You’ll Build

This isn’t a “theory-only” course. You’ll be building agents that learn, adapt, and perform in some of the most exciting domains for AI today:

Robotics with MuJoCo
Train agents to walk, run, and jump inside realistic physics simulations. Watch your neural network–powered robot figure out how to balance and move — it’s as rewarding as it sounds.

Algorithmic Trading
Apply evolutionary RL to trading strategies, where gradients are difficult to define. See how ES and ARS adapt naturally to the noisy, unpredictable world of financial markets.

These aren’t toy problems — they’re real-world applications that push the boundaries of what AI can do.

Why This Course Stands Out

There are plenty of RL courses out there. So what makes this one different?

Version 2 updates: Streamlined explanations, cleaner code, and updated libraries like Gymnasium.
Hands-on first: You won’t just read about algorithms — you’ll implement them step by step.
Beginner-friendly, expert-ready: A full review section for newcomers, with advanced deep dives for experienced learners.
Instructor experience: Created by a teacher who has guided hundreds of thousands of students into AI and machine learning.

By the end, you won’t just “know about” evolutionary RL — you’ll have working implementations you can extend for research, projects, or your portfolio.

Who This Course Is For

This course is designed for anyone who wants to explore the frontier of AI:

Machine learning enthusiasts eager to go beyond supervised learning
Software developers and engineers building intelligent agents
Finance professionals applying RL to trading and portfolio optimization
Game developers training adaptive AI for complex behaviors
Robotics practitioners exploring sequential decision-making
Students and researchers looking for real RL experience
Entrepreneurs and hobbyists experimenting with cutting-edge AI
Professionals switching careers into AI/ML and building portfolio-ready projects

If you want to future-proof your skills and work on some of the most exciting challenges in AI, this course will take you there.

Ready to Evolve Your AI Skills?

The world of AI is moving fast — and evolutionary reinforcement learning is becoming one of the most practical, efficient, and exciting approaches out there.

Whether you’re building robots, designing trading systems, or just curious about the future of machine intelligence, Evolutionary AI: Deep Reinforcement Learning in Python (v2) gives you the tools and hands-on experience you need.

So what are you waiting for?

I can’t wait to see what you’ll build. If you’ve ever wanted to teach machines to learn — this is your chance.

– Lazy Programmer

How Does AlphaEvolve Work? An Intuitive Guide

Lazy Programmer — Thu, 24 Jul 2025 10:02:35 GMT

After trying to read the white paper, I decided it’d be nice to have a short, simple, and intuitive article on how AlphaEvolve actually…

Continue reading on Medium »

[NEW VIP COURSE] Advanced AI: Deep Reinforcement Learning in PyTorch (v2)

Lazy Programmer — Sat, 17 May 2025 06:14:31 GMT

I’m excited to announce the release of my latest course
🎓 Advanced AI: Deep Reinforcement Learning in PyTorch (v2)

Don’t want to read my little spiel? Just click here to get the course: https://deeplearningcourses.com/c/deep-reinforcement-learning-in-pytorch

If you’ve ever been curious about how AI can teach itself to solve complex tasks — from playing video games to controlling robots — this course will show you exactly how it works, and how you can build it yourself.

🧠 What is Reinforcement Learning (RL), and Why Should You Care?

Reinforcement Learning is one of the most fascinating fields in AI. It’s the technology behind many high-profile breakthroughs:

AlphaGo and AlphaZero: Superhuman performance in board games
OpenAI’s Dota 2 bot: Mastering a complex team-based strategy game
Self-driving cars, robotics, finance, recommendation systems, and more

At its core, RL is about agents learning from experience — just like humans. They take actions, observe outcomes, and adapt to maximize rewards over time.

🎓 What You’ll Learn in This Course

This course is designed to take you from theory to practice, covering both classical and deep reinforcement learning techniques:

📘 Preliminaries

Learn the core concepts: rewards, value functions, the Bellman equation, and policies
Master foundational algorithms like Q-Learning, Temporal Difference (TD) learning, and Monte Carlo methods

💻 Coding RL from Scratch

Get hands-on with Python and Gymnasium (OpenAI’s environment toolkit)
Implement Q-Learning and understand the role of vectorized environments and auto-reset

🤖 Deep Reinforcement Learning (DQN)

Build Deep Q-Networks with experience replay and target networks
Implement DQN in Python and understand the role of decreasing epsilon in exploration

🧭 Policy Gradient Methods & A2C

Learn about policy optimization, actor-critic models, and entropy regularization
Implement the Advantage Actor Critic (A2C) algorithm from scratch

🕹️ Real Projects: Training Agents on Atari Games

Use Stable Baselines 3 and special environment wrappers to train agents to play Atari games
Build both DQN and A2C agents that can learn to play games from pixels

💹 Bonus: Algorithmic Trading with A2C (VIP Section)

For those enrolled in the VIP version of the course, I’ve included a powerful bonus section:
Multi-Period Portfolio Optimization using A2C (Advantage Actor-Critic).

In many of my other courses (like TensorFlow 2, PyTorch, Financial Engineering, and Pairs Trading), we’ve built simple trading agents — often using basic inputs like historical returns, and typically focusing on single-asset strategies.

This new section takes things to the next level.

You’ll learn how to train an agent that:

Uses technical indicators as input features
Allocates portfolio weights across multiple assets instead of just buying/selling one
Makes decisions on a multi-period basis (e.g. monthly or quarterly rebalancing)
Learns entirely from experience — no assumptions about future returns or covariances

We go beyond the limitations of traditional finance models. While Markowitz Portfolio Theory assumes known return statistics and optimizes for just one period, our A2C-based approach learns in a more realistic environment where future market behavior is uncertain and evolving.

This is practical, modern portfolio management — powered by reinforcement learning.

🔄 What’s New in Version 2?

Updated and cleaned-up code for better readability and maintainability
Improved explanations and more structured lessons
Compatibility with modern libraries like Gymnasium and Stable Baselines 3
Additional implementation examples and optional deep dives

👩‍💻 Who This Course is For

This course is designed for:

Machine learning engineers looking to dive into RL
Students and researchers working on AI projects
Developers and hobbyists who want to build agents that learn
Anyone who wants to understand how AI can learn from its environment

Whether you’re a complete beginner to RL or looking to sharpen your deep RL skills, this course has something for you.

🎉 Ready to Get Started?

👉 Enroll now and start building your own reinforcement learning agents:

I can’t wait to see what you’ll build. If you’ve ever wanted to teach machines to learn — this is your chance.

– Lazy Programmer

Recommender Systems Train-Test Split

Lazy Programmer — Mon, 12 May 2025 07:10:03 GMT

Recently, a student of mine had a concern about how we were doing train-test splitting in my Recommender Systems course.

Continue reading on Medium »

[COURSE] Hidden Markov Models in Python (Unsupervised Machine Learning)

Lazy Programmer — Fri, 13 Dec 2024 08:37:28 GMT

Course image for HMMs in Python

Don’t want to read the whole article? Just get the course here: https://deeplearningcourses.com/c/unsupervised-machine-learning-hidden-markov-models-in-python

Understanding sequences is at the heart of solving many real-world problems. From stock prices and credit scores to language and user behavior, sequences are everywhere. This course, Unsupervised Machine Learning: Hidden Markov Models in Python, equips you with the tools to analyze and model sequence data effectively using the power of Hidden Markov Models (HMMs).

Why Learn Hidden Markov Models?

Sequences contain invaluable information. Imagine reading this blog backward — the same words but in a different order. It wouldn’t make sense. That’s the essence of sequence analysis: understanding the importance of order. While today’s deep learning trend leans heavily on recurrent neural networks, Hidden Markov Models have been a cornerstone of sequence modeling for decades. This course bridges the gap between classical methods and modern innovations.

What You Will Learn

Building on the foundational concepts introduced in “Unsupervised Machine Learning for Cluster Analysis,” this course takes you deeper into probabilistic modeling:

Measure Sequence Probability: Learn how to calculate the probability distribution of sequences of random variables.
Optimize HMM Parameters: Discover how gradient descent — the backbone of deep learning — can be applied to HMMs as an alternative to expectation-maximization.
Hands-On Libraries: Work with powerful libraries like TensorFlow to implement HMMs, preparing you for future topics like recurrent neural networks and LSTMs.

Real-World Applications of HMMs

This course is packed with practical examples that demonstrate the versatility of Hidden Markov Models:

Health Predictions: Model sickness and recovery to predict recovery durations.
SEO Insights: Analyze user interactions on websites to identify and fix high-bounce-rate pages.
Language Modeling: Build models to identify writers, generate text, or even automate writing tasks. The precursor to today’s powerful LLMs like ChatGPT, Llama, and Claude.
Google’s PageRank: Explore how Markov models contribute to search engine algorithms.
Biological Insights: Use HMMs to decode how DNA translates into physical or behavioral traits.
Creative Applications: Generate images, improve smartphone suggestions, and more.

Learn by Doing

This course is not about memorizing APIs or reading documentation. It’s about understanding models deeply and experimenting to see how they work internally. Through hands-on exercises using Numpy, Matplotlib, and Tensorflow, you’ll develop practical skills and intuition for sequence modeling.

All course materials are free to download, and you’ll have direct access to the instructor for support along your journey.

If you’re ready to go beyond the surface and truly understand machine learning models, this course is your next step. Join us and unlock the power of Hidden Markov Models to tackle complex sequence data.

Enroll Today

Take the leap into advanced sequence analysis. Whether you’re a data scientist, engineer, or machine learning enthusiast, this course will elevate your skillset and open doors to new possibilities. See you in class!