Stories by Ali Nawaz on Medium

Making Financial Rules Portable (Without Losing Your Mind)

Ali Nawaz — Wed, 06 May 2026 18:30:34 GMT

Abstract: Banks waste millions rewriting the same business rules over and over. A trading rule starts in Excel, gets rebuilt in Java, then Python, then SQL. When something changes, teams update each version separately and often miss one. The result? Systems that don’t match, audits that fail, and nobody knowing which version is right.

Morphir fixes this. Write your financial rules once in a simple language called Elm. Morphir then creates versions for every system you need automatically. Same rule, different formats, zero manual copying. Morgan Stanley built it to solve their own mess and shared it with everyone through FINOS. This post explains what it does, why it matters, and whether you should care.

If you work in finance, you’ve probably seen this movie before: a business analyst writes the logic for a new trading rule in Excel. A developer rewrites it in Java. Another team needs the same rule in Python for their risk system. Six months later, nobody’s sure which version is “correct” anymore.

This isn’t just annoying. It’s expensive, error-prone, and sometimes dangerous when millions of dollars ride on calculations that should match but don’t.

Enter Morphir, a FINOS project that’s trying to solve a problem most people don’t even realize exists: making business logic portable across systems, languages, and teams without breaking everything in the process.

The Problem: Lost in Translation

Here’s a scenario that plays out daily in banks and financial institutions:

Your compliance team defines a new regulation rule. They understand the business side perfectly but can’t code. So they write it in plain English, maybe throw it in a Word doc or Excel spreadsheet.

The development team picks it up, interprets it, and codes it in Scala for the trading system. Then another team needs the same rule for reporting, so they rewrite it in SQL. The data science team wants it for modeling, so they build it again in Python.

Now you have four versions of the “same” rule. When the regulation changes, you update all four. Except someone misses the SQL version, and now your reports don’t match your trades. Auditors show up. Everyone panics.

This isn’t a hypothetical. This is Tuesday.

What Morphir Actually Does

Morphir takes a different approach. Instead of writing your business logic directly in Java, Python, or whatever language your system uses, you write it once in a language called Elm. Then Morphir translates it into whatever you need.

Think of it like writing a recipe once, then having it automatically translated into French, Spanish, and Japanese. Same recipe, different languages, no manual rewriting.

But here’s the clever bit: Elm isn’t just any language. It’s designed to be really hard to mess up. No null pointer exceptions. No runtime errors if you do it right. This matters when you’re dealing with financial calculations where a misplaced decimal can cost millions.

So the workflow looks like this:

Business analysts and developers collaborate to write the logic in Elm
Morphir takes that Elm code and generates an intermediate representation (IR)
From that IR, Morphir can generate code in Scala, TypeScript, JSON schemas, or even documentation
Everyone uses the generated code, and there’s only one source of truth

Why This Matters More Than You Think

Let’s talk about what this unlocks.

Same logic, different systems. You define a pricing model once. Morphir generates the Scala version for your backend, the TypeScript version for your web app, and the documentation for your compliance team. They all match because they came from the same source.

Version control for business rules. Since your logic lives in code (Elm), you can use Git. You can see who changed what, when, and why. You can roll back if something breaks. Try doing that with an Excel spreadsheet passed around in email.

Testing becomes possible. When your business logic is in Elm, you can write tests. Real tests. Not “let me manually check this in production” tests. You can verify that your interest calculation works correctly before deploying it to a system managing billions of dollars.

Talking the same language. Ever tried explaining a technical concept to a business analyst, or vice versa? Morphir creates a middle ground. The Elm code is readable enough that business folks can follow along, but rigorous enough that it compiles into production code.

The Real-World Use Case: Morgan Stanley

Morgan Stanley didn’t build Morphir because they thought it would be cool. They built it because they had a genuine problem.

They had business logic scattered across multiple systems and languages. Every time a rule changed, they had to hunt down every implementation and update them all. They missed things. Systems fell out of sync. It was a mess.

Morphir came out of that pain. They open-sourced it through FINOS because they realized other banks had the same problem. And they were right.

How It Actually Works (The Less Boring Technical Part)

Morphir uses something called an Intermediate Representation. You don’t need to understand the deep computer science here, but the idea is simple:

Instead of translating directly from Elm to Java, then Elm to Python, then Elm to whatever else you need (which means building a translator for every combination), Morphir translates Elm into a neutral format first. Then it translates that neutral format into your target language.

It’s like how Google Translate doesn’t have a direct translator for every language pair. It uses an intermediate step to make the whole thing manageable.

This IR is also queryable. You can ask it questions like “which functions use this data type?” or “where is this business rule used?” Try doing that with regular code.

The Catch (Because There’s Always a Catch)

Morphir isn’t a magic wand. You can’t just point it at your existing codebase and expect everything to work.

First, you need to rewrite your logic in Elm. That takes time. It takes learning a new language (though Elm is pretty friendly as languages go). For small rules, this might feel like overkill.

Second, Morphir works best for pure business logic — calculations, transformations, rules. If your code is tangled up with database calls, API requests, and UI rendering, you’ll need to separate the business logic first. That’s good practice anyway, but it’s still work.

Third, you’re betting on Elm. It’s not a mainstream language. If your team leaves and you need to hire replacements, finding Elm developers isn’t as easy as finding Java developers. Though the FINOS community helps here.

Where This Is Heading

FINOS is pushing Morphir toward some interesting places:

Regulation as Code. Imagine regulators publishing rules in Morphir format. Banks could directly import those rules instead of interpreting 200-page PDFs and hoping they got it right. The UK’s Financial Conduct Authority is already experimenting with this concept.

Cross-institution collaboration. If multiple banks agree on a standard way to calculate something (say, portfolio risk), they could share the Morphir definition. Everyone implements it the same way, reducing errors and making collaboration easier.

Better tooling. The Morphir team is building visual tools so business analysts can actually work with the models without writing code. Imagine dragging and dropping to build a financial rule, then having production-ready code generated automatically.

Should You Care?

If you’re building financial systems and you’ve ever had to maintain the same business logic in multiple places, yes.

If you’re tired of “wait, which version of this calculation is the right one?” conversations, yes.

If you think business analysts and developers should speak a common language instead of playing telephone, yes.

Morphir won’t solve every problem in financial software. But for the specific problem of keeping business logic consistent across systems, it’s one of the smarter approaches out there.

And unlike a lot of open source projects that feel like academic exercises, Morphir came from real pain at a real bank. That matters.

Resources to Dig Deeper

Official FINOS Morphir Resources:

Morphir GitHub Repository: Main codebase and documentation
Morphir Examples: Sample projects to understand how it works
FINOS Morphir Project Page: Overview and community info

Getting Started:

Morphir Elm SDK: The Elm tooling for building Morphir models
Morphir Documentation: Full docs and guides
Intro to Elm: Since you’ll need Elm basics to use Morphir

Related Reading:

Common Domain Model (CDM): Another FINOS project for standardizing financial data

This is part of a series on FINOS projects.
Previously: Building the Future of Open Source in Finance with FINOS and Connecting the Dots: How FDC3 is Making the Financial Desktop Smarter

jit, grad, and vmap in JAX — One simple idea behind all three: transforming functions

Ali Nawaz — Mon, 26 Jan 2026 12:54:43 GMT

If you’re new to JAX, you’ll keep seeing three words everywhere:

jit, grad, and vmap

People use them casually, like:

“Just jit it”

“Wrap it with vmap”

“Grad takes care of that”

And you’re left thinking:

“I know Python. I know NumPy. Why does JAX feel… different?”

This blog is here to clear that confusion without heavy math and without magic.

The problem JAX is designed to solve

Let’s start simple.

Python is easy to write, but it’s slow at runtime.

Accelerators like GPU and TPU are extremely fast, but they hate Python-style loops and dynamic behavior.

JAX sits in the middle and asks:

“What if we write normal Python functions…

but transform them into something fast, differentiable, and parallel?”

That’s where jit, grad, and vmap come in.

They don’t add features.

They transform your function.

Functions are the main abstraction in JAX

In JAX, you don’t write classes or graphs.

You write plain Python functions:

def loss(w, x):
    return jnp.sum(w * x)

Then JAX says:

Want gradients? Use grad
Want speed? Use jit
Want batching? Use vmap

Each of these takes a function and returns a new function.

That’s it.

No hidden state. No magic objects.

Understanding grad through simple examples

Think of grad as:

“Given a function, create another function that computes its derivative.”

Example:

from jax import grad

def f(x):
    return x ** 2
df_dx = grad(f)
df_dx(3.0)   # → 6.0

What happened?

The original function stays the same
grad(f) returns a new function
That new function computes how f changes with respect to x

Important thing to understand:

JAX traces your function, builds a computation graph internally, and applies automatic differentiation.

You never write derivative code yourself.

What actually happens when you use jit

jit stands for Just-In-Time compilation.

But practically, it means:

“Run this function once, understand it fully, then compile it to fast machine code.”

Example:

from jax import jit

@jit
def compute(x):
    return x * x + 2

The first run is slower:

JAX traces the function
Sends it to XLA
Compiles it

After that?

Runs at C / CUDA / TPU speed

Key idea:

jit doesn’t change what your function does
It changes how it runs

How vmap removes the need for manual batching

Normally, you write loops like this:

results = []
for x in batch:
    results.append(f(x))

Loops are slow and messy.

vmap says:

“This function works for one input.

I’ll automatically make it work for a batch.”

Example:

from jax import vmap

def f(x):
    return x ** 2
batched_f = vmap(f)
batched_f(jnp.array([1, 2, 3]))  # → [1, 4, 9]

No loops.
No manual batching.

Internally, JAX turns this into a single vectorized computation.

Using multiple transformations together

Here’s where JAX becomes special.

You can stack transformations:

fast_grad = jit(grad(f))

This means:
Take f, create its gradient, then compile the gradient itself.

Or:

batched_grad = vmap(grad(f))

Now you have gradients for a whole batch without writing a loop.

This composability is the core design win of JAX.

Common mistakes when using jit , grad, and vmap

Many issues people face with JAX come from incorrect expectations rather than bugs in code.

Assuming jit will speed up Python logic
jit does not optimize Python if statements or loops. That code still executes in Python, outside the compiled part.

Frequently changing input shapes
When shapes change, JAX has to recompile the function. This removes most of the performance benefit of using jit.

Using randomness without managing PRNG keys
JAX does not hide randomness. You must pass and update PRNG keys explicitly, otherwise results may be incorrect or confusing.

Treating jit, grad, and vmap as simple utilities
These are not helper functions. Each one rewrites how your function behaves internally.

Key ideas to remember

grad gives you automatic differentiation
jit gives you speed through compilation
vmap gives you batching without loops

But the most important idea is this:

JAX is not a traditional framework.

It’s a function transformation system.

Once this clicks, everything else Flax, Optax, and even TPUs starts making sense.

What’s next

In the next post, we’ll focus on how randomness works in JAX and why it looks so different from NumPy or Python.

We’ll break down what PRNG keys are, why you have to pass them around explicitly, and how this design helps JAX stay reproducible, parallel-friendly, and pure.

By the end, you’ll understand how JAX generates random numbers without hidden state, and why this approach matters once you start using jit, vmap, and accelerators.

What the Compact API in Flax Really Does: The Design Choice That Keeps Parameter Creation…

Ali Nawaz — Sat, 27 Dec 2025 09:38:32 GMT

What the Compact API in Flax Really Does: The Design Choice That Keeps Parameter Creation Functional

Why this topic matters

When you first see the compact API in Flax, it feels confusing.

You see layers being created inside the forward function.
You see parameters appearing where computation should happen.

If you care about functional programming, this feels wrong.

JAX teaches us that functions should be pure.
So how can Flax allow parameter creation inside the forward pass and still stay functional?

This blog exists to answer that question slowly and clearly.

The core confusion beginners face

In most deep learning libraries, parameter creation and computation are separated.

You define layers first.
Then you run data through them.

But with the compact API in Flax, you often see code like this.

Inside the forward function, layers appear directly.
Dense layers.
Convolutions.
Normalization.

It looks like parameters are being created every time the function runs.

But they are not.

Understanding why is the key to understanding the compact API.

Step 1: Compact is about permission, not behavior

The most important idea is this.

The compact API does not change what your function does.
It changes when Flax is allowed to create parameters.

By marking a function as compact, you are telling Flax:

You may create parameters here, but only during initialization.

That permission is tightly controlled.

Outside initialization, parameter creation is completely disabled.

Step 2: Two runs of the same function

Every Flax model runs in two very different modes.

Initialization mode
Application mode

During initialization:

The forward function runs once.
Flax watches it carefully.
Whenever it sees a layer that needs parameters, it creates them.
All parameters are stored outside the function in a parameter tree.

During application:

The same function runs again.
But now parameters already exist.
Nothing new is created.
Everything is reused.

The code looks the same.
The behavior is not.

Step 3: Why this does not break functional programming

This is the subtle but powerful design choice.

The function itself never stores parameters.
It never mutates state.
It never remembers anything.

All parameters live outside the function and are passed in explicitly.

So from JAX’s point of view, the model is still just a function.

Give it parameters and inputs.
Get outputs back.

That is why Flax works perfectly with JAX transformations like jit and grad.

Step 4: A simple mental model

Think of a factory inspection.

The first time, inspectors walk through the building.
They write down where machines should exist.
They note their shapes and requirements.

After inspection, the factory is fixed.

Every production run after that uses the same machines.
No new machines are added.

The compact API is that inspection step.

Once it is done, everything becomes stable and predictable.

Step 5: Why Flax introduced the compact API

The original setup style in Flax is explicit and clear.

But it can feel split.

Structure lives in one place.
Computation lives in another.

The compact API solves this by allowing you to describe structure exactly where computation happens.

You read the model top to bottom.
You see data flow clearly.
You understand the model without jumping between methods.

It improves readability without sacrificing correctness.

Step 6: What compact is not doing

It is important to say what the compact API does not do.

It does not recreate parameters every forward pass.
It does not store state inside the module.
It does not make the model object oriented in a traditional sense.

Everything still follows the functional rules strictly.

Common beginner mistakes

Mistake one: Thinking compact layers run differently
They do not.

Mistake two: Mixing setup and compact styles in the same module
This leads to confusion.

Mistake three: Forgetting that init and apply behave differently
Most bugs come from this misunderstanding.

Mistake four: Assuming compact is a shortcut or hack
It is a deliberate design choice.

Final takeaway

The compact API is not magic.

It is a carefully designed permission system.

It allows parameters to be created during initialization.
It forbids parameter creation during execution.
It keeps the model functional at all times.

Once you understand that separation, the compact API stops feeling strange and starts feeling natural.

What’s Next

In the next post, we will look at how Flax tracks parameters across nested modules.

We will explore how names are assigned, how parameter trees are built, and how Flax ensures the right parameters are reused at the right time without relying on hidden state.

This will complete the picture of how Flax manages complexity while staying fully functional.

How Flax Modules Work Internally: The Design Choice That Keeps JAX Functional

Ali Nawaz — Thu, 25 Dec 2025 06:24:08 GMT

Why this topic matters

When people start using Flax with JAX, one thing feels strange very quickly.

You write a neural network class, but there is no state inside it.
There is no self.weights being updated.
There is no hidden magic happening.

Yet everything works.

This can feel confusing if you come from PyTorch or TensorFlow.

Understanding how Flax modules work internally removes this confusion.
It helps you debug better, write cleaner models, and stop treating Flax like black magic.

Let us break it down slowly.

The big idea in one sentence

A Flax module is just a pure function description that knows how to create parameters and how to use them, but it never owns them.

That sentence will make sense by the end.

Step 1: Forget what a class usually means

In most deep learning frameworks, a model class behaves like this:

The class stores weights
The class updates weights
The class remembers state

Flax does not work like that.

In Flax:

The module describes structure
The parameters live outside the module
The data flows in and out

Think of a Flax module as a recipe, not a container.

A recipe explains how to cook.
It does not store the ingredients.

Step 2: What a Flax Module really is

A Flax module answers two questions.

What parameters do I need
How do I compute outputs using those parameters

That is all.

Internally, a module has:
A setup phase where submodules are defined
A call function that describes computation

But no actual parameter values live inside it.

This is the key mental shift.

Step 3: Where do parameters come from then?

Parameters are created during initialization.

When you call something like:

model.init(random_key, input)

Flax does this internally:

Walk through the module tree
Look at each layer
Ask each layer what parameters it needs
Create arrays using the random key
Store everything in a separate dictionary

The result is a parameter tree, not a model object with weights.

You now have two things:

The module which is just structure
The parameters which are just data

They are separate on purpose.

Step 4: What happens during a forward pass

When you call:

model.apply(params, input)

Here is what really happens:

The module code runs like a normal function
Whenever it needs a weight, it looks it up in params
It performs computation
It returns output

Nothing is stored.
Nothing is mutated.

This is why Flax works so well with JAX transformations like jit and grad.

Pure functions are easy to optimize.

Step 5: A simple mental model

Imagine a calculator.

The calculator body has buttons and logic.
The numbers you type are inputs.

Now imagine the calculator does not remember any numbers after the operation.

Each time:

You give it numbers
It computes
It gives output
Then it forgets everything

That is how Flax modules behave.

The calculator is the module.
The numbers are parameters and inputs.

Step 6: Why setup exists

You may wonder why Flax has a setup function.

Setup is just a place to define structure.

For example:

This module contains two dense layers
This module contains a convolution and normalization

No weights are created here.
Only relationships are defined.

During init, Flax reads this structure and creates parameters accordingly.

Step 7: Why this design is powerful

This design gives you three big advantages.

First, clarity.
You always know where state lives.

Second, safety.
No accidental hidden mutations.

Third, compatibility.
JAX transformations work naturally.

Once you accept that modules do not own parameters, everything becomes simpler.

Common beginner mistakes

Mistake 1: Expecting parameters inside self
They are never there.

Mistake 2: Thinking init runs the model normally
Init is about discovering parameters, not training.

Mistake 3: Treating apply as stateful
Apply is just a function call.

Mistake 4: Trying to modify parameters inside the module
Always return new parameters instead.

Final takeaway

Flax modules are not containers.
They are blueprints.

They describe how to create parameters.
They describe how to use parameters.
But they never store them.

Once you see a Flax module as a pure function plus structure, the confusion disappears.

That mental model will save you hours of frustration as you go deeper into JAX and Flax.

What’s Next

In the next post, we will focus on the Compact API in Flax.

We will explain what @compact means, why it exists, and how Flax allows parameter creation inside the forward pass without turning your model into a stateful object.

How JAX Compiles Your Code: The Secret Relationship Between JAX and XLA

Ali Nawaz — Sun, 07 Dec 2025 08:02:15 GMT

In the previous post, we explored why JAX thinks differently, why it separates parameters from computation, and how its functional mindset gives you more control than object-oriented frameworks. But there is still one major question left unanswered.

If JAX is just NumPy with magical transformations, how does it suddenly become so fast on GPUs and TPUs?

To understand that, we need to uncover how JAX compiles your Python code. And the moment you see this clearly, the entire design of JAX makes perfect sense.

Let’s break this down with simple analogies, clear intuition, and real code.

The Big Analogy: JAX as an Architect, XLA as the Construction Team

Imagine you design a building on paper. You draw the blueprint. That blueprint is your Python function.

But the blueprint alone cannot build a skyscraper.
You need a construction team that knows how to pour concrete, lift steel, and assemble everything using heavy machinery.

In JAX, the architect is JAX itself.
The construction team is XLA.

You write the instructions at a high level.
JAX analyzes your blueprint.
XLA takes that blueprint and builds the fastest possible version of it for your hardware.

This partnership is the reason JAX feels different from every other framework.

What Actually Happens When You Use jit

Let’s take a simple function.

import jax
import jax.numpy as jnp

def compute(x):
    return jnp.sin(x) + jnp.cos(x)

If you call the function normally, JAX executes it operation by operation.

But when you wrap it with jit:

fast_compute = jax.jit(compute)

JAX does something entirely different.
It stops executing your code.
Instead, it begins tracing your function.

Tracing is like JAX walking through your function slowly, collecting the mathematical steps you wrote.
It does not run them.
It records them.

Once JAX collects those steps, it hands them over to XLA, and XLA starts building.

It fuses operations together.
It rearranges them for maximum parallelism.
It removes unnecessary steps.
It compiles everything into one optimized accelerator program.

Then, when you finally call:

y = fast_compute(jnp.ones(1000000))

You are not running Python anymore.
You are running pure, optimized machine code on the GPU or TPU.

Why GPUs Love XLA’s Style of Execution

GPUs do not like receiving small, separate tasks.
They want one big package of work to run in parallel.

In many frameworks, every operation becomes a separate GPU call. This causes overhead. It is like asking a construction team to build your house brick by brick instead of giving them full walls.

XLA avoids this problem entirely.
It merges your operations into a single fused kernel.

So instead of sending:
Compute sin
Compute cos
Add sin and cos

XLA sends:
Compute sin(x) + cos(x) in one fused operation
This fusion is the secret behind JAX’s speed.

TPU Compilation: Why JAX Fits TPUs Naturally

TPUs cannot interpret Python at all.
Everything must be compiled into a TPU-compatible program before execution.

This is where JAX and XLA shine.
Since JAX functions are pure and stateless, they are perfectly suited for compilation.

A JAX function that works on CPU will work on GPU.
The same function will work on TPU.
Nothing needs to be rewritten.

Let’s test this idea in code.

def forward(x):
    return jnp.tanh(x * 3.0)
    
compiled = jax.jit(forward)

Whether forward runs on CPU, GPU, or TPU depends only on the device. The function itself never changes.

This is why Google researchers often use JAX on TPUs. The compilation pipeline is stable, predictable, and incredibly efficient.

Device Placement: Moving Work to a GPU or TPU

To explicitly send work to a GPU, JAX makes it easy.

gpu = jax.devices("gpu")[0]

def func(x):
    return jnp.sqrt(x)
compiled = jax.jit(func)
x = jnp.ones((1000, 1000))
y = compiled(x).block_until_ready()

If your device is GPU-enabled, the compiled version automatically runs there.
JAX selects the most powerful device unless told otherwise.

You can confirm the device by printing:

print(y.device())

It will show something like:

GpuDevice(id=0)

This confirms that the function was compiled for and executed on the GPU.

Why This Compilation Approach Makes JAX Unique

Most frameworks interpret operations eagerly.
JAX compiles entire functions.
Most frameworks keep internal state hidden inside objects.
JAX keeps everything explicit and pure.
Most frameworks send many small ops to accelerators.
XLA fuses them into large kernels.

This is why JAX feels different.
This is why JAX feels fast.
This is why JAX scales to extremely large models with fewer surprises.

The functional mindset is not just a programming style.
It is the key that unlocks compilation.
And compilation is the key that unlocks speed.

What’s Next

In the next post, we will explore how Flax modules work internally, why they look object-oriented even though JAX is functional, how the compact API actually works, and how Flax manages parameters under the hood without breaking the functional model.

Why JAX Thinks Differently: The Functional Mindset Behind its Power

Ali Nawaz — Fri, 05 Dec 2025 17:42:41 GMT

In the previous post, we explored how JAX compares to TensorFlow and PyTorch, and why its speed-focused design feels so different. But one question still remains unanswered:

Why does JAX feel so unusual when you first write code?

The reason is simple. JAX follows a functional programming mindset, not an object-oriented one. Understanding this mindset changes everything. It explains why model parameters must be passed separately, why randomness requires explicit keys, and why JAX feels so clean once it “clicks.”

Let’s break down this mindset with simple analogies, clear intuition, and code.

The Big Analogy: Blueprint vs Machine

Think of building a machine in two different ways.

In the first style, you create one big machine object. It holds its internal parts, remembers its state, and changes itself as it works. This is how PyTorch models behave. The object contains its own weights and updates them.

In the second style, you keep the blueprint separate from the building materials. The blueprint never changes. Instead, you pass different sets of materials whenever you want to build something.

This is how JAX thinks.
The model is the blueprint.
The parameters are separate.
The function is pure.

When these two are separate, something powerful happens. JAX can analyze the function mathematically. It can optimize it. It can differentiate it. It can compile it. And it can parallelize it without worrying about hidden state.

This separation is the heart of the functional mindset.

How This Looks in Real Code

Let’s recreate a tiny model in PyTorch style and JAX style to see the difference.

PyTorch keeps everything inside one object.

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)

    def forward(self, x):
        return self.linear(x)

model = Model()
x = torch.randn(5, 10)
output = model(x)

The model holds its own weights and uses them internally.

Now look at JAX with Flax.

import jax
import jax.numpy as jnp
from flax import linen as nn

class Model(nn.Module):
    @nn.compact
    def __call__(self, x):
        return nn.Dense(1)(x)

model = Model()

key = jax.random.PRNGKey(0)
dummy = jnp.ones((1, 10))
params = model.init(key, dummy)['params']

x = jnp.ones((5, 10))
output = model.apply({'params': params}, x)

JAX forces you to handle parameters explicitly. It may feel unfamiliar at first, but this gives you more control than any other framework.

Why JAX Uses Explicit Random Keys

Another thing beginners find strange is how JAX handles randomness. Instead of simply calling a random function, JAX requires you to pass an explicit PRNGKey every time.

It looks like this:

key = jax.random.PRNGKey(42)
x = jax.random.normal(key, (3,))

Why does JAX do this?

Let’s understand with an analogy.

Imagine you have a robot that generates random numbers. In most frameworks, you ask the robot for a random number and it uses some hidden state internally to give you one.

But in JAX, the robot does not keep any hidden state. You must hand it a “token” (the key) to use for randomness. The robot then returns two things:

A random number
A new key to use next time

No hidden state. No surprises.

This is what makes randomness reproducible and traceable, even inside jit, vmap, grad, and across multiple devices.

Splitting Keys: The Part People Forget

In JAX, using the same key twice is not allowed.
You must always split your key before using it again.

Example:

key = jax.random.PRNGKey(0)

key, subkey = jax.random.split(key)
a = jax.random.normal(subkey)

key, subkey = jax.random.split(key)
b = jax.random.uniform(subkey)

Each split creates a fresh, independent source of randomness.

This is why JAX code looks so clean inside large distributed systems: all randomness is explicit, controlled, and reproducible.

Why Functional Style + PRNG Makes JAX So Powerful

Let’s create a simple example that uses both ideas together.

We will write a function that initializes random weights and processes data.

def forward(params, x):
    w, b = params
    return w * x + b

def init_params(key):
    key_w, key_b = jax.random.split(key)
    w = jax.random.normal(key_w)
    b = jax.random.normal(key_b)
    return w, b

key = jax.random.PRNGKey(0)
params = init_params(key)

x = jnp.array(2.0)
output = forward(params, x)

Here is what is happening.

The function forward is pure. It has no hidden state.
All randomness is explicit in init_params.
The parameters are separate and visible.

Now JAX can do amazing things with this code.

It can jit-compile it.
It can differentiate it.
It can vectorize it.
It can run it across multiple GPUs.

All because the function is pure and stateless.

No hidden state means complete freedom to optimize.

How This Helps You in Real Projects

This mindset makes JAX especially good for:

Research, where you need complete control
Physics simulations, where equations must be exact
Distributed training, where hidden state becomes messy
Writing your own custom layers or optimizers
Building very large models where structure matters

JAX may feel strict at first, but once you adapt, the clarity becomes addictive.
You always know where your parameters are.
You always know where your randomness comes from.
You always know how your function behaves.

There is no hidden magic. Everything is explicit.

And this explicitness is what allows JAX to be so powerful.

What’s Next

In the next post, I will explain How JAX Compiles Your Code, explore the secret relationship between JAX and XLA, and show how your Python functions are transformed into highly optimized accelerator programs.

JAX vs. TensorFlow vs. PyTorch: A Deeper Look for Beginners

Ali Nawaz — Sun, 02 Nov 2025 06:08:50 GMT

In the previous post, we walked through “Why JAX Exists? The Need for Speed and Simplicity in ML”

Now that we know why JAX was created, it leads to the most important question for any beginner: “This is all great, but… which one should I actually learn?”

It’s a confusing choice. You see JAX, TensorFlow, and PyTorch everywhere. But here’s the simple truth: there is no single “best” framework. There is only the best framework for your specific goal.

Let’s ditch the high-level talk and go in-depth to compare them with simple analogies and code.

The Big Analogy: Choosing Your Vehicle

Think of these three frameworks as different types of vehicles, each built for a different job.

TensorFlow is the Industrial Truck:

It’s a heavy-duty, industrial powerhouse. It’s designed to handle massive, end-to-end projects.
It has a giant ecosystem to help you not just build a model, but also check your data, “serve” your model in the cloud, and even run it on tiny mobile phones (using TFLite) or in a web browser (using TF.js). It’s built for production and reliability.

PyTorch is the All-Terrain SUV:

This is the flexible, popular, all-around choice. It’s famous for being “Pythonic,” which means it feels natural and easy to use for Python programmers.
It’s fantastic for “off-roading” in research because it’s so easy to build and change new, creative ideas. It has a huge community and is now also very powerful in production.

JAX is the Formula 1 Engine:

JAX is a specialized machine built for one thing above all else: pure, mind-blowing speed.
It’s not a complete “vehicle” like the others. It’s more like a revolutionary engine. It doesn’t come with all the “comforts” like built-in data loaders or model-serving tools. It gives you the parts: a speed booster (jit), a slope-finder (grad), and a batch-processor (vmap).

It’s up to you, the skilled driver, to build the car around it.

How They “Feel”: A Simple Code Comparison

The best way to understand the difference is to see how they “feel” when you write code. Let’s build the same simple model in all three.

PyTorch: The “Pythonic” Object

PyTorch is famous for being “object-oriented.” This means you build a model by creating a Python class. It feels very intuitive. You create a "blueprint" (the class), define the "parts" (the layers) in the __init__ setup, and then define how data flows through them in the forward function.

import torch
import torch.nn as nn

# 1. Define the model's "blueprint" as a class
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        # 2. Define the parts (layers)
        self.layer1 = nn.Linear(in_features=10, out_features=50)
        self.activation = nn.ReLU()
        self.layer2 = nn.Linear(in_features=50, out_features=1)

    # 3. Define how data flows through the parts
    def forward(self, x):
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        return x

# 4. Create the model and use it!
model = SimpleModel()
data = torch.randn(64, 10) # A batch of 64 samples
output = model(data)

The Feel: This is easy to read. The model is a "thing" that holds its own parts (layers) and its own internal state (the weights).

TensorFlow: The “Lego Stack” (with Keras)

TensorFlow’s high-level API, Keras, is famous for its simplicity.

For most models, you don’t even need to write a class. You can just stack layers together in a list, like building with Legos.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# 1. Define the model by stacking layers in a list
model = keras.Sequential([
    layers.Input(shape=(10,)),
    layers.Dense(units=50),
    layers.Activation('relu'),
    layers.Dense(units=1)
])

# 2. Use the model!
data = tf.random.normal(shape=(64, 10))
output = model(data)

The Feel: This is incredibly fast and simple for building standard models. (For more complex models, you can also write it as a class, just like in PyTorch).

JAX: The “Pure Math Function”

JAX is different. It is “functional.” This is the most important concept to understand.

A JAX model (often built with a library like Flax) is not an “object” that holds its own weights. Instead, the model is just a “pure function,” like a math recipe. You must keep the model’s weights (parameters) separate from the model’s logic (the function).

import jax
import jax.numpy as jnp
from flax import linen as nn  # Flax is the popular neural net library for JAX

# 1. Define the model's "blueprint" (looks similar to PyTorch)
class SimpleModel(nn.Module):
    @nn.compact
    def __call__(self, x):
        x = nn.Dense(features=50)(x)
        x = nn.relu(x)
        x = nn.Dense(features=1)(x)
        return x

# THIS is where it gets different!
model = SimpleModel()

# 2. You MUST initialize the model to get the parameters (weights)
key = jax.random.PRNGKey(0)  # JAX needs an explicit "key" for randomness
dummy_data = jnp.ones(shape=(10,))
params = model.init(key, dummy_data)['params']

# 3. To run the model, you pass the `params` AND the `data`
data = jax.random.normal(key, shape=(64, 10))
output = model.apply({'params': params}, data)

The Feel: This seems like extra work, right? We had to call model.init to get the params, then pass the params back in with model.apply. But this "functional" design is the secret to JAX's power. Because the function and its weights are separate, JAX can easily jit (speed up), grad (get the slope of), and vmap (batch) it.

In-Depth Breakdown: What Matters for a Beginner

Let’s go deeper than just code. What do these differences mean for you?

Community & Ecosystem

This is maybe the most important factor for a beginner.

PyTorch: Has the strongest research community. Almost every new AI paper from top labs is released in PyTorch. The community is massive, the tutorials are excellent, and libraries like Hugging Face Transformers (the king of NLP) are PyTorch-first.

TensorFlow: Has the strongest production ecosystem. Its community is huge and backed by Google. You will find more tools for the full lifecycle: deploying to a server (TensorFlow Serving), running on a phone (TFLite), and monitoring in production (TFX).

JAX: Has a smaller, more advanced community. You’ll find brilliant people and amazing tools (like Flax, Optax, and Haiku), but you are expected to be more independent. You are the “mechanic” who has to find and assemble the parts (a library for data, a library for optimization, etc.).

Debugging (Fixing Your Errors)

This is where beginners get the most frustrated.

PyTorch: Easiest to debug. It runs in “eager mode,” which means it runs just like your normal Python script, one line at a time. If there is an error on line 10, it stops on line 10 and tells you. You can print values anywhere to see what’s happening.

TensorFlow: Also very good. With Keras and Eager Execution, it’s just as easy to debug as PyTorch for most tasks.

JAX: The hardest to debug. This is the trade-off for its speed. When you use jax.jit, JAX "compiles" your function into a super-fast version. If there's an error, it might happen inside that compiled code. The error messages can be very long and confusing until you get used to them.

My Advice: Which One Should You Learn?

Here is my simple, direct advice based on your goals.

Learn PyTorch if…

You are an absolute beginner and this is your first ML framework.
You want the easiest learning curve and the most tutorials.
You are interested in Natural Language Processing (NLP) or using the latest models from Hugging Face.
You want to read and understand new research papers.

Bottom Line: For most beginners today, PyTorch is the best and safest place to start.

Learn TensorFlow if…

Your main goal is getting a model into a real-world product.
You are specifically interested in deploying models on mobile phones (iOS/Android) or in a web browser.
You work at a company that already has a large, established TensorFlow codebase.
You want an “all-in-one” toolkit that handles everything from data validation to deployment.

Bottom Line: Learn TensorFlow if your job or project is focused on production and deployment.

Learn JAX if…

You are NOT a beginner. You should already be comfortable with ML concepts and either PyTorch or TensorFlow.
You have a strong love for math and want to build things from scratch.
Your number one need is raw performance (e.g., training huge models on TPUs).
You are a researcher in physics, biology, or advanced AI and need to write custom, high-speed math that isn’t just a standard model.

Bottom Line: Learn JAX after you are comfortable with another framework and you find yourself needing more speed and flexibility than it can offer.

What’s Next:

Up next, I will guide you step by step on how to install and configure JAX on Google Colab and on your own computer. I will share the exact commands, explain CPU and GPU builds, and help you avoid the common setup issues people run into.

By the end, you will be ready to run JAX code smoothly and start experimenting.

Why JAX Exists? The Need for Speed and Simplicity in ML

Ali Nawaz — Sat, 01 Nov 2025 06:14:34 GMT

In the previous post, we walked through “Your First Guide to JAX: NumPy with Superpowers” and got comfortable with JAX basics.

This usually leads to a great question: “But… why? Don’t we already have TensorFlow and PyTorch?”

It’s a smart question. TensorFlow and PyTorch are fantastic, powerful, and used by millions. So why did Google, the creator of TensorFlow, build another library?

The answer is that JAX was built to solve a very specific problem that researchers were facing. It fills a gap that existed between easy, familiar code (like NumPy) and high-performance ML (like TensorFlow).

To understand this, let’s imagine the world of data science is split into two “kingdoms.”

Kingdom 1: The “NumPy” Kingdom (Easy & Familiar)

This is the world of traditional science and data analysis.

Who lives here? Data scientists, financial analysts, physicists, biologists, and most Python programmers.

What’s it like? It’s simple and beautiful! The main tool is NumPy. You write simple, clean Python code. You have an idea, you write a function, and it just works.

Why people love it:

clean and readable code
fast to prototype ideas
easy to understand and teach
feels like regular Python

import numpy as np

def square(x):
    return x * x

print(square(5))

The Problem: This kingdom is not built for modern, large-scale AI.

It’s “Slow”: NumPy code runs on your CPU. It doesn’t know how to use those powerful, expensive GPUs or TPUs that make machine learning possible.
No “Slope Finder”: It’s missing the most important tool for machine learning: automatic differentiation (grad). You can't "train" a NumPy function.

This makes it hard to take a mathematical idea and turn it into a scalable ML experiment.

Kingdom 2: The “Framework” Kingdom (Fast & Powerful)

This is the world of deep learning, ruled by TensorFlow and PyTorch.

Who lives here? Machine Learning Engineers and AI Researchers.

What’s it like? It’s incredibly powerful. These frameworks are built from the ground up to be fast. They can run on massive clusters of GPUs and TPUs, and they have excellent automatic differentiation.

PyTorch Example:

import torch

x = torch.tensor(5.0, requires_grad=True)
y = x * x  # y = x^2
y.backward()  # compute gradient
print(x.grad)  # prints 10

TensorFlow Example:

import tensorflow as tf

x = tf.Variable(5.0)

with tf.GradientTape() as tape:
    y = x * x  # y = x^2

dy_dx = tape.gradient(y, x)
print(dy_dx.numpy())  # prints 10

The Problem: To live here, you have to learn a whole new set of rules.

It’s “Heavy”: You have to learn about Tensors, Sessions (in older TF), Model.compile(), layers, and model.fit().
It’s “Rigid”: These frameworks are built to do one job very well: train deep learning models. What if you want to do something weird or new? What if you want to find the gradient of a physics simulation that has complex loops and logic? It can be difficult.

The Big Problem: Researchers Were Stuck

For years, AI researchers felt stuck between these two kingdoms.

They loved the simplicity and flexibility of NumPy (Kingdom 1).
But they needed the speed and grad function of a framework (Kingdom 2).

They found themselves spending more time fighting with a heavy framework than testing their new, creative AI ideas. They had a simple wish:

“I just want to write my simple Python/NumPy code, and I want it to run fast on a GPU and give me gradients.”

JAX is the Bridge Between the Kingdoms

This is exactly why JAX was created.

JAX is not a new, heavy framework. JAX is a set of “superpowers” for the NumPy kingdom.

It was designed to give researchers the best of both worlds. Here’s how it solves the two biggest needs: speed and simplicity.

1. The Need for SPEED (Solved by jit)

Python is slow because it reads your code one line at a time. A for loop that runs 1,000,000 times means Python makes 1,000,000 decisions.

A GPU is fast because it’s a “parallel” processor. It’s like having 10,000 workers ready to go. You can’t give them one instruction at a time; you have to give them the whole plan at once.

How JAX solves this: jax.jit (Just-in-Time compilation).

Analogy:

Normal Python: A nervous home cook reading a recipe one line at a time. “1. Get a bowl.” (walks to cupboard).
2. Get a spoon.” (walks to drawer). Very slow.
@jax.jit: A master chef in a giant restaurant kitchen. They read the entire recipe once, optimize it for their 10,000-worker (GPU) kitchen, and shout "Go!" The entire meal is prepared in seconds.

jax.jit uses a powerful compiler called XLA (Accelerated Linear Algebra) to translate your simple Python function into super-fast machine code that runs perfectly on GPUs or TPUs.

2. The Need for SIMPLICITY (Solved by grad and vmap)

Researchers don’t just want to build standard models. They want to get creative. They want to “compose” functions.

How JAX solves this: JAX superpowers are composable. They are just Python functions that transform other Python functions.

This is the most beautiful part of JAX. Want to get the “slope” of a function? Just wrap it in grad.

def my_function(x):
    return x**2

find_slope = jax.grad(my_function)

Want to make that “slope” function run really, really fast on a GPU?
Just wrap it in jit!

# Let's stack our superpowers!
fast_slope_finder = jax.jit(jax.grad(my_function))

This is revolutionary. You are not building a “model.” You are not fighting a framework. You are just stacking Lego bricks. You can jit a grad, vmap a jit, grad a vmap... whatever you want.

This “composability” gives researchers the ultimate simplicity and flexibility to build anything they can imagine.

Key JAX Features

jit for Speed:

import jax
import jax.numpy as jnp

def square(x):
    return x * x

fast_square = jax.jit(square)
print(fast_square(5))

2. grad for Differentiation:

def f(x):
    return x**2

df = jax.grad(f)
print(df(3.0))  # 6.0

3. vmap for Batch Operations:

values = jnp.array([1, 2, 3, 4])

vmap_f = jax.vmap(f)
print(vmap_f(values))

Composing Transformations:

fast_grad = jax.jit(jax.grad(f))
print(fast_grad(5.0))

Simple functions → compiled + differentiable + vectorized

No new syntax. No framework overhead.

So, Why Does JAX Exist?

JAX exists to make AI research fast, simple, and flexible.

It was built for researchers who wanted to feel like they were writing clean, simple NumPy, but have it execute with the blinding speed of TensorFlow on a GPU.

It gives you the power to write any math function you can think of and instantly be able to:

jit it: Make it run crazy fast.
grad it: Find its slope (train it).
vmap it: Run it on a huge batch of data.

For beginners, this makes JAX a fantastic tool. It’s not a “black box.” It lets you build from the ground up and really understand the math, all while being as simple and familiar as the NumPy you already know.

In short: Write math naturally, run it fast, scale it easily.

What’s Next

In my next post, I’ll put these tools side by side: “JAX vs. TensorFlow vs. PyTorch: When to Use Each One.”

I’ll break down where each framework shines, from fast research prototyping to production-level deployment, and explain when JAX’s simplicity and performance make it the better choice over traditional deep learning libraries.

Your First Guide to JAX: NumPy with Superpowers

Ali Nawaz — Sun, 26 Oct 2025 16:57:32 GMT

Have you ever used NumPy? It’s that amazing Python library everyone uses for working with lists or “arrays” of numbers. It’s fast, powerful, and the foundation of data science.

Now, what if you could take your NumPy code, make it run hundreds of times faster on a GPU or TPU, and give it new “superpowers” for machine learning, all without having to learn a giant new framework?

That is exactly what JAX is.

JAX is a high-performance library from Google for math and machine learning. But don’t let “high-performance” scare you. At its heart, JAX is so simple you already know how to use it.

In short: JAX is NumPy on steroids.

The Secret: JAX is Just NumPy

To get started with JAX, you don’t need to learn a new API. You just change your import statement.

Instead of this:

import numpy as np

You write this:

import jax.numpy as jnp

That’s it. You can now use jnp for all your familiar NumPy functions (jnp.array(), jnp.dot(), jnp.sum(), etc.), but JAX will be running things behind the scenes, making it possible to...

Run your code on powerful GPUs and TPUs.
Use three incredible “superpowers” that are perfect for modern AI.

Let’s meet those superpowers.

Superpower 1: jax.jit (The Speed Booster)

jit stands for Just-In-Time compilation.

What it is: A “decorator” that you put on top of your Python function to make it incredibly fast.

The Analogy:

Imagine a chef following a recipe (your function) one line at a time.
“1. Get a bowl.” (walks to cupboard). “2. Get a spoon.” (walks to drawer). “3. Get flour.” (walks to pantry). This is slow.
@jit is like a smart chef. They read the entire recipe first, optimize it ("I'll get the bowl, spoon, and flour all in one trip"), and then execute the whole plan as fast as possible.

How to Use jit:

Let’s say you have a simple function that does some math on a big array.

import jax
import jax.numpy as jnp

# This is our normal Python function
def slow_calculation(x):
    # A bunch of math steps
    y = jnp.sin(x)
    z = jnp.cos(y)
    return jnp.sum(z)

# Now, let's create the "fast" version using jit
fast_calculation = jax.jit(slow_calculation)

# You can also just add @jax.jit above the function definition:
# @jax.jit
# def fast_calculation(x):
#     ...

# Let's create some data
big_array = jnp.arange(1_000_000)

# Run it once to "compile" the function
fast_calculation(big_array) 

# Now, let's time it!
# On a GPU, the JIT-compiled version can be 100x+ faster.

By adding jax.jit, you've told JAX to pre-compile this function into highly efficient machine code that can run directly on your GPU.

Superpower 2: jax.grad (The AI "Guide")

grad stands for gradient, which is a core concept in AI.

What it is: A function that automatically calculates the “slope” or “derivative” of any other function. This is the magic behind how all AI models learn.

The Analogy:

Imagine you are on a foggy mountain and you want to get to the bottom of the valley (the “lowest error”). You can’t see, but you can feel the slope of the ground under your feet.
To get down, you just feel which way is “downhill” and take a step. You repeat this until you reach the bottom.
jax.grad is a magic compass that instantly tells you exactly which direction is the steepest "downhill" from any point in your function.

How to Use grad:

In AI, we call the “downhill” direction the gradient. Let’s see how grad finds it.

Let’s use a simple “valley” function: y = x². We all know the bottom of this valley is at x = 0.

Python

import jax
import jax.numpy as jnp

# 1. Define our "valley" function
def my_function(x):
    return x**2

# 2. Create the "guide" function using jax.grad
find_slope = jax.grad(my_function)

# 3. Ask the guide for directions at different points
slope_at_x_10 = find_slope(10.0)
print(f"The slope at x=10 is: {slope_at_x_10}")
# Output: 20.0 (points uphill)

slope_at_x_neg_5 = find_slope(-5.0)
print(f"The slope at x=-5 is: {slope_at_x_neg_5}")
# Output: -10.0 (points uphill)

slope_at_x_0 = find_slope(0.0)
print(f"The slope at x=0 is: {slope_at_x_0}")
# Output: 0.0 (we are at the bottom!)

jax.grad automatically figured out the math to find the slope. This is the entire basis of machine learning, and JAX does it for you, no matter how complex your function is.

Superpower 3: jax.vmap (The Batch Processor)

vmap stands for vectorizing map.

What it is: A tool that automatically “batches” your code. It lets you run a function designed for one item on thousands of items at the same time, in parallel.

The Analogy:

Imagine you have a function that “squares one number.” Now you have a list of 1,000 numbers. The slow way is a for loop: "take number, square it, take next number, square it..." 1,000 times.
jax.vmap is like building a giant stamp that squares all 1,000 numbers in a single "press." It automatically rewrites your function to handle the whole batch at once, efficiently.

How to Use vmap

import jax
import jax.numpy as jnp

# 1. A simple function that works on ONE number
def square(x):
    return x * x

# 2. A batch of numbers
numbers = jnp.array([1, 2, 3, 4, 5])

# The SLOW way (a Python loop)
# results = []
# for x in numbers:
#     results.append(square(x))

# 3. The FAST way (using vmap)
# Create a new function that knows how to handle batches
batched_square = jax.vmap(square)

# Run it on all numbers at once
results = batched_square(numbers)

print(results)
# Output: [ 1  4  9 16 25]

This is essential for AI, where you are always processing batches of data (e.g., 64 images at a time, not just one).

Putting It All Together

Let’s use our new superpowers to solve a real, simple ML problem: finding the best-fit line for some data.

We have some x and y data, and we want to find the best slope (w) and intercept (b) for the line y = w*x + b.

import jax
import jax.numpy as jnp

# 1. Our Data
# Let's try to find the line for y = 3*x + 2
true_w = 3.0
true_b = 2.0
x_data = jnp.array([1.0, 2.0, 3.0, 4.0])
y_data = x_data * true_w + true_b
print(f"Our data: x={x_data}, y={y_data}")


# 2. Our Model and "Loss"
# Our model's prediction
def predict(params, x):
    return params['w'] * x + params['b']

# A "loss" function tells us how "wrong" our model is.
# We want to make this number as small as possible.
def loss_function(params, x, y):
    prediction = predict(params, x)
    error = prediction - y
    return jnp.mean(error**2) # This is our "foggy valley"

# 3. Use Our Superpowers!

# Use jax.grad to create the "guide" that finds the downhill slope
# (Notice it's a "grad" of the "loss_function")
find_loss_gradient = jax.grad(loss_function)

# Use jax.jit to make our guide super fast!
@jax.jit
def update_step(params, x, y, learning_rate):
    # Get the "downhill" direction from our guide
    gradients = find_loss_gradient(params, x, y)
    
    # Take a small step downhill
    new_params = {
        'w': params['w'] - learning_rate * gradients['w'],
        'b': params['b'] - learning_rate * gradients['b']
    }
    return new_params

# 4. "Train" the Model
# Let's start with a bad guess
params = {'w': 0.0, 'b': 0.0} 
learning_rate = 0.01

print(f"Starting guess: w={params['w']}, b={params['b']}")

# Let's take a few steps "downhill"
for _ in range(1000):
    params = update_step(params, x_data, y_data, learning_rate)

print(f"Final trained guess: w={params['w']:.2f}, b={params['b']:.2f}")

Output:

Our data: x=[1. 2. 3. 4.], y=[ 5.  8. 11. 14.]
Starting guess: w=0.0, b=0.0
Final trained guess: w=3.00, b=2.00

Look at that! By using jax.grad to find the "downhill" direction and jax.jit to make it fast, we "trained" a model to find the exact answer.

Why Should a Beginner Care About JAX?

It’s Easy to Start: If you know NumPy, you already know JAX. There’s no big new API to learn.
It’s Powerful: It gives you the three most important tools for modern AI (jit, grad, vmap) for free.
It’s Fast: It lets your code use the full power of your GPU/TPU, which is essential for any serious machine learning.
It’s the Future: JAX is used by top research labs like Google DeepMind to build the world’s most advanced AI models.

Learning JAX is a fantastic way to understand how AI really works under the hood, and it gives you a skill that many employers are looking for.

How to Install

You can try it right now in a Google Colab notebook.

# Install JAX
pip install jax jaxlib

Start by importing jax.numpy as jnp and try running your old NumPy code. Then, try adding @jax.jit to a function and see the magic for yourself!

What’s Next

In the next blog, “Your First Guide to JAX: NumPy with Superpowers,” we’ll move from understanding why JAX exists to actually using it.

You’ll learn how to write your first lines of JAX code, explore how it mirrors NumPy, and discover how JAX’s automatic differentiation and just-in-time compilation make machine learning workflows faster and more efficient.

What Is JAX? A Friendly Introduction for ML Beginners

Ali Nawaz — Fri, 10 Oct 2025 17:49:21 GMT

Have you ever used TensorFlow or PyTorch and wished your code ran much faster, especially when dealing with massive datasets or complex new AI models?

The world of Artificial Intelligence is always moving forward, and right now, many top researchers and companies are shifting towards a powerful, yet surprisingly simple tool called JAX.

JAX is a newer, open-source library developed by Google and is quickly becoming the favorite for advanced AI development. Think of it as Python with Superpowers for deep learning.

Just as TensorFlow helped make AI development easier for everyone, JAX takes it a step further by bringing unmatched speed, flexibility, and power, allowing your code to run seamlessly on CPUs, GPUs, and even TPUs without changing a single line.

In Pakistan, where innovation in fintech, research, and remote work is booming, mastering a cutting-edge tool like JAX can instantly make you stand out.

What Exactly is JAX?

JAX is a numerical computing library that dramatically accelerates your math code, primarily by offering two key capabilities:

Automatic Differentiation (AutoDiff): It can instantly and accurately calculate the slope (gradient) of any function you write. This is the secret sauce behind all AI learning.
XLA (Accelerated Linear Algebra): It automatically compiles your Python code to run efficiently on GPUs (like those used for gaming) and TPUs (Google’s custom AI chips).

Imagine you have a very skilled tailor in Anarkali Bazaar, Lahore. He makes perfect dresses but works slowly, one stitch at a time.

Now imagine that tailor is given a smart, automated laser machine that follows the same cutting plan but finishes the work in seconds.

That is exactly what JAX does for your Python code.

It takes your existing NumPy code and runs it much faster, using advanced technology under the hood.

In short:

You still write the same familiar code.
It runs faster on your machine.
It can even calculate complex gradients automatically, which is a must-have for machine learning.

The Three Core Features of JAX

Once you understand these three main tools, you understand JAX.

1. jax.grad — The Automatic Differentiator

In machine learning, models improve by learning from their mistakes. They do this by finding how much change is needed in each step — something called the gradient.

jax.grad helps you calculate this automatically without writing the math yourself.

import jax
import jax.numpy as jnp

def simple_loss(x):
  return x**2 + 5 * x + 3

slope_fn = jax.grad(simple_loss)

x_value = 2.0
slope_at_2 = slope_fn(x_value)

print(f"Value at x={x_value}: {simple_loss(x_value):.1f}")
print(f"Slope (gradient) at x={x_value}: {slope_at_2:.1f}")

Now the model instantly knows which direction to move to reduce the error.

2. jax.jit — The Speed Booster

Python is simple but sometimes slow, especially for big models.

That’s where jax.jit (Just-In-Time compilation) helps. It takes your normal Python function, optimizes it, and runs it as fast machine code, often on a GPU or TPU.

You write your code once, add one small line, and everything runs much faster.

from jax import jit
import jax.numpy as jnp

@jit
def multiply_and_sum(x):
    return jnp.sum(x * 2)

data = jnp.arange(1_000_000)
print(multiply_and_sum(data))

This code runs significantly faster than normal NumPy because JAX compiles it efficiently before execution.

3. jax.vmap — The Batch Processor

In many AI tasks, you need to repeat the same operation on many samples, for example analyzing a set of images or sensor readings.

Normally, this requires writing loops, but with jax.vmap, you can apply a function to an entire batch automatically.

import jax
import jax.numpy as jnp

def square(x):
    return x ** 2

# Apply the function to all elements at once
batch_square = jax.vmap(square)
numbers = jnp.array([1, 2, 3, 4, 5])
print(batch_square(numbers))

JAX handles the batch processing efficiently and runs all the operations in parallel.

A Simple Real-World Example: Karachi Climate Analysis

Let’s imagine you’re studying temperature changes in Karachi. You have daily readings and want to see which days caused the biggest overall change.

With JAX, you can do this analysis quickly and efficiently.

import jax
import jax.numpy as jnp

historic_temps = jnp.array([28.1, 28.5, 29.2, 30.1, 30.0, 29.8, 30.5])

def calculate_temp_change_metric(temp_array, target_temp=28.0):
    return jnp.sum((temp_array - target_temp)**2)

metric_gradient_fn = jax.grad(calculate_temp_change_metric)
compiled_gradient_fn = jax.jit(metric_gradient_fn)

temp_gradients = compiled_gradient_fn(historic_temps)

print("Karachi Climate Analysis")
for day, gradient in enumerate(temp_gradients):
    print(f"Day {day+1} Contribution: {gradient:.2f}")

The gradient shows which day’s temperature affected the overall pattern the most, and JAX calculates it in a fraction of a second.

Getting Started with JAX

If you already know NumPy, you’ll find JAX very easy to use.

Installation (in Google Colab or locally):

pip install jax jaxlib
# For GPU users
pip install jax[cuda12] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Once installed, just replace numpy with jax.numpy (imported as jnp) in your code.

Example:

import jax.numpy as jnp

salaries = jnp.array([120000, 85000, 150000, 90000, 210000])
total = jnp.sum(salaries)
print(f"Total Monthly Payroll: {total:,.0f} PKR")

And if you want it to run faster:

from jax import jit

@jit
def calculate_payroll(salaries):
    tax_rate = 0.15
    net_salary = salaries * (1 - tax_rate)
    return jnp.sum(net_salary)

total_net = calculate_payroll(salaries)
print(f"Total Net Payroll after tax: {total_net:,.0f} PKR")

Why JAX Matters for You

It’s fast: Perfect for large projects, research, and AI models.
It’s simple: Works just like NumPy.
It’s modern: Used by Google Research, DeepMind, and many AI labs.
It’s valuable: Few people know it well, which makes it a great skill for your career.

If you’re someone curious about how modern AI tools are built and want to explore what’s powering the next wave of machine learning innovation, JAX is a great place to begin your journey.

What’s Next

In the next blog, “Your First Guide to JAX: NumPy with Superpowers,” we’ll take our first practical step into JAX.

You’ll learn how JAX combines the simplicity of NumPy with the power of automatic differentiation and GPU acceleration, making it the perfect starting point for anyone stepping into the world of modern machine learning.