This repository contains toy implementations of the concepts introduced in VortexNet: Neural Computing through Fluid Dynamics. The code demonstrates how PDE-inspired "vortex" updates can be inserted into standard autoencoders.
Note: These are educational prototypes, not optimized or physically precise fluid solvers.
src/vortexnet/mnist.py: public MNIST autoencoder facade and CLI entry point.src/vortexnet/mnist_model.py,src/vortexnet/mnist_data.py,src/vortexnet/mnist_viz.py, andsrc/vortexnet/mnist_training.py: MNIST model layers, data loading, reconstruction helpers, and the training runner.src/vortexnet/image.py: public RGB image autoencoder facade and CLI entry point.src/vortexnet/image_model.py,src/vortexnet/image_data.py,src/vortexnet/image_viz.py, andsrc/vortexnet/image_training.py: image model layers, data loading, visualization/evaluation helpers, and the training runner.src/vortexnet/operators.py: reusable fixed PDE, learned stencil, residual conv, and compact spectral/FNO-style operator blocks.src/vortexnet/experiments.py: shared experiment variant definitions, device setup, summary statistics, and CSV helpers.src/vortexnet/dynamics.py: synthetic heat, wave, advection-diffusion, Burgers-style transport, reaction-diffusion, and vorticity next-state datasets for testing operator blocks on actual spatial dynamics.src/vortexnet/pdebench.py: local PDEBench-style HDF5 sequence dataset adapter.src/vortexnet/pdebench_catalog.py: PDEBench catalog filtering, guarded download, file-size lookup, and MD5 verification helpers.vortexnet_mnist.pyandvortexnet_image.py: local compatibility wrappers for direct script execution; installed commands arevortexnet-mnistandvortexnet-image.config_image.yaml: longer custom-image training config.config_image_smoke.yaml: tiny custom-image smoke config.
Dated experiment writeups and result tables live in
docs/experiments/. The current real-dataset report is
2026-04-24-pdebench-shallow-water.md.
Install dependencies into a local uv-managed environment:
uv syncThe pyproject.toml pins PyTorch and TorchVision to the PyTorch CUDA 13.0 wheel
index on Windows/Linux. If you need CPU-only or a different CUDA wheel family,
change the tool.uv.sources and tool.uv.index entries in pyproject.toml.
Check the environment:
uv run python -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.version.cuda)"
uv run pytestOn locked-down Windows workspaces, pytest cache writes can be flaky; this repo
disables pytest's cache provider in pyproject.toml.
If uv run ... reports an access error under AppData\Local\uv\cache, use
uv --no-cache run ... or refresh the environment with
uv sync --no-cache --link-mode=copy.
Useful uv commands:
uv lock # update uv.lock after dependency changes
uv sync # create/update .venv from pyproject.toml and uv.lock
uv run ... # run commands inside the managed environment
uv tree # inspect the resolved dependency treeMNIST downloads automatically into data/.
uv run vortexnet-mnist --epochs 1 --batch_size 64 --pde_padding_mode constant --max_train_batches 10 --max_test_batches 5 --no_showFor the original short demo:
uv run vortexnet-mnist --epochs 5 --batch_size 64 --no_showOutputs are written to outputs/mnist/.
Place JPEG or PNG images in my_data/, then run:
uv run vortexnet-image --config config_image_smoke.yaml --skip_tsne --skip_onnx --no_showFor a longer run:
uv run vortexnet-image --config config_image.yaml --no_showOutputs are written to the configured data.output_dir.
- Add a real train/validation split instead of validating on the training loader.
- Track PSNR/SSIM in addition to MSE reconstruction loss.
- Run ablations against a same-size plain convolutional autoencoder.
- Keep PDE padding set to
constantunless periodic boundaries are semantically required; it uses native convolution padding and avoids explicit circular padding. - Implement true loss-gradient adaptive damping instead of the current activation magnitude fallback used during ordinary forward passes.
- Evaluate on a task where PDE-style recurrence has a plausible advantage, such as video, time series, or long spatial context, not just static reconstruction.
Static reconstruction is a weak test for PDE-inspired blocks. Use the dynamics benchmark to train next-state predictors on generated fields:
uv run python scripts/compare_dynamics_variants.py --task heat --device cuda
uv run python scripts/compare_dynamics_variants.py --task wave --device cuda
uv run python scripts/compare_dynamics_variants.py --task advection_diffusion --device cuda
uv run python scripts/compare_dynamics_variants.py --task burgers2d --device cuda
uv run python scripts/compare_dynamics_variants.py --task reaction_diffusion --rollout-steps 32 --device cuda
uv run python scripts/compare_dynamics_variants.py --task vorticity --rollout-steps 16 --device cudaFor a more reliable comparison, run multiple seeds, autoregressive evaluation, and optional rollout loss during training:
uv run python scripts/compare_dynamics_variants.py --task vorticity --rollout-steps 16 --autoregressive-steps 3 --train-rollout-steps 2 --seeds 42 43 --device cuda--train-rollout-steps 2 trains on two autoregressive model steps instead of
only one target step. This is slower, but it directly optimizes the failure mode
that appears during repeated rollout. Use --detach-train-rollout to avoid
backpropagating through previous predicted states.
The benchmark compares:
plain: shared conv predictor with no operator block.fixed_pde: fixed finite-difference update with learnable channel diffusion.learned_stencil: depthwise learned local stencil plus channel mixing.conv_residual: conventional residual convolution baseline.spectral: compact FNO-style Fourier operator.hybrid_spectral: local residual convolution plus spectral/FNO-style global mixing.unet_small: small U-Net baseline.fno_small: compact FNO-style full-model baseline.
Bounded 32x32 CUDA runs with 512 train samples, 128 eval samples, and 80 train batches show the current pattern:
- Heat/diffusion is too easy; most variants nearly saturate it.
- Wave and Burgers-style transport favor the conventional residual conv block.
- Pure spectral/FNO-style mixing is not robustly better by itself on these small local benchmarks.
- Vorticity dynamics are where local-global models start to matter, but stronger baselines changed the conclusion: U-Net wins some one-step vorticity metrics, while residual conv has been more stable in one-step-trained 3-step rollout. Adding two-step rollout loss improved rollout stability and made U-Net the best tested vorticity rollout model in the bounded 32x32 two-seed run. The hybrid local-spectral block is competitive, but it does not yet beat those controls.
- The fixed PDE block is cheap and interpretable, but so far it behaves more like a regularizer than a clear accuracy win.
For larger external benchmarks, the most relevant next targets are:
- PDEBench: broad PDE surrogate benchmark with advection, Burgers, reaction-diffusion, shallow water, Navier-Stokes, FNO/U-Net/PINN baselines, and DaRUS downloads: https://github.com/pdebench/PDEBench
- NeuralOperator Navier-Stokes: 2D incompressible Navier-Stokes tensors at 128 and 1024 resolution, but the smaller archive is still about 1.5 GB: https://zenodo.org/records/12825163
- WeatherBench 2: public weather forecasting grids, including low-resolution
64x32 ERA5 files in
gs://weatherbench2/datasets, useful once the local operator stack is credible: https://weatherbench2.readthedocs.io/
PDEBench provides HDF5 datasets through DaRUS and download scripts in its
data_download folder. This repo can now inspect and train against a local
PDEBench-style HDF5 sequence file:
List catalog entries before downloading anything:
uv run python scripts/download_pdebench_file.py --pde SWE --check-size
uv run python scripts/download_pdebench_file.py --pde 2D_ReacDiff --check-sizeDownload one exact match with MD5 verification:
uv run python scripts/download_pdebench_file.py \
--pde SWE \
--filename 2D_rdb_NA_NA.h5 \
--download \
--output-dir data/pdebenchThe downloader refuses files larger than 8 GB unless --allow-large or a larger
--max-gb is provided.
uv run python scripts/compare_pdebench_file.py --file path/to/file.hdf5 --inspect-onlyUse --inspect-limit 0 to print every HDF5 dataset in large grouped files.
Run a bounded comparison on a local file:
uv run python scripts/compare_pdebench_file.py \
--file path/to/file.hdf5 \
--normalize \
--variants conv_residual unet_small fno_small hybrid_spectral \
--autoregressive-steps 3 \
--train-rollout-steps 2 \
--seeds 42 43 44 \
--device cudaThe loader supports common time-sequence layouts:
ntxyc: samples, time, x, y, channels.ntcxy: samples, time, channels, x, y.ntxy: samples, time, x, y, with one implicit channel.txyc,tcxy, andtxy: grouped files where each sample is stored under keys such as0000/data,0001/data, and so on.
Use --data-key if the file has more than one large array and --layout if
automatic layout inference is wrong.
The PDE hot path has been fused into one grouped convolution that computes Laplacian, x-gradient, and y-gradient together. Run focused benchmarks with:
uv run python scripts/benchmark_pde.py --device cuda --padding-mode constant
uv run python scripts/benchmark_autoencoder.py --model mnist --device cuda --pde-steps 3 --pde-padding-mode constant
uv run python scripts/benchmark_autoencoder.py --model image --device cuda --image-size 64 --pde-channels 8 --pde-steps 1 --pde-padding-mode constantCurrent findings on the tested RTX 3080:
- Fused PDE forward is roughly 14x-29x faster than the original Python channel-loop reference on representative bottleneck tensors.
constantPDE padding is faster thancircularand more natural for images.- Encoder-only PDE with one step is the fastest useful placement measured so far; decoder PDE and multi-step unrolling add cost quickly.
- AMP, channels-last, and fused Adam were not consistent wins on these small synthetic benchmarks.
torch.compileis not currently usable in this Windows environment because PyTorch Inductor cannot find a working Triton install.
Run placement ablations with:
uv run python scripts/benchmark_ablation.py --model mnist --device cuda --pde-padding-mode constant
uv run python scripts/benchmark_ablation.py --model image --device cuda --image-size 64 --channels-list 4 8 --pde-padding-mode constant