LLMini is a lightweight implementation of a transformer-based language model inspired by GPT architectures. It is designed for educational purposes and small-scale experiments, making it ideal for understanding the inner workings of language models.
- Transformer-based architecture with causal self-attention.
- Configurable hyperparameters for layers, heads, and embedding dimensions.
- Sampling techniques including temperature and top-k sampling.
- Pretrained model checkpoint for quick experimentation.
- Modular design for easy customization and extension.
- Early stopping and checkpointing during training.
- Script to split model weights from full checkpoints.
-
Clone the repository with submodules:
git clone --recurse-submodules <repo-url> cd llmini
-
Install dependencies:
pip install -r requirements.txt
Alternatively, use the Conda environment:
conda env create -f environment.yaml conda activate llmini
-
Initialize external dependencies:
The
setup_external.pyscript automatically initializes external dependencies, such as downloading datasets or setting up submodules, during installation.
Train the model on the Tiny Shakespeare dataset:
python llmini/train.py-
Model Selection: Specify the model architecture using the
--modelflag. For example:python llmini/train.py --model tiny
-
Dataset Selection: Choose the dataset to train on using the
--datasetflag. Supported datasets includetinyshakespeareandwikitext. For example:python llmini/train.py --dataset wikitext
-
Checkpoint Path: Resume training from a specific checkpoint using the
--checkpointflag. For example:python llmini/train.py --checkpoint checkpoints/tinygpt_char.pt
These options can be combined to customize the training process. For instance:
python llmini/train.py --model tiny --dataset wikitext --checkpoint checkpoints/tinygpt_char.ptGenerate text using the pretrained model:
python llmini/sample.pyTo enable debugging logs:
python llmini/sample.py --debugTo generate text with a specific model:
python llmini/sample.py --model tinySplit the model weights from a full checkpoint: llmini/ ├── arch.py # Model architectures ├── config.py # Centralized configuration ├── data.py # Data loading and preprocessing ├── layers.py # Reusable building blocks for models ├── model.py # Model initialization and utilities ├── sample.py # Text generation script ├── train.py # Training script ├── utils.py # Shared utilities
python scripts/evaluate.pyModify llmini/config.py to adjust parameters like BLOCK_SIZE, BATCH_SIZE, and LEARNING_RATE.
llmini/
├── arch.py # Model architectures
├── config.py # Centralized configuration
├── data.py # Data loading and preprocessing
├── layers.py # Reusable building blocks for models
├── model.py # Model initialization and utilities
├── sample.py # Text generation script
├── train.py # Training script
├── utils.py # Shared utilitiesRun the test suite using:
pytest tests/Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Submit a pull request with a detailed description of your changes.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use this code, please cite it as follows:
@software{terrel2025llmini,
author = {Nick Terrel},
title = {LLMini: A Tiny LLM Implementation},
year = {2025},
version = {1.0.0},
doi = {10.1234/llmini.2025},
url = {https://github.com/nterrel/llmini}
}- Parts of the model architecture were inspired by minGPT.
- Dataset: Tiny Shakespeare.