Releases · OpenNMT/CTranslate2

T5Gemma model conversion and inference (#1962) by @jordimas
Support for CUDA 12.8 (#1937, #1940) by @Purfview
Conv1d pure CUDA implementation (#1949), makes cuDNN an optional dependency by @jordimas
Add CUDA implementation for median filter (#1917) by @ja2d8a4v

Fixes and improvements

Fix issue- tokens batch_type may exceed max_batch_size (#1948) by @anterart
Fixes AWQ support for Qwen 2.5/3 and Phi3 (#1951, #1952, #1953) by @BBC-Esq

Assets 2

05 Dec 08:32

jordimas

v4.6.2

1251f7c

CTranslate2 4.6.2

New features

Qwen 3 support (#1943) by @jordimas
Gemma 3 text support (#1936) by @jordimas

Fixes and improvements

Fixed pkg_resources Deprecated Warning (#1911) by @thawancomt
Disable INT8 for sm120 - Blackwell GPUs (#1937) by @Purfview
FIX: package libctranslate2.so in wheel to avoid build fail (#1920) by @yzewei

Assets 2

07 Nov 16:26

jordimas

v4.6.1

785c7e0

CTranslate2 4.6.1

New features

Python 3.14 support (#1926)
Support for Cuda 12.4 (#1925)
Update Intel oneAPI to version 2025.3 (#1931)

Assets 2

08 Apr 15:33

minhthuc2502

v4.6.0

617405f

CTranslate2 4.6.0

Note: The Ctranslate2 Python package now supports python 3.13, drop the support for python 3.8.

New features

Pyhton 3.13 support (#1858)
Support returning hidden vector in Wav2Vec2 and Wav2Vec2Bert Models (#1867)
Add noexecstack linker flags (#1852 + #1861)
Support Qwen2 (#1820)
Eoleconv (#1832)
Add support RobertModel (#1864)

Fixes and improvements

Fix github action (#1871)
Prevent double library def (#1818)

Assets 2

22 Oct 11:23

minhthuc2502

v4.5.0

383d063

CTranslate2 4.5.0

Note: The Ctranslate2 Python package now supports CUDNN 9 and is no longer compatible with CUDNN 8.

New features

Support Phi3 (#1800)
Support Mistral Nemo (#1785)
Support Wav2Vec2Bert ASR (#1778)

Fixes and improvements

Upgrade to CUDNN9 (#1803)
Fix logits vocab (#1786 + #1791)
Update doc AWQ (#1795)

Assets 2

09 Sep 09:21

minhthuc2502

v4.4.0

8f4d134

CTranslate2 4.4.0

Removed: Flash Attention support in the Python package due to significant package size increase with minimal performance gain.
Note: Flash Attention remains supported in the C++ package with the WITH_FLASH_ATTN option.
Flash Attention may be re-added in the future if substantial improvements are made.

New features

Support Llama3 (#1751)
Support Gemma2 (#1772)
Add log probs for all tokens in vocab (#1755)
Grouped conv1d (#1749 + #1758)

Fixes and improvements

Fix pipeline (#1723 + #1747)
Some improvements in flash attention (#1732)
Fix crash when using return_alternative on CUDA (#1733)
Quantization AWQ GEMM + GEMV (#1727)

Assets 2

11 Jun 09:16

minhthuc2502

v4.3.1

59c7dda

CTranslate2 4.3.1

Note: Because of exceeding project's size on Pypi (> 20 GB), the release v4.3.0 was pushed unsuccessfully.

Fixes and improvements

Improve the compilation (#1706 and #1705)
Fix position bias in tensor parallel mode (#1714)

Assets 2

17 May 08:20

minhthuc2502

v4.3.0

173a0d1

CTranslate2 4.3.0

New features

Support phi-3 (8k and 128k) (#1700 and #1680)

Fixes and improvements

Fix regression Flash Attention (#1695)

Assets 2

Releases: OpenNMT/CTranslate2

v4.7.1

Fixes and improvements

Uh oh!

v4.7.0

New features

Fixes and improvements

Uh oh!

v4.6.3

v4.6.3 (2026-01-06)

New features

Fixes and improvements

Uh oh!

CTranslate2 4.6.2

New features

Fixes and improvements

Uh oh!

CTranslate2 4.6.1

New features

Uh oh!

CTranslate2 4.6.0

New features

Fixes and improvements

Uh oh!

CTranslate2 4.5.0

New features

Fixes and improvements

Uh oh!

CTranslate2 4.4.0

New features

Fixes and improvements

Uh oh!

CTranslate2 4.3.1

Fixes and improvements

Uh oh!

CTranslate2 4.3.0

New features

Fixes and improvements

Uh oh!