1.58-bit large language model

A 1.58-bit Large Language Model (1.58-bit LLM, also ternary LLM) is a version of a transformer large language model with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and perplexity of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit FP16 or BF16) counterparts, this design allows reaching the same artificial intelligence goals with much lower hardware requirements, latency, and training effort.^[1]^[2]^[3]

The name comes from a fact that a single trit, a ternary arithmetic equivalent of a bit that can take the {-1, 0, 1} values, carries $log_{2}3\approx 1.58$ bits of information. The 1.58-bit LLM models are also called 1-bit LLMs^[1]^[4] (the true 1-bit models also exist).

BitNet

In 2024, Ma et al., researchers at Microsoft declared that their 1.58-bit model, BitNet b1.58 is comparable in performance to the 16-bit Llama 2 and opens the era of 1-bit LLM.^[5] BitNet creators did not use the post-training quantization of weights but instead relied on the new BitLinear transform that replaced the nn.Linear layer of the traditional transformer design.^[6]

In 2025, Microsoft researchers had released an open-weights and open inference code model BitNet b1.58 2B4T demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.^[7]

Critique

Some researchers^[8] point out that the scaling laws^[9] of large language models favor the low-bit weights only in case of undertrained models. As the number of training tokens increases, the deficiencies of low-bit quantization surface.

References

^ ^a ^b Ma et al. 2024, p. 1.
^ Friha et al. 2024, p. 5822.
^ Hutson 2024.
^ Morales 2025.
^ Huyen 2024, p. 330.
^ Wang et al. 2023, p. 1.
^ Ma et al. 2025.
^ Ouyang et al. 2024.
^ Kumar et al. 2024.

Sources

Ma, Shuming; Wang, Hongyu; Ma, Lingxiao; Wang, Lei; Wang, Wenhui; Huang, Shaohan; Dong, Li; Wang, Ruiping; Xue, Jilong; Wei, Furu (2024-02-27). "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". arXiv:2402.17764.
Ma, Shuming; Wang, Hongyu; Huang, Shaohan; Zhang, Xingxing; Hu, Ying; Song, Ting; Xia, Yan; Wei, Furu (2025), BitNet b1.58 2B4T Technical Report, doi:10.48550/ARXIV.2504.12285, retrieved 2025-04-22
Friha, Othmane; Amine Ferrag, Mohamed; Kantarci, Burak; Cakmak, Burak; Ozgun, Arda; Ghoualmi-Zine, Nassira (2024). "LLM-Based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness". IEEE Open Journal of the Communications Society. 5: 5799–5856. doi:10.1109/OJCOMS.2024.3456549. ISSN 2644-125X.
Hutson, Matthew (2024-05-30). "1-bit LLMs Could Solve AI's Energy Demands". IEEE Spectrum. Retrieved 2025-04-22.
Huyen, Chip (2024-12-04). AI Engineering. "O'Reilly Media, Inc.". ISBN 978-1-0981-6627-4. Retrieved 2025-04-22.
Kumar, Tanishq; Ankner, Zachary; Spector, Benjamin F.; Bordelon, Blake; Muennighoff, Niklas; Paul, Mansheej; Pehlevan, Cengiz; Ré, Christopher; Raghunathan, Aditi (2024), Scaling Laws for Precision, doi:10.48550/ARXIV.2411.04330, retrieved 2025-04-22
Morales, Jowi (2025-04-17). "Microsoft researchers build 1-bit AI LLM with 2B parameters". Tom's Hardware. Retrieved 2025-04-21.
Ouyang, Xu; Ge, Tao; Hartvigsen, Thomas; Zhang, Zhisong; Mi, Haitao; Yu, Dong (2024), Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens, doi:10.48550/ARXIV.2411.17691, retrieved 2025-04-22
Wang, Hongyu; Ma, Shuming; Dong, Li; Huang, Shaohan; Wang, Huaijie; Ma, Lingxiao; Yang, Fan; Wang, Ruiping; Wu, Yi; Wei, Furu (2023), BitNet: Scaling 1-bit Transformers for Large Language Models, doi:10.48550/ARXIV.2310.11453, retrieved 2025-04-23

This artificial intelligence-related article is a stub. You can help Wikipedia by expanding it.

[FOOTNOTEMaWangMaWang20241-1] Ma et al. 2024, p. 1.

[FOOTNOTEFrihaAmine_FerragKantarciCakmak20245822-2] Friha et al. 2024, p. 5822.

[FOOTNOTEHutson2024-3] Hutson 2024.

[FOOTNOTEMorales2025-4] Morales 2025.

[FOOTNOTEHuyen2024330-5] Huyen 2024, p. 330.

[FOOTNOTEWangMaDongHuang20231-6] Wang et al. 2023, p. 1.

[FOOTNOTEMaWangHuangZhang2025-7] Ma et al. 2025.

[FOOTNOTEOuyangGeHartvigsenZhang2024-8] Ouyang et al. 2024.

[FOOTNOTEKumarAnknerSpectorBordelon2024-9] Kumar et al. 2024.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]