International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 6 Issue 6 November-December 2024 Submit your research before last 3 days of December to publish your research paper in the issue of November-December.

Advancements in Large Language Model Efficiency: A Literature Review on 1-bit Quantization

Author(s) Lalitha Shree C P, Nethravathi B
Country India
Abstract Large Language Models suffer from most of the challenges regarding computational cost, memory, and energy consumption, which makes their scaling difficult. BitNet b1.58 addresses these issues by introducing a novel 1.58-bit quantization with ternary weights {-1, 0, 1}. This can achieve performance comparable to FP16 models with much lower resource requirements. BitNet b1.58 provides 2.71x faster inference and 3.55x less memory usage than FP16 baselines. Its ternary weights allow for efficient feature filtering, making it a versatile choice for many AI applications. This makes it a standout solution, balancing high performance with resource efficiency. While BitNet b1.58 has its 1-bit mover heads that make it affordable for edge and mobile devices, it also allows longer sequences. This will mark further developments toward making AI scalable and resource-aware for various applications.
Keywords LLMs, 1 bit Quantization, BitNet, BitNet b1.58
Field Engineering
Published In Volume 6, Issue 6, November-December 2024
Published On 2024-12-10
Cite This Advancements in Large Language Model Efficiency: A Literature Review on 1-bit Quantization - Lalitha Shree C P, Nethravathi B - IJFMR Volume 6, Issue 6, November-December 2024. DOI 10.36948/ijfmr.2024.v06i06.32856
DOI https://doi.org/10.36948/ijfmr.2024.v06i06.32856
Short DOI https://doi.org/g8vgf5

Share this