Quantized Weights (Ternary): These are the discrete {−1,0,1} values derived from the Master Weights, used for actual inference/forward-pass. 3.2 The Forward PassDuring the forward pass, the master weights are first converted to ternary weights by the above-described operations (scaling and rounding). These gradients are then applied to the Master Weights, not the Ternary Weights. By using ternary weights, BitNet relies on INT8 operations instead of FP16, which reduces arithmetic energy costs. The authors applied an energy model to estimate the cost of operations on 7nm chips.
Source: The Guardian December 22, 2025 12:19 UTC