Quantization

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. However, quantization inevitably leads to a distribution divergence from the original network, which …