ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Quantized Convolutional Neural Networks Robustness under Perturbation

[version 1; peer review: 2 approved]
PUBLISHED 09 Apr 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Contemporary machine learning models are increasingly becoming restricted by size and subsequent operations per forward pass, demanding increasing compute requirements. Quantization has emerged as a convenient approach to addressing this, in which weights and activations are mapped from their conventionally used floating-point 32-bit numeric representations to lower precision integers. This process introduces significant reductions in inference time and simplifies the hardware requirements. It is a well-studied result that the performance of such reduced precision models is congruent with their floating-point counterparts. However, there is a lack of literature that addresses the performance of quantized models in a perturbed input space, as is common when stress testing regular full-precision models, particularly for real-world deployments. We focus on addressing this gap in the context of 8-bit quantized convolutional neural networks (CNNs). We study three state-of-the-art CNNs: ResNet-18, VGG-16, and SqueezeNet1_1, and subject their floating point and fixed point forms to various noise regimes with varying intensities. We characterize performance in terms of traditional metrics, including top-1 and top-5 accuracy, as well as the F1 score. We also introduce a new metric, the Kullback-Liebler divergence of the two output distributions for a given floating-point/fixed-point model pair, as a means to examine how the model’s output distribution has changed as a result of quantization, which, we contend, can be interpreted as a proxy for model similarity in decision making. We find that across all three models and under each perturbation scheme, the relative error between the quantized and full-precision model was consistently low. We also find that Kullback-Liebler divergence was on the same order of magnitude as the unperturbed tests across all perturbation regimes except Brownian noise, where significant divergences were observed for VGG-16 and SqueezeNet1_1.

Keywords

Neural network quantization, convolutional neural networks (CNNs), computer vision, model robustness, perturbation modeling, edge AI

1. Introduction

Convolutional neural networks (CNNs) have emerged as an effective means of modeling relationships in spatially expressive data, introducing a new domain of computer vision tasks ranging from image classification and segmentation to object detection and video processing. Although impressive in their abilities, contemporary CNNs suffer from increasingly large parameter counts and sophisticated hardware requirements, where parameters are the model’s filter weights and activations, typically stored as 32-bit floating point numbers (FP32).4,12

In hardware, forward-pass computations are performed by multiply-accumulate (MAC) operations, which incur substantial overhead due to FP32 arithmetic.23 These operations involve manipulating the exponent, mantissa, and special values such as NaN (Not a Number) and infinity, further increasing memory and computational costs.2 Such limitations are critical in real-time systems like autonomous vehicles, where weight, power consumption, and energy efficiency are tightly constrained.9,3 Thus, there is motivation to examine methods to reduce the precision of the numeric representation to speed up inference times and reduce hardware and energy requirements.

Quantization has become a popular approach to precision reduction, mapping FP32 weights and activations onto lower-resolution representations such as 8-bit integers (INT8). This technique, commonly used in other domains like digital signal processing for power efficiency and reduced latency, has been successfully adapted for machine learning.25 Prior studies have demonstrated its potential:

  • 1. Vector quantization techniques, such as k-means and product quantization, have achieved up to 24 × compression with minimal accuracy loss.13

  • 2. Compression pipelines combining pruning, quantization, and Huffman coding yielded compression ratios as high as 49 × with significant energy savings.16

  • 3. Optimization methods weighted by parameter importance, such as those based on the Hessian matrix, have further improved quantization efficiency.7

More aggressive strategies, such as binary networks, have also shown promise, with reductions of up to 58 × in computation time and 32 × in memory usage.28,29 However, these approaches often involve trade-offs in model accuracy and information-storing capacity.

However, given that any precision reduction also decreases a network’s information-storing capacity, it is essential to understand and characterize performance prior to deployment. Specifically, understanding performance in real-world conditions is important to derisk the implementation of quantized networks for practical applications. To this end, this paper experimentally examines the performance of quantized networks under simulated environmental perturbation. We test on four noise regimes, including random additive white Gaussian noise (AWGN), spatially correlated Brownian noise, and vertical and horizontal occlusions. The rationale for these regimes is discussed in later sections.

We also depart from standard benchmark datasets such as MNIST,11 CIFAR,20 and ImageNet10 and study fine-grained visual classification (FGVC). FGVC is relevant to scenarios where it is necessary to predict subclasses within a category, e.g., breeds of dogs or families of aircraft that show high overlap in the representation space. As such, this task generalizes the analysis of datasets such as ImageNet where classes are independent and do not share categorical similarities to such a high degree. Differences between classes in FGVC are comparatively subtle and often are as nuanced as a wing shape or engine mounting, leading to what we contend to be a more substantial assessment of a quantized model’s spatial embedding capacity.

This paper contributes to the quantization literature by addressing the performance of reduced precision networks in FGVC applications through

  • 1. Experimentally evaluating the performance of quantized networks under various forms of perturbation, relative to their full-precision counterparts,

  • 2. Experimentally evaluating quantized networks ability to perform fine-grained visual classification,

  • 3. Employing Kullback-Leibler (KL) divergence as a metric to quantify the similarity in decision making between quantized and full-precision models.

These results serve as a transferable contribution to understand robustness of quantized networks to perturbations in the analysis of subclass datasets (e.g. FGVC) where the representation shows high overlap.

2. Methods

2.1 Quantization

In this work we adhere to common quantization schemes with neural networks and map the higher-resolution set of possible FP32 numbers to a lower-resolution subset of INT8 numbers.23,35 In accordance with typical approaches to network quantization, we are quantizing the weights and outputs (activations) through the network23,35,37 We also consider strictly post-training quantization (PTQ) rather than quantization-aware training (QAT), as it does not require a complete re-training step but a simple calibration step on unlabeled data. While generally QAT yields higher compression rates and lower error rates,23 as this paper will show, PTQ still performs well and comes with the benefit of faster and less resource-intensive implementation.

PTQ necessitates first defining several key parameters: the bit-width b , the step-size or scale factor, s , and the zero-point z .23 The bit-width defines the number of possible levels in the quantization grid. The scale factor sets the step-size or difference between each level. The zero-point is an integer chosen such that the actual zero is quantized without error and is important to ensure activation functions like ReLU do not introduce additional quantization errors.23

Using symmetric or affine quantization involves mapping parameters to the quantization grid depending on the symmetry of the scheme. For the unsymmetric (affine) case, we have

(1)
xINT8=clamp(xFP32s+z,0,2b1).

For the symmetric about z case

(2)
xINT8=clamp(xFP32s+z,2b1,2b11).

The notation is the round-to-nearest integer operator while the clamping function is defined as

(3)
clamp(x;a,c)={aifx<a,xifaxc,cifx>c,
and the parameters a and c denote the bounds of the integer grid. The quantization range is bounded by qmin and qmax , and these are defined in terms of quantization symmetry. For affine quantization
(4)
qmin=sz
(5)
qmax=s(2b1z).

For the symmetric case, z is constrained to 0, and the range is bounded by

(6)
qmin=s(2b1),
(7)
qmax=s(2b11).

As mentioned, we select a bit width of 8-bits, the most commonly used scheme to balance compression and error rates.35 Scale and zero point are determined using optimization algorithms packaged with PyTorch’s26 quantization library. We quantize weights symmetrically and the activations in an affine fashion. These choices reflect that weights are approximately symmetrically distributed around 0, whereas activations are positively skewed owing to their activation function, e.g., ReLU.

2.2 Dataset

To study FGVC, we use the fine-grained visual classification aircraft (FGVCA) dataset.22 FGVCA was first introduced as part of the 2013 International Conference on Computer Vision (ICCV), and since then has seen frequent use in the literature.34,5 FGVCA is comprised of 10000 1-2 Mpixel images and provides several classification tasks and labels, including aircraft model (most specific), variant, family, and manufacturer (least specific).22 The family classification task is considered here and comprises 70 family labels. Examples of families include Boeing-737, which includes variants like 737-200 or 737-300.22 Figure 1 illustrates several dataset samples. Dataset images have varying resolutions and aspect ratios and are altered to a consistent input size as part of the pre-processing transformation.

11a13f80-8f50-408d-a2c1-88656bed9987_figure1.gif

Figure 1. Sample imagery from the FGVC-A dataset.

Other FGVC datasets are available, including natural species,8 birds,18 or flowers.24 FGVCA has important subclass properties where aircraft features are strongly dependent on size (hobbyist aircraft to large transport aircraft), purpose (commercial, pleasure, or military), and technology (turbine propulsion, propeller, glider, etc.). Each of these designs have unique structural features such as the wing shape and size, fuselage style, landing gear/wheels, and engine mounting. Another interesting feature of FGVCA is that different organizations such as airlines and military often have slight modifications such as branding and camouflage, meaning classes can have the same rigid structure, but distinctly different "looks". Cumulatively, these features result in rigid variability across classes compared to tasks such as birds or flowers, where non-trivial variations can exist within the same class due to mutations, climate, etc. This rigidity is important for our study as it ensures that errors owing to ambiguity in classes are minimized, allowing us to focus exclusively on errors owing to each model’s capacity for spatial embedding.

2.3 Perturbations

To model real-world phenomena, we selected additive white Gaussian Noise (AWGN), spatially correlated Brownian noise, and vertical and horizontal occlusions. AWGN is commonly used as a model for thermal noise experienced in circuits and random photo-sensor noise.14,6 Additionally, AWGN is commonly used in adversarial attacks against deep learning-based systems.30,38 Brownian noise approximates correlated noise like smoke, fog, clouds, and underwater distortions. Vertical and horizontal occlusions are a model of structured image corruption owing to losses from sensor malfunctions, data corruption during transmission, or solar radiation resulting in rows or columns of dead pixels.

Given an image, x:

(8)
xH×W×C
with height H , with W , and channels C , we our perturbations are defined as follows. In AWGN, each pixel’s noise term is independently and identically sampled from a single Gaussian distribution. The lack of correlation between noise samples yields a grainy image. Given the univariate Gaussian density function with mean μ=0 and variance σ2 :
(9)
N(0,σ2)=12πσ2e(x)22σ2.

Pixels are indexed by i,j, and k respectively denoting row, column and channel yielding our C channel additive noise mask ng(i,j,k) :

(10)
ng(i,j,k)N(0,σ2).

The perturbed input follows from a simple addition of the noise mask to the original input:

(11)
X(i,j,k)=X(i,j,k)+ng(i,j,k).

Brownian noise differs from AWGN in that it is not independent and identically distributed, but rather spatially correlated. Moreover, unlike the flat power spectral density of AWGN, Brownian noise has a power spectral density proportional to the inverse square of the frequency, 1f2 , and lower frequencies dominate the noise spectrum. Thus, lower frequencies are amplified while higher frequencies are attenuated, giving long-range, smoothly-varying spatial correlations. Brownian noise adds blotches and blur-like artifacts to an image as illustrated in Figure 2.

11a13f80-8f50-408d-a2c1-88656bed9987_figure2.gif

Figure 2. Sample imagery from the FGVC-A dataset under various perturbations.

To generate Brownian noise, we start with white noise, ng(i,j,k) , sampled from (9). These noises are fast Fourier transformed (FFT) for each of the three channels and yield the frequency components u,v,z , or W(u,v,z) :

(12)
W(u,v,z)=FFT[ng(i,j,k)].

We then scale each frequency component by 1f2 , where f is the frequency magnitude, u2+v2+z2 :

(13)
W(u,v,z)=W(u,v,z)u2+v2+z2.

Finally, we apply the inverse FFT (IFFT) to convert to the spatial domain and add the resulting noise to the image:

(14)
nb(i,j,k)=IFFT[W(u,v,z)],
(15)
X(i,j,k)=X(i,j,k)+nb(i,j,k).

Lastly, we examine highly structured noise in vertical and horizontal black-out occlusions, or streaks, applied within a pre-defined bounding box of the classification target within each image, as shown in Figure 2. We generate these as having constant width such that increasing the number of streaks increases the image coverage. We control the intensity or degree of perturbation induced by such occlusions by adjusting the number of streaks present in the class’ bounding box region. Edges introduced by occlusions are expected to significantly degrade performance since CNNs extract features such as edges from images, and the occlusions may cause irrelevant feature activation and false edge identification. Streaks may also cover important or distinguishing features of a target class, such as its engine or wing structures, and reduce network performance.

2.4 Kullback-Liebler divergence

To extend our analysis of the performance of quantized networks to the softmax probability distribution, we study the KL divergence between the output probabilities of each FP32 and INT8 model. KL divergence, introduced in,1 quantifies the difference between two distributions P (true) and Q (baseline).32

(16)
DKL(PQ)=iP(i)logP(i)Q(i),
and i is the number of possible states. In our context, we wish to compare the divergence of an INT8 quantized model’s output class probabilities to its full-precision counterpart, given K output classes. It is asymmetric, i.e., (DKL(PQ)DKL(QP)) , and always non-negative, reaching zero if and only if P and Q are identical. We denote the INT8 and FP32 models’ class probability distributions as PINT8 and PFP32 and then
(17)
DKL(PFP32PINT8)=k=1KPFP32(k)logPFP32(k)PINT8(k).

If we observe similar accuracy with high KL divergence, the quantized model maintains accuracy at the cost of what is termed here confidence. For example, consider an FP32 model that yields the following output probabilities: class 1: 0.7, class 2: 0.2, class 3: 0.1, and an INT8 model that yields class 1: 0.5, class 2: 0.4, class 3: 0.1, where the actual label is class 1. Both models correctly classified class 1 as the actual label. However, they have markedly different degrees of confidence. Of course, KL divergence alone does not grant insight into class-by-class probabilities, but it does quantify a macro-level model similarity, which may serve as a precursor to specific class probability studies. However, it is important to recognize that this metric assumes the FP32 model is the true baseline distribution. We contend that this is a fair characterization as, in reality, it is the theoretical best we can do regarding capacity for information embedding per parameter. In this case, the baseline is indeed dynamic in the sense that the distribution of the FP32 model will change under a perturbation. Given that both the FP32 and INT8 models are tested on identical inputs with identical perturbation, we are then answering the question under perturbation level X, how does our quantized model deviate in its probabilities from what would otherwise be outputted given no quantization?

3 Experimental procedure

3.1 Models and training

We examine three state of the art models including VGG-16,33 ResNet-18,17 and SqueezeNet1_1.19 These models were selected on the basis of their varying size and ubiquity in the current body of computer vision research. We compare the size of each network in terms of parameter count and floating point operations (FLOPs) in Table 1. Models were downloaded from PyTorch’s model zoo,26 with pre-trained weights for ImageNet.10 Each model was adjusted to have 70 output neurons, in accordance with the FGVCA families classification task. Models were trained for 250 epochs on a training set of shuffled 3333 images, using the cross-entropy loss function to measure the prediction error. The stochastic gradient descent optimizer was employed with a learning rate of 0.001 and a momentum of 0.9 to update the model parameters during training. Additionally, a learning rate scheduler was applied to decrease the learning rate by a factor of 0.1 every 50 epochs.

Table 1. Details of studied models.

All data retrieved from PyTorch documentation.26

ModelParameter (M)Size (MB) GFLOPs
VGG-16138.4527.815.47
ResNet-1811.744.71.81
SqueezeNet1_11.24.70.35

3.2 Perturbation intensities

We examine the performance of each model under each of the perturbations discussed above. As noted, each of these perturbations has its own respective intensity parameter. For AWGN and Brownian noise it is standard deviation, and the number of occlusions for vertical and horizontal occlusions. In each case, the intensity range was experimentally selected to best capture the full spectrum of degradation, as is shown in the accuracy/F1 plots in Section 3. For AWGN we study standard deviations ranging from 0 to 1 with a step size of 0.1 from 0.1 to 0.6, and then 0.2 from 0.6 to 1.0 as beyond 0.6 model performance plateaus. For Brownian noise we study standard deviations ranging from 10 to 80 with a step size of 10. For vertical and horizontal occlusions we select occlusions ranging from 1 to 6 with step sizes of 1. Note that the inclusion of even 1 streak yielded significant degradation across all metrics, and as more occlusions were added, performance quickly degraded to near 0 across all metrics. Also, as discussed in Section 1.3 the Brownian perturbation regime required comparatively higher standard deviations than AWGN to yield any noticeable degradation. This is expected, as Brownian noise is smoother and more structured than AWGN, owing to its 1/f2 spectrum, making it less disruptive to the image’s overall structure and features when compared to AWGN’s direct pixel-by-pixel variations. For each perturbation scheme each model’s FP32/INT8 pair were tested on the full test set of 3333 images.

3.3 Metrics computation

We are interested in computing 4 primary metrics, top-1 accuracy, top-5 accuracy, F1-score, and KL divergence. Top 1 accuracy, top 5 accuracy, and F1 score are computed using Scikit-learn.27 To compute KL divergence we store the 70 class output probabilities for each model pair and compute the KL divergence between them, on a per image basis, for each perturbation level. We then compute the average KL divergence across the entire test set of 3333 images, for each perturbation level, resulting a in a single scalar averaged measure for each model pair.

4. Results and Discussion

4.1 Unperturbed results

We begin by examining the unperturbed results to establish a baseline. Table 2 presents F1, top-1 and top-5 accuracies for each model pair with VGG-16 showing the best performance across categories and SqueezeNet1_1 the worst. This result is expected result and matches the general trend presented in Table 1 which supports the notion that increased parameter count driven by layer depth yields higher accuracy. As expected, we can see that top-1 accuracy is comparatively worse than top-5 accuracy across all models - SqueezeNet1_1 exhibits a 14% top-1 accuracy drop compared to VGG-16 but only a 6% drop in top-5 accuracy. This is indicative of the nature of FGVC where it is likely that smaller models like SqueezeNet1_1 lack the expressive power to capture the subtleties between similar classes but can make a near correct prediction. We also see that F1 - the average of a model’s ability to avoid false positives (precision) and false negatives (recall) - follows the same trend as top-1 accuracy. Most notable, however, is that we observe very little degeneration in model performance across all three metrics post-quantization. This result is congruent with23,35,37 and confirms that our quantization scheme is implemented and performing as expected. Table 3 presents KL divergences for each model pair. Interestingly, ResNet-18 exhibits the lowest KL divergence by an order of magnitude, and VGG-16 the highest. While all relatively low, these results may reflect each model’s architecture. For instance, ResNet-18’s skip connections, may aid in mitigating quantization error propagation where VGG-16’s depth without such connections may exacerbate quantization error propagation.

Table 2. Baseline unperturbed top-1 and top-5 accuracies.

SqueezeNet1_1ResNet-18VGG-16
MetricFP32INT8FP32INT8FP32 INT8
Top-1 Acc0.62680.62500.72040.71920.78430.7828
Top-5 Acc0.88660.88660.92410.92560.94870.9475
F1 Score0.62870.62680.71530.71450.78350.7828

Table 3. Baseline unperturbed KL divergences.

Model Pair KL Divergence
SqueezeNet1_10.0138
ResNet-180.0082
VGG-160.0182

4.2 Perturbed results

We begin by looking at the impacts of AWGN, presented in Figure 3, and Table 4. Across all models and metrics, we see steep degeneration beginning at σ = 0.2 and plateauing at σ = 0.8, indicating a point of potential ill-conditioning in this particular noise regime. As expected, top-5 accuracy is consistently higher than top-1 by a 20% - 30% margin, with the gap between them widening as noise levels increase. This result suggests that while the exact classification becomes more challenging, the correct class will likely remain among the top 5 predictions. Moreover, in line with our baseline observations, VGG-16 generally outperforms ResNet-18 and SqueezeNet1_1, and ResNet-18 typically outperforms SqueezeNet1_1, with the gap narrowing at higher noise levels. F1 scores follow similar trends to accuracy metrics but show a steeper decline with respect to noise, suggesting that both precision and recall are possibly more severely impacted by noise than accuracy metrics. Most important to this study, however, is that in all cases, we see no significant divergence in performance curves between each FP32/INT8 model pair, directly supporting the notion that quantized models, under AWGN, are just as performant, according to accuracy and F1 metrics, irrespective of overall losses. Figure 5 and Table 6 present the KL divergences under AWGN. Here, we witness interesting non-linear behavior - under AWGN, each model’s INT8 representation exhibits peak divergence in its class output probabilities in the σ = 0.4-0.6 range. Also, in line with the trend in baseline KL divergences, VGG-16 demonstrates the highest divergence despite being less sensitive in its accuracy and F1 response curves. This suggests that while VGG-16 in its INT8 form is demonstrably robust according to accuracy and F1, its output distribution shifts from its FP32 form, potentially indicating decreased overall confidence or misplaced confidence. This result is particularly relevant in threshold-based systems where decisions are made based on some output probability threshold. The drop in KL divergence after σ = 0.6 is most likely due to the model pairs reaching a saturation point in error, where, in effect, each model pair is equally weak and unable to make any meaningful predictions, which is evidenced in the accuracy plots where we can see sub-20% top-1 accuracy and sub-40% top-5 accuracy for each quantized and full-precision model.

11a13f80-8f50-408d-a2c1-88656bed9987_figure3.gif

Figure 3. Performance of each model pair under varying levels of AWGN.

Table 4. AWGN performance results.

Top-1 AccuracyTop-5 AccuracyF1 Score
σ SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16
FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8
0.10.6110.6080.7120.7120.7780.7790.8790.8830.9210.9190.9410.9410.6150.6130.7130.7130.7790.782
0.20.5210.5260.6270.6190.7020.7030.8240.8250.8790.8720.9120.9030.5310.5330.6380.6310.7110.713
0.30.3720.3790.4690.4660.5410.5320.7010.7110.7620.7540.8180.8140.3760.3820.4890.4870.5550.548
0.40.2340.2430.2970.3000.3410.3350.5160.5290.5850.5730.6460.6380.2160.2300.3060.3090.3590.353
0.50.1390.1460.1780.1760.1810.1730.3460.3560.4250.4100.4740.4730.1050.1150.1670.1610.1880.180
0.60.0940.0950.1110.1040.0980.0990.2420.2460.3100.2930.3460.3430.0490.0510.0870.0830.0880.090
0.80.0740.0750.0660.0650.0480.0470.1760.1760.2140.2130.2440.2420.0190.0220.0320.0300.0230.021
1.00.0750.0720.0550.0590.0400.0410.1610.1610.2080.2030.2140.2220.0160.0170.0160.0170.0070.007
11a13f80-8f50-408d-a2c1-88656bed9987_figure4.gif

Figure 4. Performance of each model pair under varying levels of Brownian noise.

11a13f80-8f50-408d-a2c1-88656bed9987_figure5.gif

Figure 5. KL divergences for each model pair under various levels of AWGN.

Table 5. Brownian Noise Performance Results.

Top-1 AccuracyTop-5 AccuracyF1 Score
σ SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16
FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8
100.6060.6070.6920.6800.7790.7760.8730.8800.9050.9020.9430.9430.6060.6080.6880.6770.7790.776
200.5470.5560.6080.6040.7720.7560.8390.8420.8560.8620.9330.9330.5500.5590.6040.6020.7720.756
300.4600.4550.5240.5330.7410.7310.7820.7780.8000.8060.9170.9190.4630.4580.5180.5320.7400.730
400.3870.3750.4460.4380.7070.6920.7140.7070.7280.7340.9040.8980.3830.3690.4410.4330.7080.694
500.3170.3070.3740.3820.6530.6520.6430.6340.6530.6720.8790.8740.3040.2920.3580.3690.6550.655
600.2600.2580.3150.3180.6030.6000.5680.5570.5990.5950.8360.8410.2390.2310.2980.3020.6050.604
700.2020.2180.2620.2680.5470.5500.4910.5010.5370.5490.8060.7980.1720.1880.2410.2450.5520.556
800.1760.1820.2270.2290.4840.4760.4340.4400.4880.4840.7610.7530.1370.1460.2020.2020.4880.479

Table 6. AWGN KL Divergences.

σ SqueezeNet1_1ResNet18 VGG16
0.10.0410.0320.050
0.20.1040.0840.142
0.30.1670.1440.255
0.40.1930.2030.338
0.50.1720.2140.296
0.60.1280.2050.222
0.80.0640.1530.108
1.00.0370.1040.055

Next, we examine the impacts of varying levels of Brownian noise. Figure 4 and Table 5 present the F1 scores, top-1 and top-5 accuracies where we can see very different response curves compared to the AWGN results. It is clear that, as expected, the effects of Brownian noise are much more gradual compared to AWGN, as the spatial correlation of Brownian noise yields a much smoother noise pattern. Like AWGN, we see VGG-16 outperform all models as it maintains top-1 accuracy above 50% and top-5 accuracy above 70% until σ = 50. Interestingly, we can see that Brownian noise seems to induce a relatively large error in the quantized model from the FP32 model compared to AWGN, especially in VGG-16 in the σ = 10-50 range. In terms of F1, we also observe more significant discrepancies between the quantized and full-precision models; for instance, VGG-16 exhibits roughly 2% error in F1 at σ = 20. Also, as expected, the top-5 accuracy is consistently higher than the top-1 accuracy in all three models. KL divergence, as shown in Figure 6 and Table 7, is much higher than AWGN and does not have the same peaking behavior shown in Figure 5 - instead we see a steady increase with noise across all three models. Like AWGN, VGG-16 exhibits the highest divergence, again suggesting a comparatively higher discrepancy in the model’s confidence among output classes when making predictions. ResNet-18, in contrast, yields the lowest divergence. Overall, we can see that the impacts of Brownian are much more gradual; however, they induce significantly higher KL divergences when compared to AWGN.

11a13f80-8f50-408d-a2c1-88656bed9987_figure6.gif

Figure 6. KL divergences for each model pair under various levels of Brownian noise.

Table 7. Brownian noise KL divergences.

σ SqueezeNet1_1ResNet18 VGG16
100.2470.1970.105
200.6220.4130.273
301.0890.6190.470
401.4980.7470.713
501.8430.9021.035
602.1111.0071.273
702.2821.1421.565
802.3271.2221.663

We now look at the results under vertical occlusions, given in Figure 7 and Table 8. Here, we see performance degradation in accuracy and F1 gradually dropping off, similar to Brownian noise, which is an expected result as occlusions are another form of highly correlated noise. However, while the decrease is gradual, the immediate drop off at just 1 occlusion is significant. For example, the inclusion of 2 vertical occlusions yielded sub-50% top-1 accuracy across all models. With respect to quantization, we see the same general trend as other perturbation regimes, in that there is minimal divergence between each FP32/INT8 model pair, again suggesting that performance degradation is a macroscopic phenomenon - a function of model architecture and overall size, not parameter-wise information capacity. Indeed, each model exhibits relatively low KL divergence, with the max being 0.0636 for VGG-16 at two vertical occlusions, as shown in Figure 9 and Table 10. Moreover, the lack of divergence is relatively constant, with the most significant delta being just 0.038 for VGG-16 between 1 and 3 occlusions. However, it is notable that ResNet-18 exhibits the lowest KL divergence, which, paired with its performance in top-1, top-5, and F1, indicates it is comparatively robust against highly structured perturbation.

11a13f80-8f50-408d-a2c1-88656bed9987_figure7.gif

Figure 7. Performance of each model pair under various amounts of vertical occlusion.

Table 8. Vertical occlusions performance results.

Top-1 AccuracyTop-5 AccuracyF1 Score
Num.SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16
FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8
10.4960.5000.6170.6220.6490.6440.8330.8340.8850.8820.8930.8920.4980.5010.6060.6130.6440.639
20.3700.3730.4810.4750.4890.4830.7270.7320.7830.7810.7790.7800.3690.3700.4730.4660.4810.474
30.2270.2340.3890.3760.3230.3210.5520.5480.7020.6970.6370.6360.2130.2180.3840.3680.3120.308
40.1990.1980.2990.2900.2270.2240.4750.4740.6020.5890.5300.5230.1890.1880.2810.2710.2000.195
50.1120.1130.2160.2120.1670.1640.3240.3250.4870.4790.4130.4110.0940.0960.1920.1870.1240.120
60.0650.0630.1760.1730.1320.1260.1950.1950.4180.4110.3440.3330.0380.0370.1620.1580.0910.087
11a13f80-8f50-408d-a2c1-88656bed9987_figure8.gif

Figure 8. Performance of each model pair under various amounts of horizontal occlusion.

11a13f80-8f50-408d-a2c1-88656bed9987_figure9.gif

Figure 9. KL divergences for each model pair under various amounts of vertical occlusion.

Table 9. Horizontal occlusions performance results.

Top-1 AccuracyTop-5 AccuracyF1 Score
Num.SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16SqueezeNet1_1ResNet18VGG16
FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8FP32INT8
10.5560.5610.6560.6570.7180.7130.8510.8490.8940.8940.9210.9200.5620.5670.6520.6530.7170.714
20.4900.4850.6170.6160.6560.6470.8170.8160.8810.8810.8990.8960.5060.5010.6140.6140.6620.655
30.4490.4510.5490.5510.5900.5810.7860.7850.8300.8310.8580.8560.4620.4660.5490.5520.5950.587
40.3560.3480.4640.4610.4580.4550.7080.7100.7750.7800.7700.7670.3740.3670.4700.4650.4580.456
50.3230.3220.4090.4110.3950.3890.6830.6810.7160.7190.7190.7100.3460.3450.4130.4130.3970.391
60.2750.2690.3380.3410.3130.3080.6240.6190.6450.6470.6390.6340.2840.2780.3430.3470.3080.300

Table 10. Vertical occlusions KL divergences.

Num.SqueezeNet1_1ResNet18VGG16
10.0150.0090.026
20.0280.0140.060
30.0310.0140.064
40.0260.0130.058
50.0230.0110.048
60.0220.0120.041

Lastly, we look at performance under horizontal occlusions with results shown in Figure 8 and Table 11 while Figure 10 shows the KL divergence. Similar to the results under vertical occlusion, we see an immediate drop-off at just one occlusion, again showing that, generally, models are much more sensitive to highly structured occlusions than AWGN and Brownian noise. Also of note is that VGG-16 performs the best out of all three models across accuracy and F1, unlike the vertical occlusions. F1 is similar to the vertical occlusion results, with a gradual decay congruent with the top-1 accuracy curve. Another interesting note is that we generally observe less degradation for the same number of streaks than vertical occlusions. This may be because the distinguishing features learned during training are dominated by horizontal edges such as the plane’s fuselage, and introducing points of high contrast in the form of structured noise is much more disruptive, as they are guaranteed to intersect the plane’s structure. Further, we see a widened discrepancy between top-1 and top-5 accuracies with horizontal occlusions compared to vertical, suggesting it is likely that generally, models are more likely to have the correct class in their top 5 predictions but struggle with exact classification. This result is likely exacerbated by the dataset’s nature, in that such occlusion severely impairs the model’s ability to distinguish between subtleties in the different classes. We can also see the same behavior as all other studied perturbations with respect to quantization in that there is minimal divergence between the INT8 and FP32 models across all metrics. Most unique to horizontal occlusions is that the KL divergence for all model pairs is near zero and shows no clear trend. This indicates that the output distributions are similar under such perturbation and lack any notable discrepancy.

Table 11. Horizontal occlusions KL divergences.

Num.SqueezeNet1_1ResNet18 VGG16
10.0140.0090.021
20.0190.0110.029
30.0170.0100.028
40.0190.0110.032
50.0160.0110.027
60.0160.0120.029
11a13f80-8f50-408d-a2c1-88656bed9987_figure10.gif

Figure 10. KL divergences for each model pair under various amounts of horizontal occlusion.

5. Conclusion

We have compared the robustness of FP32 and post-training INT8 quantized CNNs under input perturbations, ranging from independent and identically distributed AWGN to spatially correlated Brownian noise to vertical and horizontal occlusions. We considered three state-of-the-art models ranging in parameter count, including SqueezeNet1_1, ResNet-18, and VGG-16, and measured raw performance according to F1 score, top-1 and top-5 accuracy. To measure changes in class output confidence pre- and post-quantization, we employed KL divergence. We have found that while degradation was observed amongst all models, substantial degradation in the quantized model relative to the full-precision model was not observed. Indeed, we see the highest degree of top-1 and top-5 error between the models in VGG-16 under Brownian noise at σ=20 , and in ResNet-18 under AWGN at σ=0.6 , respectively. We observe the most significant discrepancy in class output probabilities under the Brownian noise regime, followed by AWGN. It is also worth noting that the same presented methodology in terms of examining KL divergence with perturbations can be applied to evaluate the robustness of other model compression techniques, such as approximate multipliers.36,15,31

As stated in the introduction, this work sought to address deficiencies in the current body of research surrounding neural network quantization. The goal is to de-risk the deployment of neural networks in real-world scenarios and provide experimental data to understand their performance under various perturbation regimes. Based on these experimental findings, we can conclude that INT8 quantized networks do not exhibit ill-conditioning or exacerbated sensitivity under perturbation relative to their FP32 counterpart and, thus, are robust in these noise regimes.

We have also identified a balance between traditional metrics like top-1 and top-5 accuracy and model similarity. For example, in the AWGN regime, we see that VGG-16 outperforms ResNet-18 but also has a higher KL divergence consistent with lower confidence. Models like VGG-16 also carry a significantly higher energy consumption and increased complexity as measured by MACs and GFLOPs ( Table 1). These results provide a means to guide designers to identify candidate models based on expected deployment scenarios; for instance, if thermal noise is expected, one may consider AWGN, and if similarity in probabilities is more important than top-1 accuracy, ResNet-18, with 8.5x fewer GFLOPs than VGG-16, may be preferred. That is, we have highlighted a crucial trade-off between raw accuracy and KL divergence. We sought not to prove that, for example, VGG-16 is the best performer under perturbation but rather that there are demonstrable trade-offs when considering quantization in terms of accuracy and model similarity under perturbative threats.

Code availability

Source code available from: https://github.com/jacklangille/CNN-Quantization-Experiment-Code/tree/main

Archived source code at time of publication: https://doi.org/10.5281/zenodo.15097737

License: MIT license

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Apr 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Langille J, Hammad I and Kember G. Quantized Convolutional Neural Networks Robustness under Perturbation [version 1; peer review: 2 approved]. F1000Research 2025, 14:419 (https://doi.org/10.12688/f1000research.163144.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 09 Apr 2025
Views
12
Cite
Reviewer Report 29 Apr 2025
Venkata Mohit Tamanampudi, JP Morgan Chase, New York City, USA 
Approved
VIEWS 12
This is a well-executed and clearly written study that fills an important gap in understanding the real-world reliability of quantized CNNs under perturbations. The experiments are thorough, and the conclusions are sound.

If the authors want to ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Tamanampudi VM. Reviewer Report For: Quantized Convolutional Neural Networks Robustness under Perturbation [version 1; peer review: 2 approved]. F1000Research 2025, 14:419 (https://doi.org/10.5256/f1000research.179444.r377971)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
8
Cite
Reviewer Report 24 Apr 2025
Joseph Chukwunweike, Gist Limited, Bristol, UK 
Approved
VIEWS 8
The article is technically sound however there are some corrections that needs to be done

1. References included in the conclusion session should not be included (Removed)
2. The referencing of the Programme used to run ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Chukwunweike J. Reviewer Report For: Quantized Convolutional Neural Networks Robustness under Perturbation [version 1; peer review: 2 approved]. F1000Research 2025, 14:419 (https://doi.org/10.5256/f1000research.179444.r380048)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Apr 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.