ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Convolutional neural networks for real-time wood plank detection and defect segmentation

[version 1; peer review: 1 approved with reservations]
PUBLISHED 23 Mar 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background: Defect detection and segmentation on product surfaces in industry has become one of the most important steps in quality control. There are many sophisticated hardware and software tools used in the industry for this purpose. The need for the real-time classification and detection of defects in industrial quality control has become a crucial requirement. Most algorithms and deep neural network architectures require expensive hardware to perform inference in real-time. This necessitates the design of architectures that are light-weight and suitable for deployment in industrial environments.
Methods: In this study, we introduce a novel method for detecting wood planks on a fast-moving conveyor and using a convolutional neural network (CNN) to segment surface defects in real-time. A backbone network is trained with a large-scale image dataset. A dataset of 5000 images is created with proper annotation of wood planks and defects. In addition, a data augmentation technique is employed to enhance the accuracy of the model. Furthermore, we examine both statistical and deep learning-based approaches to identify and separate defects using the latest methods.
Results: Our plank detection method achieved an impressive mean average precision of 97% and 96% of global pixel accuracy for defect segmentation. This remarkable performance is made possible by the real-time processing capabilities of our system, which can run at 30 frames per second (FPS) without sacrificing accuracy.
Conclusions: The results of our study demonstrate the potential of our method not only in industrial wood processing applications but also in other industries where materials undergo similar processes of defect detection and segmentation. By utilizing our method, these industries can expect to see improved efficiency, accuracy, and overall productivity.

Keywords

Artificial Intelligence, Convolutional Neural Networks, Data Augmentation, Defect Detection, Deep Learning, Neural Networks.

Introduction

Defect detection and segmentation are crucial steps for quality control in automated production lines in various industries such as wood, textiles, and medicine.13 These processes involve the use of machine-learning algorithms to identify and classify defects, which can be surface imperfections, structural issues, or other abnormalities that affect product quality. Although it is easy for humans to detect and recognize defects on product surfaces, machines are not always accurate at performing this task. Therefore, industries require automatic defect detection systems in their quality control processes, ensuring that only high-quality products are released for sale. These systems take input images and produce segmentation of areas containing defects.4 The location of the defect is crucial in these applications, and real-time processing is important in industrial environments.

There have been significant developments in deep learning-based defect detection in recent years, with attempts to create generic datasets5 that can be used for all types of defect detection systems. However, every application requires datasets in a specific industry domain, such as wood and metal.1,6 If a model is trained on a specific defect dataset, it may not always produce the same accurate results on other defects datasets as the properties of the product surface may have different background colors or different types of defects.7 Methods8 have been developed that are trained on different datasets and use knowledge transfer to perform defect detection on other datasets; however, generic defect detection does not seems to work well for all types of defects.

Our paper introduces a new approach for the automated detection and segmentation of defects on rapidly moving wood plank surfaces, using our novel method. Our method first detects the wood plank itself and then the extracted plank image is passed to another module, where defect segmentation is performed in real-time. The input frame is divided into two parts, each with regions of interest (ROI). When a plank enters the first ROI and its front is captured, it is assigned an identification number (ID). The machine then flips the plank and it enters into the second ROI to capture the other side using the same ID. Defects are detected and segmented on both sides, the final classification of the plank is determined based on the severity of the detected defects. We classified wood planks into six categories based on their quality, with 1 indicating a higher quality plank and 6 indicating the lowest quality. Table 3 shows the defect types and corresponding severity of the defects. Our approach uses both sides of the wood panel to achieve better results than previous research, which typically used only one side of the wood surface or employed multiple cameras, resulting in increased hardware costs.9,10 Figure 1 provides a visual representation of the industrial environment in which a wood plank is moving on a conveyor, making it easy to understand how our approach is implemented.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure1.gif

Figure 1. Industrial scenario: A wood plank is being conveyed on a conveyor belt.

Our contribution in this paper is threefold:

  • We propose a novel CNN-based approach that can accurately detect wood planks in real-time and perform segmentation of any surface defects present on the planks.

  • We created a dataset of 5000 labeled images for wood defect segmentation.

  • We trained our novel model, deployed it on an edge device, and optimized it for use in industrial settings.

In the section “literature review”, we review the most recent methods for defect detection and segmentation, with a focus on the wood industry. In the section “methods”, we describe our novel method, and in section dataset, we explain the details of our dataset. The training of our model is described in the section “training”. We evaluate our method in section evaluation. Finally, we conclude the paper and sketch some new future research work ideas in the section “conclusion and future work”.

Literature review

Defect detection and segmentation

The objective of defect detection and segmentation is to automatically identify patterns on product surfaces during quality control. This is an important task in industrial quality control and manufacturing, and numerous methods have been developed to address this issue. Some of the methods used include k-means clustering,11 active contours,12 region growth,13 and graph cuts14 and deep learning techniques, such as CNN,15 encoder-decoder models,16 R-CNNs,17 recurrent neural network models,18 and generative adversarial network models.19 However, most of these approaches are performed on a single image and executed offline. These methods are not sufficiently efficient for real-time industrial applications, where efficiency is a key consideration.

Traditional methods

Several techniques have been suggested for identifying defects in various materials and surfaces using statistical methods. For example, the method described in20 employed fuzzy connected components to identify defects present on strip steel surfaces by calculating the maximum and sum of fuzzy connected areas, resulting in a detection rate of 96.8%. In,10 an artificial vision-based system for evaluating the quality of slate slabs is presented, using 3D and 2D color data that is processed and analyzed to detect six specific traits. An unsupervised approach to identifying defects in images was proposed in.21 The method focuses on surface texture and utilizes low-rank representation combined with a texture prior. However, the effectiveness of the method is partially contingent on the quality of the prior map and assumes that the defects are in the foreground, implying that if the background is more prominent than the defect, the method may not be able to detect it.22 employs random decision forest methods for defect detection. This method combines feature extraction and classification techniques to detect defects in fabrics. The advantage of using random decision forests is their ability to handle both continuous and discrete variables, prevent overfitting as a classifier, and perform efficiently when handling large datasets. In,23 traditional classification methods such as local binary patterns and gradient local binary patterns were used to classify defects on the surface of birch veneer. However, the proposed method classifies only two types of defects, cracks and mineral lines, and fails to classify other forms of defects, which is a key limitation. To determine the location of defects in an image,24 proposed the use of gradient local binary patterns. This method leverages the non-continuity of pixels within a local area, reducing the potential area of defect existence, improving accuracy, and saving time for further defect detection. The detection of defects on complex pattern surfaces, such as fabrics, was also explored using traditional statistical methods.25

Deep learning based methods

In addition to traditional methods, deep-learning based techniques have been used to address the problem of defect detection. Many deep learning models were originally trained to detect a variety of objects, such as people, animals, cars, and other objects in real-world scenes.26 These models are typically trained on a large-scale image datasets, such as microsoft common objects in context (MS-COCO)27 and imageNet large scale visual recognition challenge (ILSVRC).28 For defect detection and other industrial quality assurance tasks, these pre-trained models can be re-trained using a technique called transfer learning to adapt them to the target task of defect detection and classification.26 Many recent methods for defect detection and classification in industrial settings use transfer learning and train CNN-based deep learning models, such as ResNet,29 RetinaNet,30 AlexNet,31 DenseNet,32 VGG1633 and GoogleNet34 to detect, classify and segments defects. The result from these show that all CNN based deep learning methods significantly improve the final prediction accuracy of the detection of defects as compared with the traditional methods.

A wood knot detection and classification method based on a residual network, called TL-ResNet34,1 was proposed. The results from the method claims that TL-ResNet34 is far more accurate than other methods for detection and classification. A weakly supervised CNN-based method for detection and classification was proposed in.35 The model was trained using a small number of labeled images. One limitation of this method is underfitting, in which the model fails to detect and classify different types of defects. An automatic visual inspection system9 has been proposed that can be used to detect and classify defects on wood surfaces. The main contribution of this method is speed optimization of the defect identification task. The results showed that data augmentation and transfer learning techniques can be used together to achieve good results. The pre-trained ResNet152 neural network model achieved an average accuracy of 80.6%.

A deep regression and classification-based framework for defect detection was developed in,36 which has four modules: detection, false positive reduction, connected component analysis, and classification. The proposed method has good accuracy, but it is too computationally intensive for even small input images, and thus, it is limited to offline usage. Another method37 combines a neural architecture search and one of the most famous instance segmentation methods, Mask-RCNN,38 for the detection and segmentation of defects on the surfaces of wood veneers. Regarding the detection of defects on wood surfaces, the proposed method is more accurate and faster than other currently available techniques. However, the segmentation task requires significant amount of time, making it unsuitable for real-time industrial inspection systems. Most of the proposed methods only focused on accuracy, and very few methods have improved efficiency. An improved single shot detector39 based method was proposed to improve the detection efficiency. The trained model detected very few types of defects and was limited. A mixed-FCN (fully convolutional neural network) method was proposed in,15 which is an improved, fully convolution network (FCN) for the detection and recognition of wood defects that outperforms the existing methods while requiring little or no image preprocessing for feature extraction. The model was trained to identify only six types of wood defects. A FCN and regional convolutional neural network (R-CNN)-based method40 were proposed to detect and segment building cracks. The proposed method has limitations owing to its low performance in real-time applications, similarly to other detection and segmentation methods. An improved CNN based method for weld classification was proposed in.41 This method uses image convolution to enhance edge features and combines them with integral images to create a more accurate segmentation. The algorithm can extract the weld edge and divide the region quickly and accurately while maintaining the processing time within the real-time requirements.

In,42 a modification was made to a CNN-based method named U-Net, replacing its softmax layer with a random forest to detect small surface defects with high accuracy. This method is slow and is limited to an offline setup. A CNN based method was proposed in43 to segment defects on standing trees using LIDAR (light detection and ranging) data. The input data for this method are the point cloud data. A mesh is reconstructed from the point cloud data. Then reconstructed mesh is then used to make a relief map and is taken as input to the U-net for segmentation. This method is computationally expensive and suitable only for offline applications. To reduce the computation time of CNN,44 proposed a method that uses a non-subsampled shearlet transform (NSST) to preprocess images. Then, the images were passed to the CNN for detection and classification. This method has the advantage of faster training speed; however, the speed of inference is not conducive to its utilization in industrial applications.

In addition to CNN-based deep learning methods, auto-encoders and generative adversarial networks have also been used for defect detection. The dual auto-encoder generative adversarial network (GAN) method,45 a deep learning method, was proposed for defect detection in different types of products. The GAN has the benefit of generating a large amount of data that can be used for training, which makes the model more accurate for predicting defects in unseen data.

In the literature, most researchers have used classical image processing methods or deep learning methods to extract features to detect and classify defect locations. Most of these methods are offline and unsuitable for real-time industrial use. These methods have many limitations, such as inference speed and detection accuracy. In all these methods, the focus is only on detecting and classifying defects on the plank/wood surface, and none have focused on detecting the wood plank itself. Therefore, we propose a novel CNN method that can detect and classify wood planks and then detect, classify, and segment defects on wood plank surfaces, which can be deployed in industrial environments and outperforms all these methods.

Methods

The automatic detection of wood planks and separating it from the background on a fast-moving conveyor in real-time is a challenging task. Furthermore, defect segmentation on the surface of wood planks in real-time adds more to task complexity. We propose a novel method consisting of a backbone network that performs feature extraction, a detection algorithm for wood plank detection, and a segmentation module, that performs defect segmentation in real-time. Finally, each plank is classified into different categories, from level 1 to 6, depending on the severity of the defects on each plank surface. Figure 2 shows the architecture of the proposed method. Details of these networks and modules are described in the following sub-sections.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure2.gif

Figure 2. The overall system architecture.

Backbone network

High prediction performance in CNN training requires a substantial amount of annotated datasets, but acquiring such a large quantity of data can be challenging and expensive, especially for image labeling.46 To address this issue, transfer learning is often utilized with a limited number of datasets, demonstrating its effectiveness as a solution. This occurs when the backbone network is first trained with such a large dataset.

The backbone network, also known as the baseline network, is responsible for extracting features from input images. There are several state-of-the-art deep CNNs, such as VGG,33 GoogleNet,34 AlexNet,31 and ResNet,29 that can be used as feature extractors or backbone networks for object detection. These networks are known for their accuracy, but they may not be as efficient in terms of inference speed.

We use MobileNetV347 as the backbone network for feature extraction. The network employs depthwise separable convolutions instead of traditional convolutions to reduce computational complexity. Standard convolutions have spatial dimensions and input/output channels and require a significant number of multiplications. By contrast, depthwise separable convolutions splits the standard convolution into two separate operations: a depthwise convolution that applies filters to each input channel and a 1x1 pointwise convolution that combines the outputs from the depthwise step. This approach results in a smaller model size, fewer parameters, and faster computation times.

To improve the classification accuracy, we pre-trained MobileNetV3 on a large-scale image dataset called MS-COCO.27 The COCO 2019 object detection dataset’s training and validation subsets, comprising of over 200,000 images, were downloaded. The dataset can be accessed for free on the website. The last three layers, in conjunction with the fully connected layers, are then fine-tuned on the wood planks and defect dataset. Figure 3 shows the MobileNetV3 block containing an input, an expansion convolution, depthwise convolutions, projection layer, and output layer. A residual connection is established if the input and output have the same number of channels.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure3.gif

Figure 3. MobileNetV3 building block image from.47

Wood plank detection

We use Faster-RCNN48 network with MobileNetV3 backbone to detect wood planks. The detector extracts features from different layers of a pre-trained backbone network and sends them to a regional proposal network (RPN) and region of interest (ROI) pooling module. The RPN is responsible for identifying the location of potential wood planks in the input image, whereas the ROI pooling module extracts fixed-sized window features and passes them to the final two fully connected layers for class and bounding box predictions. The network takes an input image of size H×W with three color channels. The first layer is a 3x3 convolutional layer with a stride of two, which reduces the spatial dimensions of the image by half and outputs 40 feature maps. Hardswish49 is used as an activation function Equation 1, which is a nonlinear activation function. The subsequent layers are a series of mobile inverted residual bottleneck (MBConv) blocks. Each block consists of a depthwise convolutional layer followed by a pointwise convolutional layer, with skip connections between the input and output of the block. The MBConv2 block has a stride of two, while the rest have a stride of one. The dilation for all the convolutional layers is one, meaning that there is no dilation applied. After the MBConv blocks, a feature pyramid network (FPN) is used to generate feature maps of different spatial resolutions. The FPN is used to address the problem of scale variance in plank detection, where planks of different sizes may require different spatial resolutions for detection. The FPN in this network produces feature maps with a spatial resolution of H16×W16 and 256 feature maps. The RPN (region proposal network) generates object proposals at different locations and scales based on the feature maps from the FPN. The RoIAlign layer then extracts features from each proposal and feeds them into two parallel fully connected layers: a classification layer (Cls) that outputs a probability distribution over the classes (including background) for each proposal, and a bounding box regression layer (BBx) that predicts the offsets to the default bounding boxes for each proposal.

The final output of the network is a set of class probabilities and bounding box offsets for each anchor box, which can be used to generate the final plank detections. The configuration details of the detection module is shown in Table 1.

Table 1. Details of the layers in the detection module with a MobileNetV3 backbone.

LayerOutput sizeKernel/stride/paddingDilationActivation
InputH×W×3---
Convolution 1H2×W2×403×3/21Hardswish
Bottleneck 1H2×W2×243×3/11Hardswish
Mobile inverted residual bottleneck convolution 2H4×W4×403×3/21Hardswish
Mobile inverted residual bottleneck convolution 3H4×W4×403×3/11Hardswish
Mobile inverted residual bottleneck convolution 4H4×W4×403×3/11Hardswish
Mobile inverted residual bottleneck convolution 5H8×W8×1123×3/21Hardswish
Mobile inverted residual bottleneck convolution 6H8×W8×1123×3/11Hardswish
Mobile inverted residual bottleneck convolution 7H8×W8×1603×3/11Hardswish
Feature pyramid networkH16×W16×256---
Regional proposal network----
Region of interest align----
Classification layerNanchors×C+1--Softmax
Bounding box regressorNanchors×4--Linear

Defect segmentation

In addition to wood plank detection, the process of defect segmentation involves identifying and separating each pixel from the wood plank belonging to a defect. Each defect type is labeled with a specific color. To extract features at multiple scales and to reduce the computational complexity of the model at the inference stage, we use atrous convolution operations and the atrous spatial pyramid pooling (ASPP) module in the segmentation part. Atrous convolution, is a type of convolution operation in which the filter kernel is dilated with zeros before being applied to the input signal. This allows the operation to have a larger receptive field, without increasing the number of parameters. The ASPP module is a type of multi-scale pooling operation that utilizes atrous convolutions at different dilation rates to capture information at multiple scales. It is used for semantic segmentation to improve the network’s ability to capture objects of varying sizes. The ASPP module consists of parallel atrous convolution branches with different dilation rates, followed by global average pooling, which aggregates the information across all spatial positions. The output of each branch is then concatenated to form the final feature representation.

The first layer of the network is a 3x3 convolutional layer with stride two, which reduces the size of the image and increases the number of channels. This layer helps to extract features from the input image. The MobileNetV3 blocks are then used to further extract features from the image. These blocks use depthwise convolutions to reduce the number of parameters in the network while still maintaining good performance. Hardswish is used as an activation function Equation 1.49 By using MobileNetV3 blocks, the network is able to learn features that are specifically relevant for identifying defects in images. After backbone, the ASPP module is used to capture information at multiple scales in the image. This module applies 1x1 convolutions with different dilation rates to the feature map in order to capture information at different scales. This helps the network to identify defects of different sizes and shapes. The decoder module is then used to upsample the feature map back to the original resolution of the input image. This module creates a segmentation map that matches the size and shape of the input image. Finally, the logits layer outputs a probability distribution over the classes of interest which are the defect and non-defect regions of the image. The Softmax activation function50 then create a probability distribution that identify the location of defects in the image. Table 2 shows the complete configuration of the segmentation module along with backbone network.

(1)
HardSwishx=0ifx3xifx3xx+3/6otherwise

Table 2. Details of the layers in the segmentation module with MobileNetV3 backbone.

LayerOutput sizeKernel/stride/paddingDilationactivation
InputH×W×3---
ConvolutionH2×W2×403×3/21Hardswish
Bottleneck 1H2×W2×243×3/11Hardswish
Mobile inverted residual bottleneck convolution 2H4×W4×403×3/21Hardswish
Mobile inverted residual bottleneck convolution 3H4×W4×403×3/12Hardswish
Mobile inverted residual bottleneck convolution 4H8×W8×403×3/22Hardswish
Mobile inverted residual bottleneck convolution 5H8×W8×403×3/14Hardswish
Mobile inverted residual bottleneck convolution 6H8×W8×403×3/14Hardswish
Mobile inverted residual bottleneck convolution 7H8×W8×1123×3/14Hardswish
Mobile inverted residual bottleneck convolution 8H8×W8×1123×3/18Hardswish
Mobile inverted residual bottleneck convolution 9H8×W8×1603×3/18Hardswish
Atrous spatial pyramid poolingH8×W8×960-1, 2, 3, 6-
DecoderH4×W4×40---
LogitsH×W×C--Softmax

Dataset

Our dataset consists of labeled images of planks and defects. For a period of one month, a machine vision camera was installed in a wood industry located in Finland to capture images. The planks move very fast on a conveyor; therefore, we used multi-threading and hardware acceleration to process the high frame rates and capture sharp images. A total of 5000 images were collected using this equipment. These images were visually inspected and discarded if they were blurry or if there was no plank at any particular time. The dataset contains six classes of defects (Table 3) for defect segmentation and two classes for plank detection (plank or background). The dataset is distributed among subsets, such as training, testing, and validation subsets (Table 4). To segment the defects accurately and reduce the number of false positives, we labeled the wood planks themselves and created a dataset for the detection method. We then created a dataset for the defect segmentation model, which takes the extracted plank image as input and draws a segmentation mask for each defect. Currently we are unable to share this full dataset therefore, an example dataset has been created so that is possible to test the proposed methods51

Table 3. Defect types, severity and properties.

Defect typesSeverityProperties
Knot1Small size defects with least severity.
Stain2Medium size defects having variable sizes.
Branch3Medium size defect, can be visible on both sides of plank.
Lines4Narrow and long size defect.
Edge5Always appear on the edges of the plank.
Area6The most severe defect and covers a large area of the plank.

Table 4. The quantity of images in each subset of the dataset for various types of defects.

Defect typesTrainingValidationTesting
Knot1580316237
Branch1200240180
Area1270254190
Lines1340268201
Edge1150230172
Stain1350270202

Defect classes

We categorized each plank according to different defect levels. A plank can contain multiple defects of different classes. Each defect is categorized based on different severity level. The final classification of a plank is determined based on the highest severity of the defect. The Figure 4 shows different defect types. 3 shows the defect types, defect severity, and properties of the defects.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure4.gif

Figure 4. Defect types: from left to right (1) Area (2) Branch (3) Edge (4) Knots (5) Line (6) Stain.

Data annotation and labeling

We used the CVAT (Computer Vision Annotation Tool)52 to label the images in our dataset. This open-source tool was deployed on an Amazon Web Services53 virtual machine. The complete dataset was uploaded to storage and then loaded into CVAT. Each frame from the dataset was labeled individually by drawing a rectangle on the frame for the wood plank dataset and a polygon around the defects for the defect segmentation dataset, and assigning a class name. After annotating the training set, the JSON (javascript object notations) annotation labels and images were downloaded for model training. Then dataset is passed through an augmentation process.

Data augmentation

To make the model more generalizable for inference, we used a technique called data augmentation, which included multiple image operations such as random flip, shear, translation, and rotation. The pytorch54 library has builtin functions to perform these operations. After this process, the dataset contained an increased number of images. Table 4 shows the number of images in each subset after data augmentation process. The model is then trained on the training set with increased number of images.

Training

We used Pytorch54 (RRID:SCR_018536) distributed data-parallel training module to train the model on multiple graphical processing units (GPUs). The model is trained using two NVIDIA Quadro RTX 8000 48GB GPUs. The starting hyperparameter learning rate is set to 0.1, a batch size of 128 images per GPU, a learning decay of 0.01 per 5 epochs and RMSprop as an optimizer at eps=0.0316. We initially configured the number of epochs to 300; however, after observing the loss on the validation set, we implemented early stopping and the training script reduced the number of epochs to 200.

Quantization and pruning

Network quantization is a method of reducing the number of bits per weight of the network. Some hardware supports a faster inference speed with a quantized network. We deployed our model on an Nvidia Jetson Xavier device for real-time inference. The device supports quantized models, by default. We tested the model with precision of 8-bit integers (INT8). The number of parameters are reduced by 40% and the inference speed was increased by approximately 3x. The model was fitted to the MS-COCO dataset through a process of quantization aware training by refining a non-quantized version, where activation and weights are quantized to lower precision, without sacrificing the overall accuracy of the model and making it faster for real-time inference. To make the model more efficient, we pruned it to remove redundant elements in the network. This did not affect the accuracy of detection and segmentation in our results.

Evaluation

To evaluate the performance of the model for the detection of planks, we used mean average precision (mAP) metric Equation 2 on the plank test set, and for the segmentation of defects, we used the mean intersection over union (mIOU) and global pixel accuracy on the defects test set. mIOU is used to evaluate the similarity between the predicted segmentation mask and the ground truth segmentation mask, while global pixel accuracy is used to evaluate the overall performance of a semantic segmentation model. It measures the proportion of pixels in an image that are correctly classified by the model. Given a predicted segmentation mask and the ground truth segmentation mask for an image, the global pixel accuracy can be calculated as the ratio of correctly classified pixels to the total number of pixels in the image. The plank detection mAP was 97% and the defects segmentation mIOU was 76%, with a global pixel accuracy of 96%.

(2)
mAP=1classescclassesTPcFPc+TPc

Results and discussion

The network takes an input image and utilizes a region proposal network (RPN) to identify possible regions of interest, in this case, region containing wood planks, which are then classified and their bounding box coordinates refined. A classifier and bounding box regressor network are then used to further refine the identified planks and classify them into their respective classes. The output from this process is a set of detected wood planks with their class labels and bounding box coordinates. The resulting bounding box surrounding each plank is then cropped from the output image and passed to the segmentation module. The segmentation module is designed to receive an image containing a wood plank as input, and it specifically focuses on identifying and isolating any defects present on the plank.

After processing the input image, the segmentation module produces an image where each defect is highlighted with segmentation and labeled with its respective class color. Using this information, each wood plank is then classified according to the categories listed in Table 3.

Figure 5 shows an input image with two defined ROIs. Figure 6 and Figure 7 show the outputs from the detection and segmentation modules, respectively. The final output after plank classification is shown in Figure 8.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure5.gif

Figure 5. Input image with two region of interests highlighted in green.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure6.gif

Figure 6. Detection module output from second ROI.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure7.gif

Figure 7. Output from segmentation module.

f1533fc4-7777-4acd-a05c-ff7f604aeda3_figure8.gif

Figure 8. Final output showing plank ID on the top left, severity or final classification, defects segmentation with unique colors and labels.

Deployment and testing

To evaluate the real-time performance of our trained model, we deployed it in a wood industry where wood planks move rapidly on a conveyor. We used the Nvidia Jetson AGX Xavier as the processing unit for real-time video encoding, model inference, decoding, and producing output to an external monitor or remote sink. For the video capture, we used an Allied Vision camera system with an adjustable lens. The video capture frame rate was tested with variable settings ranging from 15 to 50 fps. The ideal frame rate to capture the fast-moving wood plank was 25-30 FPS to avoid blurriness and capture sharp frames. The model performed very well without any latency issues at 30 fps and did real-time wood plank detection, defect segmentation, and final plank classification.

The hardware specification for the industry setup are shown in the Table 5.

Table 5. Hardware specifications.

NamesTechnical specifications
CPU8-core NVIDIA Carmel Armv8.2 64-bit CPU
GPU512 NVIDIA CUDA cores
Memory64GB 256-bit LPDDR
Storage32GB eMMC 5.1
Camera1800 U-511c Sony IMX547 color 1/1.8" 2472x2064
LensAllied Vision C-6-F2.8-6MP-T1-1.8

Conclusion and future work

In this study, we proposed a novel method for the automatic detection of wood planks and defect segmentation. Our method employs several techniques to improve the accuracy of results. First, data augmentation was used to increase the number of images in the dataset. This technique involved applying various image operations such as random flip, shear, translation and rotation to each image. Second, we utilized transfer learning to improve the ability of our model to detect planks and segment defects on the plank surfaces. The results showed that these techniques were effective in achieving high accuracy in detection and segmentation tasks. Specifically, the highest mean average precision value of 97 and global pixel accuracy value of 96 were achieved. Moreover, our method demonstrated real-time performance at 30 frames per second, making it suitable for industrial wood processing applications. These results also suggest that our method has the potential for application to other industrially processed materials. As a future work, further research could investigate the transferability of our method to other materials and industries. This requires the creation of new datasets with proper annotation and training of the model on these datasets. Moreover, our approach is specific to detecting and segmenting surface defects. The future work could explore the detection and segmentation of other types of defects such as internal defects. Future research could focus on the robustness and adaptability of our approach to changes in the lighting conditions, camera angles, and other environmental factors. One potential approach could be to incorporate online learning techniques, which would allow the model to adapt to environmental changes over time. Our method relies on a convolutional neural network for detecting and segmenting surface defects. Although this has proven to be effective, there are other deep learning architectures that could be explored in future work. For example, recurrent neural networks can be used to analyze the temporal data from a conveyor belt. Similarly, attention mechanisms can be employed to selectively focus on regions of an image that are more likely to contain defects.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Mar 2023
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Mohsin M, Balogun OS, Haataja K and Toivanen P. Convolutional neural networks for real-time wood plank detection and defect segmentation [version 1; peer review: 1 approved with reservations]. F1000Research 2023, 12:319 (https://doi.org/10.12688/f1000research.131905.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 23 Mar 2023
Views
9
Cite
Reviewer Report 14 Jun 2023
Zhenye Li, Nanjing Forestry University, Nanjing, Jiangsu, China 
Approved with Reservations
VIEWS 9
It's encouraging to see research tackling the pertinent issue of real-time defect detection and segmentation in industrial applications. Your work addresses a critical need in the industry and promises substantial improvement in process efficiency and accuracy.

I ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Li Z. Reviewer Report For: Convolutional neural networks for real-time wood plank detection and defect segmentation [version 1; peer review: 1 approved with reservations]. F1000Research 2023, 12:319 (https://doi.org/10.5256/f1000research.144792.r177501)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Mar 2023
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.