Texture analysis of the developing human brain using customization of a knowledge-based system

Background: Pattern recognition software originally designed for geospatial and other technical applications could be trained by physicians and used as texture-analysis tools for evidence-based practice, in order to improve diagnostic imaging examination during pregnancy. Methods: Various machine-learning techniques and customized datasets were assessed for training of an integrable knowledge-based system (KBS), to determine a hypothetical methodology for texture classification of closely-related anatomical structures in fetal brain magnetic resonance (MR) images. Samples were manually categorized according to the magnetic field of the MRI scanner (i.e. 1.5-tesla (1.5T), 3-tesla (3T)), rotational planes (i.e. coronal, sagittal and axial), and signal weighting (i.e. spin-lattice, spin-spin, relaxation, proton density). In the machine-learning sessions, the operator manually selected relevant regions of interest (ROI) in 1.5/3T MR images. Semi-automatic procedures in MaZda/B11 were performed to determine optimal parameter sets for ROI classification. Four classes were defined: ventricles, thalamus, grey matter, and white matter. Various textures analysis methods were tested. The KBS performed automatic data pre-processing and semi-automatic classification of ROIs. Results: After testing 3456 ROIs, statistical binary classification revealed that combination of reduction techniques with linear discriminant algorithms (LDA) or nonlinear discriminant algorithms (NDA) yielded the best scoring in terms of sensitivity (both 100%, 95% CI: 99.79-100), specificity (both 100%, 95% CI: 99.79-100) and Fisher coefficient (≈E+4, ≈E+5, respectively). Conclusions: LDA and NDA in MaZda can be useful data mining tools for screening a population of interest subjected to a clinical test. 1,2 1 2

Methods: Various machine-learning techniques and customized datasets were assessed for training of an integrable knowledge-based system (KBS), to determine a hypothetical methodology for texture classification of closely-related anatomical structures in fetal brain magnetic resonance (MR) images. Samples were manually categorized according to the magnetic field of the MRI scanner (i.e. 1.5-tesla (1.5T), 3-tesla (3T)), rotational planes (i.e. coronal, sagittal and axial), and signal weighting (i.e. spin-lattice, spin-spin, relaxation, proton density). In the machine-learning sessions, the operator manually selected relevant regions of interest (ROI) in 1.5/3T MR images. Semi-automatic procedures in MaZda/B11 were performed to determine optimal parameter sets for ROI classification. Four classes were defined: ventricles, thalamus, grey matter, and white matter.
Various textures analysis methods were tested. The KBS performed automatic data pre-processing and semi-automatic classification of ROIs.
Conclusions: LDA and NDA in MaZda can be useful data mining tools for screening a population of interest subjected to a clinical test.

Introduction
Medicine is not an exact science but an applied, interdisciplinary field 1 . Therefore, the time to produce physicians specialized in radiology is very long 2, 3 . Moreover medicine is still and will still be evolving in years to come 4-6 . As cures are being discovered or invented, new diseases become known and mutations surface along with new variants [7][8][9][10] . Trainable knowledge-based systems (KBS) could be an answer to the global shortage of radiologists 1,[11][12][13][14] .
Besides, there are other major obstacles, which also prevent KBS from being fully functional out of the factory. The body of medical knowledge, known to this date, is gathered and transferred through theoretical and clinical intuition as well as experience 1,2,12 . Then physicians continue to expand their acquired knowledge with years of evidence-based practice 15,16 . On top of the challenge is the fact that some conditions may not be symptomatic during medical examination 17,18 . Hence, medicine is indeed a continuing learning process 15,16 . That was why we proposed the approach to have the observer (in this case the physicians working in the field) as the direct trainer (the programmer) of a KBS designed as a customizable, conceptual framework. In computer science, a framework is a system which implements the process of abstractioni.e. a technique where developers make computer modeling/programming simpler to understand, use and apply. Such a KBS should not be designed as "a hammer to drive a nail" but as an abstraction system with generic functionalitywhich can be changed with user-written codes and customized for unlimited applications. The purpose is to enhance human-computer interaction (HCI) in medicine.
The aforementioned points are introduced especially to show the need for customization in computer-aided-diagnosis (CAD). Pre-programmed CAD systems surely help radiologists and obstetricians, 19-21 but they could be better and more useful with some room for customizationand could even improve the sharing and preservation of diagnostic innovations.

Research background & goals
The aim of this research was to find a customizable software framework (KBS) to mathematically determine an optimal combination of texture analysis methods to differentiate anatomical structures in the developing fetal brain (i.e. regions of interest (ROI)). Why did this experiment focus on normal development of central nervous system (CNS) rather than common conditions affecting fetal organogenesis? When considering fetal intervention to correct an anomaly in utero, the first ethical priority is actually mother safety. Prenatal well-being is clinically second, after evaluation of benefit-risk ratio 22-26 . In the 1960s, fetal surgery was conceived and introduced in clinical practice 27 . In spite of the improvement in surgical technology, the number of successfully-treated cases of congenital defects and life expectancy of survivors are still limited 28-31 . The theoretical procedures are troublesome and have not been investigated enough 32 . Hence, they are considered as experimental treatments 32 . Whether invasive or minimally invasive, fetal intervention is not the ultimate solution of this societal conundrum. It is usually reserved for cases of severe fetal anomalies 28-32 . In recent years, prenatal therapy is gaining popularity in religiously conservative territories particularly where abortion is prohibited 33,34 . Fetal intervention is recommended for fetuses with mild and non-lethal defects, in order to discourage abortion and continue childbearing.
In the country where this research was carried out (i.e. Poland), abortion is, once again, nationally bannedexcept in cases of rape, life-threatening pregnancy and childbirth, and grave malformations 33-36 . Pressures are due to growing pro-life supporters demanding total anti-abortion, child-bearing at all costs, and capital punishment for illegal termination of pregnancy 33-36 . Nevertheless, Poland neither sanctions nor executes the outlaws of illegal abortion. Despite the absence of penalty, abortion is respectfully performed as permitted by local authority. The aforementioned dilemma justified the usage of fetal MRI (fMRI) in this research. In theory, it was previously hypothesized and documented that the high electromagnetic fields (EMF) used for MR procedures can disrupt the early stage of organogenesis. To this date, the embryotoxic, fetoxic, and teratogenic effects of MRI are not well known. Normally, fMRI is not recommended during the first and second trimester, unless it is absolutely necessary to confirm and/or supplement the diagnosis of fetal anomalies. Legal decision to prescribe abortion to a patient sometimes requires advanced clinical investigation, accompanied by psychological counseling before and after the operations 37-41 . To this date, ultrasound (US) devices are preferred for obstetrical examinations. With 2D, 3D, and 4D US scans, physicians can effectively and efficiently diagnose the majority of life-threatening conditions affecting mother and fetus 42-46 . Therefore, ultrasonography (USG) is sufficient for diagnosis of severe malformations affecting abdominal organs. Why was magnetic resonance imaging (MRI) prescribed? Fetal brain is where US devices struggle to produce desirable results. MRI was subsequently performed to rule out severe abnormalities in brain development, which are not visible on sonogram. Magnetic resonance (MR) samples came from fetuses with suspected heart and kidney defects. Disruption of organogenesis in the latter, depending on severity, might affect normal development of the brain. MR samples used in this experiment were visually investigated by specialists, structureby-structure. No severe malformations were observed. In these cases, MRI studies did not add any further indication to legally fulfill the criteria to terminate pregnancy. The human visual apparatus has its limit. Unfortunately, missed diagnoses do occur. Malformations may not be apparent prior to birth. If fetal defects are suspected, bureaucracy may also restrict access to more advanced testing and healthcare. Consequently, pregnant women are deliberately forced to bear and deliver malformed babies. At the end of the day, physicians may still have to deal with the legal liability for failure to terminate pregnancy 41 . Unwanted fetuses may become neglected, and foundling is also a growing problem in society 47-49 . Congenital brain defects and its impacts on physical and cognitive development may not be detectable until after birth. Fetal outcome and mental retardation can be difficult for a physician to predict. The process requires access to better medical examination, development of more advanced tools and further investigation. That is why the ultimate goal of this feasibility study was to gather knowledge for the practicality of a proposed project seeking to improve diagnostic accuracy and precision, by extending HCI usage in medicine. There are many computer-aided diagnostic tools on the commercial shelves ( -e.g. Radiomics, 50-52 Definiens Tissue Phenomics®, 53 CAD4TB Diagnostic Software 54,55 ). Sadly, they are primarily designed for pre-loaded applications but not much else.

Clinical trial registration
Though this research shares similarities with a clinical trial, it is "virtual"i.e. it is non-interventional. Such medical study does not meet criteria for clinical trial registration 58-62 . Furthermore the investigational tools were merely used for technical exploration and to measure their feasibility in medical practiceby using simulation settings. Lastly, the results were not used to alter patients' therapeutic care and outcome 58-62 .
Advance notice to readers. Readers should not expect us to teach the entire science of artificial neural network (ANN) (feedforward neural networks, recurrent neural network, etc.) in just a manuscript. It is not possible. A full introduction is not even possible. Unfamiliar readers are expected to make an effort on their own to read and learn the basic principles of artificial neural network and know the basic terminology. Like a human brain, an ANN can store memories. ANN can also judge based on stored memories and logical rules. To run a naïve trial run, it was ideal to have the ANN in a condition like 'permanent global amnesia' -i.e. a phenomenon where a brain is in a state of total blackout and thus cannot judge based on prior memories. The KBS used in this experiement was lacking an automatic memory cleaner and optimizer. Hence, stored memories were manually deleted in the ANN for every trial run.  Our trial-and-error experiments and texture-analysis software development spanned over five years of research. All the findings shared common denominators. Quantitative brain-tissue segmentation was affected by several factors, such as characteristics of fetuses (e.g. gestational age, shape, normal/abnormal development) and the quality of fMR images (e.g. 1.5/3T, resolution, slice thickness, rotational planes, artifacts, etc). Automatic segmentation of newborn brain MRI has been documented in the literature 65 .

Computer vision.
The algorithmic contributions reported so far have achieved limited success, unfortunately. Automatic segmentation of prenatal brain is even more challenging and time-consuming.

Unsupervised segmentation.
Once an image is acquired in a readable format (bitmap format (BMP)), the first step is texture segmentation -i.e. partitioning an image into ROIs. B11 can perform unsupervised segmentation and cluster analysis. In some instances, B11 achieved accuracy closed to that of clinicians. However, unsupervised segmentation was not reliable enough for therapeutic use. Fetal brain segmentation with B11 still required extensive expert interaction. In our observations, the key problems were maternal factors, environmental effects, growth variability, randomness of fetal movements and its detrimental effects on image quality. Therefore, automatic segmentation was used for new insight into the possibility of improving supervised segmentation. The steps of unsupervised segmentation are relatively simple: image acquisition and run analysis. ROIs and segment numbers can be manually adjusted. The drawback with B11 segmentation is limitation to 8-bit grayscale BMP. 16-bit DICOM was converted to visually lossless BMP, by dropping least significant bits. Note that B11 identified textures not anatomical structures. The information collected from the unsupervised trials was later used as guidance to manually estimate boundaries of anatomical structures (ROIs) for the supervised trials. The preliminary trials were single-blindedi.e. the user knew the characteristics of the ROIs, and the KBS received no hints (no ROI selection).
Further information was gathered with a semi-automatic (unblended) segmentation by defining 4 classes (ROIs): thalamus, ventricles, grey matter and white matter. In unsupervised mode, the KBS performs quite well when brain images are from MR examination of the same subjects and same sequencesbut performs poorly when they came from different subjects. The findings were likewise for same sequences of the same patient taken at a different time and MR scanner settings. The challenge was: how do we match macroscopic characteristics with electronic recognition, regardless of MR image shadings? MaZda and B11 are not yet designed to allow user to well define semantic rules and/or import plugins for fully electronic recognition of anatomical structures. Object-based image analysis tools such as eCognition work consistently well for geo-spatial applications (e.g. identification of a river in an image) 66,67 . In fetal radiology, it is still a challenge to achieve consistent results with automated-pattern recognition of prenatal anatomy. Programming a reliably effective system for such highly sensitive application is feasible but also time-consuming. Such a task would require taking into account all known variations due to pregnancy chronology and fetal developmental, to minimize segmentation errors. MaZda and B11 were originally built for HCI rather than fully-automated applications. Therefore, the best practical methodology, in this research, was for the operator to at least have prior knowledge of human embryogenesis, in order to manually and correctly identify and select fetal ROIs. Often, macroscopic appearance of many brain structures are not well differentiated in the first trimester. Hence, the selected samples were at least 20 weeks of maternal age.

Supervised segmentation.
3-tesla (3T) and 1.5-tesla (1.5T) magnetic resonance (MR) sequences of fetal brain were manually segmented into 3456 ROI. The categories were predefined as followed: ventricles (class 1), thalamus (class 2), white matter (class 3) and grey matter (class 4). The selected samples did not have any brain malformations. As previously mentioned, the anomalies were in the cardiovascular and/or renal systems. The focus of this research was on normal anatomy of fetal brain. Forward processing method also known as "supervised segmentation" 68was performed as delineated: (1) image acquisition from MR scanner, (2) selection of ROIs with MaZda, (3) image normalization with MaZda (4) feature extraction with MaZda (5) data preprocessing with B11 (6) texture classification with B11 63,64,68 . The first four steps were done with MaZda and last two steps with B11 ( Figure 1). After the preliminary trials, we became interested to learn what needs to be adjusted in order to reduce misclassification of MR images. The unsupervised segmentation revealed that the KBS was very sensitive to greyscale shading, artifacts, and image thickness as well as resolution quality. Thus, we trained the KBS accordingly. The pitfall with this algorithm is its sensitivity to overtraining (too strong memorization) 69,70 . ANN training time is shorter with standardization. For continuation, training without standardization was carried out, in spite of long processing time. ANN (one-class/ n-class) and 1-NN training runs were conducted with different sequences of MR images: T2-weighted (T1), T1 weighted, and proton-density (PD) sequences. N-class training was discontinued due to repeated problems with overtraining and lack of reproducibility in F values and miss-classification errors.

Customization
Despite the usage of multi-level, automated selection/reduction techniques, some extracted values still did not match the controlled ROI values. Differentiating thalamus from other thalamic nuclei and grey matter was the key problem. That was when we manually intervened to customize and improve the extracted data. First ROI surface areas were manually increased, in order to limit the number of parameters reporting zero and infinity values. Parameters which couldn't be correctly computed were manually omitted in the report file. Some pre-processing procedures in both MaZda and B11 couldn't be performed when the report file contained erroneous values. We accessed MaZda generated report files by changing the extension format from SEL to CSV and then imported them into Excel 2013 for adjustment. Parameters measured with other CAD tools can also be entered in the report files by simply using Microsoft Excel. The edited file can then be imported in B11 to perform texture classification.

Regions of interest
Additional tests were carried out with same ROIs (i.e. thalamus, ventricle, grey matter and white matter) to dramatically improve accuracy and precision of the KBS: it was done with a customized dataset derived from MaZda algorithms, using semi-manual reduction and nearest-neighbor feature selection (see Data availability: Dataset 1-Dataset 2). The training data were used to orient the KBS to recognize what ROIs had the same tissue characteristics, in spite of being originated from different patients or different sequences of the same patients. The training was conducted with combination of two built-in classification tools (i.e. nearest neighbor (NN) and artificial neural network) and four data processing techniques (i.e. RAW: read as written; PCA: principal component analysis, LDA: linear discriminant analysis; NDA: nonlinear discriminant analysis). To measure the KBS sensitivity and specificity, we defined "normal" as "ROIs with identical tissue" characteristics and "abnormal" those with different tissue   characteristics (Figure 2-Figure 5). Apart from noise and artifacts, we found out that the preliminary results were also affected by the planes (axial, coronal, sagittal)which refer to the rotational planes of the spinning MR scanner in relation to the mother, not the fetus. There flows the reason for the classification by rotational planes. In learning mode, we observed a consistent scoring for all the ROIs. Thus the logical and semantic information provided to the KBS was effective. Statistical binary tests (also known as classification function tests) were computed in STATISTICA version 10 to assess the performance of each procedure (combination of preprocessing techniques and classifiers). In medicine, binary scores (TP, FP, TN, FN etc.) are used to determine not just normal and abnormal characteristics but also classification property of an examination.

Results
With Fisher coefficient (F), we tested for difference between ROIs. It was nearly zero for ROIs which were alike. Therefore, the tissue anatomy was consistently the same among the normal ROI group. In testing mode, misclassification values, as low as 0%, were also recorded, in some trials (Table 1). RAW and PCA did not responded to the training, while LDA and NDA did. We obtained high F values, 100% sensitivity and 100% specificity for LDA and NDA (Table 1-Table 2)which means that there was likely a real difference between the normal and the abnormal ROIs. NORMAL was defined as ROIs with identical tissue -e.g. white matter in the occipital region vs white matter in the frontal region of the brain.  ABNORMAL was defined as ROIs with different tissue -e.g. white matter in the temporal region vs grey matter in the cerebral hemispheres of the brain. ABNORMAL was defined as ROIs with different tissue -e.g. white matter in the temporal region vs grey matter in the cerebral hemispheres of the brain.

Discussion
To this date, no such research has been documented in the literature. The explanation could be derived from the difficulty of finding fetal MRI samples for medical research, as well as the common hindrance to their availabilityi.e. continuing systematic concerns over the theoretical risks of MRI usage during pregnancy, in parallel to the lack of clinical studies and trials assessing such theoretical risks 73-76 , plus the expensive cost of MRI examination 56-57 and the scarcity of customizable CAD tools on the freeware shelvesjust to list a few.

Selecting KBS tools
The majority of the KBS we came across were designed for technical use and not easily customizable. Such programs required paying for marketing company maintenance and for in-housedeveloped customization service, on top of the annual license fee. Thus this option was not feasible for application in real-world settings, where resources are often sparse ( -e.g. eCognition 66,67 , Media Cybernetics 77-78 , Radiomics 50-52 , Definiens Tissue Phenom-ics®, 53 CAD4TB Diagnostic Software 54,55 , etc.).

Logic and reasoning behind the research design
Previous medical studies done with MaZda include inflammation, brain cancer detection, multiple sclerosis, electrophoresis, etc 79-82 . Herein, we defined some test samples as "abnormal" ROIs. However, they were, in reality, normal tissue. Not testing directly for a common anomaly doesn't necessarily mean that there is no real medical application. Though the tests were simulated, the research design was conceived for real-world medical applications 83-85 . For example, this research design could be used to detect ectopic tissue migration, neurogenesis and neuronal migration (brain function migration as a result of natural process or after injury), metaplasia and interference with brain development. Last but not least, this simulation research followed standards used in clinical trials 86 .

Statistical test and interpretation
The choice of binary classification (sensitivity/specificity) was favored over frequentist inference (p-value) because it provides more information in terms of statistical relevance to medical diagnosis, prognosis and disease prevalence 87-89 . One key difference between Fisher, POE+ACC, and MI+PA+F is the number of parameters. To perform MI+PA+F, the dataset must contain at least 30 parameters strongly matching its selection-reduction criteria 69 . Otherwise, the KBS reported an error. In our study, it was a common occurrence when the surface area of a ROI was insufficient to extract 30 parameters meeting the MI+PA+F semantics. RAW and PCA were not so affected by the training process and thus remained very sensitive to minute difference in greyscale shading.

Recommendations
A solution to high misclassification (M) was to exclude some parameters which were very sensitive to post-editing sharpness. In this research, the images were, however, processed without postediting sharpness because high M was not regarded as a problem. Instead, we used RAW and PCA as reference tests (results before the training of the KBS). On the other hand, LDA and NDA responded well to the training, and M was consistently zero.

KBS memory clearing
During the study, we had to obviously clear the KBS memory several times for every trial run. We hope that the software developers will soon implement a more effective and efficient way (e.g. one-click) to clear specific random-access memory (RAM) without closing module(s) or without manually dumping the entire RAM or restarting the computer.

Constraints, limitations, and assumptions
Patients gave consent to perform MRI examination and use of images in research and for the manuscript publication. Nevertheless, this authorization was not enough, as ownership and copyright of medical records are not always exclusively attributed to patients and such rights may not be assignable [90][91][92] . For the sake of prudence, we had to also seek institutional/research-hospitals' clearance and approvalwhich in turn were then subject to different administrative and logistic factors and regulations. Consequently, it took nearly five years to collect sufficient MRI samples to make this research possible.

Conclusion
In brief, the findings show that better results were obtained with LDA and NDA. The observed difference between the two imaging modalities was previously and repeatedly proven to be due to 3T MRI having higher resolution and able to capture more details 93-95 . Lastly, LDA and NDA could be useful tests for pre-screeningprovided ruling-in/ruling-out semantics are well defined and the KBS is well trained. 1.

5.
We do not agree with some of your comments and suggestions from a practical (clinical) point of view, for the following reasons below.
This article briefly considers existing legislations relevant to protecting patient privacy and clinical data.
You stated that "exact objectives and aims are not clear" to you. However, you did not explain why, and we feel that some of your comments mentioned are a matter of personal preference. For example, you ask us to publish details about 'patient demographics' , but we feel that this is unnecessary, in this case.
We strongly recommend you to read:

Michael Hanke
Psychoinformatics Lab, Department of Psychology, University of Magdeburg, Magdeburg, 39106, Germany The article describes an predominantly explorative analysis of the capabilities of a machine-learning based texture analysis of fetal MR images for the purpose of extracting information of brain structure (ROI labeling/segmentation) with a (semi-)automatic procedure.
My background is in neuroimaging data analysis, including the application of machine-learning algorithms on such data. Consequently, I cannot provide an expert opinion on the suitability of the proposed analysis strategies for diagnosing brain development abnormalities, and I will focus on the technical aspects of the procedure and its description in the article.
In my opinion, the present structure of the manuscript, and chosen balance of the level of detail with which the research is motivated vs. its methodological details are described, are suboptimal for communicating the implications of these findings. In the following I summarize aspects that I consider critical:

Objective and conclusions
It is not clear to me what the exact objective of this research was. How good does ROI classification have to be in order to improve the status quo? Are the developed methods feasible enough (computational to be in order to improve the status quo? Are the developed methods feasible enough (computational demands, ability to obtain suitable raw data, ...) to be employed in clinical applications? What exactly is not possible with available solution (quote "Sadly, they are primarily designed for pre-loaded applications but not much else").
It would be very helpful, if the author would provide a concrete example of the segmentation problem they are trying to solve. This could be a figure showing actual data. Figures 2-5 do not provide this information. It is unclear whether those show a schematic depiction of the problem, are actual empirical results -this uncertainty is compounded by the very short figure captions.

Provided information on methods is insufficient
I would like to refer to this report for guidelines on http://www.humanbrainmapping.org/COBIDASreport what to report for MRI studies in general.
In particular, there is no information provided on how the MR images were obtained, this includes missing information on the type of MR sequence, its parameters, vendor of the equipment, etc.
There is no information on the nature of the MR image preprocessing. One of the issues with fetal MRI is the impact of unavoidable motion of the fetus during the scan. This aspect is not touched upon in the manuscript.
Analysis description assumes familiarity with the MaZda package. Here is an example: "360 parameters were extracted with MaZda (histogram: 9; co-occurrence matrix: 220; run-length matrix: 20; gradient matrix: 5; auto-regression: 5; Haar wavelet: 28; geometry: 73). Parameters' names are provided in the appendix at the end of this manuscript." Dataset 7 contains a plain list of names such as "GeoUg" that are uninterpretable without familiarity with the MaZda package (which in turn only runs on outdated windows machine (98,2000,XP, according to the website), and source code is not available).

Structure of manuscript
Especially the methods section does not contain typical sections, such as "MRI acquisition", "Participants", etc. Instead, it has "Advance notice to readers" that states that it is impossible to provide an introduction to machine learning. While that may or may not be true, I consider it problematic that the KBS is only described at a conceptual level, while there is extensive space devoted to the development history of MaZda, which seems irrelevant in the context of this study. (Note that the statement: "ANN is a self-organizing algorithm" is not true in its generality) The heading levels seem to be off at times. "Computer vision" is a subsection of "Clinical trial registration".
In general the section heading should be more indicative of the content. The is "Customization" which reports on adjustments in the original procedure, but also on how files were renamed. The discussion has a section "Constraints, limitations, and assumptions" which essentially restates that data acquisitions took several years.

No competing interests Competing Interests:
Comments on this article Version 1