The Integration of Artificial Intelligence in Analytical Chemistry: A Chemometrics Approach

1. Introduction to Chemometrics and Artificial Intelligence in Analytical Chemistry

Chemometrics refers to the application of mathematical and statistical methods to design experiments and analyze chemical data. Within the realm of analytical chemistry, chemometrics finds application across a range of platforms including spectroscopy, chromatography, capillary electromigration separation, nuclear magnetic resonance, voltammetry, and microscopy, to name a few. Due to the high cost of equipment and the intricate nature of chemical data, chemometrics is critical to enhance the quality, reliability, and usefulness of the data obtained from the above techniques. Artificial intelligence (AI), a subset of artificial neural networks (ANNs), is finding increasing application in chemometrics. The main strength of AI is the capability to model data that contain complex inputs and patterns. In practice and research, AI is utilized in chemometrics for exploratory data analysis, regression problems, classification problems, and development of intelligent decision-support systems. Chemometrics approaches based on AI are non-analytical and involve the ‘Black Box’ approach where understanding the complex relationship between inputs and outputs is often unattainable.

Chapter 1 provides an introduction to the integration of AI in chemometrics. The basic principles of chemometrics and AI in the context of analytical chemistry are outlined. The need for chemometrics and AI in analytical chemistry is highlighted by citing examples from the author’s research group. The limitations of conventional mathematical solutions and the strength of AI models are discussed. Various types of AI approaches are described. Recent trends and future prospects are also provided. There are several standard books and comprehensive publications to provide a detailed account of chemometrics for analytical chemists. To this effect ‘Chemometrics – A Practical Guide’ and ‘Practical Chemometrics’ by V. S. B. P. S. Gupta and J. A. de G. H. Gomes, respectively, are highly recommended. However, these resources lack an adequate introduction to the application of AI in chemometrics. The need for such a publication is reflected by the comments received by several researchers worldwide. Some of the comments on the author’s earlier publications on chemometrics and AI are included at the end of this chapter. [1][2][3][4][5][6][7]

2. Fundamental Concepts of Chemometrics

The term “chemometrics” refers to a scientific discipline that uses mathematical or statistical tools, concepts, or methods to obtain the maximum amount of information from chemical data. Chemometrics started as an approach in analytical chemistry during the late 1960s and early 1970s when the mathematical models used to evaluate data obtained from chemical analysis started to be automated and programmed in computers. Nowadays, chemometrics tools have spread towards other disciplines such as food science, drug design, biotechnology, bioinformatics, pharmacokinetics, and others. The chemometric methods can be widely categorized into two main categories depending on a chosen model: supervised techniques and unsupervised techniques. Applications of chemometrics can be found in pharmaceutical, environmental, clinical, food, and other laboratories where chemometric methodologies are used to extract the maximum information from chemical data.

The term “data preprocessing” refers to a collection of techniques that are required to make raw data applicable for analysis before chemometric approaches are applied to it. Data preprocessing is an integral step of chemometrics. Raw data usually contains noise which hinders data analysis, that is why noise must be removed from data, data sets may include artifacts that distort or change chemical signals so these artifacts must be removed as well, analytical data sets may be complex and have different intensities, hence applying the same preprocessing method to these different signals will lead to data sets that are not comparable and therefore must be brought to the same scale, calibration models are based on certain information of the data set, that is why excluding outliers or incorrectly measured chemical signals is relevant, and finally, some mathematical transformations are applied to enhance data analysis efficiency.

Principal Component Analysis (PCA) is one of the most widely used chemometric techniques and it is usually the first technique to be applied for exploratory data analysis. PCA is an unsupervised method in which no previously known information about classes of data is required. PCA is a mathematical transformation technique that searches the most important or informative dimensions of a chemical data set while removing irrelevant or redundant information from the data set. Some considerations should be made regarding the data set. Data matrices should be centered and normalized before applying PCA. PCA projects data onto a new orthogonal space defined by a smaller number of dimensions and extracts most of the information and variability contained in the initial data set using these new dimensions. [8][9][10][11][12][13][14]

2.1. Data Preprocessing and Cleaning Techniques

Data preprocessing, the vital initial stage of any chemometric analysis, encompasses the cleaning and transformation of raw data into its usable form. This operation is especially significant when AI is incorporated since it requires a sizable dataset of high quality. The role of data preprocessing in maintaining a high-quality database cannot be overstated. No analysis can compensate for a flawed database since the outcome of an analysis is always tied to the quality and consistency of its composing data. Thus, it is prudent to spend time ensuring a dataset is free of outliers and considered high quality prior to running any analytical operations.

The first step in data preprocessing is the evaluation of data quality taking into account known facts about the dataset as well as previously performed analyses. A complete initial check should include examining the format and density of a dataset, checking sample labels for duplicates and consistency, examining retention times for shifts and instabilities, and checking spectral baselines for shifts and noise.

Most deviations in data can be treated through mathematical functions. Removal of background spectra, hazardous flotation of noise, simple spectral rebinning, classical scalings employing normalization and standardization, and linear filtering are all techniques well-known from signal processing. Posterior conditioning of data is another method of addressing misalignment in data caused by instabilities in tracking spectral peaks, for instance, through algorithms reliant on wavelet transforms.

In summary, data preprocessing is fundamental to ensuring the desired quality and consistency of any data before analysis. The identification of flaws in data, which can normally be done entirely through the examination of plots, can be addressed through mathematical functions modifying the raw data prior to further analysis. [15][1][16][17][18][19]

2.2. Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a powerful statistical technique used to analyze datasets with a large number of variables by reducing them to a smaller set of variables, known as principal components. These principal components are uncorrelated and account for the maximum variability in the dataset. PCA is widely applied in chemometrics to process and interpret complex data from various analytical techniques. In this section, the mathematical foundations and implementation of PCA in chemometrics, including preprocessing procedures and variable selection strategies, are discussed.

In PCA, a dataset is mathematically transformed into a new coordinate system, where the directions of the new axes are given by the eigenvectors of the covariance matrix of the dataset. The new coordinates are linear combinations of the original variables, and the corresponding variances are given by the eigenvalues of the covariance matrix. PCA aims to find a few principal components that explain most of the variance in the data. The principal components are orthogonal, meaning that they are uncorrelated, which is advantageous over the use of the original variables, as the components are independent. The first principal component explains the greatest variance, the second principal component explains the second greatest variance subject to being orthogonal to the first, and so on.

An important point within PCA implementation is the preprocessing of the dataset prior to matrix factorization. In most cases, datasets contain noise, unwanted variability, and outliers. Preprocessing procedures are applied to standardize the influence of each variable in the model. The most common preprocessing procedures are mean centering, where the mean value of each variable is subtracted from the dataset. Scaling is also applied to avoid variables with large ranges dominating the model, which can be done by dividing each variable by its standard deviation (Unit Variance Scaling) or by the range (Range Scaling). Outliers should also be considered in the PCA model, as they can introduce undesirable information, suggesting that the model is not suitable for other samples. A model should then be selected and used to investigate the influence of outliers in the dataset.

Another important factor to take into account when applying PCA on a dataset is the use of selection strategies for the variables. Variable selection reduces redundancy and provides a better interpretation of the results. It can be useful to identify highly correlated variables, as too many variables can mask the interpretation and visualization of the models, and in most cases, high correlation coefficients indicate that one variable should be discarded from the dataset. [20][21][22][23][15][12][24]

3. Artificial Intelligence Techniques in Analytical Chemistry

Artificial intelligence (AI) techniques are an emerging alternative for the treatment of complex analytical data, where there are problems due to noise and signal overlap. The terms chemometrics and digital signal processing are often used for the manifold applications of artificial intelligence in analytical chemistry, where the focus is on the underlying data rather than algorithmic details. Very simple algorithms entail intense data processing, and it seems foreseeable that more elaborate AI methods could be applied to analytical chemistry problems in the foreseeable future.

Analytical chemistry and industrial quality control have been at the forefront of the development of new data processing methods due to the need for the analysis of complex multicomponent samples, which amounts to a major challenge for conventional data evaluation strategies. In the beginning, simpler methods of data smoothing and signal deconvolution were applied, and these methods are still widely used. Subsequently, multivariate methods derived from PCA (principal component analysis) were developed for the analysis of multidimensional spectroscopic data. Since each analytical signal can be visualized as a multivariate data set, these approaches lend themselves easily to the treatment of data from advanced analytical instrumentation. Today, software based on model decomposition (e.g., MCR) or multivariate curve resolution (MCR) is quite common in areas such as UV, IR, NMR, and MS spectroscopy.

One of the most striking features of modern artificial intelligence developments is that they have become mainstream technologies via unprecedented applications in the public and private sectors. Since AI algorithms have been developed for the manipulation and evaluation of data with properties similar to analytical data, it can be expected that numerous and diverse developments of AI methods in analytical chemistry, and in chemometrics in particular, may arise in the foreseeable future. However, the potential and drawback of AI techniques concerning analytical chemistry applications is still not much known. The intention of this study is to provide a brief overview of AI techniques and their relevant implementation in the field of analytical chemistry, with a focus in particular on multidimensional analytical applications. Previous state of the art descriptions of new chemometric techniques or probing the deeply compressed information underlying complex data sets are built upon knowledge established in the pioneering eras of linear chemometrics (or model decomposition formulations). Understanding these earlier developments helps to elucidate either potential misuse or inappropriate expectations about the expected performance of new data treatment methods regarding noise filtering and signal recovery properties. Hence, an overview of AI algorithms is provided first, followed by a summary of applications and expected future developments in the field of analytical chemistry. [25][26][27][28][29][30][31][32]

3.1. Machine Learning Algorithms

The term machine learning (ML) refers to a computer system’s ability to learn from experiences and adapt its responses based on these experiences. The array of ML techniques, which can analyze and interpret the data patterns of interest, has been increasing consistently. These methods can be broadly grouped into supervised and unsupervised methods, including classification and regression. ML algorithms can also be categorized as shallow learning, which incorporates conventional ML methods, and deep learning methods.

Each category is complemented with well-known methods in analytical chemistry. Typically associated with supervised approaches, regression is framed under methods like Random Forests (RF), Support Vector Regression (SVR), and M5 tree methods. On the other hand, Principal Component Analysis (PCA), k-means clustering, and Self-Organizing Maps (SOM) are grouped under unsupervised approaches. Gradient boosting and deep neural networks are typically framed under advanced methods.

The variety of datasets often leads to different types of chemometric problems, such as classification, regression, and multicolumn classification. Each class of problems requires different algorithms to extract the knowledge of interest. It is important to highlight that RF, SVR, and M5 tree methods are implemented in the free R language through the following packages: “randomForest,” “e1071,” and “RSimca,” respectively. Free software and standalone applications include PALM (LSL, with PL publicly available), CMOS (MIRAI), and PLS_suite developed by R. Brereton. [33][4][34][2][35][36][37]

3.2. Deep Learning Methods

Deep learning (DL) is an emerging field of artificial intelligence that focuses on neural networks with more than three layers. Techniques such as convolutional neural networks (CNN), recurrent neural networks (RNN), deep belief networks (DBN), and variational autoencoders (VAE) are examples of DL methods. CNN treats time and spectral data of one or multiple channels as an image, enhancing or extracting spectral features, or image classification. Unlike BP-ANN networks that rely on the training set to extract data features, CNNs learn features through filtering during training, retaining spatial or temporal correlation. Using multiple convolutional layers minimizes spectral dimension and obtains more representative features. Spectral features extracted from CNN or pre-trained CNN are explored at the classification or regression stage. This sensitivity enables detection and classification similar to the human eye.

Beyond molecular structures, patterns in spectrogram images are exploited by another DL architecture: the RNN. According to the existence of either the long-term or short-term correlation in the data or system, RNNs are designed for machines to learn from sequential/temporal characteristics. The recurrent connections in RNNs allow their states to evolve over time. Depending on the time span of the memory of the state, there are two categories of RNNs: short-term memory or long short-term memory (LSTM). Potential applications in analytical chemistry may involve batch SERS spectra with an illumination time or complex time-resolved spectral data sets.

Deep belief networks (DBN) take the advantage of being capable of learning feature hierarchies in an unsupervised way. A DBN is a composition of several stacked Restricted Boltzmann Machines (RBM), where the hidden layer of each RBM is the visible layer of the next RBM. The pre-training of a DBN is based on training stacked RBMs one after another in an unsupervised way while BP fine-tuning is performed as a whole. In analytical chemistry, DBN is usually combined with other classifiers, such as SVM, KNN, and naïve Bayes, to extract hidden information from experimental data on different time scales. [38][39][40][41][42]

4. Applications of Chemometrics in Analytical Chemistry

Several chemometric methods that have been characterized and successfully applied for the quantitative and qualitative analysis of enormous analytical datasets, including spectra and images, are presented. Chemometrics has been acknowledged as a cornerstone of analytical chemistry due to its powerful capabilities in extracting meaningful information from complex datasets. Chemometrics relates to the application of statistical/mathematical methods to design or evaluate experiments and analyze chemical data. Chemometrics, underscoring its role as a pillar of modern science and technology, is in the process of continuous development, such as image- and big-data-based chemometrics. Spectroscopic methods are also acknowledged as an ideal pillar technology of a robust and sustainable analytical chemistry platform.

Chemometric methods can efficiently analyze enormous datasets from analytical chemistry. Chemometric methods are based on the technologies of sensors, signal processors, and computers, which, by combining software and hardware elements, increase the selectivity, sensitivity, and precision of chemical sensors. A broad range of chemometric tools are available, such as classic linear methods, principal component analysis (PCA), partial least square (PLS), and independent component analysis (ICA). These chemometric methods can be used for varied applications, such as constructing analytes calibration curves, spectral assignment, and identifying the number of sources of interferences or overlapping signals.

The chemometric approaches can be subdivided into two major subcategories based on the characteristics of the data utilized: (1) traditional chemometrics based on the spectral signal timescale of nanoscale to milliseconds, and (2) imaging- or big-data-based chemometrics utilizing high-dimensional data obtained from timescale imaging retrieval of milliseconds to a longer scale. To better understand these subcategories of chemometrics, the past decades of chemometric research accomplishments applied to quantitative and qualitative analyses in analytical chemistry are briefly summarized. On-chip imaging retrieval and imaging big-data-driven simple measurement and rapid chemometrics remotely achieved in a smartphone add value to analytical platforms, and the frontier of modern chemometrics in analytical chemistry is anticipated. [43][44][15][2][17][45][46]

4.1. Quantitative Analysis

Quantification of an analyte in a mixture of known and unknown components is of great importance in analytical chemistry. A major goal of chemometrics is to use mathematical and statistical methods to extract qualitative and quantitative information concerning the measured concentrations of the components in the mixture based on the analytical signals (responses) recorded from it. Several common chemometric methods for quantitative analysis, with simple experimental arrangements, data acquisition, and preprocessing steps are considered in this section. These methods can handle complicated problems that include both linear and nonlinear multivariate calibration models.

Conventional Quantitative Analysis Methods: Traditional quantitative analysis methods are considered in this subsection. These methods are typically simple to apply to common problems in a laboratory analysis. Widely used multivariate calibration methods, such as Classical Least Squares, Principal Component Regression, Partial Least Squares Regression, and Multilinear Regression to determine pure spectra and molar absorptivities or band shapes and widths – are also models denoting predicted analyte signal based on analyte concentration, interference signals and background noise. Some of the conventional chemometric models for quantitative analysis are purely linear in their analysis. Artificial Neural Network models are typically used to handle nonlinear analysis (e.g., spectra overlapping).

Artificial Neural Networks: Artificial Neural Networks are built up with a great number of nonlinear equations relating the inputs with the outputs via a known network structure and unknown coefficients. They can be trained based on nonlinear models by applying optimization methods that allow to infer unknown network parameters. ANNs are capable of estimating (nonlinearly) the output vector from the input vector, even when the statistical models do not fulfill all the necessary prerequisite assumptions (e.g., Gaussian distribution, independence of observations, etc.), unlike classical models. ANNs do not usually require complicated analytical drug characterization or sophisticated chemometric data analysis. After the neural network training is complete, the LSSP-QN could be applied for rapid analysis of unknown samples. Recent trends in the use of ANNs in chemistry for analysis of raw data with no or minimum preprocessing are depicted. ANN methodology for quantitative analysis of pyrazolones in paracetamol formulations based on diffuse reflectance UV–Vis spectra has been applied. A four-layer ANN architecture with 34 hidden neurons as the best network topology, using 22 (80%) training set and 5 (20%) validation set samples was developed.

Bio-inspired Optimization Algorithms: Involvement of biological systems as inspiration to develop mathematical and computational tools has been drawing attention from researchers in the last couple of years. Based on a mathematical model simulating evolution of living beings, Genetic Algorithms, for example, have been implemented to successfully optimize processes involving an experimental design, an efficient spectral band selection and a chemometric predictive model. Moreover, an ANN based selective prediction of Group A’s analytes concentrations using ANNs trained with Group A and B spectra and outputs has been studied. Several ANN architectures were tested, with a wide variation of hidden neurons, transfer functions and training algorithms. It is concluded that the honeybee optimization algorithms are efficient tools to solve problematic issues with poor spectra resolution and analysis of overlapping and variable sites spectra. [2][47][48][49][50][4]

4.2. Qualitative Analysis

In analytical chemistry, qualitative analysis is concerned chiefly with identifying which species are present in a sample. In this situation, the concentration of constituents is of minor importance. Qualitative analysis, as designed, indicates the identity of components or class of components in a sample. Sample constituents must be represented in the model used for analysis. Therefore, unique features used for identification of individual substances have to be extracted from the data in qualitative chemometrics. Non-targeted (untargeted) analysis is often referred to as a qualitative interpretation.

Due to the multivariate nature of the data, chemometrics is generally considered as an applicable tool for qualitative analysis. Qualitative chemometrics can be subdivided into two groups: target analysis and non-target analysis (generally considered as qualitative). A target approach means that the analyst or system must know a priori what is to be detected, class of compounds, or a specific compound. Then, a model can be developed using the predicted targets and a chemometric approach can be used. Generally speaking, targeted analyses are statistical models that rely on the extraction of one or more analytical signals, such as spectral wavelengths or chromatographic elution times and areas.

Variability of analytical data may naturally arise during sample preparation, changes in instruments, and environmental conditions, and it may influence the values of the outcomes. The art of qualitative analytical chemistry means to make sense of the collected data, which often are of multivariate nature and with uncertain effects. A chemometric approach can help to unravel the desired information. For qualitative analysis, chemometrics is often broadly used in scenarios ranging from simple univariate tests with one factor to more complex models involving several factors. Recent advances in artificial intelligence blurred the line between ‘real’ qualitative analysis in chemometrics and chemometrics-based exploratory data analysis, relying heavily on visualization and clustering techniques. [51][46][52][53][54]

5. Challenges and Future Directions in the Field of Chemometrics

The discipline of chemometrics, intertwined with artificial intelligence (AI), advances steadily but confronts notable hurdles. Herein, challenges to AI application are critically assessed alongside potential further explorations of these techniques within chemometrics.

While AI holds great promise in extending chemometrics so as to maintain relevance in vast and complex experimental data collection, specific challenges remain. Foremost among these is the computational expense needed to model AI applications with experimental data, requiring significant financial resources, time, and expertise. However, the advent of cloud-based computational technologies can mitigate such challenges, democratizing access to complex AI applications. Concerns also exist regarding the interpretability and transparency of AI techniques employed in chemometrics, such as neural networks, boosting, or multinomial regression. These techniques are often denounced as “black boxes,” or poorly interpretable algorithms, by scientists in other fields including medicine. Efforts are underway to develop methods to better understand AI comportment, but these need to be explored further within chemometrics. Owing to the excitement and hype surrounding AI applications, there also exists the underlying issue of poor AI model architecture, resulting in either models that cannot or do not work. This area likewise necessitates exploration.

In addition to challenges confronting AI use within chemometrics, myriad areas exist in which these techniques could further this discipline. Present chemometric methods primarily focus on reducing data dimensionality, perhaps failing to take full advantage of the information contained therein. A future approach could integrate AI with manifold and wavelet transforms to extract more chemometric information and potentially the underlying physics of studied systems. Other areas of exploration include integrating chemometrics with unexplored spectrometric or chromatography techniques, or with advanced AI techniques such as deep or generative models. Recent advances in basic chemometry science, such as chemical information theory, spatial data mining, or pieces of chemometric knowledge databases, could also be furthered with AI techniques.

Despite notable challenges, vast potential areas for exploration exist between AI and chemometrics, which, if well executed, hold promise to extend the usefulness, efficacy, and impact of chemometrics in the vast universe of experimental data. [55][56][30][57][58][59][60]

References:

[1] R. Houhou and T. Bocklitz, “Trends in artificial intelligence, machine learning, and chemometrics applied to chemical data,” Analytical Science Advances, 2021. wiley.com

[2] L.B. Ayres, F.J.V. Gomez, J.R. Linton, M.F. Silva, et al., “Taking the leap between analytical chemistry and artificial intelligence: A tutorial review,” Analytica Chimica Acta, vol. 2021, Elsevier. [HTML]

[3] M. Otto, “Chemometrics: statistics and computer application in analytical chemistry,” 2023. psu.edu

[4] P. Puthongkham, S. Wirojsaengthong, and A. Suea-Ngam, “Machine learning and chemometrics for electrochemical sensors: moving forward to the future of analytical chemistry,” Analyst, 2021. [HTML]

[5] DP Dos Santos, MM Sena, MR Almeida, “Unraveling surface-enhanced Raman spectroscopy results through chemometrics and machine learning: Principles, progress, and trends,” Analytical and …, Springer, 2023. springer.com

[6] P. B. Joshi, “Navigating with chemometrics and machine learning in chemistry,” Artificial Intelligence Review, 2023. springer.com

[7] R. C. Rial, “AI in analytical chemistry: Advancements, challenges, and future directions,” Talanta, 2024. [HTML]

[8] Y. Huang, “Chemometric methods in analytical spectroscopy technology,” in Chemometric Methods in Analytical Spectroscopy, Springer, 2022. [HTML]

[9] F. Adams and M. Adriaens, “The metamorphosis of analytical chemistry,” Analytical and Bioanalytical Chemistry, 2020. springer.com

[10] TS Bos, WC Knol, SRA Molenaar, “Recent applications of chemometrics in one‐and two‐dimensional chromatography,” Journal of separation science, vol. 2020, Wiley Online Library. wiley.com

[11] VN Ataide, LA Pradela Filho, BGS Guinati, “Combining chemometrics and paper-based analytical devices for sensing: An overview,” Trends in Analytical Chemistry, vol. 2023, Elsevier, 2023. [HTML]

[12] BJ Pollo, CA Teixeira, JR Belinato, MF Furlan, “Chemometrics, comprehensive two-dimensional gas chromatography and “omics” sciences: Basic tools and recent applications,” in Trends in Analytical Chemistry, vol. 2021, Elsevier. [HTML]

[13] C. Nantasenamat, “An Introduction to Chemometrics and Cheminformatics,” in Chemometrics and Cheminformatics, Wiley Online Library, 2021. [HTML]

[14] R.G. Brereton, “Chemometrics: Multivariate Statistical Analysis of Analytical Chemical and Biomolecular Data,” in Chemometrics and Cheminformatics in Aquatic …, Wiley Online Library, 2021. [HTML]

[15] HP Wang, P Chen, JW Dai, D Liu, JY Li, YP Xu, et al., “Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues,” TrAC Trends in Analytical Chemistry, vol. 2022, Elsevier, 2022. [HTML]

[16] B. Dayananda, S. Owen, A. Kolobaric, et al., “Pre-processing applied to instrumental data in analytical chemistry: A brief review of the methods and examples,” Critical Reviews in Analytical Chemistry, vol. 2023, Taylor & Francis, 2023. [HTML]

[17] M. Kharbach, M. Alaoui Mansouri, M. Taabouz, and H. Yu, “Current application of advancing spectroscopy techniques in food analysis: data handling with chemometric approaches,” Foods, 2023. mdpi.com

[18] X. Zhang and J. Yang, “Advanced chemometrics toward robust spectral analysis for fruit quality evaluation,” Trends in Food Science & Technology, 2024. [HTML]

[19] L. Iannucci, “Chemometrics for data interpretation: Application of principal components analysis (pca) to multivariate spectroscopic measurements,” IEEE Instrumentation & Measurement Magazine, 2021. [HTML]

[20] J. R. Beattie and F. W. L. Esmonde-White, “Exploration of principal component analysis: deriving principal component analysis visually using spectra,” Applied Spectroscopy, 2021. sagepub.com

[21] L. C. Lee and A. A. Jemain, “On overview of PCA application strategy in processing high dimensionality forensic data,” Microchemical Journal, 2021. [HTML]

[22] M. Greenacre, P. J. F. Groenen, T. Hastie, et al., “Principal component analysis,” Nature Reviews, 2022, nature.com. unina.it

[23] M. F. Dupont, A. Elbourne, D. Cozzolino, J. Chapman, “Chemometrics for environmental monitoring: a review,” Analytical …, 2020. [HTML]

[24] K. Kucharska-Ambrożej and J. Karpinska, “The application of spectroscopic techniques in combination with chemometrics for detection adulteration of some herbs and spices,” Microchemical Journal, 2020. sciencedirect.com

[25] Y. Xu, X. Liu, X. Cao, C. Huang, E. Liu, S. Qian, X. Liu, et al., “Artificial intelligence: A powerful paradigm for scientific research,” The Innovation, vol. 2, no. 1, pp. 100123, 2021. cell.com

[26] J. Li, M. S. Herdem, J. Nathwani, and J. Z. Wen, “Methods and applications for Artificial Intelligence, Big Data, Internet of Things, and Blockchain in smart energy management,” Energy and AI, 2023. sciencedirect.com

[27] IE Agbehadji, BO Awuzie, AB Ngowi, et al., “Review of big data analytics, artificial intelligence and nature-inspired computing models towards accurate detection of COVID-19 pandemic cases and contact tracing,” International journal of environmental research and public health, 2020, mdpi.com. mdpi.com

[28] F. Lussier, V. Thibault, B. Charron, GQ. Wallace, et al., “Deep learning and artificial intelligence methods for Raman and surface-enhanced Raman scattering,” Current Trends in Analytical Chemistry, vol. 2020, Elsevier, 2020. [HTML]

[29] DY Pimenov, A Bustillo, S Wojciechowski, “Artificial intelligence systems for tool condition monitoring in machining: Analysis and critical review,” Journal of Intelligent Manufacturing, vol. 34, no. 1, pp. 1-25, 2023, Springer. [HTML]

[30] J. M. Górriz, J. Ramírez, A. Ortiz, F. J. Martinez-Murcia, et al., “Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications,” Neurocomputing, vol. 2020, Elsevier. sciencedirect.com

[31] Z. Ahmed, K. Mohamed, S. Zeeshan, and X. Q. Dong, “Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine,” Database, 2020. oup.com

[32] Z. Shi, W. Yao, Z. Li, L. Zeng, Y. Zhao, R. Zhang, Y. Tang et al., “Artificial intelligence techniques for stability analysis and control in smart grids: Methodologies, applications, challenges and future directions,” Applied Energy, vol. 2020, Elsevier, 2020. [HTML]

[33] CA Meza Ramirez, M Greenop, L Ashton, et al., “Applications of machine learning in spectroscopy,” Applied Spectroscopy Reviews, vol. 2021, Taylor & Francis, 2021. [HTML]

[34] K. Mansouri, K. Taylor, S. Auerbach, et al., “Unlocking the potential of clustering and classification approaches: navigating supervised and unsupervised chemical similarity,” Environmental …, vol. 2024. ehp.niehs.nih.gov. nih.gov

[35] M. Alloghani, D. Al-Jumeily, J. Mustafina, et al., “A systematic review on supervised and unsupervised machine learning algorithms for data science,” in Supervised and Unsupervised Learning, Springer, 2020. [HTML]

[36] N. Verbeeck, R.M. Caprioli, et al., “Unsupervised machine learning for exploratory data analysis in imaging mass spectrometry,” Mass Spectrometry Reviews, vol. 2020, Wiley Online Library. wiley.com

[37] M. Mousavizadegan, A. Firoozbakhtian et al., “Machine learning in analytical chemistry: From synthesis of nanostructures to their applications in luminescence sensing,” Progress in Analytical Chemistry, vol. 2023, Elsevier, 2023. [HTML]

[38] B. Debus, H. Parastar, P. Harrington, et al., “Deep learning in analytical chemistry,” Trends in Analytical Chemistry, vol. 2021, Elsevier, 2021. [HTML]

[39] S. Yu, X. Li, W. Lu, H. Li et al., “Analysis of Raman spectra by using deep learning methods in the identification of marine pathogens,” Analytical Chemistry, 2021. yic.ac.cn

[40] Z. Jiao, P. Hu, H. Xu, and Q. Wang, “Machine learning and deep learning in chemical health and safety: a systematic review of techniques and applications,” ACS Chemical Health & Safety, 2020. acs.org

[41] K. Seddiki, F. Precioso, M. Sanabria, M. Salzet, “Early Diagnosis: End-to-End CNN–LSTM Models for Mass Spectrometry Data Classification,” Analytical Chemistry, vol. 2023, ACS Publications, 2023. [HTML]

[42] S. Weng, H. Yuan, X. Zhang, P. Li, L. Zheng, J. Zhao, et al., “Deep learning networks for the recognition and quantitation of surface-enhanced Raman spectroscopy,” Analyst, 2020. [HTML]

[43] H. Parastar and R. Tauler, “Big (bio) chemical data mining using chemometric methods: a need for chemists,” Angewandte Chemie, 2022. csic.es

[44] A. Paul and P. de Boves Harrington, “Chemometric applications in metabolomic studies using chromatography-mass spectrometry,” TrAC Trends in Analytical Chemistry, 2021. [HTML]

[45] B. L. Milman and I. K. Zhurkovich, “Big data in modern chemical analysis,” Journal of analytical chemistry, 2020. researchgate.net

[46] M. D. Peris-Díaz and A. Krężel, “A guide to good practice in chemometric methods for vibrational spectroscopy, electrochemistry, and hyphenated mass spectrometry,” TrAC Trends in Analytical Chemistry, 2021. sciencedirect.com

[47] S. Kern, S. Liehr, L. Wander, et al., “Artificial neural networks for quantitative online NMR spectroscopy,” Analytical and …, Springer, 2020. springer.com

[48] W. F. C. Rocha, C. B. Prado, and N. Blonder, “Comparison of chemometric problems in food analysis using non-linear methods,” Molecules, 2020. mdpi.com

[49] F. A. Chiappini, C. M. Teglia, G. Forno, and H. C. Goicoechea, “Modelling of bioprocess non-linear fluorescence data for at-line prediction of etanercept based on artificial neural networks optimized by response surface …,” Talanta, 2020. [HTML]

[50] LN Li, XF Liu, F Yang, WM Xu, JY Wang, et al., “A review of artificial neural network based chemometrics applied in laser-induced breakdown spectroscopy analysis,” Spectrochimica Acta Part B: Atomic Spectroscopy, vol. 176, Elsevier, 2021. [HTML]

[51] Y. Li, Y. Shen, C. Yao, D. Guo, “Quality assessment of herbal medicines based on chemical fingerprints combined with chemometrics approach: A review,” Journal of Pharmaceutical and Biomedical Analysis, vol. 177, Elsevier, 2020. [HTML]

[52] P.H. Stefanuto, A. Smolinska, J.F. Focant, “Advanced chemometric and data handling tools for GC×GC-TOF-MS: Application of chemometrics and related advanced data handling in chemical separations,” TrAC Trends in Analytical Chemistry, vol. 2021, Elsevier, 2021. [HTML]

[53] P. Oliveri, C. Malegori, and M. Casale, “Chemometrics: Multivariate analysis of chemical data,” Chemical analysis of food, 2020. [HTML]

[54] R. González-Domínguez, A. Sayago, et al., “An overview on the application of chemometrics tools in food authenticity and traceability,” Foods, 2022. mdpi.com

[55] H. Hua, Y. Li, T. Wang, N. Dong et al., “Edge computing with artificial intelligence: A machine learning perspective,” ACM Computing Surveys, 2023. acm.org

[56] S. Saha, Z. Gan, L. Cheng, J. Gao, O.L. Kafka, X. Xie, et al., “Hierarchical deep learning neural network (HiDeNN): an artificial intelligence (AI) framework for computational science and engineering,” Computer Methods in Applied Mechanics and Engineering, vol. 373, Elsevier, 2021. sciencedirect.com

[57] H. Wang, T. Fu, Y. Du, W. Gao, K. Huang, Z. Liu, et al., “Scientific discovery in the age of artificial intelligence,” Nature, vol. 2023. Nature.com, 2023. [HTML]

[58] W. Liang, G. A. Tadesse, D. Ho, L. Fei-Fei, et al., “Advances, challenges and opportunities in creating data for trustworthy AI,” Nature Machine Intelligence, vol. 2022, nature.com, 2022. [HTML]

[59] M. Frank, D. Drikakis, and V. Charissis, “Machine-learning methods for computational science and engineering,” Computation, 2020. mdpi.com

[60] G. Alam, I. Ihsanullah, M. Naushad, et al., “Applications of artificial intelligence in water treatment for optimization and automation of adsorption processes: Recent advances and prospects,” Chemical Engineering Journal, vol. 2022, Elsevier, 2022. ulster.ac.uk