The Application of Machine Learning in Materials Science and Chemistry

1. Introduction to Machine Learning

 

Machine learning (ML) is a subfield of artificial intelligence that allows computer systems to improve their performance on tasks through the acquisition and application of knowledge without explicit instructions. This process is often carried out using data, which, through computational techniques, allows the discovery of patterns and the extraction of relevant information. In recent years, machine learning has seen its impact in the field of materials science and chemistry. The development of powerful algorithms and the growth in the availability of data have made it possible to develop predictive models for the design of materials, for reaction discovery, for molecular simulations, for experimental measurements, and in a myriad of other applications.

 

ML techniques can be broadly divided into two categories: supervised and unsupervised learning. In the simplest form, a supervised learning algorithm is fed input/output data, and a predictive relationship is learned between the two. The goal of unsupervised learning, on the other hand, is to discover patterns in the data that can be used to understand or further process the system. Semi-supervised learning is a form of machine learning where the majority of the data is unlabeled, but there are some examples that are labeled. Moreover, many algorithms inherit characteristics from both unsupervised and supervised learning. Reinforcement learning is a type of semi-supervised learning by interactions, where the algorithms try to learn from their interactions with a system. (Hiran et al., 2021)(Udousoro2020)(Vercio et al.2020)(Rajoub2020)

 

2. Fundamentals of Materials Science and Chemistry

 

There are two fundamental fields in scientific research that directly intersect with machine learning: materials science and chemistry. Materials science is primarily concerned with the relationships between the properties of matter and its atomic and molecular structure. Chemistry, on the other hand, still focuses on the interactions of matter at the molecular level and incorporates several disciplines such as organic chemistry, inorganic chemistry, physical chemistry, and analytical chemistry.

 

From a theoretical point of view, several methodologies support these two scientific fields. The most popular theory used in organic and inorganic chemistry is based on valence bond theory and the Molecular Orbital theory. An example of the implementation of these theories can be the construction of databases and exercises of stereochemistry and organic or inorganic chemistry that derive from them. More generally, quantum chemistry produces the wavefunction of a molecule and a delocalization of the cloud of valence electrons. In computational chemistry, the molecule is approximated by an n-body Schrödinger equation or a time-dependent density functional theory that describes the evolution of molecular structure in time. These theories use pseudo potentials and multiple basis sets to solve the approximation to the molecular problem concerning molecules or atoms. When those are solved, one can calculate the state of the molecule, including electronic properties like energy, free energy, orbitals, density of state, and atomic properties, etc. These calculations are needed for small- to medium-sized molecules but are difficult to perform for very large molecules because they require a polynomial amount of time that grows with the number of atomic basis functions or known wavefunctions. (Galbraith et al.2021)(Shaik et al., 2021)(Dunning et al.2021)(Dunning and Jeffrey2021)(DiMucci et al.2023)

 

3. Data Collection and Preprocessing

 

To effectively apply data-driven machine learning to materials science and chemistry, it is often necessary to collect a sizable amount of related data and properly prepare the data for algorithms. Various considerations regarding data collection and preprocessing are also essential, including the characteristics of available data, the inherent connection between the datasets, and the physical properties of the algorithms. Overcoming the key challenges related to the quality and diversity of data, the reduction of noise from experiments and simulations, and the key features of the desired algorithm is directly tied to the solvent and the resulting quality of the obtained model. Because of these considerations, the field’s physical and chemical knowledge can and should play a valuable role in the development of machine learning techniques by using the physical insights for better data preprocessing and regularization. Such effort can make the algorithms more interpretable and reliable and can bridge the gap between statistical forecasting and physical interpretability.

 

A user of machine learning in these fields is faced with the task of collecting and cleaning data. For algorithm designers, on the other hand, one of the challenges is designing an algorithm that can accommodate and preprocess large datasets in the presence of censoring, measurement error, and other noise. Furthermore, the relevant data in these fields come with their own peculiarities, such as the failure of additivity and closure relations in experimental and ab-initio datasets, and the intrinsic sparse sampling of variables. Concentrating on the actual algorithm used, the combination of the field-specific peculiarities and the data characteristics can lead to new algorithm formulations as well. (Moosavi et al.2020)(Zhong et al.2021)(von et al.2020)(Deng et al., 2020)

 

4. Feature Engineering and Selection

 

The ability to systematically encode descriptors of materials is mandatory for successful application of supervised machine learning algorithms. The most prominent features for materials science applications are physical, chemical, and geometric descriptors. While some areas of materials research have a clear choice for what representation of a given material is needed and only the prediction algorithms would need to be trained, other areas would require extensive feature analysis. A common aspect of already chosen features lay in their evolution from first principles in which a data-driven model using supervised learning can be built.

 

The focus of feature engineering lies in finding the appropriate input definition for an artificial intelligence model. The workflow of identifying suitable features generally consists of feature extraction, exploratory feature analysis, and feature selection. This process can be executed iteratively and results in a self-refining feature selection that hugely contributes to increasing the performance of generative models. As described above, feature selection is one part of the feature engineering process and aims to select relevant, non-redundant, low noise features. Redundant or unimportant features can slow down the training process and the resulting model will have lower performance. While feature selection is a vital part of feature engineering to speed up and improve the accuracy of machine learning prediction models, too much feature selection might have negative consequences. Thus, it is important to find a good balance when designing the descriptive model for a specific machine learning task. (Himanen et al.2020)(Cheng et al.2020)(Oviedo et al.2022)(Yang & Gao, 2022)(Jablonka et al., 2020)

 

5. Supervised Learning Techniques

 

Supervised learning refers to algorithmic strategies that can be trained on labeled data to produce a given output, whether it be a physical property such as oxidation state or a more abstracted property like the sequence of atoms. In the context of materials science and chemistry, labeled data can be thought of as an input whose desired output (label) is known in advance. This output can take many forms, but typically, it consists of either predicting a material’s property or tracking dynamics related to structure—commonly referred to as regression and classification.

 

Supervised learning techniques are used to build predictive models from labeled training data. These techniques are also powerful pattern recognition tools, making them well-suited to solve a range of problems within the domains of materials science and chemistry. Given that the algorithm is trained with domain-specific knowledge, predictions can be used to intuitively understand many complex quantum phenomena. For example, in chemistry, measuring carbon-hydrogen bonds are notoriously difficult experimentally. Even so, supervised learning models exhibit significant predictive power, accurately differentiating these bonds in small molecules and polymers based on the local environment. Notably, these models can also be “interpreted”, meaning that their reasoning can be reasoned. Through interpretable models, one can attribute a specific chemical image to a model prediction, and even determine which atoms or features are important. These findings can be used to guide future experimentation by offering intuitive guidance on how one might modify a material to elicit desired properties. (Sahu et al.2022)(Dobbelaere et al.2021)(Gao et al., 2022)(Armeli et al., 2023)(Jia et al.2021)

 

6. Unsupervised Learning Techniques

 

Unsupervised learning is focused on learning from data that do not have assigned labels. The goal of unsupervised learning is to explore and apprehend the underlying structure and patterns of the data, including pattern recognition and latent variable estimation. Various methodologies are used for this, including dimensionality reduction, clustering, and manifold learning. Given the complex and high-dimensional nature of data in materials science and chemistry, unsupervised learning methods hold significant value and important applications for these fields, including but not limited to the design of complex materials, modeling of synthetic pathways and processes, anomaly detection and quality control, model validation and uncertainty quantification, drug design, model interpretability, quantum chemistry, protein-structure prediction, and water modeling. As the available materials databases and computation capabilities grow, we expect these applications to continue to grow.

 

This section is divided into two primary subsections that describe various unsupervised learning models and techniques and their applications to materials science and chemistry. We focus on both use cases and potential applications of pertinent methods in the literature. For example, organization of data is a common theme for a wide array of materials science and chemistry datasets. Each of these applications is presented, largely in part, as case studies across a wide range of subfields. We also provide tips for practitioners to use these techniques for their challenges of interest. (Choudhary et al.2022)(Morgan & Jacobs, 2020)(Cai et al., 2020)(Pilania, 2021)

 

7. Deep Learning and Neural Networks

 

Deep learning represents a form of machine learning that harnesses more complex models, called neural networks. When one considers the modeling of materials in chemistry, the advantages of deep learning over shallower or single-layer models can be considerable. In the language of neural networks, structures consisting of one layer of fully connected inputs connected directly to an output layer are called perceptrons. By stacking individual perceptrons into a multilayer model, deep learning can address data with increasingly complicated features, making them more tractable for exploration. Such representational refinements enable the capture of both local and global statistical features within data, offering the full potential of automating feature learning. Crucially, it is not necessary for a human operator to write code for each and every analysis and classification step, as the machine can now learn the data features by itself.

 

Neural networks possess three key principles. They feed forward data based on the model, meaning they present input data to serve as outputs. They also internally optimize network weights based on training and test data sets, with the primary goal being to minimize the difference between the outcome for a given input from the model and the outcome determined through real experimentation. They also leverage a full range of model evaluations in the training process since evaluating a wide range of model inputs can cancel errors (overfit) which would not occur otherwise. There are many novel tasks related to materials and chemistry systems that are afforded by neural networks. For starters, well-trained neural networks are known to estimate complex structure-property relationships in a great variety of data without necessitating additional costly experiments, such as quadrupole bulk calculations. (Westermayr et al.2021)(Morgan & Jacobs, 2020)(Fiedler et al., 2022)(Rodrigues et al.2021)

 

8. Applications in Materials Discovery

 

Machine learning (ML), deep learning (DL), and data mining techniques are now finding practical application in materials science and chemistry. The Journal of Chemical and Engineering Data and the American Chemical Society’s publications have launched the Data Repository for the Journal of Chemical Information and Modeling. The repository contains a large amount of published chemical information. In 2018, the repository expanded to host two growing databases of potential interest to materials chemists: AFlow—the automated workflow engine for materials design—and OQMD—the open quantum materials database. These are public, freely available databases that are being exploited for new materials discovery using machine learning. The integration of these approaches to materials science is having broader impacts as well. Absent from the following review are many interesting analyses that use data mining techniques to look at established and older materials technologies. However, many machine learning review papers, many of these works are technology-based and focus on models that use data-focused computational methods to predict novel materials, chemistries, and structures.

 

There is a notably strong emphasis on high-throughput density functional theory calculations as well as other simulation techniques. This reflects the state-of-the-art materials data that is being collected in machine learning by many chemists. There are excellent overviews of machine learning strategies repurposed molecular high-throughput computational techniques. Winkler, J. et al. used this approach to predict new superconductors, “AT ≥ 90%,” where A is a metal or 2D material, and T is a group metal, group 15, group 16, or triel. Kauwe, S. K. et al. have an excellent review of combining ab initio molecular subvolume parameters and machine learning for acceleration of property prediction. More generally, Jain et al. have published a review article focused on materials property prediction using first principles calculations in conjunction with machine learning. This review focuses on the potential associated with using machine learning for significantly reducing research area exploration spaces via methods such as Bayesian optimization or active learning. (Kearnes et al.2021)(Irwin et al.2020)(Hammer et al., 2021)(Williams et al.2021)(Anstine & Isayev, 2023)

 

9. Applications in Chemical Property Prediction

 

Chemical Property Prediction – In the past half century, a variety of computational techniques have been developed to predict and analyze the properties of chemical compounds. The mathematical and physical theory typically underpinning these methods has given rise to remarkable predictive successes in the understanding of molecular behavior. These models, often termed descriptors, are generally assumed to encode the relevant features of the chemical compounds. Key chemical properties that can be modeled include, but are not limited to, free energy changes, dipole moments, atomic charges, pKa values, partition coefficients, various vibrational frequencies, transition states and reaction pathways, absorption spectra, and various quantum chemical descriptors.

 

The application of machine-learning models, built upon the operation and tuning of several descriptors, has demonstrated that advances in material properties can be achieved. As seen previously, hybrid and computational chemistry approaches that use descriptors can also benefit from the predictive power of machine-learning models. Developing machine-learning models has become increasingly popular in the field of chemistry. The molecular design phase of modern computational chemistry often requires the rapid formulation of structures, the activity or property of which are unknown, literally in silico. By using experimental data from known structures, accurate estimates can be made which are normally well within the design objectives. (Morgan & Jacobs, 2020)(Choudhary et al.2022)(Moosavi et al.2020)(Guo et al., 2021)(Liu et al., 2021)

 

10. Materials Informatics and Databases

 

Regarded as an enabling capability for materials informatics, databases are essential components in enabling the management and retrieval of data. They assist in the extraction of useful knowledge using browsing and intelligent search, and managing feedback from results returned from queries.

 

Although it is clear that informatics is not based on brute force searches alone, machine learning (ML) can be envisioned as a critical tool that can test more hypotheses and material configurations than any human alone. When coupled with a vast data resource, ML can be an enabling tool for scientific discovery in the field of materials science. This opportunity has, indeed, been recognized by more than 3009 researchers who have recently published or submitted manuscripts on “machine learning” and “materials”, as well as by researchers in other fields, which is apparent from the 28,587 mentions of ML in a bibliometric analysis done for a highly regarded pharmaceutical journal. The numerous databases that run in conjunction with ML work to inform these studies take a variety of pathways in managing and automating the collection of compelling and pertinent approaches from the available global volumes of data resources that exploit the roles of non-experimentalist, pure-data, and computational databases. In this review, cheminformatics and bioinformatics inform novices elsewhere. We offer perspectives from database curators, researchers who use databases. Use of machine learning in cheminformatics will be covered separately. (Axelrod et al.2022)(Gomes et al.2024)(Duan et al.2021)(Lapointe et al.2020)

 

11. Challenges and Limitations in the Field

 

The growing number of machine learning models in materials science and chemistry over the past few years has brought with it a number of exciting research outcomes and the discovery of promising new candidate systems for a wide range of applications. However, considering the extraordinary number of subjects applied to the models—ranging from optimization algorithms to high-throughput computational screening methods—machine learning approaches of this kind have naturally met with a variety of challenges.

 

Due to the nature of scientific and experimental work, data is inherently both limited and noisy, a reality that machine learning methods are not necessarily meant to consolidate. Instead, traditional machine learning strategies produce a mean expectation or regression value, overlooking the uncertainty of the data which is subject to change over time.

Such models are difficult to validate and computationally expensive to generate large quantities of training data for, and are limited by molecular similarities that can cause dependable performance to vary from model to model. In a scientific context, however, generalizing is crucial, particularly as the quantity of data points in a given quantity of mechanistic space is inadequate.

 

A further limitation of machine learning in the field stems from prediction-derived models that do not yield untapped mechanisms or insightful observations. Doing so would require many unnecessary data points to be gathered. Furthermore, since all machine learning models require periodic retraining, extrapolation is frequently ideal, although the physical principles or relationships behind these models are extremely prone to change or become inconsistent.

These issues illuminate the main limitation in the application of machine learning in materials science and chemistry: the difficulty in establishing confidence in such models and their predictions. Moreover, should such preliminary findings not match the assumptions of governing physical, mechanistic theory, such models will not invite further experimental investigations. (Morgan & Jacobs, 2020)(Westermayr et al.2021)(Rodrigues et al.2021)(Pilania, 2021)(Juan et al.2021)

 

12. Future Directions and Emerging Trends

 

Materials science and chemistry play a crucial role in various technologies and industries and are closely related. Materials science experiments collect large amounts of data, and machine learning excels at data-driven tasks such as regression, classification, and clustering. In recent years, the workflow of materials science and chemistry has evolved with the rapid development of data-intensive, computation-intensive, and artificial intelligence (AI)-assisted research. Machine learning can help solve persistent problems in physics, chemistry, and materials science, such as materials discovery, identifying structure-property relationships, quantum mechanical (QM) calculations, and active learning. At the heart of these tasks is the integration of machine learning.

 

The purpose of this section is to provide future directions and emerging trends. Although the prediction and explanation properties of machine learning models have been integrated well with experimental processes like quantum mechanical (QM) calculations and structural optimization tasks, accurate and physically interpretable machine learning models can provide unique and insightful explanations and guide experimental design policy. On the basis of the perfect results or a large amount of data, large pretraining yield higher-performance models. Automatic machine learning (AutoML) is a successful and widely applicable method in simple tasks, especially low-complexity data analysis and cheaply scaled computing tasks. Moreover, several software communities are organizing materials-specific grand challenges to automate and supercharge workflows. Active learning, a process of curating data, also brings desirable features to materials science. Active learning boosts the segmentation and dimensional reduction of joint experimental and theoretical workflows. Semi-supervised learning, also called weak supervision, addresses the shortage of labeled data and maximizes value in unannotated datasets. Self-supervised learning, in the form of proxy or pretext tasks, can learn rich representations from large datasets that are being misused or overlooked. Through these technologies or tricks, the marriage of machines and materials is on the horizon. (Pollice et al.2021)(Wang et al.2022)(Rodrigues et al.2021)(Butler et al., 2022)(Zhou et al.2020)

13. Ethical Considerations in Machine Learning for Materials Science and Chemistry

 

Ethical Considerations: Possible or actual ethical issues that may arise for cheminformatics or materials science are to be considered in every paper. This will not necessarily be the case, but a consideration could be more along the lines of the resulting consequences for society, rather than any ethical issues resulting from the use of machine learning itself.

 

Consequences of Application: If applying ML, and especially conformity of materials to known materials can automatically be established, this will speed up and ease the computation, and also decrease the need for human involvement in structure-related innovation within materials science and materials chemistry: it is peculiar to complex, emergent materials. Either taking a human out of the loop, or restricting them to confirmatory checking of a machine-proposed solution, could actually lead to a lowering of the standards of thinking about materials, owing to the complacency that previously checking for novelty, of itself, has encouraged (and led to the propagation of trite papers that try to stuff endless meaningless statistical information down a hole in an attempt to pretend that by ignoring physics, everyone’s done a ‘great’ job). Academia may stop training enough people who can behave surgeon-like with computational structural proposals, or with the maths associated with models that are subservient to explorative imagination, and this could have adverse effects. These are much wider issues.

 

A sub-category here are those that concern the development of, and uses by third parties of, robots and automated procedures to make and test materials. What responsibilities do we have when robots do the dirty work for us? What problem do we face in our own need to ensure that things can be done safely, and that the best for us, or only a ‘good enough’ for us trade-off between benefits and risks is what is being pursued?


References:

Hiran, K. K., Jain, R. K., Lakhwani, K., & Doshi, R. (2021). Machine Learning: Master Supervised and Unsupervised Learning Algorithms with Real Examples (English Edition). [HTML]

Udousoro, I. C. (2020). Machine learning: a review. Semiconductor Science and Information Devices, 2(2), 5-14. bilpubgroup.com

Vercio, L. L., Amador, K., Bannister, J. J., Crites, S., Gutierrez, A., MacDonald, M. E., … & Forkert, N. D. (2020). Supervised machine learning tools: a tutorial for clinicians. Journal of Neural Engineering, 17(6), 062001. [HTML]

Rajoub, B. (2020). Supervised and unsupervised learning. In Biomedical signal processing and artificial intelligence in healthcare (pp. 51-89). Academic Press. [HTML]

Galbraith, J. M., Shaik, S., Danovich, D., Braïda, B., Wu, W., Hiberty, P., … & Dunning Jr, T. H. (2021). Valence bond and molecular orbital: Two powerful theories that nicely complement one another. Journal of Chemical Education, 98(12), 3617-3620. acs.org

Shaik, S., Danovich, D., & Hiberty, P. C. (2021). Valence bond theory—its birth, struggles with molecular orbital theory, its present state and future prospects. Molecules. mdpi.com

Dunning Jr, T. H., Xu, L. T., Cooper, D. L., & Karadakov, P. B. (2021). Spin-Coupled Generalized Valence Bond Theory: New Perspect i ves on the Electronic Structure of Molecules and Chemical Bonds. The Journal of Physical Chemistry A, 125(10), 2021-2050. whiterose.ac.uk

Dunning, T. H., & Jeffrey Hay, P. (2021). Beyond Molecular Orbital Theory: The Impact of Generalized Valence Bond Theory in Molecular Science. Computational Materials, Chemistry, and Biochemistry: From Bold Initiatives to the Last Mile: In Honor of William A. Goddard’s Contributions to Science and Engineering, 55-87. [HTML]

DiMucci, I. M., Titus, C. J., Nordlund, D., Bour, J. R., Chong, E., Grigas, D. P., … & Lancaster, K. M. (2023). Scrutinizing formally Ni IV centers through the lenses of core spectroscopy, molecular orbital theory, and valence bond theory. Chemical Science, 14(25), 6915-6929. rsc.org

Moosavi, S. M., Jablonka, K. M., & Smit, B. (2020). The role of machine learning in the understanding and design of materials. Journal of the American Chemical Society, 142(48), 20273-20287. acs.org

Zhong, S., Zhang, K., Bagheri, M., Burken, J. G., Gu, A., Li, B., … & Zhang, H. (2021). Machine learning: new ideas and tools in environmental science and engineering. Environmental science & technology, 55(19), 12741-12754. nsf.gov

von Lilienfeld, O. A., Müller, K. R., & Tkatchenko, A. (2020). Exploring chemical compound space with quantum-based machine learning. Nature Reviews Chemistry, 4(7), 347-358. [PDF]

Deng, C., Ji, X., Rainey, C., Zhang, J., & Lu, W. (2020). Integrating machine learning with human knowledge. Iscience. cell.com

Himanen, L., Jäger, M. O., Morooka, E. V., Canova, F. F., Ranawat, Y. S., Gao, D. Z., … & Foster, A. S. (2020). DScribe: Library of descriptors for machine learning in materials science. Computer Physics Communications, 247, 106949. sciencedirect.com

Cheng, B., Griffiths, R. R., Wengert, S., Kunkel, C., Stenczel, T., Zhu, B., … & Csanyi, G. (2020). Mapping materials and molecules. Accounts of Chemical Research, 53(9), 1981-1991. cam.ac.uk

Oviedo, F., Ferres, J. L., Buonassisi, T., & Butler, K. T. (2022). Interpretable and explainable machine learning for materials science and chemistry. Accounts of Materials Research, 3(6), 597-607. acs.org

Yang, Z. & Gao, W. (2022). Applications of machine learning in alloy catalysts: rational selection and future development of descriptors. Advanced Science. wiley.com

Jablonka, K. M., Ongari, D., Moosavi, S. M., & Smit, B. (2020). Big-data science in porous materials: materials genomics and machine learning. Chemical reviews. acs.org

Sahu, H., Shen, K. H., Montoya, J. H., Tran, H., & Ramprasad, R. (2022). Polymer structure predictor (PSP): a python toolkit for predicting atomic-level structural models for a range of polymer geometries. Journal of Chemical Theory and Computation, 18(4), 2737-2748. [HTML]

Dobbelaere, M. R., Plehiers, P. P., Van de Vijver, R., Stevens, C. V., & Van Geem, K. M. (2021). Learning molecular representations for thermochemistry prediction of cyclic hydrocarbons and oxygenates. The Journal of Physical Chemistry A, 125(23), 5166-5179. ugent.be

Gao, P., Liu, Z., Zhang, J., Wang, J. A., & Henkelman, G. (2022). A Fast, Low-Cost and Simple Method for Predicting Atomic/Inter-Atomic Properties by Combining a Low Dimensional Deep Learning Model with a Fragment Based …. Crystals. mdpi.com

Armeli, G., Peters, J. H., & Koop, T. (2023). Machine-learning-based prediction of the glass transition temperature of organic compounds using experimental data. ACS omega. acs.org

Jia, Y., Hou, X., Wang, Z., & Hu, X. (2021). Machine learning boosts the design and discovery of nanomaterials. ACS Sustainable Chemistry & Engineering, 9(18), 6130-6147. [HTML]

Choudhary, K., DeCost, B., Chen, C., Jain, A., Tavazza, F., Cohn, R., … & Wolverton, C. (2022). Recent advances and applications of deep learning methods in materials science. npj Computational Materials, 8(1), 59. nature.com

Morgan, D. & Jacobs, R. (2020). Opportunities and challenges for machine learning in materials science. Annual Review of Materials Research. annualreviews.org

Cai, J., Chu, X., Xu, K., Li, H., & Wei, J. (2020). Machine learning-driven new material discovery. Nanoscale Advances. rsc.org

Pilania, G. (2021). Machine learning in materials science: From explainable predictions to autonomous design. Computational Materials Science. sciencedirect.com

Westermayr, J., Gastegger, M., Schütt, K. T., & Maurer, R. J. (2021). Perspective on integrating machine learning into computational chemistry and materials science. The Journal of Chemical Physics, 154(23). aip.org

Fiedler, L., Shah, K., Bussmann, M., & Cangi, A. (2022). Deep dive into machine learning density functional theory for materials science and chemistry. Physical Review Materials. aps.org

Rodrigues, J. F., Florea, L., de Oliveira, M. C., Diamond, D., & Oliveira, O. N. (2021). Big data and machine learning for materials science. Discover Materials, 1, 1-27. springer.com

Kearnes, S. M., Maser, M. R., Wleklinski, M., Kast, A., Doyle, A. G., Dreher, S. D., … & Coley, C. W. (2021). The open reaction database. Journal of the American Chemical Society, 143(45), 18820-18826. acs.org

Irwin, J. J., Tang, K. G., Young, J., Dandarchuluun, C., Wong, B. R., Khurelbaatar, M., … & Sayle, R. A. (2020). ZINC20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling, 60(12), 6065-6073. acs.org

Hammer, A. J. S., Leonov, A. I., Bell, N. L., & Cronin, L. (2021). Chemputation and the standardization of chemical informatics. JACS Au. acs.org

Williams, W. L., Zeng, L., Gensch, T., Sigman, M. S., Doyle, A. G., & Anslyn, E. V. (2021). The evolution of data-driven modeling in organic chemistry. ACS central science, 7(10), 1622-1637. acs.org

Anstine, D. M. & Isayev, O. (2023). Generative models as an emerging paradigm in the chemical sciences. Journal of the American Chemical Society. acs.org

Guo, K., Yang, Z., Yu, C. H., & Buehler, M. J. (2021). Artificial intelligence and machine learning in design of mechanical materials. Materials Horizons. rsc.org

Liu, Y., Esan, O. C., Pan, Z., & An, L. (2021). Machine learning for advanced energy materials. Energy and AI. sciencedirect.com

Axelrod, S., Schwalbe-Koda, D., Mohapatra, S., Damewood, J., Greenman, K. P., & Gómez-Bombarelli, R. (2022). Learning matter: Materials design with machine learning and atomistic simulations. Accounts of Materials Research, 3(3), 343-357. acs.org

Gomes Souza Jr, F., Bhansali, S., Pal, K., Silveira Maranhão, F. D., Santos Oliveira, M., Valladão, V. S., … & Silva, G. B. (2024). A 30-Year Review on Nanocomposites: Comprehensive Bibliometric Insights into Microstructural, Electrical, and Mechanical Properties Assisted by Artificial Intelligence. Materials, 17(5), 1088. mdpi.com

Duan, C., Liu, F., Nandy, A., & Kulik, H. J. (2021). Putting density functional theory to the test in machine-learning-accelerated materials discovery. The Journal of Physical Chemistry Letters, 12(19), 4628-4637. [PDF]

Lapointe, C., Swinburne, T. D., Thiry, L., Mallat, S., Proville, L., Becquart, C. S., & Marinica, M. C. (2020). Machine learning surrogate models for prediction of point defect vibrational entropy. Physical Review Materials, 4(6), 063802. hal.science

Juan, Y., Dai, Y., Yang, Y., & Zhang, J. (2021). Accelerating materials discovery using machine learning. Journal of Materials Science & Technology, 79, 178-190. [HTML]

Pollice, R., dos Passos Gomes, G., Aldeghi, M., Hickman, R. J., Krenn, M., Lavigne, C., … & Aspuru-Guzik, A. (2021). Data-driven strategies for accelerated materials design. Accounts of Chemical Research, 54(4), 849-860. acs.org

Wang, Z., Sun, Z., Yin, H., Liu, X., Wang, J., Zhao, H., … & Yu, X. F. (2022). Data‐Driven Materials Innovation and Applications. Advanced Materials, 34(36), 2104113. ntu.edu.sg

Butler, K. T., Oviedo, F., & Canepa, P. (2022). Machine learning in materials science. [HTML]

Zhou, Q., Lu, S., Wu, Y., & Wang, J. (2020). Property-oriented material design based on a data-driven machine learning technique. The journal of physical chemistry letters, 11(10), 3920-3927. [HTML]

Scroll to Top