The Integration of Hybrid AI and Open-Source Tools for Molecular Design

1. Introduction to Molecular Design and the Role of AI

Molecular design consists of creating new molecules based on quantum mechanics. This involves both geometry (where the atoms are), bond orders (whether or not atoms are connected), and charges. Properties following from these are known functions, such as energies, frequencies, multipole moments, and more complex interactions including many-body potential terms. The function (or that suits the function) can be served as input parameters to another quantum mechanics package, where the molecule will have a functional purpose. This can be in simulating interaction energies between individual molecules or in lattice-scale models predicting a composite material’s property.

Molecular design generally refers to the creation of molecules with a specific geometry and energy to calculate possible interactions. One of the tools of molecular design that is growing in complexity is the capsule networks, which appear to be a promising tool for quantum-inspired molecular design. These typically involve a generative adversarial network that is able to assist in creating new compounds following a specific function after learning from sometimes massive amounts of data. AI imaging technologies can, for instance, learn the relationship of elements and their crystalline compositions in oxide materials. While these technologies promise to significantly speed up the advancement of new materials, raw computer power alone does not assure quality results. Currently, the advancement of a robust material needs to simultaneously involve human intuition and innovation matched with simulated and incremental AI advances. Each element can only process maximum information transfer according to the fundamental limitations of the overall energy pattern of the medium. (Simm et al.2020)(Alshehri et al., 2020)(Meyers et al., 2021)(von et al.2020)

1.1. Overview of Molecular Design

Molecular design is the rational design of molecules to exhibit certain desired properties. For most scientists, this involves designing molecules that can be used for a certain purpose, such as in pharmaceuticals, materials, or research tools. To enable this, we must be able to manipulate more than one of the molecular properties to obtain the required output. For example, a chemical probe used to investigate a cancer-related protein identified as a potential therapeutic target should elicit a biological response via modulation of the protein and reverse the disease pathology. They therefore need to be suitable for use in a range of molecular, cellular, and in vivo experiments. However, to reduce their cross-reactivity, they should not exhibit cytotoxicity or similar side effects in other relevant samples. A computational model is better at predicting the effect of manipulating molecular properties. Errors in determining relevant molecular properties can be mitigated by measuring an orthogonal property and, effectively through optimization, reducing unwanted changes in the output.

Traditionally, molecular design is performed by moving up the property ladder towards the required application, from the physicochemical to the in vivo effect. However, a sectional approach is needed, incorporating images as necessary. Many groups have developed methods to use AI to perform molecular design, particularly to achieve a desirable property. Simply put, molecular design can be described as a cycle where output measurements are used to design new molecules with improved properties. Most objectives, the methods used to design molecular candidates, and the tools incorporated in the design model can differ, based on the available computational facilities, the available data, and the parameters that can be explored. The system variations that have to be incorporated require increasing vigilance. This is important as it increases collaboration among interdisciplinary researchers, allows detailed discussion, and contributes to the production of data as complete as possible with coherent experimental and computational plans. Extra considerations exist when incorporating AI in the molecular design process, one of which is the black-box nature of some AI models. This can reduce new molecular designs to those with certain specified properties that do not lend themselves to chemical synthesis or have a more detrimental effect on the activity that can be quantified. There is also a lack of chemical intuition in the results obtained with these models, whereby newly designed series of molecules afford either no increase in the predicted activity or a vast number of diverse chemical groups to be synthesized, but no true relevant design scaffolds. It is something to consider thoughtfully. (Jun et al.2020)(Chatzigoulas and Cournia2021)(Woolfson, 2021)(Jongruja, 2020)

1.2. Importance of AI in Molecular Design

The integration of artificial intelligence (AI) tools has significantly altered the way that people approach molecular design. In particular, it has allowed researchers to leverage extremely large datasets for improved analysis of existing compounds in order to identify the most probable structures that are likely to have the lowest energy and, therefore, are likely to exist, or to probe the probability of specific properties given structural characteristics. With the help of this ‘big data’, AI techniques, especially those that are based on machine learning algorithms, can be effectively used to predict molecular and materials properties.

The introduction of machine learning algorithms, some of which evolved from the fields of computer science and statistics, is making the molecular design process more efficient in various ways. It is automating simulations and the distillation of complex data, freeing the researcher to focus on the creative aspects that will, in the end, control computational experiments. The tools are generally designed to interact with humans, giving predictive power while still allowing the human operator to leverage knowledge or intuition that sits outside of the capacity of the AI package. When used together, humans and AI can effectively exploit information from flows of data in ways that best cover what each effectively brings to the table. Further, one of the most exciting outcomes of these AI-driven molecular design methods is their capacity to engage in automation and exploration of chemical space on its own terms, leading to non-intuitive and potentially novel molecular design spaces. Despite these strengths, several challenges exist in the use of AI. In particular, robust, relevant, and varied data that are required to train these algorithms can sometimes be difficult to access. (Selvaraj et al., 2021)(Schneider et al.2020)(Gupta et al.2021)(Staszak et al.2022)

2. Open-Source Tools in Molecular Design

Open-source tools have become a cornerstone for advanced workflow development. In addition to reducing the barrier to entry into the field and dissemination of results to a broader audience, open-source platforms have become a place for methodological development to facilitate the testing and improvement of user-submitted algorithms and models. Thus, open-source tools are responsible for some of the cutting-edge advances in the field as researchers can build off of one another’s methods, customizing tools for their own unique purposes. Users of the best tools can also provide feedback to developers to make open-source tools even better in the future. As the majority of the open-source tools we present in this review have open codebases, they are ripe for individual modification to suit the user’s specific needs.

For molecular design specifically, open-source tools and databases have made it easier for scientists to develop and experiment with innovative approaches that address modern challenges, such as the scarcity of data in the antimalarial drug space. Many fields make use of the tools we highlight in this section because of their sophisticated or state-of-the-art capabilities, meaning they offer newer, broader options for potential application and advancement in multiple research areas. Moreover, the open-source tools we present in this section reduce the necessity for extensive resources in computational and technical skills to participate in molecular design, furthering that field’s accessibility to researchers in resource-limited settings. Open-source platforms are community-driven, and sustainability is governed by user interest and commitment. This means that the platforms capitalize mainly on the work put forth by the community of users who contribute, make suggestions for improvement, and cite the use of said platforms in their research. (da et al.2020)(Coleman et al.2022)(Ziegler et al.2020)(Poinet et al.2020)

2.1. Advantages of Open-Source Software

In addition to advances in desktop tools, a large collection of open-source software written in Python has been developed for molecular design. There are a number of appealing reasons why open-source tools could be embraced for molecular design and could be beneficial in hybrid AI toolkits. First, maintenance and feature improvements are performed in public, and the user community can continuously take advantage of the enhanced tools. Second, open-source software provides transparency in the sense that users can refer to and inspect how molecular generation is performed. This is particularly instructive for scientists who are interested in not only finding a good compound but also in understanding the underlying algorithms. Third, open-source approaches pave the way for other researchers to validate their tools or reproduce their findings. Fourth, an open-source approach enhances innovation and, at the same time, reduces the duplication of effort: any researcher can access advanced software tools without having to pay a licensing fee. Furthermore, there is potential to adjust the source code to adapt the tools to meet specific research objectives.

To make applications and a fully integrated domain-agnostic hybrid AI tool, all of the aforementioned advantages of open-source software need to be harnessed. Specifically, such hybrid AI toolkits need to make transparent how different subsystems communicate with each other, thereby increasing the interpretability of the systems. In addition, to encourage more involvement in biochemistry research regarding their availability, openness, and accuracy, such hybrid AI toolkits should provide an efficient and informative community forum and updated user manuals. In summary, open-source software has made great progress in desktop tools and could enhance hybrid AI toolkits. (Astropy et al.2022)(Bhandari et al.2021)(Price-Whelan et al.2022)(Ohm et al.2020)

3. Hybrid AI Systems

Artificial intelligence (AI) denotes different theories, methodologies, and techniques aiming at improving decision-making and problem-solving capabilities of complex systems, especially in identifying objects or patterns with specialized tools and making particular classifications. Hybrid AI is a combination of at least two methodologies, where each method is at least one of the branches of artificial intelligence, such as machine learning, knowledge-based systems, expert systems, fuzzy systems, genetic algorithms, and neural networks. The methodologies are integrated with the goal of achieving results more powerful than an individual system. The reason for working with ensemble or hybrid systems is that they exploit two different approaches that can be expected to compensate for the weaknesses of one another. The system could compensate for the lack of one component with the knowledge or experience of the other component, offering a more robust opinion on the problem at hand. Hybrid AI approaches belonging to the scientific area of computational chemistry are also gaining consensus in the field of molecular design. This latter field is a subdomain of cheminformatics, which is being developed to enhance the power of computer-aided drug discovery in the short and long term. Basically, the hybrid AI approaches aim to take advantage of the ability of machine learning to adaptively learn from data to model trends, correlations, and non-linear interactions in the chemical basis sets, regardless of how unaware of exploitable relationships. A great advantage of the hybrid AI approach is derived from combining the strengths of knowledge-based systems and machine learning, which can be more successfully applied to molecular design tasks, thanks to their ability to incorporate domain-specific knowledge. Traditionally, researchers and practitioners of various fields solve these problems by using expert knowledge of domain engineers in combination with a programming technique derived from computer science to build an expert system. Hybrid systems are favored when the problem is complex, and a single technique cannot address the different aspects of the problem, as is the case for automated molecular design. Moreover, hybrid systems can deal with many challenging problems found in the real world in research and industry. Despite the many times hybrid systems have been integrated into the solvent selection system, facilitating design, and issues to be used for solubility value construction, there are multiple problems to be addressed for the development and maintenance of such hybrid expert systems. (Selvaraj et al., 2021)(Jiménez-Luna et al.2021)(Grisoni et al.2021)(Pasrija et al.2022)

3.1. Definition and Components of Hybrid AI

The term “hybrid AI” encompasses the idea of integrating mature AI techniques, ranging from machine learning to expert systems, which have been developed independently and grown to a considerable extent. However, the hybrid AI framework in this work relies on or hybridizes a variety of the latest methodologies such as machine learning, deep learning, symbolic AI, and graph/network machine learning. As a result, in contrast to traditional hybrid AI systems, our hybrid AI system integrates a variety of AI techniques that have been independently developed to a significant extent or that already have a broad, standalone ecosystem. Consequently, the current framework has the potential to be both state-of-the-art and explainable. Several closely associated areas that can be used in building hybrid AI are graph machine learning, differentiable programming, predictive accuracy maximization, and automated ML. This research integrates these methodologies in a complementary fashion to solve the key problems in molecular design. This definition implies that a hybrid AI system can work directly via multiple input-derived simulations/images, rather than a small number of molecular calculations/simulations. This means a hybrid AI system can work via multi-fidelity input data and achieve a better and more robust solution. A hybrid AI system can perform continuous learning, which a simulator cannot. Continuous learning implies that the quality of solutions output by a hybrid AI can potentially improve over time. This is carried out via a repair mechanism inherent in all learning systems that can use data to observe the quality of their existing protocols and subsequently resolve any inadequacies. The hybrid AI system is a semiconductor lithography design system that reveals insights on AIV and manual design. The system can also be applied to physical and statistical simulation outputs. The simulator can be data-heavy or data-light systems allied to symbolic and non-symbolic simulations. The learning operating on simulators is generally referred to as “enhancement” systems in the sense that they directly refine or update the simulator and do not directly draw conclusions or actions from actual trends in the raw data. Finally, the AI system is guided by expert examiners with deep knowledge and understanding of the domain who can use reasoning as well as take advantage of readily available data from simulators. In the same vein, the system can be used to make changes to the simulator and learn from nearby data in order to educate. The AIV operation uses insights from the physical mean, ordinary mathematical models, and the actual simulator itself. The hybrid AI system part of exhibit provides a compression of all the meaningful events from figures. It’s an artful adaptation of figures by focusing on the revision. It is just the right blend of the major tenets of figures. This is reflected throughout exhibit. The system is a symbol of what is paramount in molecular design or intelligent machine learning and it resembles a typical hybrid scheme. The focus of the system is on the learning scheme, but it also has a “Connection to Simulator.” The main parts of the system are a Symbolic Intelligence, an interfacing layer, a Data Manipulation Layer, and a mathematical simulator. This flows through the finetuning layer. Note that in this chart, we spoke of “connection,” and not “update.” The term ‘connection’ indicates that the main handling tool in the “AIV operation” device of the exhibit. (Pianykh et al.2020)(Zhou et al.2021)(Zheng et al.2020)(Neunert et al.2020)

4. Applications of Hybrid AI in Molecular Design

Knowledge-based hybrid AI techniques, which incorporate domain-specific knowledge to support the quantitative data-driven part of the ML model, are powerful tools for accelerating the molecular design process. They are used in applications like drug discovery, although AI systems used for that purpose are usually considered black-box models uninterpretable for a human researcher. Further domain-specific applications of hybrid AI models developed for molecular design can be found in materials science. For example, they can help in the discovery of materials with desired properties using high-throughput computation or data mining, as well as in the optimization of magnetic anisotropy of transition metal complexes. Several case studies validate the practical effectiveness of hybrid models for these purposes. In general, concurrent exploration of both data- and knowledge-driven data in the hybrid methodology allows for very accurate and interpretable predictions to be achieved. For this reason, AI systems collaborate with human researchers, leading to dual learning systems and creating a synergistic loop of improvement.

An example that highlights the dual learning idea is the prediction of the activity of biopolymers influencing the hydric swelling and reaction kinetics. The hybrid model incorporated fuzzy logic to capture the knowledge of domain experts, who have developed a set of rules based on which the activity can be inferred from the feed ratio of two reactants. For this reason, the AI system itself is a black box, since information contained in easily understandable rules is separable by human experts. The computer analysis, however, was very important here, as the AI model combined the original and numerical results from the two governing experimental techniques, including the mass balance, and needed possible contraction to meet it. Thus, once scientific databases in molecular design are rich in both qualitative and numerical data, it is clear that, in scientific databases where both input and output are in a qualitative form, or can be somehow transformed in a way that data from these two areas can be useful, dual learning systems are of great added value. For instance, generally in computer chemistry, outputs of quantum chemical calculations are used after some form of transformation such as descriptors, while employing them in a qualitative manner, painting an even bigger picture, is also capable of obtaining record-breaking predictions. While the AI community has been embracing this idea for years, the trend in applications of hybrid techniques in molecular design, including pharmaceuticals or advanced synthesis, was somewhat limited. (Wang et al., 2022)(Jin et al., 2022)(Erge and van Oort, 2022)(Chakraborty et al.2022)

4.1. Drug Discovery

Molecular design, the design of new molecules and invention of new products at the molecular level, includes a variety of different applications, such as materials science, agroscience, and biophysics. This section focuses specifically on drug discovery, which is one of the most important aspects of molecular design. Novel AI-based methods are revolutionizing the drug discovery landscape, particularly due to their superior ability to amalgamate information from thousands or millions of experimental data points to make predictions. Since the late 20th century, technologies such as NMR, mass spectrometry, and other aptamers have been used to study protein-ligand interactions. A combination of robotics, virtual screening to predict interactions, and computational chemistry packages have been used to find lead compounds. Furthermore, physicochemical descriptors, statistical methodology, and machine learning can all be used to find lead compounds.

Different AI-based technologies have been used to predict the relevant biological parameters for potential drugs, considering billions of combinations to use a drug for various new applications. The main goal of the newly developed AI-based systems in the early stage, including in silico ligand discovery, is to further reduce time and costs of those AI systems. Until now, pharmacokinetics, ADMET, and machine learning techniques via structure-activity relationship and quantitative structure-activity relationship principles have been used to reduce HTS data, either to filter actives and inactives or to design specific criteria for in silico screening. Moreover, machine learning techniques can be combined to accumulate biomedical data to increase the predictability of the new AI systems. Many APIs are available for computational chemistry, and quantum computing is present using a new hybrid AI system. Combining the expertise of a medicinal chemist with the computational power of a computer can help in the search for new drugs. Thus, the drug development time is reduced from years to just a few weeks or even months. (Tiwari et al.2023)(Chen et al., 2023)(Mak et al.2023)(Sarkar et al.2023)

4.2. Materials Science

Materials science offers a broad range of applications to predict materials’ properties like semiconductors and materials for electronic technologies, conductors, insulators, biodegradable or implantable biomaterials, and drugs. The newest computational techniques can integrate data from high-throughput measurements and predict properties in the most complex of systems due to the exponential growth of computing power that enables advanced modeling of complex systems. Machine learning models address the limitations and parameterization issues associated with quantum models and can be used in a hybrid approach to predict the most challenging regime: phase transitions, surface properties, functional changes, and thermal transport effects. Integration of AI, specifically in the area of molecule prediction, allows the rapid identification of new candidates that have desired properties. In addition, AI can be used to optimize processes or to find specific material properties that can also be utilized in combination with advanced method predictions. As a result, AI and hybrid-approach predictions have been used to design and predict ultra-light biomaterials.

Obtaining sufficient diverse data and stimuli conditions is a challenge in the real world when gathering the many chemical properties needed to develop predictive models. A major limitation of a data-centric design approach using a hybrid AI-quantum model approach includes the availability and quality of data that sometimes has limited diversity, which can be biased towards one type of experimental method for obtaining data. Much of the problem with real materials data, particularly for inorganic and mineralogy data, is poor and is difficult to integrate into databases because the perception of the chemical identity evolves and changes. A hybrid approach can also be used for process optimization in materials; for example, reducing annealing time for microstructure features in Al alloys. A key bottleneck in the actual integration of AI in molecular design is the need for interdisciplinary knowledge between computer science, theoretical sciences, and experimental scientists, as much of the data used for quantitative AI predictions. (Selvaraj et al., 2021)(Allal-Chérif et al.2021)(Melo et al.2021)(Vatansever et al.2021)(Jiménez-Luna et al.2021)

5. Challenges and Limitations

Realizing the integration of hybrid AI with open-source tools for molecular design can still have several limitations. First, the availability and accuracy of data play a significant role in generating AI models as well as open-source tools built upon this data. In practice, if the data are insufficient and biased, the predictions will not significantly improve the cost and may also lead to incorrect predictions or results. Especially in molecular AI, if the data for any molecules are missing, the model cannot account for the behavior of that molecule. In addition, since the AI models and open software are trained on the experimental distribution of chemical space, drawing diverse and well-represented training data is critical to obtaining accurate predictions, which most practitioners sometimes do not realize or disregard. In the case of a skewed dataset, the AI model will produce biased and more uncertain predictions for the neglected minority examples.

Second is interpretability and explainability. As a service model, understanding how the decisions are generated is critical to building trust and credibility in the results. It is a challenge in AI to see how different techniques can be interpreted and to explain the model’s decisions effectively, especially in complex compounds or hybrids of different AI/ML methods. It is important to combine the open software of the AI model with other software tools that explain AI. New approaches are examples of software tools created to interpret and explain AI models. In addition to that, privacy concerns and issues related to algorithmic bias due to imbalanced training data and ethical challenges also arise. The complexity of integrating multiple techniques also presents challenging issues. The integration effort may include unforeseen interactions and unexpected outcomes that may result in incorrect or incomplete execution of the software. (Pachouly et al.2022)(Agbehadji et al.2020)(Shams et al.2021)(Albahri et al.2023)

5.1. Data Quality and Quantity

One of the crucial features for the success of AI and hybrid AI applications is the availability of high-quality data. A great deal of time and effort has therefore been dedicated to constructing, validating, and cataloging large datasets of physically relevant molecular data, for example, solubilities, reaction yields, and other chemical properties across functional moieties and chemical reactions. The guiding principles advocate the use of metadata standards, robust data stewardship, performant data storage, and seamless data sharing in fostering widespread data usage. This section considers more deeply the issues of the quality and quantity of data within the context of hybrid AI for molecular design. The scalability of the AI drug discovery models in practice demands the analysis of tens of millions of compounds. This is unattainable using only high-quality experimentally measured or theoretically calculated data. To overcome this data scarcity, synthetic datasets can be generated through de novo design of chemical reaction rules typically carried out very efficiently using deep learning generative models. Case studies have highlighted that a lack of attention to data quality has led to multi-parametric models with significant prediction errors. Ultimately, researchers who specialize in machine learning applications to de novo drug discovery emphasize that there needs to be continuous monitoring of the input data to ensure that their models are up to date and their predictions remain deployable. This indicates that in molecular design applications, data-dedicated researchers have greater attention to strategy for long-term data management, storage, and quality. (Aldoseri et al., 2023)(Saravi et al.2022)(Sollini et al.2020)(Albahri et al.2023)

5.2. Interpretability and Explainability

An important challenge for building trust in hybrid AI systems for molecular design is that of interpretability and explainability. It is crucial for researchers to understand how AI models make decisions in order to build confidence in these systems, especially when moving outside known chemistry. Additionally, scientists need to be able to examine the conclusions of AI models to determine their usefulness. Visualization and saliency maps can be used as prototypical examples to communicate model inferences. Explanatory models can be further developed to replicate the primary models’ predictions using understandable and tunable units. Researchers who are striving to transition to hybrid AI could progress more rapidly by availing themselves of insights from the literature of interpretable machine learning and deep learning.

The prevalence of AI black-box models is due to the complexity of models used in molecular design being capable of generating high predictive accuracy. Model interpretability and accuracy may not be readily reconcilable with some technologies available to date, including decision trees, decision sets, rule lists, and linear models. The rule lists, for example, display a reduction in predictive accuracy as justified by the nested suboptimal lists. However, rules, unlike causal correlations in scientific dialogue, are decision support classifiers and provide a different type of information, primarily an indication of the size of resultant uncertainty and its consistency through the trees. One solution to this impasse is to use a hybrid AI that incorporates models within the explanatory models that are interpretable. Another approach for building trust in hybrid AI is to develop user-friendly front ends with which modelers can interact in the course of chemical design. Whether rule lists built for the given complexity of chemical data and their particular resolution, such as in fragmentary bioactivity data, will indeed be scientifically meaningful will largely depend on the thought and data input by the modeler as well as the precision applied by the AI software.

Public institutions have developed regulations surrounding the interpretability of final decisions that humans are always able to make a decision. In essence, AI models—bypassing human guidance and/or confirming critical judgments—will be held to greater transparency. Ethical and generally societal reasons support increased interpretable decisions, particularly in scientific knowledge gathering. Rapid access to scientific decision-making by transparent human and/or machine-manufactured rules is essential. Regulatory and ethics agencies might be well served to request the validation of machine learning products to be explained in the example of transparent AI decision support creation. Concerns about the transparency of AI systems are growing and are beginning to become regulating laws in different jurisdictions, for example, the right to explanation. A precursor to this is the advance of machine learning, generally speaking. The research MDs—as well as other machine learning tools—are advancing at great speed. Given the explosive pace, the more readily understandable the scientific conclusions are within the dialogue, the more use of these will be made by humans and not just scientists in creating support suggestions. Thus, an understanding of the background for any decision would be beneficial. It may also foster the understanding of an AI system in your study. In turn, this may be determinative in investments and thus market acceptance. (Krishnan, 2020)(Dunnington et al.2020)(Rudin et al.2022)(Marcinkevičs and Vogt, 2020)

6. Future Directions in Hybrid AI and Open-Source for Molecular Design

While we are already witnessing a race between researchers to increase the predictive power of rational molecular design using even more biased and fine-tuned machine learning algorithms, further improvements could easily be found by developing ethics into the algorithms or by understanding our main goal as integrated predictive systems with safety built in. Although potentially expensive in the short term, collaborative approaches could be more quickly adopted for a shared better future. The potential development of new platforms and workflows to facilitate AI model deployment even without endpoint sharing is a next step in predictive toxicological assessment.

The impact of the above discussion is not just in enhancing the existing predictive power of AI applied to molecular design for safer-by-design CP/CB. It has also enabled other catalytic collaborations, bringing the tools developed into the data collection tools to further increase the shared open materials for the AI domain to explore. A thoughtful and iterative process of real mixed AI and multi-disciplinary teams within open-access academies is vital in shaping the future data-driven molecular design landscape. Deploying global data within the AI community can help modelers test current AI tools using real big data and can pave the way to regulatory question/comment collections from AI results to ensure the tools are also interpretable and usable within realistic settings. Ethical AI demands such shared discussions to stop unnecessary repetition and data/resource waste. Regulatory testing regimes for real-world safety unfortunately still require large, expensive, and ethically challenging studies. Artificial Intelligence should help us change this and ensure efficient use of existing big data.

6.1. Emerging Technologies

The continuing emergence of new and radically different technologies has the potential to significantly change how we might consider integrating hybrid AI and open-source tools. Powerful new computational paradigms are being born from the tremendous strides being made in quantum computing and scalable AI in the cloud. The parallel processing capabilities and potential competitive boosts made possible by these platforms are only starting to be appreciated and harnessed. New, more efficient methods of data processing, visualization, tackling novel molecular design problems, and supporting decision-making are also under development and worthy of exploring.

Further, the key underpinning of the emerging technology trend is also shared by traditional practice in molecular design—the increasing importance of data and the need for efficient and automated processing, analyses, and insights. Open-source tools often have data-centric capabilities that support these needs, providing data management, biological annotation enrichment, and AI-based functionality. These tools are moving from relatively simple, easy-to-use plug-ins to support computational chemistry efforts to more sophisticated platforms that are enabling big-data approaches to drug design with a wide range of potential use cases. We believe it is important to keep an eye on these emerging technologies in case they make their way into relevant applications our rapidly expanding ontologies may start to make possible. Quantum computing platforms seek out research collaborations to help come up with novel use cases for their technologies to solve major complex challenges. Cloud-based AI has toolkits for drug discovery. A few examples are noted through this section to give a flavor and some ideas to the kinds of collaborations we might engage in.

6.2. Ethical Considerations

Ensuring data privacy and security are the main ethical considerations involved in integrating hybrid AI and open-source tools in molecular design. It must take into account the sensitivity of biological information that will be publicly accessible through its project website and datasets. This level of scrutiny is further expanded online as countries around the world release their own guidelines and reports focused on the responsible use of digital tools.

With recent research documenting the occurrence of bias in AI systems, it becomes evident that important consideration must be given to this issue. Given the predictive nature of the AI systems to be employed in molecular design, ensuring fairness and equity in their predictions is of prime concern. Both documented and anecdotal experience confirm that predatory commercial strategies are employed in the biological and chemical market-making space, with little regard for public health implications, and this type of approach runs counter to the intention behind the use of open-source AI for molecular design. We advise applying AI in contexts that are protective of human rights and human dignity. Finally, as is required in any development process, the end users should be able to fully understand and support the designs and discoveries coming out of the molecular AI pipeline. For such transparency to be achieved, stakeholders should not only comprehend and trust resulting outputs, but they should also understand the tools and processes used to generate these results.

While the estimated impact of AI on job creation and destruction varies according to the study, projections suggest an overall displacement of labor. The deployment of new healthcare AI tools could lead to an increase or shift in the required human asset base. Skills and talent are the single fastest growing resource cost in the field of AI, and without changes in government training and regulatory practices, there will be a shortage of an estimated 250,000 data scientists by the year 2024. The necessity of proactive workforce development is not lost on recent reports. Finally, regulatory and ethical oversight to mitigate risks associated with AI use must be developed before they reach a critical level of reformation. Should a lack of guidelines in molecular design lead to an unsafe state of affairs, ethical responsibilities may be addressed legally. To ensure the responsible development and deployment of AI systems, we recommend: (1) Design sociotechnical systems to be robust, resilient, and properly aligned with human values through safety mechanisms; (2) Take due care in fixing errors made by AI so as to not undermine human welfare; (3) Develop better governance protocols for the ethical use of AI technologies; (4) Press for broad societal and state investments in the responsible research and development of AI. (Trump et al.2023)(Niazi and Mariam, 2023)(Thapa and Camtepe, 2021)(Boldt and Orrù, 2022)

References:

Simm, Gregor, Robert Pinsler, and José Miguel Hernández-Lobato. “Reinforcement learning for molecular design guided by quantum mechanics.” In International Conference on Machine Learning, pp. 8959-8969. PMLR, 2020. mlr.press

Alshehri, A. S., Gani, R., and You, F. “Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: State-of-the-art and future directions.” Computers & Chemical Engineering (2020). [PDF]

Meyers, J., Fabian, B., and Brown, N. “De novo molecular design and generative models.” Drug discovery today (2021). sciencedirect.com

von Lilienfeld, O. Anatole, Klaus-Robert Müller, and Alexandre Tkatchenko. “Exploring chemical compound space with quantum-based machine learning.” Nature Reviews Chemistry 4, no. 7 (2020): 347-358. [PDF]

Jun, Joomyung V., David M. Chenoweth, and E. James Petersson. “Rational design of small molecule fluorescent probes for biological applications.” Organic & biomolecular chemistry 18, no. 30 (2020): 5747-5763. rsc.org

Chatzigoulas, Alexios, and Zoe Cournia. “Rational design of allosteric modulators: Challenges and successes.” Wiley Interdisciplinary Reviews: Computational Molecular Science 11, no. 6 (2021): e1529. [HTML]

Woolfson, D. N. “A brief history of de novo protein design: minimal, rational, and computational.” Journal of Molecular Biology (2021). bris.ac.uk

Jongruja, N. “Antimicrobial peptide engineering: rational design, synthesis, and synergistic effect.” Russian Journal of Bioorganic Chemistry (2020). [HTML]

Selvaraj, C., Chandra, I., and Singh, S. K. “Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries.” Molecular diversity (2021). springer.com

Schneider, Petra, W. Patrick Walters, Alleyn T. Plowright, Norman Sieroka, Jennifer Listgarten, Robert A. Goodnow Jr, Jasmin Fisher et al. “Rethinking drug design in the artificial intelligence era.” Nature reviews drug discovery 19, no. 5 (2020): 353-364. ucl.ac.uk

Gupta, Rohan, Devesh Srivastava, Mehar Sahu, Swati Tiwari, Rashmi K. Ambasta, and Pravir Kumar. “Artificial intelligence to deep learning: machine intelligence approach for drug discovery.” Molecular diversity 25 (2021): 1315-1360. springer.com

Staszak, Maciej, Katarzyna Staszak, Karolina Wieszczycka, Anna Bajek, Krzysztof Roszkowski, and Bartosz Tylkowski. “Machine learning in drug design: Use of artificial intelligence to explore the chemical structure–biological activity relationship.” Wiley Interdisciplinary Reviews: Computational Molecular Science 12, no. 2 (2022): e1568. wiley.com

da Silva, Rafael Ferreira, Loïc Pottier, Taina Coleman, Ewa Deelman, and Henri Casanova. “Workflowhub: Community framework for enabling scientific workflow research and development.” In 2020 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), pp. 49-56. IEEE, 2020. nsf.gov

Coleman, Tainã, Henri Casanova, Loïc Pottier, Manav Kaushik, Ewa Deelman, and Rafael Ferreira da Silva. “Wfcommons: A framework for enabling scientific workflow research and development.” Future generation computer systems 128 (2022): 16-27. sciencedirect.com

Ziegler, Erik, Trinity Urban, Danny Brown, James Petts, Steve D. Pieper, Rob Lewis, Chris Hafey, and Gordon J. Harris. “Open health imaging foundation viewer: an extensible open-source framework for building web-based imaging applications to support cancer research.” JCO clinical cancer informatics 4 (2020): 336-345. ascopubs.org

Poinet, Paul, Dimitrie Stefanescu, and Eleni Papadonikolaki. “Collaborative workflows and version control through open-source and distributed common data environment.” In International Conference on Computing in Civil and Building Engineering, pp. 228-247. Cham: Springer International Publishing, 2020. [HTML]

Astropy, Collaboration, Adrian M. Price-Whelan, Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, and Attila Bódi. “The Astropy Project: sustaining and growing a community-oriented open-source project and the latest major release (v5. 0) of the core package.” Astrophysical Journal 935, no. 2 (2022). mtak.hu

Bhandari, Guru, Amara Naseer, and Leon Moonen. “CVEfixes: automated collection of vulnerabilities and their fixes from open-source software.” In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 30-39. 2021. acm.org

Price-Whelan, Adrian M., Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, Larry Bradley, David L. Shupe, Aarya A. Patil et al. “The astropy project: sustaining and growing a community-oriented open-source project and the latest major release (v5. 0) of the core package.” The Astrophysical Journal 935, no. 2 (2022): 167. iop.org

Ohm, Marc, Henrik Plate, Arnold Sykosch, and Michael Meier. “Backstabber’s knife collection: A review of open source software supply chain attacks.” In Detection of Intrusions and Malware, and Vulnerability Assessment: 17th International Conference, DIMVA 2020, Lisbon, Portugal, June 24–26, 2020, Proceedings 17, pp. 23-43. Springer International Publishing, 2020. nih.gov

Jiménez-Luna, José, Francesca Grisoni, Nils Weskamp, and Gisbert Schneider. “Artificial intelligence in drug discovery: recent advances and future perspectives.” Expert opinion on drug discovery 16, no. 9 (2021): 949-959. tandfonline.com

Grisoni, Francesca, Berend JH Huisman, Alexander L. Button, Michael Moret, Kenneth Atz, Daniel Merk, and Gisbert Schneider. “Combining generative artificial intelligence and on-chip synthesis for de novo drug design.” Science Advances 7, no. 24 (2021): eabg3338. science.org

Pasrija, Purvashi, Prakash Jha, Pruthvi Upadhyaya, Mohd Khan, and Madhu Chopra. “Machine learning and artificial intelligence: a paradigm shift in big data-driven drug design and discovery.” Current Topics in Medicinal Chemistry 22, no. 20 (2022): 1692-1727. [HTML]

Pianykh, Oleg S., Georg Langs, Marc Dewey, Dieter R. Enzmann, Christian J. Herold, Stefan O. Schoenberg, and James A. Brink. “Continuous learning AI in radiology: implementation principles and early applications.” Radiology 297, no. 1 (2020): 6-14. rsna.org

Zhou, Quan, Dezong Zhao, Bin Shuai, Yanfei Li, Huw Williams, and Hongming Xu. “Knowledge implementation and transfer with an adaptive learning network for real-time power management of the plug-in hybrid vehicle.” IEEE Transactions on Neural Networks and Learning Systems 32, no. 12 (2021): 5298-5308. ieee.org

Zheng, Nanning, Shaoyi Du, Jianji Wang, He Zhang, Wenting Cui, Zijian Kang, Tao Yang et al. “Predicting COVID-19 in China using hybrid AI model.” IEEE transactions on cybernetics 50, no. 7 (2020): 2891-2904. archive.org

Neunert, Michael, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, and Martin Riedmiller. “Continuous-discrete reinforcement learning for hybrid control in robotics.” In Conference on Robot Learning, pp. 735-751. PMLR, 2020. mlr.press

Wang, J., Li, Y., Gao, R. X., and Zhang, F. “Hybrid physics-based and data-driven models for smart manufacturing: Modelling, simulation, and explainability.” Journal of Manufacturing Systems (2022). researchgate.net

Jin, W., Dong, S., Yu, C., and Luo, Q. “A data-driven hybrid ensemble AI model for COVID-19 infection forecast using multiple neural networks and reinforced learning.” Computers in Biology and Medicine (2022). nih.gov

Erge, O. and van Oort, E. “Combining physics-based and data-driven modeling in well construction: Hybrid fluid dynamics modeling.” Journal of Natural Gas Science and Engineering (2022). [HTML]

Chakraborty, Arijit, Sven Serneels, Heiko Claussen, and Venkat Venkatasubramanian. “Hybrid ai models in chemical engineering–a purpose-driven perspective.” Computer Aided Chemical Engineering 51 (2022): 1507-1512. [HTML]

Tiwari, Prafulla C., Rishi Pal, Manju J. Chaudhary, and Rajendra Nath. “Artificial intelligence revolutionizing drug development: Exploring opportunities and challenges.” Drug Development Research 84, no. 8 (2023): 1652-1663. researchgate.net

Chen, W., Liu, X., Zhang, S., and Chen, S. “Artificial intelligence for drug discovery: Resources, methods, and applications.” Molecular Therapy-Nucleic Acids (2023). cell.com

Mak, Kit-Kay, Yi-Hang Wong, and Mallikarjuna Rao Pichika. “Artificial intelligence in drug discovery and development.” Drug Discovery and Evaluation: Safety and Pharmacokinetic Assays (2023): 1-38. nih.gov

Sarkar, Chayna, Biswadeep Das, Vikram Singh Rawat, Julie Birdie Wahlang, Arvind Nongpiur, Iadarilang Tiewsoh, Nari M. Lyngdoh, Debasmita Das, Manjunath Bidarolli, and Hannah Theresa Sony. “Artificial intelligence and machine learning technology driven modern drug discovery and development.” International Journal of Molecular Sciences 24, no. 3 (2023): 2026. mdpi.com

Allal-Chérif, Oihab, Alba Yela Aránega, and Rafael Castaño Sánchez. “Intelligent recruitment: How to identify, select, and retain talents from around the world using artificial intelligence.” Technological Forecasting and Social Change 169 (2021): 120822. core.ac.uk

Melo, Marcelo CR, Jacqueline RMA Maasch, and Cesar de la Fuente-Nunez. “Accelerating antibiotic discovery through artificial intelligence.” Communications biology 4, no. 1 (2021): 1050. nature.com

Vatansever, Sezen, Avner Schlessinger, Daniel Wacker, H. Ümit Kaniskan, Jian Jin, Ming‐Ming Zhou, and Bin Zhang. “Artificial intelligence and machine learning‐aided drug discovery in central nervous system diseases: State‐of‐the‐arts and future directions.” Medicinal research reviews 41, no. 3 (2021): 1427-1473. wiley.com

Pachouly, Jalaj, Swati Ahirrao, Ketan Kotecha, Ganeshsree Selvachandran, and Ajith Abraham. “A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools.” Engineering Applications of Artificial Intelligence 111 (2022): 104773. softcomputing.net

Agbehadji, Israel Edem, Bankole Osita Awuzie, Alfred Beati Ngowi, and Richard C. Millham. “Review of big data analytics, artificial intelligence and nature-inspired computing models towards accurate detection of COVID-19 pandemic cases and contact tracing.” International journal of environmental research and public health 17, no. 15 (2020): 5330. mdpi.com

Shams, Seyedeh Reyhaneh, Ali Jahani, Saba Kalantary, Mazaher Moeinaddini, and Nematollah Khorasani. “Artificial intelligence accuracy assessment in NO2 concentration forecasting of metropolises air.” Scientific Reports 11, no. 1 (2021): 1805. nature.com

Albahri, Ahmed Shihab, Ali M. Duhaim, Mohammed A. Fadhel, Alhamzah Alnoor, Noor S. Baqer, Laith Alzubaidi, Osamah Shihab Albahri et al. “A systematic review of trustworthy and explainable artificial intelligence in healthcare: Assessment of quality, bias risk, and data fusion.” Information Fusion 96 (2023): 156-191. google.com

Aldoseri, A., Al-Khalifa, K. N., and Hamouda, A. M. “Re-thinking data strategy and integration for artificial intelligence: concepts, opportunities, and challenges.” Applied Sciences (2023). mdpi.com

Saravi, Babak, Frank Hassel, Sara Ülkümen, Alisia Zink, Veronika Shavlokhova, Sebastien Couillard-Despres, Martin Boeker, Peter Obid, and Gernot Michael Lang. “Artificial intelligence-driven prediction modeling and decision making in spine surgery using hybrid machine learning models.” Journal of Personalized Medicine 12, no. 4 (2022): 509. mdpi.com

Sollini, Martina, Francesco Bartoli, Andrea Marciano, Roberta Zanca, Riemer HJA Slart, and Paola A. Erba. “Artificial intelligence and hybrid imaging: the best match for personalized medicine in oncology.” European journal of hybrid imaging 4 (2020): 1-22. springer.com

Krishnan, M. “Against interpretability: a critical examination of the interpretability problem in machine learning.” Philosophy & Technology (2020). springer.com

Dunnington, Dewey W., Benjamin F. Trueman, William J. Raseman, Lindsay E. Anderson, and Graham A. Gagnon. “Comparing the Predictive performance, interpretability, and accessibility of machine learning and physically based models for water treatment.” ACS ES&T Engineering 1, no. 3 (2020): 348-356. [HTML]

Rudin, Cynthia, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. “Interpretable machine learning: Fundamental principles and 10 grand challenges.” Statistic Surveys 16 (2022): 1-85. projecteuclid.org

Marcinkevičs, R. and Vogt, J. E. “Interpretability and explainability: A machine learning zoo mini-tour.” arXiv preprint arXiv:2012.01805 (2020). [PDF]

Trump, Benjamin, Christopher Cummings, Kasia Klasa, Stephanie Galaitsi, and Igor Linkov. “Governing biotechnology to provide safety and security and address ethical, legal, and social implications.” Frontiers in genetics 13 (2023): 1052371. frontiersin.org

Niazi, S. K. and Mariam, Z. “Computer-aided drug design and drug discovery: a prospective analysis.” Pharmaceuticals (2023). mdpi.com

Thapa, C. and Camtepe, S. “Precision health data: Requirements, challenges and existing techniques for data security and privacy.” Computers in biology and medicine (2021). [PDF]

Boldt, J. and Orrù, E. “Towards a unified list of ethical principles for emerging technologies. An analysis of four European reports on molecular biotechnology and artificial ….” Sustainable Futures (2022). sciencedirect.com