PostDoc IA pour la Chimie à l'INSA Rouen Normandie
22 Juillet 2022
Catégorie : Post-doctorant
1-year Post-doc position in Artificial Intelligence /Optimization/Chemistry
at INSA Rouen Normandy, France:
Optimization of the asymmetric synthesis of functionalized cyclopropanes using artificial intelligence
Samia Ainouz (Full Professor) and Gilles Gasso (Full Professor), members of the Intelligent Transportation Systems team (STI) and Apprentissage team (App) at LITIS lab (Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes), INSA Rouen Normandy, France.
Salary, starting date, travel and support
This 1-year (12-month) Post-doc position is funded by Region Normandy with RIN project 2022/2023
Salary: 2700€ /month (gross salary)
Expected starting date: No later than November1st. 2022.
The successful candidate will receive support for occasional international and national travels (participation to conferences).
Candidates applications should include:
- ·a full resume, including a comprehensive list of publications, and;
- ·a cover letter, and;
- ·contact details of up to 2 references;
And be sent to:
By no later than October 15th 2022.
This research project is based on the skills of two research laboratories (COBRA and LITIS), one of which is specialized in organic synthesis research and the other in the field of artificial intelligence. The ambition of the CYCLIA project is to be able to optimize the asymmetric synthesis cyclopropane compounds, compounds that are present in the structure of many drugs, thanks to the contribution of artificial intelligence. From an experimental database provided by the COBRA laboratory and data extracted from the chemical literature, the LITIS team will develop a digital solution thanks to artificial intelligence, which will make it possible to predict experimental results obtained.
Artificial Intelligence (AI) has recently revolutionized several research and application fields ongoing from medicine to intelligent mobility. It contributes to impressive achievements in those fields by settling fast and efficient algorithms that are learned using available data and prior knowledge of the domains. Among the methods of AI, one branch of interest is the machine learning and especially Deep neural networks to learn on graphs with regards to their shape and connections. This is the case of networks such as PointNet, or DeepComNet architectures. Leveraging from graph based features and neighborhood connectivity properties, these architectures are able to solve issues like road traffic management, molecule shape classification or very recently accurate protein structure prediction in an end-to-end manner. Therefore, the CYCLIA project is devoted to push the boundaries of knowledge for the asymmetric synthesis of elaborated SO2R CF2H CF3 cyclopropanes by combining the experimental approaches in organic synthesis and Machine Learning. CYCLIA benefits from a strong partnership between organic chemists (COBRA Laboratory) and experts in Machine Learning (LITIS Laboratory). In that aim, we will study challenging reactions with the benefits of machine learning that will bring new transformations and new insights for the community. Finally, we will be able to propose news approaches for organic chemists.
In a nutshell, the pursued goal will be to design efficient ML methods to predict the proportions of the cyclopropane stereoisomers based on input reactant molecules and the experimental conditions (solvents concentration, temperature, structure of the chiral catalyst...). The main challenges are the design of the numerical representation of the involved molecules, the learning of the adequate ML model, and its practical assessment.
Initial experimental tests of the first round of machine learning algorithms using blind data for which experimental known results would have not been included in the initial data base. The COBRA team will validate the computer scientist's predictions and at the same time will provide new data and experimental proofs through experimentation, which will allow us to set up our own model adapted to our problem. Possible optimization of the first ML tool to deliver more precise results will be perform, while developments of a representative set of experiments and optimization using IA will be carried out.
Main Objective: Design of a predictive model for cyclopropanation
From a Machine Learning (ML) standpoint, the pursued goal is to learn a statistical model, taking as inputs the reactants (diazo compound, alkene), the chiral metallic catalyst, the experimental conditions, and being able to correctly predict the relative distribution of the output stereoisomers. Figure 4 illustrates the general principle of the framework. To learn the prediction model we will rely on experimental cyclopropanation reaction results collected from COBRA, from other worldwide research groups as for instance the group of Pr. A. B. Charette (Montreal University, Canada), Pr. M. P. Doyle (University Texas San Antonio, USA), or Pr. H. M. L Davies (Emory University, USA), or from new experiments conducted during the project. Formally our experimental database will include Npairs of (input, output), Nbeing the number of reactions. Each input is heterogeneousand consists of:
- ·reactants described as graphs with an a priori information on the 3D-volume covered by the molecules,
- ·chiral catalysts (also represented as graphs),
- ·reaction conditions (scalar values) namely the temperature, reactants concentration, solvent rate of addition, electronic effects of the reactants substituents (Hammet constant), the Van der Walls volume of the substituents...
The related output representing the ground-truth to be predicted by the model is composed of the relative proportions of the four cyclopropane stereoisomers issued from the cyclopropanation reaction along with the global reaction yield.
Learning a statistical model mapping the heterogeneous inputs onto numerical outputs is coined Supervised Learning (as the model learning is guided by the knowledge of the ground-truth outputs) and is common in applications binding ML and chemistry. Recent research works investigate supervised ML for modeling a catalyst system10,13 or reaction yields. To apply the supervised ML framework, one needs to represent the reactions inputs with numerical (quantitative) information (hereby denoted features) amenable to statistical models. The main challenges reside in describing the molecules involved in the reactions (diazo compound, alkene, chiral catalyst in our case) as appropriate features correlated with the target outputs (the relative proportions of stereoisomers). For instance,Gallarati13 and Doyle15 used prefixed density functional theory (DFT)-derived features to represent the molecules. Other predefined representations exist and given a reaction-modeling task, one needs to select the suitable representations to predict the desired output. Alternative and more flexible approaches based on Artificial Intelligence (deep learning) applied to molecular graphs have recently emerged and allow to learn automatically the relevant representations. However, those AI-based methods require a large amount of data.
In the current proposal, and inspired from the aforementioned works, we will investigate dedicated statistical models for predicting cyclopropanation process outcomes. The main original features of the project are:
- ·collection, cleaning and release of a large dataset of labeled cyclopropanation reactions,
- ·design a procedure able to automatically select the best subset of molecular representations in contrast to the Gallarati13 or Doyle15 approaches,
- ·selection of the adequate statistical model for cyclopropanation yield prediction through either a thorough evaluation of several machine learning models or through AutoML (Automatic Machine Learning),
- ·assess the confidence score of the model prediction especially while predicting isolated high yield.
The working planis as follows:
Action 1: preprocess the cyclopropanation reactions dataset
Action 2: implement and evaluate state of the art ML methods
Action 3: design our own ML models dedicated to the cyclopropanes synthesis (which molecules representation? which ML approach? which a priori information on the used reactants and catalysts to enforce in the ML model?)
Action 4: evaluate the models adequately (how to efficiently and reliably evaluate the models with regard to the operating condition of the cyclopropanation reactions?)
Action 5: release the labeled dataset and the Python code
Experiments, IA, transfer learning, Cyclopropane, Supervised learning
Qualification and skills
The successful candidate would:
- ·have completed a PhD. in Computer Science with a specialization/interest in AI- and/or machine learning-based techniques;
- ·have demonstrated research experience and relevant publication records;
- ·have a desire to apply IA for several and real life domains (medical, chemistry, ...)
- ·have strong English and/or French writing and oral communication skills.
Knowledge and/or experience with the following fields would be greatly appreciated:
- ·Python implementation with adapted libraries to our data
- ·AI techniques such as deep learning, , transfer learning, supervised learning, ML
About LITIS lab
The research conducted at LITIS lab covers 3 major fields: information access, bio-medical information processing and ambient intelligence with applications in health, automotive and smart territories. The expertise of LITIS members is recognized internationally and includes: machine learning, multi-agent systems, intelligent vehicles, medical imaging, bio-informatics.
The candidate will be allowed to access to different experimental platforms to carry out the work:
- Experimental chemistry platform with an important amount of cyclopropane data
- An intensive computing center (CRIANN: Centre Régional Informatique et d'Applications Numériques de Normandie).
 Qi, Charles R., et al. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Geyer, Fabien. Performance Evaluation 130 (2019): 1-16.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Hassabis, D. (2021). Nature, 2021, 596 (7873), 583-589.
Żurański, A. M.; Martinez Alvarado, J. I.; Shields, B. J.; Doyle, A. G. (2021). Acc. Chem. Res. 2021, 54, 1856.
 Bishop, Christopher M. "Pattern recognition." Machine learning 128.9 (2006).
 (a) Anderson, Brandon, Truong Son Hy, and Risi Kondor. Advances in Neural Information Processing Systems 32 (2019): 14537-14546. (b) Jiang, Dejun, et al. Journal of cheminformatics 13.1 (2021): 1-23.
7 Qi, Charles R., et al. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
8Geyer, Fabien. Performance Evaluation 130 (2019): 1-16.
9Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Hassabis, D. (2021). Nature, 2021, 596 (7873), 583-589.
10Lexa, K. W.; Belyk, K. M.; Henle, J. ; Xiang, B. ; Sheridan, R. P.; Denmark, S. E.; Ruck, R. T.; Sherer, E. C. Org. Process. Res. Dev. 2021 doi.org/10.1021/acs.oprd.1c00155.
11Werth, J.; Sigman, M. S. J. Am. Chem. Soc. 2020, 142, 16382.
12Metsänen, T. T.; Lexa, K. W. ; Santiago, C. B.; Chung, C. K.; Xu, Y.; Liu, Z.; Humphrey, G. R.; Ruck, R. T.; Sherer, E. C.; Sigman, M. S. Chem. Sci. 2018, 9, 6922.
13Gallatari, S.; Fabregat, R.; Laplaza, R.; Bhattacharjee, S.; Wodrich, M. D.; Corminboeuf, C.Chem. Sci. 2021, 12, 6879.
14Hueffel, J. A.; Sperger, T.; Funes-Ardoiz, I.; Ward, J. S.; Rissanen, K.; Schoenebeck, F. Science 2021, 374, 1134.