PhD position at IMT Atlantique : channel coding for data storage on DNA molecules
9 Juin 2022
Catégorie : Doctorant
PhD position starting in fall 2022, at IMT Atlantique (Brest), a top-level ingineering school in France. The topic of the PhD is on channel coding for the emerging application of data storage on DNA molecules (see https://www.youtube.com/watch?v=r8qWc9X4f6k for a brief introduction).
The PhD will be in the framework of a French PEPR project called MolecularXiv which aims at developing innovative and competitive solutions for massive DNA data storage.
Context of the PhD
Data storage on DNA molecules is perceived as an emerging and promising technology which should allow for highly increased density and durability compared to conventional storage techniques (HDD, SDD, etc.). The disruptive idea of this technology, which appeared for a long time to be a distant hope, is to build synthetic DNA sequences encoding some relevant information, see https://www.youtube.com/watch?v=r8qWc9X4f6k for a brief introduction.
Recent major improvements in both DNA synthesis and sequencing techniques made DNA data storage affordable, although these techniques still introduce a large amount of errors in the read data sequences. While conventional storage systems are mostly concerned with substitution errors, DNA storage also introduces deletions and insertions which standard Error-Correction (EC) solutions (LDPC, Turbo, Polar, etc.) cannot handle
DNA sequencers output a large number of copies of the same input sequence, with different error realisations. Recently developed EC solutions for DNA storage efficiently exploit those multiple reads so as to correct both substitution, insertion, and deletion errors. However, these solutions assume unrealistic independent and identically distributed (i.i.d.) error models, and may therefore show poor performance in practice.
Challenges of the PhD
In previous works, we developed a channel model for DNA storage, which accurately captures error dependency to the successive bases of the read sequences, as well as memory within errors introduced by the sequencer. This model could be used in a first design step before testing the developed EC solutions under costly in-vivo experiments.
Therefore, the three main challenges of the PhD will be as follows:
- Benchmark current existing EC solutions for DNA storage under our channel model, so as to evaluate their efficiency under more realistic error models.
- Develop efficient EC solutions for DNA storage, which exploit as much as possible the knowledge of our statistical channel model, while keeping a reasonable complexity.
- Experimentally validate the developed EC solutions in a full end-to-end DNA storage system, by relying on the expertise and means of the MolecularXiv project.
Context of the work
The PhD will be realized in the context of the PEPR (Projet Exploratoire de Recherche) MolecularXiv (see https://www.cnrs.fr/index.php/fr/cnrsinfo/stockage-de-donnees-du-data-center-la-capsule-adn). This French project will involve many researchers working in various area: biology, bioinformatics, signal processing, etc. Although the focus of the PhD will be on coding and information theory, the candidate should expect some interactions with researchers working on the other fields.
How to Apply
The candidate should have earned an MSc degree, or equivalent, in one of the following fields: telecommunications, information theory, applied mathematics, signal processing.
To apply, please contact Elsa Dupraz (firstname.lastname@example.org) and Emmanuel Boutillon (email@example.com), and attach the following:
- Full CV with a list of projects and courses related to the subject of the PhD
- Complete academic record (from bachelor to MSc)
- 1 or 2 reference contacts (former or current internship advisor, teacher, etc.