
Furthermore, the dominant storage mediums are traditional, with 59% of the storage capacity expected to come from hard disk drives and 26% from flash technologies. Predictions from Seagate estimate that this quantity, called the Global Datasphere, will grow from 33 ZB (zettabytes) in 2018 to 175 ZB by 2025 2. It is now generally accepted that the amount of digital data is doubling at least every two years.

The focus has gradually changed from solving difficult computational problems to exploiting desirable properties of DNA, leading to the development of DNA digital data storage. This pioneering work opened the gate to many interesting questions: can molecular machines be used to solve intractable problems? Is DNA suitable for long-term storage of digital information? More recently, are such methods scalable in the era of Big Data? The use of DNA to facilitate computation is an active area of research, dating back to 1994 when Leonard Adleman solved a seven-node instance of the Hamiltonian path problem, using the toolbox of (DNA) molecular biology 1.

We then discuss the integration of our methods in modern, scalable workflows. Depending on hardware, we achieve a reduction in inference time ranging from one to over two orders of magnitude compared to the state-of-the-art, while retaining high fidelity. For this purpose, we introduce an in silico-generated hybridisation dataset of over 2.5 million data points, enabling the use of deep learning. We present the first comprehensive study of machine learning methods applied to the task of predicting DNA hybridisation.

Current tools are, however, limited in terms of throughput and applicability to large-scale problems. Being able to control and predict the process of DNA hybridisation is crucial for the ambitious future of Hybrid Molecular-Electronic Computing. Information is encoded as DNA strands, which will naturally bind in solution, thus enabling search and pattern-matching capabilities. Deoxyribonucleic acid (DNA) has shown great promise in enabling computational applications, most notably in the fields of DNA digital data storage and DNA computing.
