Michela Taufer leads team advancing data analytics for molecular dynamics

Michela Taufer has been fascinated by the behavior of molecules since she was a graduate student. For more than two decades, she has used computers to simulate this behavior, a practice known as molecular dynamics.

“I was born as a scientist with molecular dynamics in mind,” said Taufer, a professor of computer and information sciences at the University of Delaware.

As computers become increasingly powerful, molecular dynamics researchers need increasingly streamlined methods to collect and analyze their data. Taufer has assembled a team of researchers to work on these problems. They recently received a grant through the National Science Foundation’s Big Data Science & Engineering program for just shy of $2 million, with nearly $1 million going to Taufer and UD alumna Trilce Estrada, who is now an assistant professor of computer science at the University of New Mexico.

Molecular dynamics

When you have an imaging test, such as an MRI, your doctor can see what’s happening in your body at that particular moment. Imagine that you could visualize that same part of your body as time passes and at a smaller scale—molecule by molecule. Molecular dynamics simulations can be used to do this.  Powerful supercomputers are being used to simulate how proteins fold or miss-fold, causing the expression of diseases, and how peptides move through membranes of our body cells, for example.

“Molecular dynamics is a fantastic tool to see how molecules evolve in our bodies,” said Taufer.

Molecular dynamics simulations on powerful supercomputers help scientists learn new information about how diseases develop, which can then enhance research on new treatments.

However, the massive volumes of data generated by supercomputers are difficult to store and analyze. After the molecular dynamics simulation data is generated and stored on large disk storages, there is a bottleneck before it is retrieved and analyzed.

“We need to put together generation and analysis at runtime,” said Taufer.

Under this new grant, Taufer and her collaborators are integrating information so that as they run simulations, they will be able to simultaneously understand the results.

Taufer is leading the project because she has experience and collaborations reaching into all three fields necessary for this project: molecular dynamics, machine learning and workflow. She brought the team together, “gluing together the expertise” of all the collaborators.

The molecular data is being generated by Harel Weinstein, director of the Weill Cornell Medicine Institute for Computational Biomedicine at Weill Cornell Medicine in New York City, and Michel Cuendet, an instructor of computational biomedicine there. These researchers use molecular dynamics to probe how biological molecules work in various organs, including the brain and the heart.

Trilce Estrada is contributing expertise in machine learning, which will be used to extract information and build a repository of molecular properties.

The project also includes workflow experts from the University of Southern California Information Sciences Institute: research director and professor Ewa Deelman and research assistant professor Rafael Ferreira da Silva. Simulation and analytics will be integrated into workflows so that changes in molecular properties can be detected quickly.

Taufer and her collaborators hope that the team’s findings will impact not just the molecular dynamics community, but a much broader range of scientists. The methods, tools, and machine learning algorithms the team develops may apply to other fields in which researchers generate, retrieve, and analyze massive amounts of data from supercomputers. The team also aims to create training materials, such as online courses, to teach other data scientists their methods.