My research focuses on algorithms and data structures for “big data,” with applications to computational biology and more recently astronomy, among other areas. I am particularly interested in the “manifold hypothesis” and how interesting geometric and topological properties of data can enable more efficient algorithms for search and analysis. I am also interested in computational topology, and approaches for making topological data analysis tractable on large data sets.

Cover of Cell Systems, Issue 2

Cell Systems

Here are some selected publications

Recent advances in technology have led to exponential growth in data, sometimes outpacing available computing power. This explosion in data promises new discoveries, if only we can mine it. I am interested in developing algorithms to help data scientists from fields as diverse as molecular biology, astronomy, chemistry, the social sciences, global trade, and finance make discoveries based upon data. In the past, my research has focused on protein structure prediction, remote homology detection in proteins, and function prediction in protein-protein interaction networks. More recently, I have developed compressive algorithms for speeding up approximate search in biological systems, and I am interested in extending these ideas to domains outside of biology.

I also have a strong interest in functional programming, and I have enjoyed implementing some of my research software in Haskell. I am interested in how more powerful programming language features, such as algebraic type systems, higher-order functions, and ease of parallelization, can make the implementation of data-science algorithms faster, more correct, and more productive for the programmer.