Daniel Sabanés Bové is a senior principal data scientist at Roche. In our conversation, we discuss the need for better software in biotech and his career in data science.
Daniel studied statistics at Ludwig Maximilian University of Munich in Germany, earning his PhD in 2013 from the University of Zurich in Switzerland. His doctoral research focused on Bayesian model selection. After completing his PhD, Daniel began his at Roche as a biostatistician. There, he applied statistical principles to clinical trials and research in areas like oncology, immunology, and neuroscience.
In 2018, Daniel joined Google as a data scientist. While there, he worked on ranking systems, developing models to optimize search results. Then in 2020, Daniel returned back to Roche to lead a specialized team focused on statistical engineering.
Throughout his career, Daniel has co-authored multiple R packages published on CRAN and Bioconductor. He also co-wrote the book 'Likelihood and Bayesian Inference: With Applications in Biology and Medicine'. Currently, he serves as co-chair of an ASA working group called openstatsware that promotes software engineering in biostatistics.
According to Daniel, software engineering principles are often neglected in biostatistics. Most biostatisticians know a programming language like R, but lack formal training in writing reusable, reliable code. Daniel argues this is problematic for several reasons.
First, without code reviews, we risk making erroneous analytical decisions based on buggy statistical software. Code passed from statistician to statistician without documentation makes reproducibility impossible. In regulated fields like pharmaceuticals, validation protocols are needed to verify analyses, but require engineered code. Even modifying poorly written software can introduce unexpected behaviors without sufficient testing.
To address these problems, Daniel calls on the biostatistics community to prioritize software engineering skills. Change starts with awareness - we must recognize the value of good engineering. Next, software engineering concepts need integration across statistics curriculums - in both academia and industry.
Dedicated software engineering teams play a key role. They can catalyze adoption of engineering best practices within research teams and provide training. Providing attractive career growth for software-oriented roles aids retention of technical talent.
Cross-organizational collaboration also helps. By sharing insights and contributing to open source tools, we make better use of resources. Following modern engineering practices facilitates building reusable components. Daniel points to projects like Mediana (for clinical trial simulations) as examples of successful collaborative open source biostatistics software.
What could improved software engineering mean for biostatistical analyses? Daniel foresees greater efficiency and integrity. With robust code review protocols, analyses have higher accuracy. Well-documented software enhances reproducibility. A strong testing culture provides safety nets against inadvertent bugs. Modular, reusable code makes implementing new analyses faster. Validation frameworks give regulators necessary confidence in results.
Daniel also notes how high-quality software enables faster innovation. By encapsulating complex methods in packages, researchers can build on previous work rather than recoding from scratch. Reliable software tools empower statisticians to operate at higher levels of abstraction.
Ultimately, Daniel argues that pursing excellence in software engineering serves both ethical and practical ends. Ethically, biostatisticians have an obligation to provide sound statistical guidance. Pursuing engineering excellence helps fulfill this duty. Practically, improved software engineering makes biostatisticians more effective in their work - accelerating discoveries and powering data-driven decisions.