Cutting edge ideas appeal of the Cambridge Bioinformatics Hackathon

Cutting edge ideas appeal of the Cambridge Bioinformatics Hackathon

Cutting edge ideas appeal of the Cambridge Bioinformatics Hackathon

“I have two projects and I can’t decide which to bring”, says Kevin Dialdestoro, a software engineer at Genestack, the Cambridge-based biodata management company. He is excited about participating in the first Cambridge Bioinformatics Hackathon, organised by the Babraham Institute Bioinformatics Department. We asked him about what he is bringing to the challenge and why Genestack is sponsoring the hackathon.

Q. Have you participated in a Hackathon or something similar before? Was it very different to your normal way of working?
The last Hackathon I joined was actually also held in Cambridge. It was organised by DNAdigest in 2015 and we were brainstorming ideas for a data recommendation service, a Netflix equivalent for genomics data where two datasets are relevant if they are co-cited in published papers or share similar metadata.
I found it a refreshing learning experience…. a hackathon pulls you away from your daily comfort zone and puts you in with people who are there to both challenge and push your mad ideas into reality.
 
Q. Is there a particular challenge that you would like to see this hackathon tackle?
Applying deep learning in Bioinformatics is a hot topic nowadays, so I am interested in seeing new demonstrations of its effectiveness, particularly for single cell RNA-Seq.
We should be prepared to deal with the super-exponential growth of single-cell RNA-Seq data. For example, we should be thinking about how to make the best use of the millions of samples coming out of the Human Cell Atlas Project. Deep learning is a promising technique to fully exploit such vast data.
 
Q. Do you think that the role for a hackathon is to develop skills, to create a better understanding of industry problems, or are there other benefits?
A hackathon encourages us to investigate cutting-edge ideas. These ideas naturally reflect the state of the industry and will point us towards imminent problems to focus our efforts on. Genestack is keen to encourage innovation and by supporting the Babraham Institute with this hackathon we hope that it will be the first of many.
 
Q. What do you plan to work on in this hackathon?

I have two projects and I can’t decide on which one to drop, so I’ll see if I can do a bit of both!
The first one is a continuation of something we started this summer: predicting the cell types of new single-cell RNA-Seq samples based on past collection of annotated gold-standard single-cell RNA-Seq experiments. We now want to put together an R package for this, to make it easy to prepare a collection of annotated samples and use it to predict new samples using a basic PCA-based model. The package will be open-sourced and we hope that future uptakes of this project by the community will incorporate more advanced methods such as deep learning.
The second project I have is about automatically finding an optimal P-value threshold.In Bioinformatics, we do multiple hypothesis testing a lot, e.g. in GWAS and differential expression analysis, and this introduces many false positives. Currently, we perform a multiple-testing correction and choose an arbitrary threshold to limit them. But, this manual thresholding is sub-optimal and significant results may unnecessarily include too many false positives or exclude too many true positives. I will present an idea for an automatic and data-driven alternative and try to implement and demonstrate its effectiveness.
 
I am looking forward to seeing what the others come up with!

To find out more about Kevin’s work at Genestack: www.genestack.com