Profile
Keiran Raine
My CV
-
Education:
Whitemere Primary School (1985-1991, Gateshead)
Usworth Comprehensive (1991-1996, Washington – closed 2007)
Park View 6th form (1996-1998, Chester-le-Street)
University of Durham (1998-2001)
University of Manchester (2001-2002) -
Qualifications:
GCSEs – 6As (inc. science, English and maths), 2Bs, 1C
A-levels – 2Bs, C, D
BSc. Biomedical Sciences 2:1 hons
Msc. BioInformatics -
Work History:
Age – Job
16-18 – McDonalds (holiday/weekend)
18-21 – Summer schools, factory, bar and restaurant work.
22-27 – Sanger Institute (Bioinformatician -> Senior Bioinformatician)
27-29 – Roche Pharmaceuticals (Clinical Programmer)
29-? – Sanger Institute (Senior Bioinformatician -> Principal Software Developer) -
Current Job:
Principal Software Developer
-
About Me:
Husband, father, dog owner and programmer. Mildly LEGO obsessed!
-
Read more
I am originally from Gateshead in the north-east of England. I attended secondary school in Washington and 6th form in Chester-le-Street. I lived in Canada for a year when I was 7.
While in school I was heavily involved in music, playing trombone, piano and singing in choir. I’ve performed in school musicals on stage and as part of the orchestra.
I studied Biomedical Sciences at university followed by a post-graduate degree in BioInformatics (using computers to solve genetic and biological questions). It was at university I met my wife.
In the summers I either worked at McDonalds, at Durham University summer school (inner-city school out-reach), Maths summer school (helping underachievers prepare for secondary school) or even the production line at the Nissan car plant.
Below is a summary of my work since I completed my studies.
I originally joined the Cancer Genome Project (CGP) at the Sanger Institute (now CASM) in 2002 on completion of my M.Sc in Bioinformatics. At that time I was responsible for developing an integrated Laboratory Information Management System (LIMS) to support PCR heteroduplex analysis. This involved database design, Perl-CGI web-development and integration of TECAN lab robots. This system continues to be used largely unchanged for sample tracking in the CASM laboratory. During this time I also contributed to very early versions of the COSMIC database.
During 2004 CGP moved heavily into capillary sequencing (also known as Sanger sequencing). Once the LIMS was adapted I moved on to developing a high throughput pipeline to run the autoCSA analysis software. The control software was written using JAVA, and a public desktop application, a.k.a. StandaloneCSA, was created for the external use and release early 2007. This was a large project involving many developers working together on various aspects including database design, the analysis pipeline itself and a web-application used for visual inspection of the called variants by scientific staff.
In 2007 I left the CGP and joined Roche Pharmaceuticals as a Clinical Programmer. Here I gained experience of good clinical practice while building data capture and cleaning tools using Oracle Clinical.
After 2 years at Roche I returned to the CGP in 2009. Although this may seem an unusual career choice the advent of next generation sequencing had significantly changed the challenges and work being undertaken. One of the primary differences in the data was the sheer volume being generated. The initial project was to create a pipeline to run the mapping of Illumina paired-end sequencing, supporting multiple species and builds.
Over 6 months I extended an existing database, developed a new web-application and file tracking system to control the scheduling and flow of data into and out of a Platform LSF compute farm. After the initial 6 months the success of the project resulted in it being extended to support the downstream analysis tools with more of the team moving in to aid in both support and development. During this phase of the project the web-interfaces were extended significantly to allow the scientific staff to manage their data and analysis.
The system proved to be a work-horse for the group, where the core structure has been largely unchanged, for 6 years. The core jobs submission mechanism has been reworked by the group in the last few years, but the core database and much of the web-application has remained largely unchanged.
As part of the analysis pipeline work I brought GBrowse into the the group as a centrally managed genome browser. GBrowse relies on server side compute and a traditional database backend to provide the images. In 2012 JBrowse matured to a state where BAM data could be directly handled in the browser. Due to this, and no further development on GBrowse, we have been actively working with the GMOD community to ensure features of GBrowse that our scientific staff consider important are replicated.
During 2016 I developed a new plugin for JBrowse, proportionalMultiBw (prototyped in my own time) to aid in the visualisation of allele fraction in regions of very high sequencing depth. I’ve additionally produced a general purpose tool for automating generation of screen shots, see cgpJBrowseToolkit.
In 2013 the CGP became involved in the ICGC PanCancer Analysis of Whole Genomes project. As part of this a significant portion of the IT team spent 6 months preparing all of the core algorithms we use in house for public use. In addition to this work I was a member of the technical working group advising on tools, mapping strategy and creating tools to aid the submission of the raw sequencing data from the sequencing sites around the world. Working closely with the IT group at the Ontario Institute for Cancer Research I integrated the ‘Sanger’ pipeline into the framework to be used in multiple data centers around the world. This pipeline ran successfully in 13 locations on a variety of base infrastructures including AWS, Azure, OpenStack and traditional HPC.
In 2016 I developed cgpbox as an initial method of providing our wholegenome analysis platform in a convenient to use docker image. This was well received and we continue to build on this with the dockstore framework. cgpbox has since been deprecated in favour of more easily supportable docker images.
Above I have detailed projects that I have been heavily involved in. Although I may have worked exclusively on some of these it should be noted that I value the advice and expertise of those around me. The ‘CASM-IT’ team is an excellent group of people from varied backgrounds who draw on each others strengths. The team is also supported by the core informatics and infrastructure groups at the Sanger Institute.
-
Read more
I work in a multi disciplinary IT team where a background in biology is a strong advantage. Our work ranges from developing novel software to identify genetic changes to tools for collating and cleaning large datasets.
Much of the data we work with is large, an individuals data set will often be 200GB of data (2 x 4K movies).
A main focus of my work recently has been working to ensure our tools generate reproducible results and are portable. For this we use a range of software development concepts which I am happy to discuss.
-
My Typical Day:
I get up before 7am, drop my son at primary school and then travel to work. Once there I set up my computer and early in the day the whole team stand in a circle to discuss how they are getting on with their work.
The remainder of the day is taken up by meetings and working at the computer… remembering to take breaks and have a walk in the fresh-air when I can.
-
Read more
Morning
8:50
Arrive at work and get that all important morning coffee on the way to my office which I share with 4 other people.
9:00
Check emails and action anything out standing from the previous day. Collect anything urgent or unusual to be discussed in our daily team “stand-up”. Continue work on my current project (coding, design, documentation etc.)
9:45-10:00
Daily standup meeting – All members of the team meet in a room and discuss progression of allocated work. What was completed in the last 24h, what needs review or testing and anything new that will be started today. Any urgent issues are discussed too.
Rest of the day
Apart from gathering people for coffee (10:30, 15:00) and lunch, the rest of the day is a mixture of:
- Kickoff meetings with scientific staff to determine requirements for new projects and how IT can help.
- Update meetings for existing projects – feedback to/from it/scientists.
- Design meetings – discussions on how to approach a project, breaking it down into small more manageable items that can potentially be worked on in parallel by more than one person
- Prioritisation and planning meetings – what should be next, who can work on it, do we have enough developers to work on something?
- Actual development work: coding, websites, database design, optimisation of systems.
Every day isn’t full of meetings, the bulk of our meetings are only held every other week.
-
What I'd do with the prize money:
Monthly webcast sessions with upper primary explaining scientific advances that they may have heard about in the news and why they are important. Specifically around global warming, renewable energy and health.
-
My Interview
-
How would you describe yourself in 3 words?
Driven, resourceful, dependable.
What did you want to be after you left school?
Vet
Were you ever in trouble at school?
Occasionally, nothing serious
Who is your favourite singer or band?
Muse
What's your favourite food?
Pizza
If you had 3 wishes for yourself what would they be? - be honest!
Unlimited space for displaying LEGO, unlimited LEGO, give a wish to my wife
Tell us a joke.
Q: What did the biologist wearing on their first date? A: Designer genes
-