Biyoinformatik Yaz Kampı '15: First Bioinformatics Hackathon in Turkey, English Summary 27-29th of July 2015

These 3 days consisted of our 48 hour bioinformatics hackathon, aptly named Biyo Hackathon, in which we were tasked with gradually challenging tasks relating to de novo sequencing in two teams - The Chickpeas and The Beans. The Biyo Hackathon is the first bioinformatics hackathon to take place in Turkey, and as the field of bioinformatics grows we hope it will help usher in more interest in bioinformatics. Our hackathon started at ~9 AM on Monday and ended at ~9 AM on Wednesday. During this time we had little chance for rest, as everyone wanted to beat the other team and so chose to stay awake most of the time. Due to the workload we were tasked with in these 3 days the following notes were taken in a bullet point style to save as much time as possible. The results of our coding can be found on our wiki at http://193.255.88.41/wiki/staj/BiyoHackaton2015.

The participants shall be posting their blog updates to voice their opinions on the Biyo Hackathon in the coming days, so keep your eyes peeled. In the meantime lets take a look at what we have done over the course of ~50 hours spanning 3 Gregorian days...

27/7/2015:

brought in the survival supplies from Mr. Ahmet’s car

got ready for the coding start with our two teams, the Beans and the Chickpeas

first task was to write a code to analyze a given nucleotide/protein sequence

s for sequence, p for protein to get seperate results

must put all of the tasks into separate functions so we can call any function we desire

we the Chickpeas lost due to Mr. Ahmet of the Beans wrecking us with his speed

translated the descriptive information from Turkish to English

eating break at 14:15

sorted for alphabetical sorting, Counter most_common for numerical sorting

try to use -1 for all errors

losing team makes the sandwiches for lunch

moving on to second task with a FASTQ sequence analysis using Phred scoring system

troubles with IPython in getting our code to run (such as not registering tabs correctly)

speed test between the groups to see who will have the fastest time to analyze a large FASTQ file (AR2_S3_L001_R1_001)

second task complete at 22:10, Chickpeas lost again by a 2 second difference in coding result printing speed

22:45 and our pizza is late so it should be with a free delivery

23:10, we cancelled the pizza that never arrived so now we are eating our prepared sandwiches

28/7/2015:

2:53, woke up after a nap and the code is going along nicely, we only have to get the histograms/boxplots/surprise graphs set up

5:03, third job is halfway through and we are taking a 4 hour break before we tackle the remainder of the job

7:23, people starting to slowly wake up early to get increased coding time

9:25, coding fully engaged and people are trying their hardest to get the code working

11:18, Chickpeas on a fantastic losing streak due to the last minute actions of the savior Mr. Ahmet becoming the MVP for the Beans team

11:57, eating breakfast/lunch is mostly done

task 4 consists of writing a big pseudocode for our sequence analysis which consists of assembling Illumina reads

short warm up practice with a paper cutout sequence where we tried to match the overlapping bases to obtain the original sequence

after warm up we moved on to code the beginning of exercise 4 in IPython, which consisted of many small functions

above: struggling with the 4th task

we took a break where we waited for our code to be modified by the IPython server

our code turned out to be too slow and we spent some time trying to fix the code by replacing re.search, removing some unused elements, and condensing some functions

while we waited for the code to run we watched three TED Talk called What's invisible? More than you think by John Lloyd (https://www.youtube.com/watch?v=8EUy_82IChY), How to start a movement by Derek Sievers (http://www.ted.com/talks/derek_sivers_how_to_start_a_movement?language=en#t-21384), and What if 3D printing was 100x faster? by Joseph DeSimone (http://www.ted.com/talks/joe_desimone_what_if_3d_printing_was_25x_faster?language=en)

at around 8 PM we started visualizing our data for the 4th task in Cytoscape, mapping out the relationships of the sequences to each other

21:18, we are trying to deal with a problem in the code where we are trying to replace repeat sequences and print out the box plot of the result

around 10:30 PM we finished our dinner and moved on to finish task #4

above: finishing up the 4th task

00:15, short break while IPython handles the computations

2:05, task 4 was settled in a stalemate due to it being too difficult to come to a conclusion even with our combined efforts

we were given our next and 5th task which consisted of taking our joined sequences from the 4th task and tagging their location in their genes and marking the location of the non-joined sequences in their genes

but before that we took a quick peek at D3 (Data Driven Documents) at http://d3js.org/

examples given were from http://ahmetrasit.com/secim/, http://ahmetrasit.com/pisa/, http://ahmetrasit.com/pisa/pisa2.html, and https://github.com/mbostock/d3/wiki/Gallery

29/7/2015:

8:39, the cabin is waking up and getting ready to work for some hours

started the day by plotting the joined sequences into a histogram and boxplot

copying our joined sequences into blast (http://blast.ncbi.nlm.nih.gov/Blast.cgi) for task 5

above: matching one of our joined sequences with a gene

code summary by Mr. Ahmet where talked about the functions in the code and how we improved the speed of the code by 50%, talk about some alternatives to verify our code, how de novo assembly code is usually ported over to C to gain a much faster processing time (it took us over 6 hours to fully process our data), took longer than expected to write the code because we faced many unknown factors such as Phred quality scoring system and how to convert it to code, not much to do when alleles are involved in the code, another issue is the lack of coverage when the data is used for clinical purposes, group writing code is always more difficult than solo coding

11:19, the hackathon is completed and the winning team is... Friendship! (both teams decided to work together to achieve the end goals)

Again for the interested among you, all of the codes we worked on can be found on our wiki address at http://193.255.88.41/wiki/staj/BiyoHackaton2015.

Biyoinformatik Yaz Kampı '15

29 Temmuz 2015 Çarşamba

First Bioinformatics Hackathon in Turkey, English Summary 27-29th of July 2015

Hiç yorum yok:

Yorum Gönder