These 3 days consisted of our 48 hour bioinformatics hackathon, aptly named Biyo Hackathon, in which we were tasked with gradually challenging tasks relating to de novo sequencing in two teams - The Chickpeas and The Beans. The Biyo Hackathon is the first bioinformatics hackathon to take place in Turkey, and as the field of bioinformatics grows we hope it will help usher in more interest in bioinformatics. Our hackathon started at ~9 AM on Monday and ended at ~9 AM on Wednesday. During this time we had little chance for rest, as everyone wanted to beat the other team and so chose to stay awake most of the time. Due to the workload we were tasked with in these 3 days the following notes were taken in a bullet point style to save as much time as possible. The results of our coding can be found on our wiki at http://193.255.88.41/wiki/staj/BiyoHackaton2015.
The participants shall be posting their blog updates to voice their opinions on the Biyo Hackathon in the coming days, so keep your eyes peeled. In the meantime lets take a look at what we have done over the course of ~50 hours spanning 3 Gregorian days...
27/7/2015:
brought in the survival supplies from Mr. Ahmet’s car
got ready for the coding start with our two teams, the Beans and the Chickpeas
first task was to write a code to analyze a given nucleotide/protein sequence
s for sequence, p for protein to get seperate results
must put all of the tasks into separate functions so we can call any function we desire
we the Chickpeas lost due to Mr. Ahmet of the Beans wrecking us with his speed
translated the descriptive information from Turkish to English
eating break at 14:15
sorted for alphabetical sorting, Counter most_common for numerical sorting
try to use -1 for all errors
losing team makes the sandwiches for lunch
moving on to second task with a FASTQ sequence analysis using Phred scoring system
troubles with IPython in getting our code to run (such as not registering tabs correctly)
speed test between the groups to see who will have the fastest time to analyze a large FASTQ file (AR2_S3_L001_R1_001)
second task complete at 22:10, Chickpeas lost again by a 2 second difference in coding result printing speed
22:45 and our pizza is late so it should be with a free delivery
23:10, we cancelled the pizza that never arrived so now we are eating our prepared sandwiches
28/7/2015:
2:53, woke up after a nap and the code is going along nicely, we only have to get the histograms/boxplots/surprise graphs set up
5:03, third job is halfway through and we are taking a 4 hour break before we tackle the remainder of the job
7:23, people starting to slowly wake up early to get increased coding time
9:25, coding fully engaged and people are trying their hardest to get the code working
11:18, Chickpeas on a fantastic losing streak due to the last minute actions of the savior Mr. Ahmet becoming the MVP for the Beans team
11:57, eating breakfast/lunch is mostly done
task 4 consists of writing a big pseudocode for our sequence analysis which consists of assembling Illumina reads
short warm up practice with a paper cutout sequence where we tried to match the overlapping bases to obtain the original sequence
after warm up we moved on to code the beginning of exercise 4 in IPython, which consisted of many small functions
above: struggling with the 4th task
we took a break where we waited for our code to be modified by the IPython server
our code turned out to be too slow and we spent some time trying to fix the code by replacing re.search, removing some unused elements, and condensing some functions
-
at around 8 PM we started visualizing our data for the 4th task in Cytoscape, mapping out the relationships of the sequences to each other
21:18, we are trying to deal with a problem in the code where we are trying to replace repeat sequences and print out the box plot of the result
around 10:30 PM we finished our dinner and moved on to finish task #4
above: finishing up the 4th task
00:15, short break while IPython handles the computations
2:05, task 4 was settled in a stalemate due to it being too difficult to come to a conclusion even with our combined efforts
we were given our next and 5th task which consisted of taking our joined sequences from the 4th task and tagging their location in their genes and marking the location of the non-joined sequences in their genes
but before that we took a quick peek at D3 (Data Driven Documents) at http://d3js.org/
-
29/7/2015:
8:39, the cabin is waking up and getting ready to work for some hours
started the day by plotting the joined sequences into a histogram and boxplot
-
above: matching one of our joined sequences with a gene
code summary by Mr. Ahmet where talked about the functions in the code and how we improved the speed of the code by 50%, talk about some alternatives to verify our code, how de novo assembly code is usually ported over to C to gain a much faster processing time (it took us over 6 hours to fully process our data), took longer than expected to write the code because we faced many unknown factors such as Phred quality scoring system and how to convert it to code, not much to do when alleles are involved in the code, another issue is the lack of coverage when the data is used for clinical purposes, group writing code is always more difficult than solo coding
11:19, the hackathon is completed and the winning team is... Friendship! (both teams decided to work together to achieve the end goals)
Hiç yorum yok:
Yorum Gönder