Blog

s-aligner: a greedy algorithm for non-greedy de novo genome assembly

Today marks a small milestone in the development of this project. A technical description of s-aligner containing a further analysis of its capabilities is finally available to the public. I hope it can be peer-reviewed in the next weeks or months. It is already being scrutinized by the community. I invite you to read it and participate in the discussion.

"s-aligner: a greedy algorithm for non-greedy denovo genome assembly"

If you don’t have much time, here you have also a summary of the main ideas and results.

S-aligner is a simple idea. No big breakthrough discovery has been required to develop it. Just high skills developing software and selecting/discarding technologies and ideas.

  1. First, it finds overlaps in the reads.
  2. Then it finds a position for each read in a contig.
  3. Then it deletes inconsistent reads
  4. Repeat until all reads are processed or we already got a good-enough assembly

Some characteristics of s-aligner are:

  1. It is interactive.
  2. You can adjust quality/speed.
  3. Results have the same quality with or without paired-end information.
  4. It can generate quality metrics for the results: output to FASTQ.

And it outperforms every software it was tested against for viral de novo genome assembly.

Overall, s-aligner performs on average 110% better than the second-best with the viral benchmark sets analyzed and 64% better with a benchmark set containing samples with extraordinarily large viruses (~250kbp).

All these advantages mean that in a crisis like the one caused by COVID-19, the hundreds of thousands of sequencings being done around the world could make use of cheaper resources to obtain equivalent or superior quality in the results. That could have a significant impact on the management of the crisis.

Share On

Comments

Write a Comment