Scaffolding Algorithm Based on Short Reads or Long Reads
罗 军 伟
腾 讯 ID:
815 277 628
In the field of genome assembly, through scaffolding algorithms, a more complete and contiguous reference genome can be obtained, which is the cornerstone of genomic research. Scaffolding algorithms typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis.We present a novel scaffolding algorithm (called BOSS), which employs short paired reads for scaffolding. To construct a scaffold graph, BOSS utilizes the distribution of insert size to decide whether an edge between two vertices (contigs) should be added and how an edge should be weighed. Moreover, BOSS adopts an iterative strategy to detect spurious edges whose removal can guarantee no contradictions in the scaffold graph. We also present a scaffolding algorithm based on long reads and contig classification (SLR). SLR classifies the contigs into unique contigs and ambiguous contigs for addressing the problem of repetitive regions.
足彩app罗军伟，博士，副教授，博士生导师。近年来主要从事机器学习、深度学习、大数据技术应用、生物信息等方面的研究。目前，主持国家自然科学基金面上项目1项和省级项目1项、主持完成国家自然科学青年基金1项、作为主要人员参与完成国家自然科学基金项目4项、参与完成省部级科研项目4项，获得河南省自然科学二等奖1项。在国际重要期刊《Bioinformatics》、《IEEE/ACM Transactions on Computational Biology and Bioinformatics》、《BMC Bioinformatics》以及国际重要学术会议BIBM、ISBRA等上发表学术论文20余篇。获得ACM SIGBIO优秀博士论文奖、河南省青年骨干教师等荣誉。