To facilitate comparisons between CHB-derived consensus sequences and to identify the most genetically similar consensus to the HBV quasispecies of each sample, we estimated the Mash distance between each consensus and the subsampled HBV sequencing data which aligned to the best performing reference (linear or graph-based) for each sample. We also performed de novo HBV strain-level assembly using SAVAGE and VG-Flow to identify the viral haplotypes comprising each CHB infection.69 (link),70 (link) For each sample, the best-performing linear reference was added to the SAVAGE output for VG-Flow to improve strain-level contiguity and assembly. The set of sample-specific viral haplotypes with frequencies >1% were included in all pairwise genetic distance comparisons. The consensus sequence with the lowest estimated genetic distance with the HBV-specific high throughput sequencing data can be inferred to be the most accurate and genetically representative consensus sequence for each sample.
Free full text: Click here