Fastx toolkit

Manufactured by Illumina

The FASTX-Toolkit is a collection of command-line tools for preprocessing and manipulation of FASTA/FASTQ files. It provides utilities for sequence quality control, trimming, and format conversion.

Automatically generated - may contain errors

Lab products found in correlation

Rneasy mini kit, by Qiagen (1 mentions) Hiseq 2000 sequencing machine, by Illumina (1 mentions) Hiseq 1000, by Illumina (1 mentions) Truseq stranded mrna library prep kit, by Illumina (1 mentions) Nextseq 500 platform, by Illumina (1 mentions)

12 protocols using fastx toolkit

Pacbio-Illumina Hybrid Genome Sequencing

Cited 3 times

Check if the same lab product or an alternative is used in the 5 most similar protocols

The genome was sequenced using PacbioRS, which can generate continuous long reads (CLRs) of up to 10 kb in length, and can be used to upgrade draft genomes containing gaps using PBJelly (Ver. 12.9.14) [22 (link)]. However, CLRs show only 82.1% to 84.4% base accuracy [55 (link)]. Thus, error correction was performed using the command pacBioToCA [56 (link)] with the parameters -length 500, -partitions 200, -shortReads, -l NC, -t 20, and -s pacbio.spec. Illumina (50× read coverage of genome) reads were used for correction. Illumina reads were trimmed using FASTX-Toolkit [56 (link)] with the parameters -t 20, -l 50, and -Q 33. Pacbio.spec files specified the parameters for overlapping Illumina and pacbio data for correction: utgErrorRate = 0.25, utgErrorLimit tgErrorLcnsErrorRate = 0.25, cgwErrorRate = 0.25, ovlErrorRate = 0.25, and merSize = 10. After correction, pacBio-corrected reads were analyzed using FastQC [57 ]. A total of 2,640,379 CLRs (7.6× read coverage of genome) were used for error-correction, which generated 2,415,333 error-corrected reads (2.3× read coverage of genome) (Additional file 1: Table S1). The average CLR length decreased from 1,819 to 969 bp. The resulting error-corrected CLRs were used for gap filling.

Shin S.C., Ahn D.H., Kim S.J., Pyo C.W., Lee H., Kim M.K., Lee J., Lee J.E., Detrich HW I.I.I., Postlethwait J.H., Edwards D., Lee S.G., Lee J.H, & Park H. (2014). The genome sequence of the Antarctic bullhead notothen reveals evolutionary adaptations to a cold environment. Genome Biology, 15(9), 468.

+ Open protocol

+ Expand

RNA-seq Transcriptome Assembly of Insects

Cited 1 time

Check if the same lab product or an alternative is used in the 5 most similar protocols

Insects used for RNA-seq were collected from the same laboratory population described above and reared on ~ 70% humidity coffee parchment. Total RNA was isolated from pooled whole-body female and male adults (30 and 50 individuals, respectively), separately, using RNeasy Mini Kit (Qiagen) and including a DNase I step to remove genomic DNA contamination. Illumina RNA-seq single-end library construction using TruSeq RNA Library Prep Kit v2 and sequencing through a HiSeq2500 platform were performed by BGI (Hong Kong). Raw Illumina reads was adaptor-removed, trimmed and filtered according to quality using default parameters of the Fastx-Toolkit v.0.014. Transcript assembly was performed using rnaSPades v.3.14.0^{25 (link)} with default parameters. Transcript redundancy was reduced by clustering sequences with CD-HIT v.4.8.1^{26 (link)} at default options. Removal of sequence contamination was performed using BLASTn search as described above.

Navarro-Escalante L., Hernandez-Hernandez E.M., Nuñez J., Acevedo F.E., Berrio A., Constantino L.M., Padilla-Hurtado B.E., Molina D., Gongora C., Acuña R., Stuart J, & Benavides P. (2021). A coffee berry borer (Hypothenemus hampei) genome assembly reveals a reduced chemosensory receptor gene repertoire and male-specific genome sequences. Scientific Reports, 11, 4900.

+ Open protocol

+ Expand

Whole-Genome Sequencing for Listeria monocytogenes

Check if the same lab product or an alternative is used in the 5 most similar protocols

Whole-genome sequencing data for the 180 L. monocytogenes isolates were processed using Haplo-ST (S1 Fig, [26 (link)]) for allelic profiling of 2554 genes per isolate. Haplo-ST first cleaned raw Illumina whole-genome sequencing reads obtained as previously described (S1 File) using the FASTX-Toolkit [27 ]. Next, reads were trimmed to remove all bases with a Phred quality score of < 20 from both ends and filtered such that 90% of bases in the clean reads had a quality of at least 20. After trimming and filtering, all remaining reads with lengths of < 50 bp were filtered out. Next, Haplo-ST used YASRA [28 ] to assemble the cleaned reads into allele sequences and provided wgMLST profiles to the assembled allele sequences with BIGSdb-Lm (available at http://bigsdb.pasteur.fr/listeria).

Louha S., Meinersmann R.J, & Glenn T.C. (2021). Whole genome genetic variation and linkage disequilibrium in a diverse collection of Listeria monocytogenes isolates. PLoS ONE, 16(2), e0242297.

+ Open protocol

+ Expand

Full-length genomic read extraction

Check if the same lab product or an alternative is used in the 5 most similar protocols

After quality filtering and Illumina sequencing adaptor trimming with FASTX-Toolkit (v0.0.13), the raw paired-end reads were merged to single-end reads by using FLASh software (v1.2.11). The correlated 5′-end and 3′-end sequences were extracted by the custom script (fasta_to_paired.sh) using the SeqKit (v2.4.0) and Cutadapt (v4.1) packages. The inferred full-length reads were generated by Bedtools (v2.31.0) and Samtools (v1.17) after mapping to the reference genome (NC_000913.3 for Eco, NC_008596.1 for Msm and NC_018143.2 for Mtb) with Bowtie 2 (v2.5.1). The full-length reads with an insert length greater than 10,000 nt were discarded. The mapping results were visualized using the IGV genome viewer (v2.4.10). Data analysis and visualization scripts used Python packages including Matplotlib (v3.7.1), Numpy (v1.24.3), Scipy (v1.10.1), bioinfokit (v0.3), and pyCircos (v0.3.0).

Ju X., Li S., Froom R., Wang L., Lilic M., Delbeau M., Campbell E.A., Rock J.M, & Liu S. (2024). Incomplete transcripts dominate the Mycobacterium tuberculosis transcriptome. Nature, 627(8003), 424-430.

+ Open protocol

+ Expand

Genome Assembly of Fungal Pathogens

Cited 2 times

Check if the same lab product or an alternative is used in the 5 most similar protocols

Illumina paired-end reads were quality filtered using FastX tool kit (version 0.0.13.2). Adapter sequences were clipped using Cutadapt version 1.2.1 [29] . Then paired reads having at least 80% of bases with quality score greater than Q30 (Q score is quality score specified by Illumina, which indicates probability of errors in base calling. Q30 means a probability of incorrect base call is in 1 in 1000) were chosen for further analysis. We attempted both de novo and reference based assembly of genomes using Velvet 1.2.09, however reference based assembly was used for further analysis since it yielded better assembly [30] (link). M. oryzae 70-15 was used as a reference strain for reference based assembly. The whole genome assembly is available at NCBI/DDBJ/EMBL with the accession AXDJ01000000 for B157 and AYPX01000000 for MG01.
Contig ordering, gap filling and re-scaffolding was performed using various integrated tools in order to improve assembly quality. We used the ABACAS tool for contig ordering with reference [31] (link). The Iterative Mapping and Assembly for Gap Elimination (IMAGE) [32] (link) method was used to fill the gaps in the assembly. The pre-assembled contigs were merged back to scaffolds after successful completion of iterative assembly using SSPACE (SSAKE-based Scaffolding of Pre-Assembled Contigs after Extension) [33] (link).

Gowda M., Shirke M.D., Mahesh H.B., Chandarana P., Rajamani A, & Chattoo B.B. (2015). Genome analysis of rice-blast fungus Magnaporthe oryzae field isolates from southern India. Genomics Data, 5, 284-291.

+ Open protocol

+ Expand

High-throughput sequencing of Hth-Exd complexes

Cited 1 time

Check if the same lab product or an alternative is used in the 5 most similar protocols

Libraries for Hth^FL-Exd and Hth^FL-Exd^R2A,R5A (Lib-16) were sequenced using a v2 75-cycle high-output kit on an Illumina NEXTSeq Series desktop sequencer at the Genome Center at Columbia University. Libraries Lib-Hth-F and Lib-Hth-R with either Hth or Exd shape-readout mutant in complex with the respective other wild-type protein and Dfd, as well as the Lib-30 Hth^FL-Exd-Dfd experiment were all sequenced at the New York Genome Center using separate lanes on an Illumina HiSeq 2000 sequencing machine. Libraries Lib-Hth-F and Lib-Hth-R with wild-type proteins were also sequenced on a HiSeq instrument at a different facility. Libraries were trimmed to remove Illumina- and library-internal adapter sequences using the FASTX toolkit (Hanon lab) and loaded into the R environment using the R package named SELEX (http://bioconductor.org/packages/SELEX) (Riley, 2014 (link)).

Kribelbauer J.F., Loker R.E., Feng S., Rastogi C., Abe N., Rube H.T., Bussemaker H.J, & Mann R.S. (2020). Context-dependent gene regulation by homeodomain transcription factor complexes revealed by shape-readout deficient proteins. Molecular cell, 78(1), 152-167.e11.

+ Open protocol

+ Expand

Illumina Sequence Reads Quality Control

Check if the same lab product or an alternative is used in the 5 most similar protocols

Illumina sequence reads were analyzed for their quality and adjusted using the FASTX-Toolkit. The FASTX Artifacts Filter was used to eliminate reads containing artifacts such as poly-A regions. Most of the reads containing artifacts have been eliminated by Illumina itself already. The FASTQ Quality Filter set to a minimum quality score threshold of 20 and a minimum read length of 47 was used to eliminate low quality reads. The FASTX Trimmer served to eliminate single bases showing very low quality in all reads.

Reininger V, & Schlegel M. (2016). Analysis of the Phialocephala subalpina Transcriptome during Colonization of Its Host Plant Picea abies. PLoS ONE, 11(3), e0150591.

+ Open protocol

+ Expand

Breast Cancer miRNA Sequencing Protocol

Cited 1 time

Check if the same lab product or an alternative is used in the 5 most similar protocols

Small RNA sequencing was performed on single lane of Illumina HiSeq 1000 with eight multiplex libraries from the four breast cancer cell lines. The reads obtained from deep sequencing of small RNAs were subjected to Illumina adaptor trimming using FastX tool kit and were size filtered to select for candidate miRNA's (14 to 24 bases) from a pool of small RNA sequences using in-house perl script. The size separated reads were then mapped onto human miRNA reads obtained from miRBase (version 21) using Bowtie2 (version 2.1.0)¹⁶ (link) with 0 mismatches in the first 8 bases. MicroRNAs were quantified followed by normalisation by read per million using in-house script. Deregulated miRNAs with > = 3 fold change were retained for further analysis. For searching microRNAs targeting PR 3′UTR, differentially expressed microRNAs in response to progesterone were compared to microRNAs predicted to target PR using 6 algorithms (TargetScan, miRanda, miRWalk, miRMap, RNA22 and RNAhybrid).

Godbole M., Chandrani P., Gardi N., Dhamne H., Patel K., Yadav N., Gupta S., Badwe R, & Dutt A. (2017). miR-129-2 mediates down-regulation of progesterone receptor in response to progesterone in breast cancer cells. Cancer Biology & Therapy, 18(10), 801-805.

+ Open protocol

+ Expand

High-throughput Sequencing with Purified Amplicons

Check if the same lab product or an alternative is used in the 5 most similar protocols

For targeted high-throughput sequencing, PAGE-purified primers containing the sequencing adapter and target sequence were used to produce amplicons. After amplification, libraries were PAGE separated and target fragments gel purified. Libraries were sequenced using the Illumina NextSeq500 platform. Reads were preprocessed using the FASTX Toolkit (Hannon laboratory) and reads less abundant than 0.01% of the most abundant read were excluded. Insertions and deletions were quantified using custom scripts and manually verified. For whole-genome sequencing, libraries were prepared using the TruSeq Stranded mRNA Library Prep Kit (Illumina), and reads were processed using the FASTX Toolkit and mapped to the TuMV genome with BWA.

Olspert A., Chung B.Y., Atkins J.F., Carr J.P, & Firth A.E. (2015). Transcriptional slippage in the positive-sense RNA virus family Potyviridae. EMBO Reports, 16(8), 995-1004.

+ Open protocol

+ Expand

Ribosome Profiling Analysis Pipeline

Cited 1 time

Check if the same lab product or an alternative is used in the 5 most similar protocols

To process RPF sequencing reads, Illumina adapters were removed using fastx_clipper from the FASTX-Toolkit. Ribosomal RNA and tRNA were removed using Bowtie version 1.0.0⁵. Remaining reads were aligned to the genome (hg19 / GRCh37) and transcriptome using STAR version 2.5.3a⁶ (--alignIntronMin 20 --alignIntronMax 100000 --outFilterMismatchNmax 1 -- outFilterType BySJout --outFilterMismatchNoverLmax 0.04 --twopassMode Basic). For the transcriptome annotation, a combination of GENCODE v26lift37 transcriptome annotation was combined with transcripts annotated as tstatus “unannotated” from MiTranscriptome annotation^{7 (link)}. To determine the RPF library quality, trinucleotide codon periodicity was plotted using RibORF readDist script⁸ against annotated protein-coding ORFs (GENCODE v26lift37). Only samples and read lengths that showed clear trinucleotide periodicity were used for subsequent ORF predictions.

de Miranda Santos I.K., Costa C.H., Krieger H., Feitosa M.F., Zurakowski D., Fardin B., Gomes R.B., Weiner D.L., Harn D.A., Ezekowitz R.A, & Epstein J.E. (2001). Mannan-Binding Lectin Enhances Susceptibility to Visceral Leishmaniasis. Infection and Immunity, 69(8), 5212-5215.

+ Open protocol

+ Expand

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!