labtools.adtools.seqlib
Module Contents
Functions
|
Generator for reading entries in a fasta file. |
|
Generator for reading entries in a fastq file. |
|
Generator for fastq file without opening into memory. |
|
Returns number of reads in a fastq or fastq.gz file. |
|
Returns number of reads in a fastq file. |
|
Writes bc_dict to a csv. |
|
Reads bc_dict from a csv. |
- labtools.adtools.seqlib.read_fasta(filename)[source]
Generator for reading entries in a fasta file.
Yields 2 lines of a fasta file at a time (name, seq).
- Parameters:
filename (str) – Path to fasta or fasta.gz file.
- Yields:
(name, seq) ((str, str)) – Name of sequence, biological sequence.
Examples
>>> for line in read_fasta("example.fasta"): ... name = line[0] ... seq = line[1] ... print(name, seq) Geraldine ACGTGCTGAGGCTGCGCTAGCAT Gustavo CTGATGCTAGATGCTGATA
- labtools.adtools.seqlib.read_fastq(filename, subset=None)[source]
Generator for reading entries in a fastq file.
Yields 4 lines of a fastq file at a time (name, seq, +, error).
- Parameters:
filename (str) – Path to fastq or fastq.gz file.
subset (int, optional) – Number of reads to randomly sample from the fastq file.
- Yields:
(name, seq, qual) ((str, str, str)) – tuple of str containing name, seq and quality for entry.
Examples
>>> for line in read_fastq("example.fasta"): ... name = line[0] ... seq = line[1] ... qual = line[2] ... print(name, seq) Geraldine ACGTGCTGAGGCTGCGCTAGCAT Gustavo CTGATGCTAGATGCTGATA
- labtools.adtools.seqlib.read_fastq_big(filename, subset=None, progress=True, **kwargs)[source]
Generator for fastq file without opening into memory.
Yields 4 lines of a fastq file at a time (name, seq, +, error). Useful in situations where the fastq file is large and opening into RAM would crash computer. Supports subsetting with sklearn.sample_without_replacement().
- Parameters:
filename (str) – Path to fastq or fastq.gz file.
subset (int) – Number of reads to randomly subsample from file.
- Yields:
(name, seq, qual) ((str, str, str)) – tuple of str containing name, seq and quality for entry.
Examples
>>> for line in read_fastq_big("example.fasta"): ... name = line[0] ... seq = line[1] ... qual = line[2] ... print(name, seq) Geraldine ACGTGCTGAGGCTGCGCTAGCAT Gustavo CTGATGCTAGATGCTGATA
- labtools.adtools.seqlib.get_numreads(filename)[source]
Returns number of reads in a fastq or fastq.gz file.
- Parameters:
filename (str) – Path to fastq or fastq.gz file.
- Returns:
numreads – Number of reads in the fastq file.
- Return type:
int
Examples
>>> get_numreads("example.fastq") 124
- labtools.adtools.seqlib.get_numreads_old(filename)[source]
Returns number of reads in a fastq file.
- Parameters:
filename (str) – Path to fastq file.
- Returns:
numreads – Number of reads in the fastq file.
- Return type:
int
Examples
>>> get_numreads("example.fastq") 124