`labtools.adtools.seqlib`

Module Contents

Functions

`read_fasta`(filename)	Generator for reading entries in a fasta file.
`read_fastq`(filename[, subset])	Generator for reading entries in a fastq file.
`read_fastq_big`(filename[, subset, progress])	Generator for fastq file without opening into memory.
`get_numreads`(filename)	Returns number of reads in a fastq or fastq.gz file.
`get_numreads_old`(filename)	Returns number of reads in a fastq file.
`write_bc_dict`(bc_dict, name)	Writes bc_dict to a csv.
`read_bc_dict`(filename)	Reads bc_dict from a csv.

labtools.adtools.seqlib.read_fasta(filename)[source]

Generator for reading entries in a fasta file.

Yields 2 lines of a fasta file at a time (name, seq).

Parameters:: filename (str) – Path to fasta or fasta.gz file.
Yields:: (name, seq) ((str, str)) – Name of sequence, biological sequence.

Examples

>>> for line in read_fasta("example.fasta"):
...     name = line[0]
...     seq = line[1]
...     print(name, seq)
Geraldine
ACGTGCTGAGGCTGCGCTAGCAT
Gustavo
CTGATGCTAGATGCTGATA

labtools.adtools.seqlib.read_fastq(filename, subset=None)[source]

Generator for reading entries in a fastq file.

Yields 4 lines of a fastq file at a time (name, seq, +, error).

Parameters:

filename (str) – Path to fastq or fastq.gz file.
subset (int, optional) – Number of reads to randomly sample from the fastq file.

Yields:

(name, seq, qual) ((str, str, str)) – tuple of str containing name, seq and quality for entry.

Examples

>>> for line in read_fastq("example.fasta"):
...     name = line[0]
...     seq = line[1]
...     qual = line[2]
...     print(name, seq)
Geraldine
ACGTGCTGAGGCTGCGCTAGCAT
Gustavo
CTGATGCTAGATGCTGATA

labtools.adtools.seqlib.read_fastq_big(filename, subset=None, progress=True, **kwargs)[source]

Generator for fastq file without opening into memory.

Yields 4 lines of a fastq file at a time (name, seq, +, error). Useful in situations where the fastq file is large and opening into RAM would crash computer. Supports subsetting with sklearn.sample_without_replacement().

Parameters:

filename (str) – Path to fastq or fastq.gz file.
subset (int) – Number of reads to randomly subsample from file.

Yields:

(name, seq, qual) ((str, str, str)) – tuple of str containing name, seq and quality for entry.

Examples

>>> for line in read_fastq_big("example.fasta"):
...     name = line[0]
...     seq = line[1]
...     qual = line[2]
...     print(name, seq)
Geraldine
ACGTGCTGAGGCTGCGCTAGCAT
Gustavo
CTGATGCTAGATGCTGATA

labtools.adtools.seqlib.get_numreads(filename)[source]

Returns number of reads in a fastq or fastq.gz file.

Parameters:: filename (str) – Path to fastq or fastq.gz file.
Returns:: numreads – Number of reads in the fastq file.
Return type:: int

Examples

>>> get_numreads("example.fastq")
124

labtools.adtools.seqlib.get_numreads_old(filename)[source]

Returns number of reads in a fastq file.

Parameters:: filename (str) – Path to fastq file.
Returns:: numreads – Number of reads in the fastq file.
Return type:: int

Examples

>>> get_numreads("example.fastq")
124

labtools.adtools.seqlib.write_bc_dict(bc_dict, name)[source]

Writes bc_dict to a csv.

Parameters:

bc_dict (dict) – Dictionary output from counter.create_map().
name (str) – Filename for output csv. Ex “Library1_dictionary”

labtools.adtools.seqlib.read_bc_dict(filename)[source]

Reads bc_dict from a csv.

Parameters:: filename (str) – Path to csv containing a single dictionary.
Returns:: bc_dict – Dictionary.
Return type:: dict

labtools.adtools.seqlib

Module Contents

Functions

`labtools.adtools.seqlib`