:py:mod:`labtools.adtools.seqlib` ================================= .. py:module:: labtools.adtools.seqlib Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: labtools.adtools.seqlib.read_fasta labtools.adtools.seqlib.read_fastq labtools.adtools.seqlib.read_fastq_big labtools.adtools.seqlib.get_numreads labtools.adtools.seqlib.get_numreads_old labtools.adtools.seqlib.write_bc_dict labtools.adtools.seqlib.read_bc_dict .. py:function:: read_fasta(filename) Generator for reading entries in a fasta file. Yields 2 lines of a fasta file at a time (name, seq). :param filename: Path to fasta or fasta.gz file. :type filename: str :Yields: **(name, seq)** (*(str, str)*) -- Name of sequence, biological sequence. .. rubric:: Examples >>> for line in read_fasta("example.fasta"): ... name = line[0] ... seq = line[1] ... print(name, seq) Geraldine ACGTGCTGAGGCTGCGCTAGCAT Gustavo CTGATGCTAGATGCTGATA .. py:function:: read_fastq(filename, subset=None) Generator for reading entries in a fastq file. Yields 4 lines of a fastq file at a time (name, seq, +, error). :param filename: Path to fastq or fastq.gz file. :type filename: str :param subset: Number of reads to randomly sample from the fastq file. :type subset: int, optional :Yields: **(name, seq, qual)** (*(str, str, str)*) -- tuple of str containing name, seq and quality for entry. .. rubric:: Examples >>> for line in read_fastq("example.fasta"): ... name = line[0] ... seq = line[1] ... qual = line[2] ... print(name, seq) Geraldine ACGTGCTGAGGCTGCGCTAGCAT Gustavo CTGATGCTAGATGCTGATA .. py:function:: read_fastq_big(filename, subset=None, progress=True, **kwargs) Generator for fastq file without opening into memory. Yields 4 lines of a fastq file at a time (name, seq, +, error). Useful in situations where the fastq file is large and opening into RAM would crash computer. Supports subsetting with sklearn.sample_without_replacement(). :param filename: Path to fastq or fastq.gz file. :type filename: str :param subset: Number of reads to randomly subsample from file. :type subset: int :Yields: **(name, seq, qual)** (*(str, str, str)*) -- tuple of str containing name, seq and quality for entry. .. rubric:: Examples >>> for line in read_fastq_big("example.fasta"): ... name = line[0] ... seq = line[1] ... qual = line[2] ... print(name, seq) Geraldine ACGTGCTGAGGCTGCGCTAGCAT Gustavo CTGATGCTAGATGCTGATA .. py:function:: get_numreads(filename) Returns number of reads in a fastq or fastq.gz file. :param filename: Path to fastq or fastq.gz file. :type filename: str :returns: **numreads** -- Number of reads in the fastq file. :rtype: int .. rubric:: Examples >>> get_numreads("example.fastq") 124 .. py:function:: get_numreads_old(filename) Returns number of reads in a fastq file. :param filename: Path to fastq file. :type filename: str :returns: **numreads** -- Number of reads in the fastq file. :rtype: int .. rubric:: Examples >>> get_numreads("example.fastq") 124 .. py:function:: write_bc_dict(bc_dict, name) Writes bc_dict to a csv. :param bc_dict: Dictionary output from counter.create_map(). :type bc_dict: dict :param name: Filename for output csv. Ex "Library1_dictionary" :type name: str .. py:function:: read_bc_dict(filename) Reads bc_dict from a csv. :param filename: Path to csv containing a single dictionary. :type filename: str :returns: **bc_dict** -- Dictionary. :rtype: dict