labtools.adtools.counter

Module Contents

Functions

seq_counter(fastq[, design_to_use, barcoded, only_bcs])

Counts occurences of ADs or AD-barcode pairs in a fastq file.

create_map(ad_bcs[, filter])

Converts output of seq_counter with AD,bc pairs to a dict map.

convert_bcs_from_map(bcs, bc_dict)

Takes bc only data and uses a barcode dictionary to return AD counts.

sort_normalizer(pair_counts, bin_counts[, thresh])

Normalize by reads per sample, reads per tile and reads per bin.

calculate_activity(df_in, bin_values[, min_max])

Calculate the activity of a normalized sort df.

main()

labtools.adtools.counter.seq_counter(fastq, design_to_use=None, barcoded=False, only_bcs=False, **kwargs)[source]

Counts occurences of ADs or AD-barcode pairs in a fastq file.

Parameters:
  • fastq (str) – Path to fastq or fastq.gz file.

  • design_to_use (str, default None) – Path to csv file containing ArrayDNA column.

  • barcoded (bool, default False) – Whether to count ADs with different barcodes separately.

  • only_bcs (default False) – True, False or the barcode map to use. If True, no map is used.

  • **kwargs (dict) – Add additional arguments to pass to pull_AD or pull_barcode.

Returns:

counts – Pandas series where indices are AD or AD/barcode sequences and values are counts.

Return type:

pandas.core.series.Series

Examples

>>> seq_counter("../exampledata/mini.fastq")
GGTTCTTCTAAATTGAGATGTGATAATAATGCTGCTGCTCATGTTAAATTGGATTCATTTCCAGCTGGTGTTAGATTTGATACATCTGATGAAGAATTGTTGGAACATTTGGCTGCTAAA    1
GAAGAATTGTTTTTACATTTGTCTGCTAAGATTGGTAGATCTTCTAGGAAACCACATCCATTCTTGGATGAATTTATTCATACTTTGGTTGAAGAAGATGGTATTTGTAGAACTCATCCA    3
dtype: int64
labtools.adtools.counter.create_map(ad_bcs, filter=False)[source]

Converts output of seq_counter with AD,bc pairs to a dict map.

If the barcode is found with two different ADs, it is not included in the dictionary.

Parameters:
  • ad_bcs (pd.Series) – output counts from seq_counter with barcoded = True.

  • filter (int, default False) – Number of reads below which to ignore the barcode.

Returns:

bc_dict – Dictionary with barcodes as keys and 1 AD as value.

Return type:

dict

labtools.adtools.counter.convert_bcs_from_map(bcs, bc_dict)[source]

Takes bc only data and uses a barcode dictionary to return AD counts.

If the barcode is found with two different ADs, it is not included in the dictionary.

Parameters:
  • bcs (pd.Series) – output counts from seq_counter with only_bcs = True.

  • bc_dict (dict) – Dictionary with barcodes as keys and 1 AD as value from create_map().

Returns:

converted – Pandas series where indices are AD sequences and values are counts.

Return type:

pd.Series

labtools.adtools.counter.sort_normalizer(pair_counts, bin_counts, thresh=10)[source]

Normalize by reads per sample, reads per tile and reads per bin.

Parameters:
  • pair_counts (list of pandas.core.series.Series) – List of pandas series where indices are AD or AD/barcode sequences and values are counts.

  • bin_counts (list) – List of number of cells per bin in the same order as the pair counts.

  • thresh (int, default 10) – Number of reads above which to count the unique sequence.

Returns:

  • df (pandas.DataFrame) – Pandas dataframe containing the normalized read counts.

  • numreads (pandas.DataFrame) – Total read counts for each unique sequence.

  • reads (pandas.DataFrame) – Read counts per bin for each unique sequence.

Examples

>>> sort_normalizer([count1, count2], [1000,1000])
labtools.adtools.counter.calculate_activity(df_in, bin_values, min_max=False)[source]

Calculate the activity of a normalized sort df.

Parameters:
  • df_in (pandas.DataFrame) – Dataframe output of sort_normalizer()

  • bin_values (list) – List of mean or median fluorescence values per bin in the same order as the pair counts.

  • min_max (bool, default False) – Whether to normalize the activity using min 0 max 1.

Returns:

df – Pandas dataframe containing the activity values per sequence or sequence-barcode pair.

Return type:

pandas.DataFrame

labtools.adtools.counter.main()[source]