labtools.adtools.counter
Module Contents
Functions
|
Counts occurences of ADs or AD-barcode pairs in a fastq file. |
|
Converts output of seq_counter with AD,bc pairs to a dict map. |
|
Takes bc only data and uses a barcode dictionary to return AD counts. |
|
Normalize by reads per sample, reads per tile and reads per bin. |
|
Calculate the activity of a normalized sort df. |
|
- labtools.adtools.counter.seq_counter(fastq, design_to_use=None, barcoded=False, only_bcs=False, **kwargs)[source]
Counts occurences of ADs or AD-barcode pairs in a fastq file.
- Parameters:
fastq (str) – Path to fastq or fastq.gz file.
design_to_use (str, default None) – Path to csv file containing ArrayDNA column.
barcoded (bool, default False) – Whether to count ADs with different barcodes separately.
only_bcs (default False) – True, False or the barcode map to use. If True, no map is used.
**kwargs (dict) – Add additional arguments to pass to pull_AD or pull_barcode.
- Returns:
counts – Pandas series where indices are AD or AD/barcode sequences and values are counts.
- Return type:
pandas.core.series.Series
Examples
>>> seq_counter("../exampledata/mini.fastq") GGTTCTTCTAAATTGAGATGTGATAATAATGCTGCTGCTCATGTTAAATTGGATTCATTTCCAGCTGGTGTTAGATTTGATACATCTGATGAAGAATTGTTGGAACATTTGGCTGCTAAA 1 GAAGAATTGTTTTTACATTTGTCTGCTAAGATTGGTAGATCTTCTAGGAAACCACATCCATTCTTGGATGAATTTATTCATACTTTGGTTGAAGAAGATGGTATTTGTAGAACTCATCCA 3 dtype: int64
- labtools.adtools.counter.create_map(ad_bcs, filter=False)[source]
Converts output of seq_counter with AD,bc pairs to a dict map.
If the barcode is found with two different ADs, it is not included in the dictionary.
- Parameters:
ad_bcs (pd.Series) – output counts from seq_counter with barcoded = True.
filter (int, default False) – Number of reads below which to ignore the barcode.
- Returns:
bc_dict – Dictionary with barcodes as keys and 1 AD as value.
- Return type:
dict
- labtools.adtools.counter.convert_bcs_from_map(bcs, bc_dict)[source]
Takes bc only data and uses a barcode dictionary to return AD counts.
If the barcode is found with two different ADs, it is not included in the dictionary.
- Parameters:
bcs (pd.Series) – output counts from seq_counter with only_bcs = True.
bc_dict (dict) – Dictionary with barcodes as keys and 1 AD as value from create_map().
- Returns:
converted – Pandas series where indices are AD sequences and values are counts.
- Return type:
pd.Series
- labtools.adtools.counter.sort_normalizer(pair_counts, bin_counts, thresh=10)[source]
Normalize by reads per sample, reads per tile and reads per bin.
- Parameters:
pair_counts (list of pandas.core.series.Series) – List of pandas series where indices are AD or AD/barcode sequences and values are counts.
bin_counts (list) – List of number of cells per bin in the same order as the pair counts.
thresh (int, default 10) – Number of reads above which to count the unique sequence.
- Returns:
df (pandas.DataFrame) – Pandas dataframe containing the normalized read counts.
numreads (pandas.DataFrame) – Total read counts for each unique sequence.
reads (pandas.DataFrame) – Read counts per bin for each unique sequence.
Examples
>>> sort_normalizer([count1, count2], [1000,1000])
- labtools.adtools.counter.calculate_activity(df_in, bin_values, min_max=False)[source]
Calculate the activity of a normalized sort df.
- Parameters:
df_in (pandas.DataFrame) – Dataframe output of sort_normalizer()
bin_values (list) – List of mean or median fluorescence values per bin in the same order as the pair counts.
min_max (bool, default False) – Whether to normalize the activity using min 0 max 1.
- Returns:
df – Pandas dataframe containing the activity values per sequence or sequence-barcode pair.
- Return type:
pandas.DataFrame