montebarcode package

Submodules

montebarcode.checks module

Functions to check that barcodes conform.

montebarcode.checks.Distance(min_distance: int = 2, use_levenshtein: bool = True) Callable[[str, Iterable], bool][source]

Create a distance checking function.

Uses the parameters to produce a function which takes as arguments a candidate barcode and a working list of barcodes, and returns True if the parameter conditions are not met.

Parameters:
  • min_distance (int) – The minimum distance allowed among all barcodes.

  • use_levenshtein (bool, optional) – Whether to use the Levenshtein distance. Default: True.

Returns:

Checking function.

Return type:

function

Examples

>>> Distance(1)('ATA', ['TCG', 'AAT'])
False
>>> Distance(2)('AAA', ['TCG', 'AAT'])
True
montebarcode.checks.GCcontent(min: float = 0.35, max: float = 0.65) Callable[[str, Iterable], bool][source]

Create GC content checking function.

Produces a function which takes as arguments a candidate barcode and a working list of barcodes, and returns True if the candidate is not within the bounds. (Working list is ignored.)

Parameters:
  • min (float) – Minimum acceptable proportion of GC content.

  • max (float) – Maximum acceptable proportion of GC content.

Returns:

Checking function.

Return type:

function

Examples

>>> GCcontent()('AATT', [])
True
>>> GCcontent()('AACG', [])
False
>>> GCcontent()('GGCG', [])
True
montebarcode.checks.Homopolymer(length: int = 4) Callable[[str, Iterable], bool][source]

Create a homopolymer checking function.

Produces a function which takes as arguments a candidate barcode and a working list of barcodes, and returns True if the candidate contains a homopolymer length or longer. (Working list is ignored.)

Parameters:

length (int) – Minimum length of homopolymer to detect.

Returns:

Checking function.

Return type:

function

Examples

>>> Homopolymer(3)('AAAT', [])
True
>>> Homopolymer(4)('AAAT', [])
False
montebarcode.checks.Identities() Callable[[str, Iterable], bool][source]

Create an identity checking function.

Produces a function which takes as arguments a candidate barcode and a working list of barcodes, and returns True if the candidate or its reverse complement is in the working list.

Returns:

Checking function.

Return type:

function

Examples

>>> Identities()('AAA', ['TCG', 'AAT'])
False
>>> Identities()('AAA', ['TCG', 'AAA'])
True
>>> Identities()('AAA', ['TCG', 'TTT'])
True
montebarcode.checks.IlluminaColorBalance(green_4ch='GT', red_4ch='AC', green_2ch='AT', red_blue_2ch='AC', image1_1ch='AT', image2_1ch='C', alphabet='ATCG') Callable[[str, Iterable], bool][source]

Create a color balance checking function.

Produces a function which takes as arguments a candidate barcode and a working list of barcodes, and returns True if adding the barcode to the working set would give suboptimal color balance across a range of Illumina SBS platforms.

This is a very stringent check in order to ensure barcodes will wrok on multiple platforms.

If there are 4 or fewer barcodes in total, the checking function will only check for two dark bases (for 1 channel chemistry) at the 5’ end.

If there are at least 5 barcodes in total, the checking function will test for color balance among the channels and images.

Parameters:
  • green_4ch (str, optional) – Bases detected by the green channel in 4 channel chemistry. Default: ‘GT’,

  • red_4ch (str, optional) – Bases detected by the red channel in 4 channel chemistry. Default: ‘AC’,

  • green_2ch (str, optional) – Bases detected by the green channel in 2 channel chemistry. Default: ‘AT’,

  • red_blue_2ch (str, optional) – Bases detected by the red/blue channel in 2 channel chemistry. Default: ‘AC’,

  • image1_1ch (str, optional) – Bases detected by the first image in 1 channel chemistry. Default: ‘AT’,

  • image2_1ch (str, optional) – Bases detected by the second image in 1 channel chemistry. Default: ‘C’,

  • alphabet (str, optional) – Letters comprising the alphabet. Used to infer the dark bases for each chemistry.

Returns:

Checking function.

Return type:

function

Examples

>>> IlluminaColorBalance()('GGAT', ['TCGC', 'AAAG'])
True
>>> IlluminaColorBalance()('AAAT', ['TCGC', 'AAAG'])
False
>>> IlluminaColorBalance()('AAAT', ['TCGC', 'ACAG', 'TGGC', 'ATCG'])
True
>>> IlluminaColorBalance()('AAAT', ['TCGC', 'CCAG', 'TGGC', 'ATCG'])
False
montebarcode.checks.Palindrome() Callable[[str, Iterable], bool][source]

Create a palindrome checking function.

Produces a function which takes as arguments a candidate barcode and a working list of barcodes, and returns True if the candidate is palindromic. (Working list is ignored.)

Returns:

Checking function.

Return type:

function

Examples

>>> Palindrome()('AAA', [])
False
>>> Palindrome()('AATT', [])
True
montebarcode.checks.RestrictionSites(n: int = 0) Callable[[str, Iterable], bool][source]

Create a Type IIS restriction site checking function.

Produces a function which takes as arguments a candidate barcode and a working list of barcodes, and returns True if the candidate contains a Type IIS resitriction site commonly used in Golden Gate cloning. (Working list is ignored.)

Parameters:

n (int) – Minimum number of restriction sites to tolerate.

Returns:

Checking function.

Return type:

function

Examples

>>> RestrictionSites()('AAATGGTCTC', [])
True
>>> RestrictionSites(1)('AAATGGTCTC', [])
False
>>> RestrictionSites()('AAATGCTCTC', [])
False
montebarcode.checks.base_usage(x: Iterable[str]) Mapping[source]

Calculate the proportional base usage for a set of barcodes.

Returns a dictionary mapping position along sequence to the distribution among the bases.

Does not check for legitimate DNA alphabet.

Parameters:

x (Iterable) – Set of barcodes to check.

Returns:

Base usage per position.

Return type:

dict

Examples

>>> base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[0]['A']
0.25
>>> base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[1]['G']
0
>>> base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[2]['A']
0.5
montebarcode.checks.make_checks(barcodes: Iterable[str], n: int, checks: Iterable[Callable[[str, Sequence], bool]], max_rejection_rate: float = 1.0, max_tries: int = 10000, initial: Sequence[str] | None = None, quiet: bool = False) Sequence[dict, int, Sequence[str]][source]

Check barcode list conforms to the checks.

Keeps a tally of rejection rate and rejection reasons.

Parameters:
  • barcodes (Iterable[str]) – List or generator of barcodes to check.

  • n (int) – Minimum number of barcodes to accept. Stops when this is reached or checked all barcodes.

  • checks (Iterable) – List of functions which take a candidate barcode and previous barcodes as arguments and return True if the candidate should be rejected.

  • length (int) – Length of barcodes to generate.

  • max_rejection_rate (float, optional) – Rejection rate above which to terminate. Default: 1.

  • max_tries (int, optional) – Number of barcodes to try before enforcing max_rejection_rate. Default 100000.

  • initial (list, optional) – An initial list to append new barcodes.

  • quiet (bool, optional) – Whether to report progress. Default: True.

Returns:

  • dict – Counts of rejections based on failing each check.

  • int – Number of random barcode candidates tried.

  • list – List of n random barcodes passing the checks.

Raises:

ValueError – When rejection rate goes above the threshold.

Examples

>>> make_checks(['AAAT', 'TCGC', 'ACAG', 'TGGC', 'ATCG'], 5, checks=[IlluminaColorBalance()], quiet=True)  
(Counter({'color_balance': 1}), 5, ['AAAT', 'TCGC', 'ACAG', 'TGGC'])
>>> make_checks(['AAAT', 'TCGC', 'CCAG', 'TGGC', 'ATCG'], 5, checks=[IlluminaColorBalance()], quiet=True)  
(Counter(), 5, ['AAAT', 'TCGC', 'CCAG', 'TGGC', 'ATCG'])
>>> make_checks(['AAAT', 'TCGC', 'ACAG', 'TGGC', 'ATCG'], 4, checks=[IlluminaColorBalance()], quiet=True)  
(Counter(), 4, ['AAAT', 'TCGC', 'ACAG', 'TGGC'])
>>> #
>>> checks = [Homopolymer(), Palindrome()]
>>> make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], 4, checks=checks, quiet=True)  
(Counter({'homopolymer': 1, 'palindrome': 1}), 4, ['ATCGCG', 'GCCGAT'])
>>> make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], 1, checks=checks, quiet=True)  
(Counter({'homopolymer': 1, 'palindrome': 1}), 3, ['ATCGCG'])
>>> #
>>> make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], 1, checks=checks, initial=['ATCGCG'], quiet=True)   
(Counter({'homopolymer': 1, 'palindrome': 1}), 3, ['ATCGCG', 'ATCGCG'])
>>> checks = [Homopolymer(), Palindrome(), Identities()]
>>> make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], 1, checks=checks, initial=['ATCGCG'], quiet=True)   
(Counter({'homopolymer': 1, 'palindrome': 1, 'identity': 1}), 4, ['ATCGCG', 'GCCGAT'])
montebarcode.checks.minmax_distance(x: Sequence[str], use_levenshtein: bool = True) Sequence[int][source]

Get minimum and maximum distances among a set of barcodes.

This compares all the pairwise distances except for self-self (diagonal of the distance matrix). If there are repeated barcodes in the list, then the minimum distance will be zero.

Parameters:
  • x (list | tuple | set) – Set of barcodes to check.

  • use_levenshtein (bool, optional) – Whether to use the Levenshtein distance. Default: True.

Returns:

Two integers: the first is the minimum distance, the second is the maximum distance.

Return type:

tuple

Examples

>>> minmax_distance(['AAA', 'AAA'])
(0, 0)
>>> minmax_distance(['AAA', 'TCG', 'AAT'])
(1, 3)
>>> minmax_distance(['AAA', 'TCG', 'AAAT'], use_levenshtein=False)
(0, 3)
>>> minmax_distance(['AAA', 'TCG', 'AAAT'])
(1, 4)

montebarcode.cli module

Command-line interface for monte-barcode.

montebarcode.cli.check_barcodes(args: Namespace) None[source]
montebarcode.cli.generate(args: Namespace) None[source]
montebarcode.cli.main() None[source]
montebarcode.cli.sort_barcodes(args: Namespace) None[source]

montebarcode.generate module

Functions for generating random barcodes.

montebarcode.generate.codon_barcodes(seq: str, ordered: bool = False) Generator[str][source]

Generate a stream of barcodes encoding an amino acid sequence.

Makes no consideration of codon usage preferences. If ordered is True, it is ignored if the number of possible combinations is more than 100,000.

Parameters:
  • seq (str) – Amino acid sequence to encode, in one-letter code.

  • ordered (bool) – Whether to produce barcodes in sorted order. Default: False.

Yields:

sequence (str) – DNA sequence encoding amino acid sequence.

Examples

>>> list(codon_barcodes("L", ordered=True))  
['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG']
>>> list(codon_barcodes("L"))  
['TTA', 'CTT', 'CTA', 'CTG', 'CTC', 'TTG']
montebarcode.generate.infinite_barcodes(length: int = 12, alphabet: Iterable[str] | Iterable[Mapping] = 'ATCG', check_used: bool = True) Generator[str][source]

Generate an stream of random barcodes by randomly sampling from an alphabet.

Not actually infinite by default. Set check_used = False. This will produce barcodes forever, so make sure you have some end condition in your loop.

Parameters:
  • length (int) – Length of barcode to generate.

  • alphabet (Iterable, optional) – Set of letters from which to sample.

  • check_used (bool) – Only produce unique sequences. Default: True.

Yields:

sequence (str) – Sequence with desired length.

Examples

>>> sorted(infinite_barcodes(2))  
['AA', 'AC', 'AG', 'AT', 'CA', 'CC', 'CG', 'CT', 'GA', 'GC', 'GG', 'GT', 'TA', 'TC', 'TG', 'TT']
>>> sorted(infinite_barcodes(2, alphabet='cats'))  
['aa', 'ac', 'as', 'at', 'ca', 'cc', 'cs', 'ct', 'sa', 'sc', 'ss', 'st', 'ta', 'tc', 'ts', 'tt']
>>> sorted(infinite_barcodes(2, alphabet=transition_matrix(['ATCG', 'ATTT'])))  
['AT']
>>> for bc in infinite_barcodes(20, check_used=False):  
...     print(bc)
...     break
...
ATCAGTCGTCACACTAGTTA
montebarcode.generate.transition_matrix(x: Sequence[str]) Sequence[Mapping][source]

Generate transition frequencies from one item to the next in a sequence.

Counts the occurence of the next letter conditioned on the preceding letter.

Parameters:

x (Sequence[str]) – List of strings to take transition frequencies from.

Returns:

A length-n tuple, where n is the minimum length of x. Each item is a 2-tuple containing the next possible letters and their frequencies.

Return type:

tuple

Examples

>>> transition_matrix(['ATC', 'ATG'])  
({None: (('A',), (2,))}, {'A': (('T',), (2,))}, {'T': (('C', 'G'), (1, 1))})
>>> transition_matrix(['ATC', 'CTG'])  
({None: (('A', 'C'), (1, 1))}, {'A': (('T',), (1,)), 'C': (('T',), (1,))}, {'T': (('C', 'G'), (1, 1))})
>>> transition_matrix(['ATC', 'CAG'])  
({None: (('A', 'C'), (1, 1))}, {'A': (('T',), (1,)), 'C': (('A',), (1,))}, {'A': (('G',), (1,)), 'T': (('C',), (1,))})

montebarcode.utils module

Utilities used by monte-barcode.

montebarcode.utils.pprint_dict(x: Mapping, message: str) None[source]

Module contents