polychrom.contactmaps module

Building contact maps

This module is the main workhorse of tools to calculate contactmaps, both from polymer simulations and from other simulations (e.g. 1D simulations of loop extrusion). All of the functions here are designed to be parallelized, and lots of efforts were put into making this possible.

The reasons we need parallel contactmap code is the following:

  • Calculating contact maps is slow, esp. at large contact radii, and benefits greatly from parallelizing

  • Doing regular multiprocesing.map has limitations

  • It can only handle heataps up to some size, and transferring gigabyte-sized heatmaps between processes takes minutes

  • It can only do as many heatmaps as fits in RAM, which on 20-core 128GB machine is no more than 5GB/heatmap

The structure of this class is as follows.

On the outer level, it provides three methods to average contactmaps:

The first two create contact map from an entire file: either monomer-resolution or binned. The last one creates contact maps from sub-chains in a file, starting at a given set of starting points. It is useful when doing contact maps from several copies of a system in one simulation.

The first two methods have a legacy implementation from the old library that is still here to do the tests.

On the middle level, it provides a method “averageContacts”. This method accepts a “contact iterator”, and can be used to average contacts from both a set of filenames and from a simulation of some kind (e.g. averaging positions of loop extruding factors from a 1D loop extrusion simulation). All of the outer level functions ( monomerResolutionContactMap for example) are implemented using this method.

On the lower level, there are internals of the “averageContacts” method and an associated “worker” function. There is generally no need to understand the code of those functions. There exists a reference implementation of both the worker and the averageContacts() function, simpleWorker and averageContactsSimple(). They do all the things that “averageContacts” do, but on only one core. In fact, “averageContacts” defaults to “averageContactsSimple” if requested to run on one core because it is a little bit faster.

polychrom.contactmaps.averageContacts(contactIterator, inValues, N, **kwargs)[source]

A main workhorse for averaging contacts on multiple cores into one shared contact map. It mostly does managing the arguments, and initializing the variables. All of the logic of how contacts are actually put in shared memory buckets is in the worker defined above.

Parameters
  • contactIterator (iterator) – an iterator. See descriptions of “filenameContactMap” class below for example and explanations

  • inValues (iterable) – an array of values to pass to contactIterator. Would be an array of arrays of filenames or something like that.

  • N (int) – Size of one side of the resulting contactmap

  • arrayDtype (ctypes dtype (default c_int32) for the contact map) –

  • classInitArgs (args to pass to the constructor of contact iterator) –

  • classInitKwargs (dict of keyword args to pass to the constructor) –

  • contactProcessing (function f(contacts), should return processed contacts) –

  • nproc (int, number of processors(default 4)) –

  • bucketNum (int (default = nproc) Number of memory buckets to use) –

  • contactBlock (int (default 500k) Number of contacts to aggregate before writing) –

  • useFmap (True, False, or callable) –

    If True, uses mirnylib.systemutils.fmap If False, uses multiprocessing.Pool.map Otherwise, uses provided function, assuming it of a fork-map type (different initializations are needed for forkmap and multiprocessing-style map)

    Sorry, no outside multiprocessing-style maps for now, it’s easy to fix Let me know if it is needed.

Code that calcualtes a contactmap from a set of polymer conformation is in the methods below (averageMonomerResolutionContactMap, etc.)

An example code that would run a contactmap from a simulation is below:

..code-block:: python

class simContactMap(object):

“contactmap ‘finder’ for a simulation” def __init__(self, ind): # accept a parameter (e.g. random number generator seed)

self.model = initModel(ind) # pass parameter to the functon that returns me a model object self.count = 10000000 # how many times to run a step of the model self.model.steps(10000) # initial steps of the model to equilibrate it

def next(self): # actual realization of the self.next method
if self.count == 0: # terminate the simulation if we did self.count iterations

raise StopIteration

self.count -= 1 #decrement the counter self.model.steps(30) # advance model by 30 steps return np.array(self.model.getSMCs()).T # return current LEF positions

mymap = polychrom.contactmaps.averageContacts(simContactMap, range(20), 30000, nproc=20 )

polychrom.contactmaps.averageContactsSimple(contactIterator, inValues, N, **kwargs)[source]

This is a reference one-core implementation

Parameters
  • contactIterator – an iterator. See descriptions of “filenameContactMap” class below for example and explanations

  • inValues – an array of values to pass to contactIterator. Would be an array of arrays of filenames etc.

  • N – Size of the resulting contactmap

  • **kwargs – arrayDtype: ctypes dtype (default c_int32) for the contact map classInitArgs: args to pass to the constructor of contact iterator as second+ args (first is the file list) classInitKwargs: dict of keyword args to pass to the coonstructor uniqueContacts: whether contact iterator outputs unique contacts (true) or contacts can be duplicate (False) contactProcessing: function f(contacts), should return processed contacts

Returns

contactmap

polychrom.contactmaps.binnedContactMap(filenames, chains=None, binSize=5, cutoff=5, n=8, contactFinder=<function calculate_contacts>, loadFunction=<function load>, exceptionsToIgnore=None, useFmap=False)[source]
polychrom.contactmaps.chunk(mylist, chunksize)[source]
Parameters
  • mylist – array

  • chunksize – int

Returns

list of chunks of an array len chunksize (last chunk is less)

class polychrom.contactmaps.filenameContactMap(filenames, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]

Bases: object

This is the iterator for the contact map finder

__init__(filenames, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]

Init accepts arguments to initialize the iterator. filenames will be one of the items in the inValues list of the “averageContacts” function cutoff and loadFunction should be provided either in classInitArgs or classInitKwargs of averageContacts

When initialized, the iterator should store these args properly and create all necessary constructs

next()[source]

This is the method which gets called by the worker asking for contacts. This method should return new set of contacts each time it is called When there are no more contacts to return (all filenames are gone, or simulation is over), then this method should raise StopIteration

class polychrom.contactmaps.filenameContactMapRepeat(filenames, mapStarts, mapN, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]

Bases: object

This is a interator for the repeated contact map finder

__init__(filenames, mapStarts, mapN, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]

Init accepts arguments to initialize the iterator. filenames will be one of the items in the inValues list of the “averageContacts” function cutoff and loadFunction should be provided either in classInitArgs or classInitKwargs of averageContacts

When initialized, the iterator should store these args properly and create all necessary constructs

next()[source]

This is the method which gets called by the worker asking for contacts. This method should return new set of contacts each time it is called When there are no more contacts to return (all filenames are gone, or simulation is over), then this method should raise StopIteration

polychrom.contactmaps.findN(filenames, loadFunction, exceptions)[source]

Finds length of data in filenames, handling the fact that files could be not loadable

polychrom.contactmaps.indexing(smaller, larger, M)[source]

converts x-y indexes to index in the upper triangular matrix

polychrom.contactmaps.init(*args)[source]

Initializes global arguments for the worker

polychrom.contactmaps.monomerResolutionContactMap(filenames, cutoff=5, n=8, contactFinder=<function calculate_contacts>, loadFunction=<function load>, exceptionsToIgnore=[], useFmap=False)[source]
polychrom.contactmaps.monomerResolutionContactMapSubchains(filenames, mapStarts, mapN, cutoff=5, n=8, method=<function calculate_contacts>, loadFunction=<function load>, exceptionsToIgnore=[], useFmap=False)[source]
polychrom.contactmaps.simple_worker(x, uniqueContacts)[source]

A “reference” version of “worker” function below that runs on only one core. Unlike the actual multicore worker, it can write contacts to the matrix directly without sorting. This is useful when your contact finding is faster than sorting a 1D array fo contacts

If uniqueContacts True, assume that contactFinder outputs only unique contacts ( like pure contact map) if False, do not assume that (like in binned contact map). Using False is always safe, but True will add a minor speed up, especially for very large contact radius.

polychrom.contactmaps.tonumpyarray(mp_arr)[source]

Converts mp.array to numpy array

polychrom.contactmaps.triagToNormal(triag, M)[source]

Convert triangular matrix to a regular matrix

polychrom.contactmaps.worker(x)[source]

This is a parallel implementation of the worker using shared memory buckets This worker is being called by the averageContact method It receives contacts from the contactIterator by calling .next() And puts contacts into the shared memory buckets

All the locks etc. for shared memory objects are handeled here as well