polychrom.contactmaps module¶
Building contact maps¶
This module is the main workhorse of tools to calculate contactmaps, both from polymer simulations and from other simulations (e.g. 1D simulations of loop extrusion). All of the functions here are designed to be parallelized, and lots of efforts were put into making this possible.
The reasons we need parallel contactmap code is the following:
Calculating contact maps is slow, esp. at large contact radii, and benefits greatly from parallelizing
Doing regular multiprocesing.map has limitations
It can only handle heataps up to some size, and transferring gigabyte-sized heatmaps between processes takes minutes
It can only do as many heatmaps as fits in RAM, which on 20-core 128GB machine is no more than 5GB/heatmap
The structure of this class is as follows.
On the outer level, it provides three methods to average contactmaps:
The first two create contact map from an entire file: either monomer-resolution or binned. The last one creates contact maps from sub-chains in a file, starting at a given set of starting points. It is useful when doing contact maps from several copies of a system in one simulation.
The first two methods have a legacy implementation from the old library that is still here to do the tests.
On the middle level, it provides a method “averageContacts”. This method accepts a “contact iterator”, and can be used to average contacts from both a set of filenames and from a simulation of some kind (e.g. averaging positions of loop extruding factors from a 1D loop extrusion simulation). All of the outer level functions ( monomerResolutionContactMap for example) are implemented using this method.
On the lower level, there are internals of the “averageContacts” method and an associated “worker” function. There is
generally no need to understand the code of those functions. There exists a reference implementation of both the
worker and the averageContacts()
function, simpleWorker
and averageContactsSimple()
. They do
all the things that “averageContacts” do, but on only one core. In fact, “averageContacts” defaults to
“averageContactsSimple” if requested to run on one core because it is a little bit faster.
- polychrom.contactmaps.averageContacts(contactIterator, inValues, N, **kwargs)[source]¶
A main workhorse for averaging contacts on multiple cores into one shared contact map. It mostly does managing the arguments, and initializing the variables. All of the logic of how contacts are actually put in shared memory buckets is in the worker defined above.
- Parameters
contactIterator (iterator) – an iterator. See descriptions of “filenameContactMap” class below for example and explanations
inValues (iterable) – an array of values to pass to contactIterator. Would be an array of arrays of filenames or something like that.
N (int) – Size of one side of the resulting contactmap
arrayDtype (ctypes dtype (default c_int32) for the contact map) –
classInitArgs (args to pass to the constructor of contact iterator) –
classInitKwargs (dict of keyword args to pass to the constructor) –
contactProcessing (function f(contacts), should return processed contacts) –
nproc (int, number of processors(default 4)) –
bucketNum (int (default = nproc) Number of memory buckets to use) –
contactBlock (int (default 500k) Number of contacts to aggregate before writing) –
useFmap (True, False, or callable) –
If True, uses mirnylib.systemutils.fmap If False, uses multiprocessing.Pool.map Otherwise, uses provided function, assuming it of a fork-map type (different initializations are needed for forkmap and multiprocessing-style map)
Sorry, no outside multiprocessing-style maps for now, it’s easy to fix Let me know if it is needed.
Code that calcualtes a contactmap from a set of polymer conformation is in the methods below (averageMonomerResolutionContactMap, etc.)
An example code that would run a contactmap from a simulation is below:
..code-block:: python
- class simContactMap(object):
“contactmap ‘finder’ for a simulation” def __init__(self, ind): # accept a parameter (e.g. random number generator seed)
self.model = initModel(ind) # pass parameter to the functon that returns me a model object self.count = 10000000 # how many times to run a step of the model self.model.steps(10000) # initial steps of the model to equilibrate it
- def next(self): # actual realization of the self.next method
- if self.count == 0: # terminate the simulation if we did self.count iterations
raise StopIteration
self.count -= 1 #decrement the counter self.model.steps(30) # advance model by 30 steps return np.array(self.model.getSMCs()).T # return current LEF positions
mymap = polychrom.contactmaps.averageContacts(simContactMap, range(20), 30000, nproc=20 )
- polychrom.contactmaps.averageContactsSimple(contactIterator, inValues, N, **kwargs)[source]¶
This is a reference one-core implementation
- Parameters
contactIterator – an iterator. See descriptions of “filenameContactMap” class below for example and explanations
inValues – an array of values to pass to contactIterator. Would be an array of arrays of filenames etc.
N – Size of the resulting contactmap
**kwargs – arrayDtype: ctypes dtype (default c_int32) for the contact map classInitArgs: args to pass to the constructor of contact iterator as second+ args (first is the file list) classInitKwargs: dict of keyword args to pass to the coonstructor uniqueContacts: whether contact iterator outputs unique contacts (true) or contacts can be duplicate (False) contactProcessing: function f(contacts), should return processed contacts
- Returns
contactmap
- polychrom.contactmaps.binnedContactMap(filenames, chains=None, binSize=5, cutoff=5, n=8, contactFinder=<function calculate_contacts>, loadFunction=<function load>, exceptionsToIgnore=None, useFmap=False)[source]¶
- polychrom.contactmaps.chunk(mylist, chunksize)[source]¶
- Parameters
mylist – array
chunksize – int
- Returns
list of chunks of an array len chunksize (last chunk is less)
- class polychrom.contactmaps.filenameContactMap(filenames, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]¶
Bases:
object
This is the iterator for the contact map finder
- __init__(filenames, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]¶
Init accepts arguments to initialize the iterator. filenames will be one of the items in the inValues list of the “averageContacts” function cutoff and loadFunction should be provided either in classInitArgs or classInitKwargs of averageContacts
When initialized, the iterator should store these args properly and create all necessary constructs
- class polychrom.contactmaps.filenameContactMapRepeat(filenames, mapStarts, mapN, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]¶
Bases:
object
This is a interator for the repeated contact map finder
- __init__(filenames, mapStarts, mapN, cutoff=1.7, loadFunction=None, exceptionsToIgnore=[], contactFunction=None)[source]¶
Init accepts arguments to initialize the iterator. filenames will be one of the items in the inValues list of the “averageContacts” function cutoff and loadFunction should be provided either in classInitArgs or classInitKwargs of averageContacts
When initialized, the iterator should store these args properly and create all necessary constructs
- polychrom.contactmaps.findN(filenames, loadFunction, exceptions)[source]¶
Finds length of data in filenames, handling the fact that files could be not loadable
- polychrom.contactmaps.indexing(smaller, larger, M)[source]¶
converts x-y indexes to index in the upper triangular matrix
- polychrom.contactmaps.monomerResolutionContactMap(filenames, cutoff=5, n=8, contactFinder=<function calculate_contacts>, loadFunction=<function load>, exceptionsToIgnore=[], useFmap=False)[source]¶
- polychrom.contactmaps.monomerResolutionContactMapSubchains(filenames, mapStarts, mapN, cutoff=5, n=8, method=<function calculate_contacts>, loadFunction=<function load>, exceptionsToIgnore=[], useFmap=False)[source]¶
- polychrom.contactmaps.simple_worker(x, uniqueContacts)[source]¶
A “reference” version of “worker” function below that runs on only one core. Unlike the actual multicore worker, it can write contacts to the matrix directly without sorting. This is useful when your contact finding is faster than sorting a 1D array fo contacts
If uniqueContacts True, assume that contactFinder outputs only unique contacts ( like pure contact map) if False, do not assume that (like in binned contact map). Using False is always safe, but True will add a minor speed up, especially for very large contact radius.
- polychrom.contactmaps.triagToNormal(triag, M)[source]¶
Convert triangular matrix to a regular matrix
- polychrom.contactmaps.worker(x)[source]¶
This is a parallel implementation of the worker using shared memory buckets This worker is being called by the averageContact method It receives contacts from the contactIterator by calling .next() And puts contacts into the shared memory buckets
All the locks etc. for shared memory objects are handeled here as well