polychrom.polymer_analyses module¶

Analyses of polymer conformations¶

This module presents a collection of utils to work with polymer conformations.

Tools for calculating contacts¶

The main function calculating contacts is: polychrom.polymer_analyses.calculate_contacts() Right now it is a simple wrapper around scipy.cKDTree.

Another function polychrom.polymer_analyses.smart_contacts() was added recently to help build contact maps with a large contact radius. It randomly sub-samples the monomers; by default selecting N/cutoff monomers. It then calculates contacts from sub-sampled monomers only. It is especially helpful when the same code needs to calculate contacts at large and small contact radii.Because of sub-sampling at large contact radius, it avoids the problem of having way-too-many-contacts at a large contact radius. For ordinary contacts, the number of contacts scales as contact_radius^3; however, with smart_contacts it would only scale linearly with contact radius, which leads to significant speedsups.

Tools to calculate P(s) and R(s)¶

We provide functions to calculate P(s), Rg^2(s) and R^2(s) for polymers. By default, they use log-spaced bins on the X axis, with about 10 bins per order of magnitude, but aligned such that the last bins ends exactly at (N-1). They output (bin, scaling) for Rg^2 and R^2, and (bin_mid, scaling) for contacts. In either case, the returned values are ready to plot. The difference is that Rg and R^2 are evaluated at a given value of s, while contacts are aggregated for (bins[0].. bins[1]), (bins[1]..bins[2]). Therefore, we have to return bin mids for contacts.

polychrom.polymer_analyses.R2_scaling(data, bins=None, ring=False)[source]¶

Returns end-to-end distance scaling of a given polymer conformation. ..warning:: This method averages end-to-end scaling over all possible

subchains of given length

Parameters

data (Nx3 array) –
bins (the same as in giveCpScaling) –
ring (is the polymer a ring?) –

polychrom.polymer_analyses.Rg2(data)[source]¶: Simply calculates gyration radius of a polymer chain.

polychrom.polymer_analyses.Rg2_matrix(data)[source]¶: Uses dynamic programming and vectorizing to calculate Rg for each subchain of the polymer. Returns a matrix for which an element [i,j] is Rg of a subchain from i to j including i and j

polychrom.polymer_analyses.Rg2_scaling(data, bins=None, ring=False)[source]¶

Calculates average gyration radius of subchains a function of s

Parameters

data (Nx3 array) –
bins (subchain lengths at which to calculate Rg) –
ring (treat polymer as a ring (default: False)) –

polychrom.polymer_analyses.calculate_cistrans(data, chains, chain_id=0, cutoff=5, pbc_box=False, box_size=None)[source]¶: Analysis of the territoriality of polymer chains from simulations, using the cis/trans ratio. Cis signal is computed for the marked chain (‘chain_id’) as amount of contacts of the chain with itself Trans signal is the total amount of trans contacts for the marked chain with other chains from ‘chains’ (and with all the replicas for ‘pbc_box’=True)

polychrom.polymer_analyses.calculate_contacts(data, cutoff=1.7)[source]¶

Calculates contacts between points give the contact radius (cutoff)

Parameters

data (Nx3 array) – Coordinates of points
cutoff (float , optional) – Cutoff distance (contact radius)

Returns

k by 2 array of contacts. Each row corresponds to a contact.

polychrom.polymer_analyses.contact_scaling(data, bins0=None, cutoff=1.1, *, ring=False)[source]¶

Returns contact probability scaling for a given polymer conformation Contact between monomers X and X+1 is counted as s=1

Parameters

data (Nx3 array of ints/floats) – Input polymer conformation
bins0 (list or None) – Bins to calculate scaling. Bins should probably be log-spaced; log-spaced bins can be quickly calculated using mirnylib.numtuis.logbinsnew. If None, bins will be calculated automatically
cutoff (float, optional) – Cutoff to calculate scaling
ring (bool, optional) – If True, will calculate contacts for the ring

Returns

(mids, contact probabilities) where “mids” contains
geometric means of bin start/end

polychrom.polymer_analyses.generate_bins(N, start=4, bins_per_order_magn=10)[source]¶

polychrom.polymer_analyses.getLinkingNumber(data1, data2, simplify=True, randomOffset=True, verbose=False)[source]¶: Ported here from openmmlib as well.

polychrom.polymer_analyses.kabsch_msd(P, Q)[source]¶

Calculates MSD between two vectors using Kabash alcorithm Borrowed from https://github.com/charnley/rmsd with some changes

rmsd is licenced with a 2-clause BSD licence

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

polychrom.polymer_analyses.kabsch_rmsd(P, Q)¶

Calculates MSD between two vectors using Kabash alcorithm Borrowed from https://github.com/charnley/rmsd with some changes

rmsd is licenced with a 2-clause BSD licence

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

polychrom.polymer_analyses.mutualSimplify(a, b, verbose=False)[source]¶

Ported here from openmmlib.

Given two polymer rings, it attempts to reduce the number of monomers in each of them while preserving the linking between them. It does so by trying to remove monomers one-by-one. If no other bonds pass through the triangle formed by the 2 old bonds and 1 new bond, it accepts removal of the monomer. It does so until no monomers in either of the rings can be removed.

polychrom.polymer_analyses.ndarray_groupby_aggregate(df, ndarray_cols, aggregate_cols, value_cols=[], sample_cols=[], preset='sum', ndarray_agg=<function <lambda>>, value_agg=<function <lambda>>)[source]¶

A version of pd.groupby that is aware of numpy arrays as values of columns

aggregates columns ndarray_cols using ndarray_agg aggregator,
aggregates value_cols using value_agg aggregator,
takes the first element in sample_cols,
aggregates over aggregate_cols

It has presets for sum, mean and nanmean.

polychrom.polymer_analyses.slope_contact_scaling(mids, cp, sigma=2)[source]¶

polychrom.polymer_analyses.smart_contacts(data, cutoff=1.7, min_cutoff=2.1, percent_func=<function <lambda>>)[source]¶

Calculates contacts for a polymer, give the contact radius (cutoff) This method takes a random fraction of the monomers that is equal to ( 1/cutoff).

This is done to make contact finding faster, and because if cutoff radius is R, and monomer (i,j) are in contact, then monomers (i+a), and (j+b) are likely in contact if |a| + |b| <~ R (the polymer could not run away by more than R in R steps)

This method will have # of contacts grow approximately linearly with contact radius, not cubically, which should drastically speed up computations of contacts for large (5+) contact radii. This should allow using the same code both for small and large contact radius, without the need to reduce the # of conformations, subsample the data, or both at very large contact radii.

Parameters

data (Nx3 array) – Polymer coordinates
cutoff (float , optional) – Cutoff distance that defines contact
min_cutoff (float, optional) – Apply the “smart” reduction of contacts only when cutoff is less than this value
percent_func (callable, optional) – Function that calculates fraction of monomers to use, as a function of cutoff Default is 1/cutoff

Returns

k by 2 array of contacts. Each row corresponds to a contact.

polychrom.polymer_analyses.streaming_ndarray_agg(in_stream, ndarray_cols, aggregate_cols, value_cols=[], sample_cols=[], chunksize=30000, add_count_col=False, divide_by_count=False)[source]¶

Takes in_stream of dataframes

Applies ndarray-aware groupby-sum or groupby-mean: treats ndarray_cols as numpy arrays, value_cols as normal values, for sample_cols takes the first element.

Does groupby over aggregate_cols

if add_count_col is True, adds column “count”, if it’s a string - adds column with add_count_col name

if divide_by_counts is True, divides result by column “count”. If it’s a string, divides by divide_by_count column

This function can be used for automatically aggregating P(s), R(s) etc. for a set of conformations that is so large that all P(s) won’t fit in RAM, and when averaging needs to be done over so many parameters that for-loops are not an issue. Examples may include simulations in which sweep over many parameters has been performed.