lXtractor.variables package
lXtractor.variables.base module
Base classes, common types and functions for the variables module.
- class lXtractor.variables.base.AbstractCalculator[source]
Bases:
Generic
[OT
]Class defining variables’ calculation strategy.
- abstract __call__(o: OT, v: VT, m: Mapping[int, int | None] | None) tuple[bool, RT] [source]
- abstract __call__(o: Iterable[OT], v: Iterable[VT] | Iterable[Iterable[VT]], m: Iterable[Mapping[int, int | None] | None] | None) Iterable[Iterable[tuple[bool, RT]]]
- Parameters:
o – Object to calculate on.
v – Some variable whose calculate method accepts o-type instances.
m – Optional mapping between object and some reference object numbering schemes.
- Returns:
Calculation result.
- abstract map(o, v, m)[source]
Map variables to a single object.
- Parameters:
o (OT) – Object to calculate on.
v (Iterable[VT]) – An iterable over variables whose calculate method accepts o-type instances.
m (Mapping[int, int | None] | None) – Optional mapping between object and some reference object numbering schemes.
- Returns:
An iterator (generator) over calculation result.
- Return type:
Iterable[tuple[bool, RT]]
- abstract vmap(o, v, m)[source]
Map objects to a single variable.
- Parameters:
o (Iterable[OT]) – An iterable over objects to calculate on.
v (VT) – Some variable whose calculate method accepts o-type instances.
m (Iterable[Mapping[int, int | None] | None]) – Optional mapping between object and some reference object numbering schemes.
- Returns:
An iterator (generator) over calculation result.
- Return type:
Iterable[tuple[bool, RT]]
- class lXtractor.variables.base.AbstractVariable[source]
Bases:
Generic
[OT
,RT
]Abstract base class for variables.
- abstract calculate(obj, mapping=None)[source]
Calculate variable. Each variable defines its own calculation strategy.
- Parameters:
obj (OT) – An object used for variable’s calculation.
mapping (Mapping[int, int | None] | None) – Mapping from generalizable positions of MSA/reference/etc. to the obj’s positions.
- Returns:
Calculation result.
- Raises:
FailedCalculation
if the calculation fails.- Return type:
RT
- property id: str
Variable identifier such that eval(x.id) produces another instance.
- abstract property rtype: Type[RT]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.base.AggFn(*args, **kwargs)[source]
Bases:
Protocol
- __init__(*args, **kwargs)
- class lXtractor.variables.base.LigandVariable[source]
Bases:
AbstractVariable
[Ligand
,RT
],Generic
[T
,RT
]A type of variable whose
calculate()
method requires protein sequence.- abstract calculate(obj, mapping=None)[source]
- Parameters:
obj (Ligand) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
RT
- class lXtractor.variables.base.ProtFP(path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/lxtractor/checkouts/latest/lXtractor/resources/PFP.csv'))[source]
Bases:
object
ProtFP embeddings for amino acid residues.
ProtFP is a coding scheme derived from the PCA analysis of the AAIndex database [Westen et al., 2013, Westen et al., 2013].
>>> pfp = ProtFP() >>> pfp[('G', 1)] -5.7 >>> list(pfp['G']) [-5.7, -8.72, 4.18, -1.35, -0.31] >>> comp1 = pfp[1] >>> assert len(comp1) == 20 >>> comp1[0] -5.7 >>> comp1.index[0] 'G'
[1]Gerard JP van Westen, Remco F Swier, Jörg K Wegner, Adriaan P IJzerman, Herman WT van Vlijmen, and Andreas Bender. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. Journal of Cheminformatics, 5(1):41–41, 2013. doi:10.1186/1758-2946-5-41.
[2]Gerard JP van Westen, Remco F Swier, Isidro Cortes-Ciriano, Jörg K Wegner, John P Overington, Adriaan P IJzerman, Herman WT van Vlijmen, and Andreas Bender. Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. Journal of Cheminformatics, 5(1):42, 2013. doi:10.1186/1758-2946-5-42.
- class lXtractor.variables.base.SequenceVariable[source]
Bases:
AbstractVariable
[Sequence
[T
],RT
],Generic
[T
,RT
]A type of variable whose
calculate()
method requires protein sequence.- abstract calculate(obj, mapping=None)[source]
- Parameters:
obj (Sequence[T]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
RT
- class lXtractor.variables.base.StructureVariable[source]
Bases:
AbstractVariable
[GenericStructure
,RT
],Generic
[RT
]A type of variable whose
calculate()
method requires protein structure.- abstract calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (Mapping[int, int | None] | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
RT
- class lXtractor.variables.base.Variables(dict=None, /, **kwargs)[source]
Bases:
UserDict
A subclass of
dict
holding variables (AbstractVariable
subclasses).The keys are the
AbstractVariable
subclasses’ instances (hashed by :meth:id
), and values are calculation results.- as_df()[source]
- Returns:
A table with two columns: VariableID and VariableResult.
- Return type:
DataFrame
- classmethod read(path)[source]
Read and initialize variables.
- Parameters:
path (Path) – Path to a two-column .tsv file holding pairs (var_id, var_value). Will use var_id to initialize variable, importing dynamically a relevant class from
variables
.- Returns:
A dict mapping variable object to its value.
- Return type:
- write(path)[source]
- Parameters:
path (Path) – Path to a file.
skip_if_contains – Skip if a variable ID contains any of the provided strings.
- property sequence: Variables
- Returns:
values that are
SequenceVariable
instances.
- property structure: Variables
- Returns:
values that are
StructureVariable
instances.
lXtractor.variables.calculator module
Module defining variable calculators managing the exact calculation process of variables on objects.
- class lXtractor.variables.calculator.GenericCalculator(num_proc=1, valid_exceptions=(<class 'lXtractor.core.exceptions.FailedCalculation'>, ), apply_kwargs=None, verbose=False)[source]
Bases:
AbstractCalculator
Parallel calculator, calculating variables in parallel. Duh.
- __call__(o: OT, v: VT, m: Mapping[int, int | None] | None) tuple[bool, RT] [source]
- __call__(o: Iterable[OT], v: Iterable[VT] | Iterable[Iterable[VT]], m: Iterable[Mapping[int, int | None] | None] | None) Iterable[Iterable[tuple[bool, RT]]]
- Parameters:
o – Object to calculate on.
v – Some variable whose calculate method accepts o-type instances.
m – Optional mapping between object and some reference object numbering schemes.
- Returns:
Calculation result.
- __init__(num_proc=1, valid_exceptions=(<class 'lXtractor.core.exceptions.FailedCalculation'>, ), apply_kwargs=None, verbose=False)[source]
- map(o, v, m)[source]
Map variables to a single object.
- Parameters:
o (OT) – Object to calculate on.
v (Iterable[VT]) – An iterable over variables whose calculate method accepts o-type instances.
m (Mapping[int, int | None] | None) – Optional mapping between object and some reference object numbering schemes.
- Returns:
An iterator (generator) over calculation result.
- Return type:
Generator[tuple[bool, RT], None, None]
- vmap(o, v, m)[source]
Map objects to a single variable.
- Parameters:
o (Iterable[OT]) – An iterable over objects to calculate on.
v (VT) – Some variable whose calculate method accepts o-type instances.
m (Iterable[Mapping[int, int | None] | None] | Mapping[int, int | None] | None) – Optional mapping between object and some reference object numbering schemes.
- Returns:
An iterator (generator) over calculation result.
- Return type:
Generator[tuple[bool, RT], None, None]
- apply_kwargs
- num_proc
- valid_exceptions
- verbose
lXtractor.variables.manager module
Manager
handles variable calculations, such as:
Variable manipulations (assignment, deletions, and resetting).
- Calculation of variables. Simply manages the calculation process, whereas
calculators (
lXtractor.variables.calculator.GenericCalculator
for instance) do the heavy lifting.
- Aggregation of the calculation results, either
- class lXtractor.variables.manager.Manager(verbose=False)[source]
Bases:
object
Manager of variable calculations, handling assignment, aggregation, and, of course, the calculations themselves.
- aggregate_from_chains(chains)[source]
Aggregate calculation results from the variables container of the provided chains.
>>> from lXtractor.variables.sequential import SeqEl >>> s = lxc.ChainSequence.from_string('abcd', name='_seq') >>> manager = Manager() >>> manager.assign([SeqEl(1)], [s]) >>> df = manager.aggregate_from_chains([s]) >>> len(df) == 1 True >>> list(df.columns) ['VariableID', 'VariableResult', 'ObjectID', 'ObjectType']
- Parameters:
chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
- Returns:
A dataframe with ObjectID, ObjectType, and calculation results.
- Return type:
DataFrame
- aggregate_from_it(results, vs_to_cols=True, replace_errors=True, replace_errors_with=nan, num_vs=None)[source]
Aggregate calculation results directly from
calculate()
output.- Parameters:
results (Iterable[tuple[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand], SequenceVariable | StructureVariable | LigandVariable, bool, Any]]) – An iterable over calculation results.
vs_to_cols (bool) – If
True
, will attempt to use the wide format for the final results with variables as columns. Otherwise, will use the long format with fixed columns: “ObjectID”, “VariableID”, “VariableCalculated”, and “VariableResult”. Note that for the wide format to work, all objects and their variables must have unique IDs.replace_errors (bool) – When calculation failed, replace the calculation results with certain value.
replace_errors_with (Any) – Use this value to replace erroneous calculation results.
num_vs (int | None) – The number of variables per object. Providing this will significantly increase the aggregation speed.
- Returns:
A table with results in long or short format.
- Return type:
DataFrame | dict[str, list]
- assign(vs, chains)[source]
Assign variables to chains sequences/structures.
- Parameters:
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable]) – A sequence of variables.
chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
- Returns:
No return. Will store assigned variables within the variables attribute.
- calculate(objs, vs, calculator, *, save=False, **kwargs)[source]
Handles variable calculations:
Stage calculations (see
stage()
).Calculate variables using the provided calculator.
(Optional) save the calculation results to variables container.
Output (stream) calculation results.
Note that 3 and 4 are done lazily as calculation results from the calculator become available.
>>> from lXtractor.variables.calculator import GenericCalculator >>> from lXtractor.variables.sequential import SeqEl >>> s = lxc.ChainSequence.from_string('ABCD', name='_seq') >>> m = Manager() >>> c = GenericCalculator() >>> list(m.calculate([s],[SeqEl(1)],c)) [(_seq|1-4, SeqEl(p=1,_rtype='str',seq_name='seq1'), True, 'A')] >>> list(m.calculate([s],[SeqEl(5)],c))[0][-2:] (False, 'Missing index 4 in sequence')
- Parameters:
objs (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables. If not provided, will use assigned variables (see
assign()
).calculator (AbstractCalculator) – A calculator object – some callable with the right signature handling the calculations.
save (bool) – Save calculation results to variables. Will overwrite any existing matching variables.
kwargs – Passed to
stage()
.
- Returns:
A generator over tuples: 1. Original object. 2. Variable. 3. Flag indicated whether the calculation was successful. 4. The calculation result (or the error message).
- Return type:
Generator[tuple[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand], SequenceVariable | StructureVariable | LigandVariable, bool, Any], None, None]
- remove(chains, vs=None)[source]
Remove variables from the variables container.
- Parameters:
chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables to remove. If not provided, will remove all variables.
- Returns:
No return.
- reset(chains, vs=None)[source]
Similar to
remove()
, but instead of deleting, resets variable calculation results.- Parameters:
chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables to reset. If not provided, will reset all variables.
- Returns:
No return.
- stage(chains, vs, **kwargs)[source]
Stage objects for calculations (e.g., using
calculate()
). It’s a useful method if using a different calculation method and/or parallelization strategy within a Calculator class.See also
>>> from lXtractor.variables.sequential import SeqEl >>> s = lxc.ChainSequence.from_string('ABCD', name='_seq') >>> m = Manager() >>> staged = list(m.stage([s], [SeqEl(1)])) >>> len(staged) == 1 True >>> staged[0] (_seq|1-4, 'ABCD', [SeqEl(p=1,_rtype='str',seq_name='seq1')], None)
- Parameters:
chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables. If not provided, will use assigned variables (see
assign()
).kwargs – Passed to
stage()
.
- Returns:
An iterable over tuples holding data for variables’ calculation.
- Return type:
Generator[tuple[ChainSequence, Sequence[Any], Sequence[SequenceVariable], Mapping[int, int] | None] | tuple[ChainStructure, GenericStructure, Sequence[StructureVariable], Mapping[int, int] | None], None, None]
- verbose
- lXtractor.variables.manager.find_structure(s)[source]
Recursively search for structure up the ancestral tree.
- Parameters:
s (ChainStructure) – An arbitrary chain structure.
- Returns:
The first non-empty atom array up the parent chain.
- Return type:
GenericStructure | None
- lXtractor.variables.manager.get_mapping(obj, map_name, map_to)[source]
Obtain mapping from a Chain*-type object.
>>> s = lxc.ChainSequence.from_string('ABCD', name='_seq') >>> s.add_seq('some_map', [5, 6, 7, 8]) >>> s.add_seq('another_map', ['D', 'B', 'C', 'A']) >>> get_mapping(s, 'some_map', None) {5: 1, 6: 2, 7: 3, 8: 4} >>> get_mapping(s, 'another_map', 'some_map') {'D': 5, 'B': 6, 'C': 7, 'A': 8}
- Parameters:
obj (Any) – Chain*-type object. If not a Chain*-type object, raises AttributeError.
map_name (str | None) – The name of a map to create the mapping from. If
None
, the resulting mapping isNone
.map_to (str | None) – The name of a map to create a mapping to. If
None
, will default to the real sequence indices (1-based) for aChainSequence
object and to the structure actual numbering for theChainStructure
.
- Returns:
A dictionary mapping from the map_name sequence to map_to sequence.
- Return type:
dict | None
- lXtractor.variables.manager.stage(obj: ChainStructure, vs, *, missing, seq_name, map_name, map_to) tuple[ChainStructure, GenericStructure, Sequence[StructureVariable], Mapping[int, int] | None] [source]
- lXtractor.variables.manager.stage(obj: ChainSequence, vs, *, missing, seq_name, map_name, map_to) tuple[ChainSequence, Sequence[Any], Sequence[SequenceVariable], Mapping[int, int] | None]
- lXtractor.variables.manager.stage(obj: ChainSequence, vs, *, missing, seq_name, map_name, map_to) tuple[ChainSequence, Sequence[Any], Sequence[SequenceVariable], Mapping[int, int] | None]
- lXtractor.variables.manager.stage(obj: tuple[ChainStructure, Ligand], vs, *, missing, seq_name, map_name, map_to) tuple[tuple[ChainStructure, Ligand], Ligand, Sequence[LigandVariable], Mapping[int, int] | None]
Stage object for calculation. If it’s a chain sequence, will stage some sequence/mapping within it. If it’s a chain structure, will stage the atom array.
- Parameters:
obj – A chain sequence or structure or structure-ligand pair to calculate the variables on.
vs – A sequence of variables to calculate.
missing – If
True
, calculate only those assigned variables that are missing.seq_name – If obj is the chain sequence, the sequence name is used to obtain an actual sequence (
obj[seq_name]
).map_name – The mapping name to obtain the mapping keys. If
None
, the resulting mapping will beNone
.map_to – The mapping name to obtain the mapping values. See
get_mapping()
for details.
- Returns:
A tuple with four elements: 1. Original object. 2. Staged target passed to a variable for calculation. 3. A sequence of sequence or structural variables. 4. An optional mapping.
lXtractor.variables.parser module
- lXtractor.variables.parser.init_var(var)[source]
Convert a textual representation of a single variable into a concrete and initialized variable.
>>> assert isinstance(init_var('123'), SeqEl) >>> assert isinstance(init_var('1-2'), Dist) >>> assert isinstance(init_var('1-2-3-4'), PseudoDihedral)
- Parameters:
var (str) – textual representation of a variable.
- Returns:
initialized variable, a concrete subclass of an
AbstractVariable
- lXtractor.variables.parser.parse_var(inp)[source]
Parse raw input into a collection of variables, structures, and levels at which they should be calculated.
- Parameters:
inp (str) –
"[variable_specs]--[protein_specs]::[domains]"
format, where:- variable_specs define the variable type
(e.g., 1:CA-2:CA for CA-CA distance between positions 1 and 2)
protein_specs define proteins for which to calculate variables
domains list the domain names for the given protein collection
- Returns:
a namedtuple with (1) variables, (2) list of proteins (or
[None]
), and (3) a list of domains (or[None]
).
lXtractor.variables.sequential module
Module defines variables calculated on sequences
- class lXtractor.variables.sequential.PFP(p, i)[source]
Bases:
SequenceVariable
A ProtFP embedding variable.
See also
- __init__(p, i)[source]
- Parameters:
p (int) – Position, starting from 1.
i (int) – A PCA component index starting from 1.
- calculate(obj, mapping=None)[source]
- Parameters:
obj (Sequence[str]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
float
- i
A PCA component index starting from 1.
- p
Position, starting from 1
- property rtype: Type[float]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.sequential.SeqEl(p, _rtype='str', seq_name='seq1')[source]
Bases:
SequenceVariable
[T
,T
]A sequence element variable. It doesn’t encompass any calculation. Rather, it simply accesses sequence at certain position.
>>> v1, v2 = SeqEl(1), SeqEl(1, 'X') >>> s1, s2 = 'XYZ', [1, 2, 3] >>> v1.calculate(s1,, 'X' >>> v2.calculate(s2,, 1
- __init__(p, _rtype='str', seq_name='seq1')[source]
- Parameters:
p (int) – Position, starting from 1.
seq_name (str) – The name of the sequence used to distinguish variables pointing to the same position.
- calculate(obj, mapping=None)[source]
- Parameters:
obj (Sequence[T]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
T
- p
Position, starting from 1.
- property rtype: Type[T]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- seq_name
Sequence name for which the element is accessed
- class lXtractor.variables.sequential.SliceTransformReduce(start=None, stop=None, step=None, seq_name='seq1')[source]
Bases:
SequenceVariable
,Generic
[T
,V
,K
]A composite variable with three sequential operations:
Slice – subset the sequence (optional).
Transform – transform the sequence (optional).
Reduce – reduce to a final variable.
- This is an abstract class. It requires to define at least two methods:
rtype()
property.
See also
make_str()
– a factory function to quickly make child classes.- __init__(start=None, stop=None, step=None, seq_name='seq1')[source]
Note
start and stop have inclusive boundaries.
- Parameters:
start (int | None) – Start position
stop (int | None) – Stop position.
step (int | None) – Slicing step.
seq_name (str) – Sequence name. Please use it in case a resulting variable will be applied to seqs other than the primary sequence.
- calculate(obj, mapping=None)[source]
- Parameters:
obj (Iterable[K]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
V
- abstract static reduce(seq)[source]
Reduce the input iterable into the variable result.
- Parameters:
seq (Iterable[T] | Iterable[K]) – Some sort of iterable – the results of the transform (or slicing, if no transformation is used)
- Returns:
An aggregated value (e.g., float, string, etc.).
- Return type:
V
- static transform(seq)[source]
Optionally transform the slicing result. If not used, it is the identity operation.
- Parameters:
seq (Iterable[K]) – The result of slicing operation. If no slicing is used, it is just an
iter(input_seq)
.- Returns:
Iterable over transformed elements (can have another type than the input ones).
- Return type:
Iterable[T] | Iterable[K]
- seq_name
Sequence name.
- start
Start position.
- step
Slicing step.
- stop
End position.
- lXtractor.variables.sequential.make_str(reduce, rtype, transform=None, reduce_name=None, transform_name=None)[source]
Makes a non-abstract subclass of
SliceTransformReduce
with specific transform and reduce operations.To make things clearer, transform and reduce operations will have certain names that will be incoroporated into a created class name.
Example 1: no transformation:
>>> v_type = make_str(sum, float) >>> v_type.__name__ 'SliceSum'
To instanciate it, we provide additional slicing parameters
>>> v = v_type(1, 2, seq_name='X') >>> v.id "SliceSum(start=1,stop=2,step=None,seq_name='X')"
>>> v.calculate([1, 2, 3, 4, 5],, 3
Example 2: with transformation:
Note that the first operatoiin – slicing – inevitably produces an iterator over the input sequence. Hence, even if we aren’t slicing, i.e., provide
None
for allSliceTransformReduce.__init__()
arguments, we still obtain an iterator over characters. Therefore, we convert it to string and then apply the necessary operation. Note that this feature makes transformmap
-friendly.>>> count_x = lambda x: sum(1 for c in x if c == 'X') >>> upper = lambda x: "".join(x).upper() >>> v = make_str(count_x, int, transform=upper, transform_name='upper', ... reduce_name='countX')() >>> v.calculate('XoXoxo',, 3 >>> v.id "SliceUpperCountx(start=None,stop=None,step=None,seq_name='seq1')"
See also
SliceTransformReduce
– a base abstract class from which this function generates variables.- Parameters:
reduce (Callable[[Iterable[T]], V]) – Reduce operation peferably producing a single output.
rtype (Type) – Return type of the reduce operation and, since this is the last operatoin, of a variable itself.
transform (Callable[[Iterator[K]], Iterable[T]] | None) – Optional transformation operation. It accepts an iterator over (optionally) sliced input elements and returns an iterable over elements of potentially another type, as long as they are supported by the reduce.
reduce_name (str | None) – The name of the reduce operation. Please provide it in case using
lambda
.transform_name (str | None) – The name of the transform operation. Please provide it in case using
lambda
.
- Returns:
An uninitialized subclass of
SliceTransformReduce
encapsulating the provided operations within theSliceTransformReduce.calculate()
.- Return type:
Type[SliceTransformReduce]
lXtractor.variables.structural module
Module defining variables calculated on structures.
- class lXtractor.variables.structural.AggDist(p1, p2, key='min')[source]
Bases:
StructureVariable
Aggregated distance between two residues.
It will return
agg_fn(pdist)
wherepdist
is an array of all pairwise distances between atoms of p1 and p2.- __init__(p1, p2, key='min')[source]
- Parameters:
p1 (int) – Position 1.
p2 (int) – Position 2.
key (str) – Agg function name.
Available aggregator functions are:
>>> print(list(AggFns)) ['min', 'max', 'mean', 'median']
- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
float
- key
Agg function name.
- p1
Position 1.
- p2
Position 2.
- property rtype: Type[float]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.structural.Chi1(p)[source]
Bases:
CompositeDihedral
Chi1-dihedral angle.
- static get_dihedrals(pos)[source]
Implemented by child classes.
- Parameters:
pos – Position to create
Dihedral
instances.- Returns:
An iterable over
Dihedral
’s. Thecalculate()
will try calculating dihedrals in the provided order until the first successful calculation. If no calculations were successful, will raiseFailedCalculation
error.- Return type:
list[Dihedral]
- class lXtractor.variables.structural.Chi2(p)[source]
Bases:
CompositeDihedral
Chi2-dihedral angle,
- static get_dihedrals(pos)[source]
Implemented by child classes.
- Parameters:
pos – Position to create
Dihedral
instances.- Returns:
An iterable over
Dihedral
’s. Thecalculate()
will try calculating dihedrals in the provided order until the first successful calculation. If no calculations were successful, will raiseFailedCalculation
error.- Return type:
list[Dihedral]
- class lXtractor.variables.structural.ClosestLigandContactsCount(p, a=None)[source]
Bases:
StructureVariable
The number of atoms involved in contacting ligands.
- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
float
- a
Atom name. If not provided, sum contacts across all residue atoms.
- p
Residue position.
- property rtype: Type[int]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.structural.ClosestLigandDist(p, a=None, agg_lig='min', agg_res='min')[source]
Bases:
StructureVariable
A distance from the selected residue or a residue’s atom to a connected ligand.
Each ligand provides
lXtractor.core.ligand.Ligand.dist
array. These arrays are stacked and aggregated atom-wise usingagg_lig
. Then,agg_res
aggregates the obtained vector of values into a single number.For instance, to obtain max distance for the closest ligand of a residue 1, use
ClosestLigandDist(1, agg_res='max')
.If structure has no
<ligands lXtractor.core.structure.GenericStructure.ligands>
, this variable defaults to -1.0.- ..note ::
Attr
lXtractor.core.ligand.dist
provides distances from an atom to the closest ligand atom.
- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
float
- a
Atom name. If not provided, aggregate across residue atoms.
- agg_lig
Aggregator function for ligands.
- agg_res
Aggregator function for a residue atoms.
- p
Residue position
- property rtype: Type[float]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.structural.ClosestLigandNames(p, a=None)[source]
Bases:
StructureVariable
","
-separated contacting ligand (residue) names.- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
str
- a
Atom name. If not provided, merge across all residue atoms.
- p
Residue position.
- property rtype: Type[str]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.structural.Contacts(p, r=5.0)[source]
Bases:
StructureVariable
Uses KDTree to find atoms within the
r
distance threshold of those defined by target positionp
. Positions these atoms correspond to are returned as a “,”-separated string.If mapping is provided, contact positions will be filtered to those covered by this mapping.
Note
The default value of
r
is provided byDefaultConfig["contacts"]["non-covalent"][1]
.- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
str
- p
Target position.
- r
Contact upper bound in angstroms.
- property rtype: Type[str]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.structural.Dihedral(p1, p2, p3, p4, a1, a2, a3, a4, name='GenericDihedral')[source]
Bases:
StructureVariable
Dihedral angle involving four different atoms.
- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
float
- a1
Atom name.
- a2
Atom name.
- a3
Atom name.
- a4
Atom name.
- property atoms: list[str]
- Returns:
A list of atoms a1-a4.
- name: str
Used to designate special kinds of dihedrals.
- p1
Position.
- p2
Position.
- p3
Position.
- p4
Position.
- property positions: list[int]
- Returns:
A list of positions p1-p4.
- property rtype: Type[float]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.structural.Dist(p1, p2, a1=None, a2=None, com=False)[source]
Bases:
StructureVariable
A distance between two atoms.
- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
float
- a1: str | None
Atom name 1.
- a2: str | None
Atom name 2.
- com: bool
Use center of mass instead of concrete atoms.
- p1: int
Position 1.
- p2: int
Position 2.
- property rtype: Type[float]
Variable’s return type, such that rtype(“result”) converts to the relevant type.
- class lXtractor.variables.structural.PseudoDihedral(p1, p2, p3, p4)[source]
Bases:
Dihedral
Pseudo-dihedral angle - “the torsion angle between planes defined by 4 consecutive alpha-carbon atoms.”
- class lXtractor.variables.structural.SASA(p, a=None)[source]
Bases:
StructureVariable
Solvent-accessible surface area of a residue or a specific atom.
The SASA is calculated for the whole array, and subset to all or a single atoms of a residue (so atoms are taken into account for calculation).
- calculate(obj, mapping=None)[source]
- Parameters:
obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.
- Returns:
A calculation result of some sensible non-sequence type, such as string, float, int, etc.
- Return type:
float | None
- a
- p
- property rtype: Type[float]
Variable’s return type, such that rtype(“result”) converts to the relevant type.