lXtractor.variables package

lXtractor.variables.base module

Base classes, common types and functions for the variables module.

class lXtractor.variables.base.AbstractCalculator[source]

Bases: Generic[OT]

Class defining variables’ calculation strategy.

abstract __call__(o: OT, v: VT, m: Mapping[int, int | None] | None) → tuple[bool, RT][source]

abstract __call__(o: Iterable[OT], v: Iterable[VT] | Iterable[Iterable[VT]], m: Iterable[Mapping[int, int | None] | None] | None) → Iterable[Iterable[tuple[bool, RT]]]

Parameters:

o – Object to calculate on.
v – Some variable whose calculate method accepts o-type instances.
m – Optional mapping between object and some reference object numbering schemes.

Returns:

Calculation result.

abstract map(o, v, m)[source]

Map variables to a single object.

Parameters:

o (OT) – Object to calculate on.
v (Iterable[VT]) – An iterable over variables whose calculate method accepts o-type instances.
m (Mapping[int, int | None] | None) – Optional mapping between object and some reference object numbering schemes.

Returns:

An iterator (generator) over calculation result.

Return type:

Iterable[tuple[bool, RT]]

abstract vmap(o, v, m)[source]

Map objects to a single variable.

Parameters:

o (Iterable[OT]) – An iterable over objects to calculate on.
v (VT) – Some variable whose calculate method accepts o-type instances.
m (Iterable[Mapping[int, int | None] | None]) – Optional mapping between object and some reference object numbering schemes.

Returns:

An iterator (generator) over calculation result.

Return type:

Iterable[tuple[bool, RT]]

class lXtractor.variables.base.AbstractVariable[source]

Bases: Generic[OT, RT]

Abstract base class for variables.

abstract calculate(obj, mapping=None)[source]

Calculate variable. Each variable defines its own calculation strategy.

Parameters:

obj (OT) – An object used for variable’s calculation.
mapping (Mapping[int, int | None] | None) – Mapping from generalizable positions of MSA/reference/etc. to the obj’s positions.

Returns:

Calculation result.

Raises:

FailedCalculation if the calculation fails.

Return type:

RT

property id: str: Variable identifier such that eval(x.id) produces another instance.

abstract property rtype: Type[RT]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.base.AggFn(*args, **kwargs)[source]

Bases: Protocol

__call__(a, **kwargs)[source]

Call self as a function.

Return type:: ndarray | float

__init__(*args, **kwargs)

class lXtractor.variables.base.LigandVariable[source]

Bases: AbstractVariable[Ligand, RT], Generic[T, RT]

A type of variable whose calculate() method requires protein sequence.

abstract calculate(obj, mapping=None)[source]

Parameters:

obj (Ligand) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

RT

class lXtractor.variables.base.ProtFP(path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/lxtractor/checkouts/latest/lXtractor/resources/PFP.csv'))[source]

Bases: object

ProtFP embeddings for amino acid residues.

ProtFP is a coding scheme derived from the PCA analysis of the AAIndex database [Westen et al., 2013, Westen et al., 2013].

>>> pfp = ProtFP()
>>> pfp[('G', 1)]
-5.7
>>> list(pfp['G'])
[-5.7, -8.72, 4.18, -1.35, -0.31]
>>> comp1 = pfp[1]
>>> assert len(comp1) == 20
>>> comp1[0]
-5.7
>>> comp1.index[0]
'G'

[1]

Gerard JP van Westen, Remco F Swier, Jörg K Wegner, Adriaan P IJzerman, Herman WT van Vlijmen, and Andreas Bender. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. Journal of Cheminformatics, 5(1):41–41, 2013. doi:10.1186/1758-2946-5-41.

[2]

Gerard JP van Westen, Remco F Swier, Isidro Cortes-Ciriano, Jörg K Wegner, John P Overington, Adriaan P IJzerman, Herman WT van Vlijmen, and Andreas Bender. Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. Journal of Cheminformatics, 5(1):42, 2013. doi:10.1186/1758-2946-5-42.

__init__(path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/lxtractor/checkouts/latest/lXtractor/resources/PFP.csv'))[source]

class lXtractor.variables.base.SequenceVariable[source]

Bases: AbstractVariable[Sequence[T], RT], Generic[T, RT]

A type of variable whose calculate() method requires protein sequence.

abstract calculate(obj, mapping=None)[source]

Parameters:

obj (Sequence[T]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

RT

class lXtractor.variables.base.StructureVariable[source]

Bases: AbstractVariable[GenericStructure, RT], Generic[RT]

A type of variable whose calculate() method requires protein structure.

abstract calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (Mapping[int, int | None] | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

RT

class lXtractor.variables.base.Variables(dict=None, /, **kwargs)[source]

Bases: UserDict

A subclass of dict holding variables (AbstractVariable subclasses).

The keys are the AbstractVariable subclasses’ instances (hashed by :meth:id), and values are calculation results.

as_df()[source]

Returns:: A table with two columns: VariableID and VariableResult.
Return type:: DataFrame

classmethod read(path)[source]

Read and initialize variables.

Parameters:: path (Path) – Path to a two-column .tsv file holding pairs (var_id, var_value). Will use var_id to initialize variable, importing dynamically a relevant class from variables.
Returns:: A dict mapping variable object to its value.
Return type:: Variables

write(path)[source]

Parameters:

path (Path) – Path to a file.
skip_if_contains – Skip if a variable ID contains any of the provided strings.

property sequence: Variables

Returns:: values that are SequenceVariable instances.

property structure: Variables

Returns:: values that are StructureVariable instances.

lXtractor.variables.calculator module

Module defining variable calculators managing the exact calculation process of variables on objects.

class lXtractor.variables.calculator.GenericCalculator(num_proc=1, valid_exceptions=(<class 'lXtractor.core.exceptions.FailedCalculation'>, ), apply_kwargs=None, verbose=False)[source]

Bases: AbstractCalculator

Parallel calculator, calculating variables in parallel. Duh.

__call__(o: OT, v: VT, m: Mapping[int, int | None] | None) → tuple[bool, RT][source]

__call__(o: Iterable[OT], v: Iterable[VT] | Iterable[Iterable[VT]], m: Iterable[Mapping[int, int | None] | None] | None) → Iterable[Iterable[tuple[bool, RT]]]

Parameters:

o – Object to calculate on.
v – Some variable whose calculate method accepts o-type instances.
m – Optional mapping between object and some reference object numbering schemes.

Returns:

Calculation result.

__init__(num_proc=1, valid_exceptions=(<class 'lXtractor.core.exceptions.FailedCalculation'>, ), apply_kwargs=None, verbose=False)[source]

map(o, v, m)[source]

Map variables to a single object.

Parameters:

o (OT) – Object to calculate on.
v (Iterable[VT]) – An iterable over variables whose calculate method accepts o-type instances.
m (Mapping[int, int | None] | None) – Optional mapping between object and some reference object numbering schemes.

Returns:

An iterator (generator) over calculation result.

Return type:

Generator[tuple[bool, RT], None, None]

vmap(o, v, m)[source]

Map objects to a single variable.

Parameters:

o (Iterable[OT]) – An iterable over objects to calculate on.
v (VT) – Some variable whose calculate method accepts o-type instances.
m (Iterable[Mapping[int, int | None] | None] | Mapping[int, int | None] | None) – Optional mapping between object and some reference object numbering schemes.

Returns:

An iterator (generator) over calculation result.

Return type:

Generator[tuple[bool, RT], None, None]

apply_kwargs

num_proc

valid_exceptions

verbose

lXtractor.variables.calculator.calculate(o, v, m, valid_exceptions, num_proc, verbose=False, **kwargs)[source]

Return type:: Generator[Iterator[tuple[bool, RT]], None, None]

lXtractor.variables.manager module

Manager handles variable calculations, such as:

Variable manipulations (assignment, deletions, and resetting).
Calculation of variables. Simply manages the calculation process, whereas
calculators (lXtractor.variables.calculator.GenericCalculator for instance) do the heavy lifting.
Aggregation of the calculation results, either
from_chains or from_iterable.

class lXtractor.variables.manager.Manager(verbose=False)[source]

Bases: object

Manager of variable calculations, handling assignment, aggregation, and, of course, the calculations themselves.

__init__(verbose=False)[source]

Parameters:: verbose (bool) – Display progress bar.

aggregate_from_chains(chains)[source]

Aggregate calculation results from the variables container of the provided chains.

>>> from lXtractor.variables.sequential import SeqEl
>>> s = lxc.ChainSequence.from_string('abcd', name='_seq')
>>> manager = Manager()
>>> manager.assign([SeqEl(1)], [s])
>>> df = manager.aggregate_from_chains([s])
>>> len(df) == 1
True
>>> list(df.columns)
['VariableID', 'VariableResult', 'ObjectID', 'ObjectType']

Parameters:: chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
Returns:: A dataframe with ObjectID, ObjectType, and calculation results.
Return type:: DataFrame

aggregate_from_it(results, vs_to_cols=True, replace_errors=True, replace_errors_with=nan, num_vs=None)[source]

Aggregate calculation results directly from calculate() output.

Parameters:

results (Iterable[tuple[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand], SequenceVariable | StructureVariable | LigandVariable, bool, Any]]) – An iterable over calculation results.
vs_to_cols (bool) – If True, will attempt to use the wide format for the final results with variables as columns. Otherwise, will use the long format with fixed columns: “ObjectID”, “VariableID”, “VariableCalculated”, and “VariableResult”. Note that for the wide format to work, all objects and their variables must have unique IDs.
replace_errors (bool) – When calculation failed, replace the calculation results with certain value.
replace_errors_with (Any) – Use this value to replace erroneous calculation results.
num_vs (int | None) – The number of variables per object. Providing this will significantly increase the aggregation speed.

Returns:

A table with results in long or short format.

Return type:

DataFrame | dict[str, list]

assign(vs, chains)[source]

Assign variables to chains sequences/structures.

Parameters:

vs (Sequence[SequenceVariable | StructureVariable | LigandVariable]) – A sequence of variables.
chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.

Returns:

No return. Will store assigned variables within the variables attribute.

calculate(objs, vs, calculator, *, save=False, **kwargs)[source]

Handles variable calculations:

Stage calculations (see stage()).

Calculate variables using the provided calculator.

(Optional) save the calculation results to variables container.

Output (stream) calculation results.

Note that 3 and 4 are done lazily as calculation results from the calculator become available.

>>> from lXtractor.variables.calculator import GenericCalculator
>>> from lXtractor.variables.sequential import SeqEl
>>> s = lxc.ChainSequence.from_string('ABCD', name='_seq')
>>> m = Manager()
>>> c = GenericCalculator()
>>> list(m.calculate([s],[SeqEl(1)],c))
[(_seq|1-4, SeqEl(p=1,_rtype='str',seq_name='seq1'), True, 'A')]
>>> list(m.calculate([s],[SeqEl(5)],c))[0][-2:]
(False, 'Missing index 4 in sequence')

Parameters:

objs (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables. If not provided, will use assigned variables (see assign()).
calculator (AbstractCalculator) – A calculator object – some callable with the right signature handling the calculations.
save (bool) – Save calculation results to variables. Will overwrite any existing matching variables.
kwargs – Passed to stage().

Returns:

A generator over tuples: 1. Original object. 2. Variable. 3. Flag indicated whether the calculation was successful. 4. The calculation result (or the error message).

Return type:

Generator[tuple[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand], SequenceVariable | StructureVariable | LigandVariable, bool, Any], None, None]

remove(chains, vs=None)[source]

Remove variables from the variables container.

Parameters:

chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables to remove. If not provided, will remove all variables.

Returns:

No return.

reset(chains, vs=None)[source]

Similar to remove(), but instead of deleting, resets variable calculation results.

Parameters:

chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables to reset. If not provided, will reset all variables.

Returns:

No return.

stage(chains, vs, **kwargs)[source]

Stage objects for calculations (e.g., using calculate()). It’s a useful method if using a different calculation method and/or parallelization strategy within a Calculator class.

See also

stage() calculate()

>>> from lXtractor.variables.sequential import SeqEl
>>> s = lxc.ChainSequence.from_string('ABCD', name='_seq')
>>> m = Manager()
>>> staged = list(m.stage([s], [SeqEl(1)]))
>>> len(staged) == 1
True
>>> staged[0]
(_seq|1-4, 'ABCD', [SeqEl(p=1,_rtype='str',seq_name='seq1')], None)

Parameters:

chains (Iterable[ChainSequence | ChainStructure | tuple[ChainStructure, Ligand]]) – An iterable over chain sequences/structures.
vs (Sequence[SequenceVariable | StructureVariable | LigandVariable] | None) – A sequence of variables. If not provided, will use assigned variables (see assign()).
kwargs – Passed to stage().

Returns:

An iterable over tuples holding data for variables’ calculation.

Return type:

Generator[tuple[ChainSequence, Sequence[Any], Sequence[SequenceVariable], Mapping[int, int] | None] | tuple[ChainStructure, GenericStructure, Sequence[StructureVariable], Mapping[int, int] | None], None, None]

verbose

lXtractor.variables.manager.find_structure(s)[source]

Recursively search for structure up the ancestral tree.

Parameters:: s (ChainStructure) – An arbitrary chain structure.
Returns:: The first non-empty atom array up the parent chain.
Return type:: GenericStructure | None

lXtractor.variables.manager.get_mapping(obj, map_name, map_to)[source]

Obtain mapping from a Chain*-type object.

>>> s = lxc.ChainSequence.from_string('ABCD', name='_seq')
>>> s.add_seq('some_map', [5, 6, 7, 8])
>>> s.add_seq('another_map', ['D', 'B', 'C', 'A'])
>>> get_mapping(s, 'some_map', None)
{5: 1, 6: 2, 7: 3, 8: 4}
>>> get_mapping(s, 'another_map', 'some_map')
{'D': 5, 'B': 6, 'C': 7, 'A': 8}

Parameters:

obj (Any) – Chain*-type object. If not a Chain*-type object, raises AttributeError.
map_name (str | None) – The name of a map to create the mapping from. If None, the resulting mapping is None.
map_to (str | None) – The name of a map to create a mapping to. If None, will default to the real sequence indices (1-based) for a ChainSequence object and to the structure actual numbering for the ChainStructure.

Returns:

A dictionary mapping from the map_name sequence to map_to sequence.

Return type:

dict | None

lXtractor.variables.manager.stage(obj: ChainStructure, vs, *, missing, seq_name, map_name, map_to) → tuple[ChainStructure, GenericStructure, Sequence[StructureVariable], Mapping[int, int] | None][source]

lXtractor.variables.manager.stage(obj: ChainSequence, vs, *, missing, seq_name, map_name, map_to) → tuple[ChainSequence, Sequence[Any], Sequence[SequenceVariable], Mapping[int, int] | None]

lXtractor.variables.manager.stage(obj: tuple[ChainStructure, Ligand], vs, *, missing, seq_name, map_name, map_to) → tuple[tuple[ChainStructure, Ligand], Ligand, Sequence[LigandVariable], Mapping[int, int] | None]

Stage object for calculation. If it’s a chain sequence, will stage some sequence/mapping within it. If it’s a chain structure, will stage the atom array.

Parameters:

obj – A chain sequence or structure or structure-ligand pair to calculate the variables on.
vs – A sequence of variables to calculate.
missing – If True, calculate only those assigned variables that are missing.
seq_name – If obj is the chain sequence, the sequence name is used to obtain an actual sequence (obj[seq_name]).
map_name – The mapping name to obtain the mapping keys. If None, the resulting mapping will be None.
map_to – The mapping name to obtain the mapping values. See get_mapping() for details.

Returns:

A tuple with four elements: 1. Original object. 2. Staged target passed to a variable for calculation. 3. A sequence of sequence or structural variables. 4. An optional mapping.

lXtractor.variables.parser module

lXtractor.variables.parser.init_var(var)[source]

Convert a textual representation of a single variable into a concrete and initialized variable.

>>> assert isinstance(init_var('123'), SeqEl)
>>> assert isinstance(init_var('1-2'), Dist)
>>> assert isinstance(init_var('1-2-3-4'), PseudoDihedral)

Parameters:: var (str) – textual representation of a variable.
Returns:: initialized variable, a concrete subclass of an AbstractVariable

lXtractor.variables.parser.parse_var(inp)[source]

Parse raw input into a collection of variables, structures, and levels at which they should be calculated.

Parameters:

inp (str) –

"[variable_specs]--[protein_specs]::[domains]" format, where:

variable_specs define the variable type
(e.g., 1:CA-2:CA for CA-CA distance between positions 1 and 2)
protein_specs define proteins for which to calculate variables
domains list the domain names for the given protein collection

Returns:

a namedtuple with (1) variables, (2) list of proteins (or [None]), and (3) a list of domains (or [None]).

lXtractor.variables.sequential module

Module defines variables calculated on sequences

class lXtractor.variables.sequential.PFP(p, i)[source]

Bases: SequenceVariable

A ProtFP embedding variable.

See also

lXtractor.variables.base.ProtFP

__init__(p, i)[source]

Parameters:

p (int) – Position, starting from 1.
i (int) – A PCA component index starting from 1.

calculate(obj, mapping=None)[source]

Parameters:

obj (Sequence[str]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

float

i: A PCA component index starting from 1.

p: Position, starting from 1

property rtype: Type[float]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.sequential.SeqEl(p, _rtype='str', seq_name='seq1')[source]

Bases: SequenceVariable[T, T]

A sequence element variable. It doesn’t encompass any calculation. Rather, it simply accesses sequence at certain position.

>>> v1, v2 = SeqEl(1), SeqEl(1, 'X')
>>> s1, s2 = 'XYZ', [1, 2, 3]
>>> v1.calculate(s1,,
'X'
>>> v2.calculate(s2,,
1

__init__(p, _rtype='str', seq_name='seq1')[source]

Parameters:

p (int) – Position, starting from 1.
seq_name (str) – The name of the sequence used to distinguish variables pointing to the same position.

calculate(obj, mapping=None)[source]

Parameters:

obj (Sequence[T]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

T

p: Position, starting from 1.

property rtype: Type[T]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

seq_name: Sequence name for which the element is accessed

class lXtractor.variables.sequential.SliceTransformReduce(start=None, stop=None, step=None, seq_name='seq1')[source]

Bases: SequenceVariable, Generic[T, V, K]

A composite variable with three sequential operations:

Slice – subset the sequence (optional).

Transform – transform the sequence (optional).

Reduce – reduce to a final variable.

This is an abstract class. It requires to define at least two methods:

transform().
rtype() property.

See also

make_str() – a factory function to quickly make child classes.

__init__(start=None, stop=None, step=None, seq_name='seq1')[source]

Note

start and stop have inclusive boundaries.

Parameters:

start (int | None) – Start position
stop (int | None) – Stop position.
step (int | None) – Slicing step.
seq_name (str) – Sequence name. Please use it in case a resulting variable will be applied to seqs other than the primary sequence.

calculate(obj, mapping=None)[source]

Parameters:

obj (Iterable[K]) – Some sequence.
mapping (Mapping[int, int | None] | None) – Optional mapping between sequence and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

V

abstract static reduce(seq)[source]

Reduce the input iterable into the variable result.

Parameters:: seq (Iterable[T] | Iterable[K]) – Some sort of iterable – the results of the transform (or slicing, if no transformation is used)
Returns:: An aggregated value (e.g., float, string, etc.).
Return type:: V

static transform(seq)[source]

Optionally transform the slicing result. If not used, it is the identity operation.

Parameters:: seq (Iterable[K]) – The result of slicing operation. If no slicing is used, it is just an iter(input_seq).
Returns:: Iterable over transformed elements (can have another type than the input ones).
Return type:: Iterable[T] | Iterable[K]

seq_name: Sequence name.

start: Start position.

step: Slicing step.

stop: End position.

lXtractor.variables.sequential.make_str(reduce, rtype, transform=None, reduce_name=None, transform_name=None)[source]

Makes a non-abstract subclass of SliceTransformReduce with specific transform and reduce operations.

To make things clearer, transform and reduce operations will have certain names that will be incoroporated into a created class name.

Example 1: no transformation:

>>> v_type = make_str(sum, float)
>>> v_type.__name__
'SliceSum'

To instanciate it, we provide additional slicing parameters

>>> v = v_type(1, 2, seq_name='X')
>>> v.id
"SliceSum(start=1,stop=2,step=None,seq_name='X')"

>>> v.calculate([1, 2, 3, 4, 5],,
3

Example 2: with transformation:

Note that the first operatoiin – slicing – inevitably produces an iterator over the input sequence. Hence, even if we aren’t slicing, i.e., provide None for all SliceTransformReduce.__init__() arguments, we still obtain an iterator over characters. Therefore, we convert it to string and then apply the necessary operation. Note that this feature makes transform map-friendly.

>>> count_x = lambda x: sum(1 for c in x if c == 'X')
>>> upper = lambda x: "".join(x).upper()
>>> v = make_str(count_x, int, transform=upper, transform_name='upper',
...              reduce_name='countX')()
>>> v.calculate('XoXoxo',,
3
>>> v.id
"SliceUpperCountx(start=None,stop=None,step=None,seq_name='seq1')"

See also

SliceTransformReduce – a base abstract class from which this function generates variables.

Parameters:

reduce (Callable[[Iterable[T]], V]) – Reduce operation peferably producing a single output.
rtype (Type) – Return type of the reduce operation and, since this is the last operatoin, of a variable itself.
transform (Callable[[Iterator[K]], Iterable[T]] | None) – Optional transformation operation. It accepts an iterator over (optionally) sliced input elements and returns an iterable over elements of potentially another type, as long as they are supported by the reduce.
reduce_name (str | None) – The name of the reduce operation. Please provide it in case using lambda.
transform_name (str | None) – The name of the transform operation. Please provide it in case using lambda.

Returns:

An uninitialized subclass of SliceTransformReduce encapsulating the provided operations within the SliceTransformReduce.calculate().

Return type:

Type[SliceTransformReduce]

lXtractor.variables.structural module

Module defining variables calculated on structures.

class lXtractor.variables.structural.AggDist(p1, p2, key='min')[source]

Bases: StructureVariable

Aggregated distance between two residues.

It will return agg_fn(pdist) where pdist is an array of all pairwise distances between atoms of p1 and p2.

__init__(p1, p2, key='min')[source]

Parameters:

p1 (int) – Position 1.
p2 (int) – Position 2.
key (str) – Agg function name.

Available aggregator functions are:

>>> print(list(AggFns))
['min', 'max', 'mean', 'median']

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

float

key: Agg function name.

p1: Position 1.

p2: Position 2.

property rtype: Type[float]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.structural.Chi1(p)[source]

Bases: CompositeDihedral

Chi1-dihedral angle.

static get_dihedrals(pos)[source]

Implemented by child classes.

Parameters:: pos – Position to create Dihedral instances.
Returns:: An iterable over Dihedral’s. The calculate() will try calculating dihedrals in the provided order until the first successful calculation. If no calculations were successful, will raise FailedCalculation error.
Return type:: list[Dihedral]

class lXtractor.variables.structural.Chi2(p)[source]

Bases: CompositeDihedral

Chi2-dihedral angle,

static get_dihedrals(pos)[source]

Implemented by child classes.

Parameters:: pos – Position to create Dihedral instances.
Returns:: An iterable over Dihedral’s. The calculate() will try calculating dihedrals in the provided order until the first successful calculation. If no calculations were successful, will raise FailedCalculation error.
Return type:: list[Dihedral]

class lXtractor.variables.structural.ClosestLigandContactsCount(p, a=None)[source]

Bases: StructureVariable

The number of atoms involved in contacting ligands.

__init__(p, a=None)[source]

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

float

a: Atom name. If not provided, sum contacts across all residue atoms.

p: Residue position.

property rtype: Type[int]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.structural.ClosestLigandDist(p, a=None, agg_lig='min', agg_res='min')[source]

Bases: StructureVariable

A distance from the selected residue or a residue’s atom to a connected ligand.

Each ligand provides lXtractor.core.ligand.Ligand.dist array. These arrays are stacked and aggregated atom-wise using agg_lig. Then, agg_res aggregates the obtained vector of values into a single number.

For instance, to obtain max distance for the closest ligand of a residue 1, use ClosestLigandDist(1, agg_res='max').

If structure has no <ligands lXtractor.core.structure.GenericStructure.ligands>, this variable defaults to -1.0.

..note ::: Attr lXtractor.core.ligand.dist provides distances from an atom to the closest ligand atom.

__init__(p, a=None, agg_lig='min', agg_res='min')[source]

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

float

a: Atom name. If not provided, aggregate across residue atoms.

agg_lig: Aggregator function for ligands.

agg_res: Aggregator function for a residue atoms.

p: Residue position

property rtype: Type[float]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.structural.ClosestLigandNames(p, a=None)[source]

Bases: StructureVariable

","-separated contacting ligand (residue) names.

__init__(p, a=None)[source]

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

str

a: Atom name. If not provided, merge across all residue atoms.

p: Residue position.

property rtype: Type[str]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.structural.Contacts(p, r=5.0)[source]

Bases: StructureVariable

Uses KDTree to find atoms within the r distance threshold of those defined by target position p. Positions these atoms correspond to are returned as a “,”-separated string.

If mapping is provided, contact positions will be filtered to those covered by this mapping.

Note

The default value of r is provided by DefaultConfig["contacts"]["non-covalent"][1].

__init__(p, r=5.0)[source]

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

str

p: Target position.

r: Contact upper bound in angstroms.

property rtype: Type[str]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.structural.Dihedral(p1, p2, p3, p4, a1, a2, a3, a4, name='GenericDihedral')[source]

Bases: StructureVariable

Dihedral angle involving four different atoms.

__init__(p1, p2, p3, p4, a1, a2, a3, a4, name='GenericDihedral')[source]

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

float

a1: Atom name.

a2: Atom name.

a3: Atom name.

a4: Atom name.

property atoms: list[str]

Returns:: A list of atoms a1-a4.

name: str: Used to designate special kinds of dihedrals.

p1: Position.

p2: Position.

p3: Position.

p4: Position.

property positions: list[int]

Returns:: A list of positions p1-p4.

property rtype: Type[float]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.structural.Dist(p1, p2, a1=None, a2=None, com=False)[source]

Bases: StructureVariable

A distance between two atoms.

__init__(p1, p2, a1=None, a2=None, com=False)[source]

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

float

a1: str | None: Atom name 1.

a2: str | None: Atom name 2.

com: bool: Use center of mass instead of concrete atoms.

p1: int: Position 1.

p2: int: Position 2.

property rtype: Type[float]: Variable’s return type, such that rtype(“result”) converts to the relevant type.

class lXtractor.variables.structural.Omega(p)[source]

Bases: Dihedral

Omega dihedral angle.

__init__(p)[source]

p

class lXtractor.variables.structural.Phi(p)[source]

Bases: Dihedral

Phi dihedral angle.

__init__(p)[source]

p

class lXtractor.variables.structural.PseudoDihedral(p1, p2, p3, p4)[source]

Bases: Dihedral

Pseudo-dihedral angle - “the torsion angle between planes defined by 4 consecutive alpha-carbon atoms.”

__init__(p1, p2, p3, p4)[source]

class lXtractor.variables.structural.Psi(p)[source]

Bases: Dihedral

Psi dihedral angle.

__init__(p)[source]

p

class lXtractor.variables.structural.SASA(p, a=None)[source]

Bases: StructureVariable

Solvent-accessible surface area of a residue or a specific atom.

The SASA is calculated for the whole array, and subset to all or a single atoms of a residue (so atoms are taken into account for calculation).

__init__(p, a=None)[source]

calculate(obj, mapping=None)[source]

Parameters:

obj (GenericStructure) – Some atom array.
mapping (MappingT | None) – Optional mapping between structure and some reference object numbering schemes.

Returns:

A calculation result of some sensible non-sequence type, such as string, float, int, etc.

Return type:

float | None

a

p

property rtype: Type[float]: Variable’s return type, such that rtype(“result”) converts to the relevant type.