lXtractor.util package
lXtractor.util.io module
Various utilities for IO.
- lXtractor.util.io.fetch_chunks(it, fetcher, chunk_size=100, **kwargs)[source]
A wrapper for fetching multiple links with
ThreadPoolExecutor
.- Parameters:
it (Iterable[V]) – Iterable over some objects accepted by the fetcher, e.g., links.
fetcher (Callable[[list[V]], T]) – A callable accepting a chunk of objects from it, fetching and returning the result.
chunk_size (int) – Split iterable into this many chunks for the executor.
kwargs – Passed to
fetch_iterable()
.
- Returns:
A list of results
- Return type:
Generator[tuple[list[V], T | Future], None, None]
- lXtractor.util.io.fetch_iterable(it, fetcher, num_threads=None, verbose=False, blocking=True, allow_failure=True)[source]
- Parameters:
it (Iterable[V]) – Iterable over some objects accepted by the fetcher, e.g., links.
fetcher (Callable[[V], T]) – A callable accepting a chunk of objects from it, fetching and returning the result.
num_threads (int | None) – The number of threads for
ThreadPoolExecutor
.verbose (bool) – Enable progress bar and warnings/exceptions on fetching failures.
blocking (bool) – If
True
, will wait for each result. Otherwise, will returnFuture
objects instead of fetched data.allow_failure (bool) – If
True
, failure to fetch will raise a warning isntead of an exception. Otherwise, the warning is logged, and the results won’t contain inputs that failed to fetch.
- Returns:
A list of tuples where the first object is the input and the second object is the fetched data.
- Return type:
Generator[tuple[V, T], None, None] | Generator[tuple[V, Future[T]], None, None]
- lXtractor.util.io.fetch_text(url, decode=False, chunk_size=8192, **kwargs)[source]
Fetch the content as a single string. This will use the
requests.get
withstream=True
by default to split the download into chunks and thus avoid taking too much memory at once.- Parameters:
url (str) – Link to fetch from.
decode (bool) – Decode the received bytes to utf-8.
chunk_size (int) – The number of bytes to use when splitting the fetched result into chunks.
kwargs – Passed to
requests.get()
.
- Returns:
Fetched text as a single string.
- Return type:
str | bytes
- lXtractor.util.io.fetch_to_file(url, fpath=None, fname=None, root_dir=None, decode=False)[source]
- Parameters:
url (str) – Link to a file.
fpath (Path | None) – Path to a file for saving. If provided, fname and root_dir are ignored. Otherwise, will use
.../{this}
from the link for the file name and save into the current dir.fname (str | None) – Name of the file to save.
root_dir (Path | None) – Dir where to save the file.
decode (bool) – If
True
, try decoding the raw request’s content.
- Returns:
Local path to the file.
- Return type:
Path
- lXtractor.util.io.fetch_urls(url_getter, url_getter_args, fmt, dir_, *, fname_idx=0, args_applier=None, callback=None, overwrite=False, decode=False, max_trials=1, num_threads=None, verbose=False)[source]
A general-purpose function for fetching URLs. Each URL is dynamically produced via URL getters supplied with positional arguments.
See also
ApiBase
orPDB
for more information on URL getters.It has two modes: fetching to text and fetching to files. The former is the default, whereas the latter can be turned on by providing dir_ argument. If provided, each url is considered a separate file to fetch. Thus, the function will also check dir_ (if it exists) for files that were already fetched to avoid useless work. This can be turned off via overwrite=True. For this functionality to work, each argument in url_getter_args must be converted to a single (file)name. If an argument is a sequence, fname_idx should point to an index, such that
arg[fname_idx]
is the filename.- Parameters:
url_getter (UrlGetter) – A callable accepting two or more strings and returning a valid url to fetch. The last argument is reserved for fmt.
url_getter_args (Iterable[_U]) – An iterable over strings or tuple of strings supplied to the url_getter. Each element must be sufficient for the url_getter to return a valid URL.
dir – Dir to save files to. If
None
, will return either raw string or json-derived dictionary if the fmt is “json”.fmt (str) – File format. It is used construct a full file name “{filename}.{fmt}”.
fname_idx (int) – If an element in url_getter_args is a tuple, this argument is used to index this tuple to construct a file name that is used to save file / check if such file exists.
args_applier (Callable[[UrlGetter, _U], str] | None) – A callable accepting a URL getter and its args and applying the arguments to the URL getter to obtain the URL. If none, will apply arguments as positional arguments.
callback (Callable[[_U, str | bytes], T] | None) – A callable to parse content right after fetching, e.g.,
json.loads
. It’s only used if dir_ is not provided.overwrite (bool) – Overwrite existing files if dir_ is provided.
decode (bool) – Decode the fetched content (bytes to utf-8). Should be
True
if expecting text content.max_trials (int) – Max number of fetching attempts for a given id.
num_threads (int | None) – The number of threads to use for parallel requests. If
None
, will send requests sequentially.verbose (bool) – Display progress bar.
- Returns:
A tuple with fetched results and the remaining file names. The former is a list of tuples, where the first element is the original name, and the second element is either the path to a downloaded file or downloaded data as string. The order may differ. The latter is a list of names that failed to fetch.
- Return type:
tuple[list[tuple[_U, _F] | tuple[_U, T]], list[_U]]
- lXtractor.util.io.get_dirs(path)[source]
- Parameters:
path (Path) – Path to a directory.
- Returns:
Mapping {dir name => dir path} for each dir in path.
- Return type:
dict[str, Path]
- lXtractor.util.io.get_files(path)[source]
- Parameters:
path (Path) – Path to a directory.
- Returns:
Mapping {file name => file path} for each file in path.
- Return type:
dict[str, Path]
- lXtractor.util.io.parse_suffix(path)[source]
Parse a file suffix.
If there are no suffixes: raise an error.
If there is one suffix, return it.
If there are more than one suffixes, join the last two and return.
- Parameters:
path (Path) – Input path.
- Returns:
Parsed suffix.
- Raises:
FormatError – If not suffix is present.
- Return type:
str
- lXtractor.util.io.path_tree(path)[source]
Create a tree graph from Chain*-type objects saved to the filesystem.
The function will recursively walk starting from the provided path, connecting parent and children paths (residing within “segments” directory). If it meets a path containing “structures” directory, it will save valid structure paths under a node’s “structures” attribute. In that case, such structures are assumed to be nested under a chain, and they do not form nodes in this graph.
A path to a Chain*-type object is valid if it contains “sequence.tsv” and “meta.tsv” files. A valid structure path must contain “sequence.tsv”, “meta.tsv”, and “structure.*” files.
- Parameters:
path (Path) – A root path to start with.
- Returns:
An undirected graph with paths as nodes and edges representing parent-child relationships.
- Return type:
DiGraph
- lXtractor.util.io.read_n_col_table(path, n, sep='\t')[source]
Read table from file and ensure it has exactly n columns.
- Return type:
DataFrame | None
- lXtractor.util.io.run_sp(cmd, split=True)[source]
It will attempt to run the command as a subprocess returning text. If the command returns CalledProcessError, it will rerun the command with
check=False
to capture all the outputs into the result.- Parameters:
cmd (str) – A single string of a command.
split (bool) – Split cmd before running. If
False
, will passshell=True
.
- Returns:
Result of a subprocess with captured output.
lXtractor.util.misc module
Miscellaneous utilities that couldn’t be properly categorized.
- lXtractor.util.misc.all_logging_disabled(highest_level=50)[source]
A context manager that will prevent any logging messages triggered during the body from being processed.
The function was borrowed from this gist
- Parameters:
highest_level – the maximum logging level in use. This would only need to be changed if a custom level greater than CRITICAL is defined.
- lXtractor.util.misc.apply(fn, it, verbose, desc, num_proc, total=None, use_joblib=False, **kwargs)[source]
- Parameters:
fn (Callable[[T], R]) – A one-argument function.
it (Iterable[T]) – An iterable over some objects.
verbose (bool) – Display progress bar.
desc (str) – Progress bar description.
num_proc (int) – The number of processes to use. Anything below
1
indicates sequential processing. Otherwise, will applyfn
in parallel usingProcessPoolExecutor
.total (int | None) – The total number of elements. Used for the progress bar.
use_joblib (bool) – Use
joblib.Parallel
for parallel application.
- Returns:
Passed to
ProcessPoolExecutor.map()
orjoblib.Parallel
.- Return type:
Iterator[R]
- lXtractor.util.misc.col2col(df, col_fr, col_to)[source]
- Parameters:
df (DataFrame) – Some DataFrame.
col_fr (str) – A column name to map from.
col_to (str) – A column name to map to.
- Returns:
Mapping between values of a pair of columns.
- lXtractor.util.misc.graph_reindex_nodes(g)[source]
Reindex the graph nodes so that node data equals to node indices.
- Parameters:
g (PyGraph) – An arbitrary PyGraph.
- Returns:
A PyGraph of the same size and having the same edges but with reindexed nodes.
- Return type:
PyGraph
- lXtractor.util.misc.is_valid_field_name(s)[source]
- Parameters:
s (str) – Some string.
- Returns:
True
ifs` is a valid field name for ``__getattr__ `` operations else ``False
.- Return type:
bool
- lXtractor.util.misc.json_to_molgraph(inp)[source]
Converts a JSON-formatted molecular graph into a PyGraph object. This graph is a dictionary with two keys: “num_nodes” and “edges”. The former indicates the number of atoms in a structure, whereas the latter is a list of edge tuples.
- Parameters:
inp (dict | PathLike) – A dictionary or a path to a JSON file produced using rustworkx.node_link_json.
- Returns:
A graph with nodes and edges initialized in order given in inp. Any associated data will be omitted.
- Return type:
PyGraph
- lXtractor.util.misc.valgroup(m, sep=':')[source]
Reformat a mapping from the format:
X => [Y{sep}Z, ...]
To a format:
X => [(Y, [Z, ...]), ...]
>>> mapping = {'X': ['C:A', 'C:B', 'Y:Z']} >>> valgroup(mapping) {'X': [('X', ['A', 'B']), ('Y', ['Z'])]}
Hint
This method is useful for converting the sequence-to-structure mapping outputted by
lXtractor.ext.sifts.SIFTS
to a format accepted by the :method:`lXtractor.core.chain.initializer.ChainInitializer.from_mapping` to initializelXtractor.core.chain.Chain
objects- Parameters:
m (Mapping[str, list[str]]) – A mapping from strings to a list of strings.
sep (str) – A separator of each mapped string in the list.
- Returns:
A reformatted mapping.
lXtractor.util.seq module
Low-level utilities to work with sequences (as strings) or sequence files.
- lXtractor.util.seq.biotite_align(seqs, **kwargs)[source]
Align two sequences using biotite align_optimal function.
- Parameters:
seqs (Iterable[tuple[str, str]]) – An iterable with exactly two sequences.
kwargs – Additional arguments to align_optimal.
- Returns:
A pair of aligned sequences.
- Return type:
tuple[tuple[str, str], tuple[str, str]]
- lXtractor.util.seq.mafft_add(msa, seqs, *, mafft='mafft', thread=1, keeplength=True)[source]
Add sequences to existing MSA using mafft.
This is a curried function: incomplete argument set yield partially evaluated function (e.g.,
mafft_add(thread=10)
).- Parameters:
msa (Iterable[tuple[str, str]] | Path) – an iterable over sequences with the same length.
seqs (Iterable[tuple[str, str]]) – an iterable over sequences comprising the addition.
thread (int) – how many threads to dedicate for mafft.
keeplength (bool) – force to preserve the MSA’s length.
mafft (str) – mafft executable.
- Returns:
A tuple of two lists of SeqRecord objects: with (1) alignment sequences with addition, and (2) aligned addition, separately.
- Return type:
Iterator[tuple[str, str]]
- lXtractor.util.seq.mafft_align(seqs, *, mafft='mafft-linsi', thread=1)[source]
Align an arbitrary number of sequences using mafft.
- Parameters:
seqs (Iterable[tuple[str, str]] | Path) – An iterable over (header, _seq) pairs or path to file with sequences to align.
thread (int) – How many threads to dedicate for mafft.
mafft (str) – mafft executable (path or env variable).
- Returns:
An Iterator over aligned (header, _seq) pairs.
- Return type:
Iterator[tuple[str, str]]
- lXtractor.util.seq.map_pairs_numbering(s1, s1_numbering, s2, s2_numbering, align=True, align_method=<function mafft_align>, empty=None, **kwargs)[source]
Map numbering between a pair of sequences.
- Parameters:
s1 (str) – The first sequence.
s1_numbering (Iterable[int]) – The first sequence’s numbering.
s2 (str) – The second sequence.
s2_numbering (Iterable[int]) – The second sequence’s numbering.
align (bool) – Align before calculating. If
False
, sequences are assumed to be aligned.align_method (AlignMethod) – Align method to use. Must be a callable accepting and returning a list of sequences.
empty (Any | None) – Empty numeration element in place of a gap.
kwargs – Passed to align_method.
- Returns:
Iterator over character pairs (a, b), where a and b are the original sequences’ numberings. One of a or b in a pair can be empty to represent a gap.
- Return type:
Generator[tuple[int | None, int | None], None, None]
- lXtractor.util.seq.partition_gap_sequences(seqs, max_fraction_of_gaps=1.0)[source]
Removes sequences having fraction of gaps above the given threshold.
- Parameters:
seqs (Iterable[tuple[str, str]]) – a collection of arbitrary sequences.
max_fraction_of_gaps (float) – a threshold specifying an upper bound on allowed fraction of gap characters within a sequence.
- Returns:
a filtered list of sequences.
- Return type:
tuple[Iterator[str], Iterator[str]]
- lXtractor.util.seq.read_fasta(inp, strip_id=True)[source]
Simple lazy fasta reader.
- Parameters:
inp (str | PathLike | TextIOBase | Iterable[str]) – Pathlike object compatible with
open
or opened file or an iterable over lines or raw text as str.strip_id (bool) – Strip ID to the first consecutive (spaceless) string.
- Returns:
An iterator of (header, seq) pairs.
- Return type:
Iterator[tuple[str, str]]
- lXtractor.util.seq.remove_gap_columns(seqs, max_gaps=1.0)[source]
Remove gap columns from a collection of sequences.
- Parameters:
seqs (Iterable[str]) – A collection of equal length sequences.
max_gaps (float) – Max fraction of gaps allowed per column.
- Returns:
Initial seqs with gap columns removed and removed columns’ indices.
- Return type:
tuple[Iterator[str], ndarray]
- lXtractor.util.seq.write_fasta(inp, out)[source]
Simple fasta writer.
- Parameters:
inp (Iterable[tuple[str, str]]) – Iterable over (header, _seq) pairs.
out (Path | SupportsWrite) – Something that supports .write method.
- Returns:
Nothing.
- Return type:
None
lXtractor.util.structure module
Low-level utilities to work with structures.
- lXtractor.util.structure.calculate_dihedral(atom1, atom2, atom3, atom4)[source]
Calculate angle between planes formed by [a1, a2, atom3] and [a2, atom3, atom4].
Each atom is an array of shape (3, ) with XYZ coordinates.
Calculation method inspired by https://math.stackexchange.com/questions/47059/how-do-i-calculate-a- dihedral-angle-given-cartesian-coordinates
- Return type:
float
- lXtractor.util.structure.compare_arrays(a, b, eps=0.001)[source]
Compare two numerical arrays.
- Parameters:
a (ndarray[Any, dtype[float | int]]) – The first array.
b (ndarray[Any, dtype[float | int]]) – The second array.
eps (float) – Comparison tolerance.
- Returns:
True
if the absolute difference between the two arrays is within eps.- Raises:
LengthMismatch – If the two arrays are not of the same shape.
- lXtractor.util.structure.compare_coord(a, b, eps=0.001)[source]
Compare coordinates between atoms of two atom arrays.
- Parameters:
a (AtomArray) – The first atom array.
b (AtomArray) – The second atom array.
eps (float) – Comparison tolerance.
- Returns:
True
if the two arrays are of the same length and the absolute difference between coordinates of the corresponding atom pairs is within eps.
- lXtractor.util.structure.extend_residue_mask(a, idx)[source]
Extend a residue mask for given atoms.
- Parameters:
a (AtomArray) – An arbitrary atom array.
idx (list[int]) – Indices pointing to atoms at which to extend the mask.
- Returns:
The extended mask, where
True
indicates that the atom belongs to the same residue as indicated by idx.- Return type:
ndarray[Any, dtype[bool_]]
- lXtractor.util.structure.filter_any_polymer(a, min_size=2)[source]
Get a mask indicating atoms being a part of a macromolecular polymer: peptide, nucleotide, or carbohydrate.
- Parameters:
a (AtomArray) – Array of atoms.
min_size (int) – Min number of polymer monomers.
- Returns:
A boolean mask
True
for polymers’ atoms.- Return type:
ndarray
- lXtractor.util.structure.filter_ligand(a)[source]
Filter for ligand atoms – non-polymer and non-solvent hetero atoms.
- ..note ::
No contact-based verification is performed here.
- Parameters:
a (AtomArray) – Atom array.
- Returns:
A boolean mask
True
for ligand atoms.- Return type:
ndarray
- lXtractor.util.structure.filter_polymer(a, min_size=2, pol_type='peptide')[source]
Filter for atoms that are a part of a consecutive standard macromolecular polymer entity.
- Parameters:
a (AtomArray) – The array to filter.
min_size – The minimum number of monomers.
pol_type – The polymer type, either
"peptide"
,"nucleotide"
, or"carbohydrate"
. Abbreviations are supported:"p"
,"pep"
,"n"
, etc.
- Returns:
This array is True for all indices in array, where atoms belong to consecutive polymer entity having at least min_size monomers.
- Return type:
ndarray[Any, dtype[bool_]]
- lXtractor.util.structure.filter_selection(array, res_id, atom_names=None)[source]
Filter
AtomArray
by residue numbers and atom names.- Parameters:
array (AtomArray) – Arbitrary structure.
res_id (Sequence[int] | None) – A sequence of residue numbers.
atom_names (Sequence[Sequence[str]] | Sequence[str] | None) – A sequence of atom names (broadcasted to each position in res_id) or an iterable over such sequences for each position in res_id.
- Returns:
A binary mask that is
True
for filtered atoms.- Return type:
ndarray
- lXtractor.util.structure.filter_solvent_extended(a)[source]
Filter for solvent atoms using a curated solvent list including non-water molecules typically being a part of a crystallization solution.
- Parameters:
a (AtomArray) – Atom array.
- Returns:
A boolean mask
True
for solvent atoms.- Return type:
ndarray
- lXtractor.util.structure.filter_to_common_atoms(a1, a2, allow_residue_mismatch=False)[source]
Filter to atoms common between residues of atom arrays a1 and a2.
- Parameters:
a1 (AtomArray) – Arbitrary atom array.
a2 (AtomArray) – Arbitrary atom array.
allow_residue_mismatch (bool) – If
True
, when residue names mismatch, the common atoms are derived from the intersectiona1.atoms & a2.atoms & {"C", "N", "CA", "CB"}
.
- Returns:
A pair of masks for a1 and a2,
True
for matching atoms.- Raises:
ValueError –
If a1 and a2 have different number of residues.
- If the selection for some residue produces different number
of atoms.
- Return type:
tuple[ndarray, ndarray]
- lXtractor.util.structure.find_contacts(a, mask)[source]
Find contacts between a subset of atoms within the structure and the rest of the structure. An atom is considered to be in contact with another atom if the distance between them is below the threshold for the non-covalent bond specified in config (
DefaultConfig["bonds"]["NC-NC"][1]
).- Parameters:
a (AtomArray) – Atom array.
mask (ndarray) – A boolean mask
True
for atoms for which to find contacts.
- Returns:
A tuple with three arrays of size equal to the a’s number of atoms:
- Contact mask:
True
fora[~mask]
atoms in contact with a[mask]
.
- Contact mask:
Distances: for
a[mask]
atoms to the closesta[~mask]
atom.Indices: of these closest
a[~mask]
atoms within the mask.
Suppose that
mask
specifies a ligand. Then, fori
-th atom in a,contacts[i]
,distances[i]
,indices[i]
indicate whethera[i]
has a contact, the precise distance froma[i]
atom to the closest ligand atom, and an index of this ligand atom, respectively.- Return type:
tuple[ndarray, ndarray, ndarray]
- lXtractor.util.structure.find_first_polymer_type(a, min_size=2, order=('p', 'n', 'c'))[source]
Determines polymer type of the supplied atom array or an array of atom marks.
Probe polymer types in a sequence in a given order. If a polymer with at least min_size atoms of the probed type is found, it will be returned.
Hint
The function serves as a good quick-check when a single polymer type is expected, which should always be true when a is an array of atom marks.
- Parameters:
a (AtomArray | ndarray[Any, dtype[int]]) – An arbitrary array of atoms.
min_size (int) – A minimum number of monomers in a polymer.
order (tuple[str, str, str]) – An order of the polymers to probe.
- Returns:
The first polymer type to accommodate min_size requirement.
- Return type:
str
- lXtractor.util.structure.find_primary_polymer_type(a, min_size=2, residues=False)[source]
Find the major polymer type, i.e., the one with the largest number of atoms or monomers.
- Parameters:
a (AtomArray) – An arbitrary atom array.
min_size (int) – Minimum number of monomers for a polymer.
residues (bool) –
True
if the dominant polymer should be picked according to the number of residues. Otherwise, the number of atoms will be used.
- Returns:
A binary mask pointing at the polymer atoms in a and the polymer type – “c” (carbohydrate), “n” (nucleotide), or “p” (peptide). If no polymer atoms were found, polymer type will be designated as “x”.
- Return type:
tuple[ndarray, str]
- lXtractor.util.structure.get_missing_atoms(a, excluding_names=('OXT',), excluding_elements=('H',))[source]
For each residue, compare with the one stored in CCD, and find missing atoms.
- Parameters:
a (AtomArray) – Non-empty atom array.
excluding_names (Sequence[str] | None) – A sequence of atom names to exclude for calculation.
excluding_elements (Sequence[str] | None) – A sequence of element names to exclude for calculation.
- Returns:
A generator of lists of missing atoms (excluding hydrogens) per residue in a or
None
if not such residue was found in CCD.- Return type:
Generator[list[str | None] | None, None, None]
- lXtractor.util.structure.get_observed_atoms_frac(a, excluding_names=('OXT',), excluding_elements=('H',))[source]
Find fractions of observed atoms compared to canonical residue versions stored in CCD.
- Parameters:
a (AtomArray) – Non-empty atom array.
excluding_names (Sequence[str] | None) – A sequence of atom names to exclude for calculation.
excluding_elements (Sequence[str] | None) – A sequence of element names to exclude for calculation.
- Returns:
A generator of observed atom fractions per residue in a or
None
if a residue was not found in CCD.- Return type:
Generator[list[str | None] | None, None, None]
- lXtractor.util.structure.iter_canonical(a)[source]
- Parameters:
a (AtomArray) – Arbitrary atom array.
- Returns:
Generator of canonical versions of residues in a or
None
if no such residue found in CCD.- Return type:
Generator[AtomArray | None, None, None]
- lXtractor.util.structure.iter_residue_masks(a)[source]
Iterate over residue masks.
- Parameters:
a (AtomArray) – Atom array.
- Returns:
A generator over boolean masks for each residue in a.
- Return type:
Generator[ndarray[Any, dtype[bool_]], None, None]
- lXtractor.util.structure.load_structure(inp, fmt='', *, gz=False, **kwargs)[source]
This is a simplified version of a
biotite.io.general.load_structure
extending the supported input types. Namely, it allows using paths, strings, bytes or gzipped files. On the other hand, there are less supported formats: pdb, cif, and mmtf.- Parameters:
inp (IOBase | Path | str | bytes) – Input to load from. It can be a path to a file, an opened file handle, a string or bytes of file contents. Gzipped bytes and files are supported.
fmt (str) – If
inp
is aPath
-like object, it must be of the form “name.fmt” or “name.fmt.gz”. In this case,fmt
is ignored. Otherwise, it is used to determine the parser type and must be provided.gz (bool) – If
inp
is gzippedbytes
, this flag must beTrue
.kwargs – Passed to
get_structure
: either a method or a separate function used bybiotite
to convert the input into anAtomArray
.
- Returns:
- Return type:
AtomArray
- lXtractor.util.structure.mark_polymer_type(a, min_size=2)[source]
Denote polymer type in an atom array.
It will find the breakpoints in a and split it into segments. Each segment will be checked separately to determine its polymer type. The results are then concatenated into a single array and returned.
- Parameters:
a (AtomArray) – Any atom array.
min_size (int) – Minimum number of consecutive monomers in a polymer.
- Returns:
An array where each atom of a is marked by a character:
"n"
,"p"
, or"c"
for nucleotide, peptide, and carbohydrate. Non-polymer atoms are marked by “x”.- Return type:
ndarray[Any, dtype[str_]]
- lXtractor.util.structure.save_structure(array, path, **kwargs)[source]
This is a simplified version of a
biotite.io.general.save_structure
. On the one hand, it can conveniently compress the data usinggzip
. On the other hand, the number of supported formats is fewer: pdb, cif, and mmtf.- Parameters:
array (AtomArray) – An
AtomArray
to write.path (Path) – A path with correct extension, e.g.,
Path("data/structure.pdb")
, orPath("data/structure.pdb.gz")
.kwargs – If compressing is not required, the original
save_structure
from biotite is used with thesekwargs
. Otherwise,kwargs
are ignored.
- Returns:
If the file was successfully written, returns the original path.
- lXtractor.util.structure.to_graph(a, split_chains=False)[source]
Create a molecular connectivity graph from an atom array.
Molecular graph is a undirected graph without multiedges, where nodes are indices to atoms. Thus, node indices point directly to atoms in the provided atom array, and the number of nodes equals the number of atoms. A pair of nodes has an edge between them, if they form a covalent bond. The edges are constructed according to atom-depended bond thresholds defined by the global config. These distances are stored as edge values. See the docs of rustworkx on how to manipulate the resulting graph object.
- Parameters:
a (AtomArray) – Atom array to guild a graph from.
split_chains (bool) – Edges between atoms from different chains are forbidden.
- Returns:
A graph object where nodes are atom indices and edges represent covalent bonds.
- Return type:
PyGraph