lXtractor.util package

lXtractor.util.io module

Various utilities for IO.

lXtractor.util.io.fetch_chunks(it, fetcher, chunk_size=100, **kwargs)[source]

A wrapper for fetching multiple links with ThreadPoolExecutor.

Parameters:
  • it (Iterable[V]) – Iterable over some objects accepted by the fetcher, e.g., links.

  • fetcher (Callable[[list[V]], T]) – A callable accepting a chunk of objects from it, fetching and returning the result.

  • chunk_size (int) – Split iterable into this many chunks for the executor.

  • kwargs – Passed to fetch_iterable().

Returns:

A list of results

Return type:

Generator[tuple[list[V], T | Future], None, None]

lXtractor.util.io.fetch_iterable(it, fetcher, num_threads=None, verbose=False, blocking=True, allow_failure=True)[source]
Parameters:
  • it (Iterable[V]) – Iterable over some objects accepted by the fetcher, e.g., links.

  • fetcher (Callable[[V], T]) – A callable accepting a chunk of objects from it, fetching and returning the result.

  • num_threads (int | None) – The number of threads for ThreadPoolExecutor.

  • verbose (bool) – Enable progress bar and warnings/exceptions on fetching failures.

  • blocking (bool) – If True, will wait for each result. Otherwise, will return Future objects instead of fetched data.

  • allow_failure (bool) – If True, failure to fetch will raise a warning isntead of an exception. Otherwise, the warning is logged, and the results won’t contain inputs that failed to fetch.

Returns:

A list of tuples where the first object is the input and the second object is the fetched data.

Return type:

Generator[tuple[V, T], None, None] | Generator[tuple[V, Future[T]], None, None]

lXtractor.util.io.fetch_text(url, decode=False, chunk_size=8192, **kwargs)[source]

Fetch the content as a single string. This will use the requests.get with stream=True by default to split the download into chunks and thus avoid taking too much memory at once.

Parameters:
  • url (str) – Link to fetch from.

  • decode (bool) – Decode the received bytes to utf-8.

  • chunk_size (int) – The number of bytes to use when splitting the fetched result into chunks.

  • kwargs – Passed to requests.get().

Returns:

Fetched text as a single string.

Return type:

str | bytes

lXtractor.util.io.fetch_to_file(url, fpath=None, fname=None, root_dir=None, decode=False)[source]
Parameters:
  • url (str) – Link to a file.

  • fpath (Path | None) – Path to a file for saving. If provided, fname and root_dir are ignored. Otherwise, will use .../{this} from the link for the file name and save into the current dir.

  • fname (str | None) – Name of the file to save.

  • root_dir (Path | None) – Dir where to save the file.

  • decode (bool) – If True, try decoding the raw request’s content.

Returns:

Local path to the file.

Return type:

Path

lXtractor.util.io.fetch_urls(url_getter, url_getter_args, fmt, dir_, *, fname_idx=0, args_applier=None, callback=None, overwrite=False, decode=False, max_trials=1, num_threads=None, verbose=False)[source]

A general-purpose function for fetching URLs. Each URL is dynamically produced via URL getters supplied with positional arguments.

See also

ApiBase or PDB for more information on URL getters.

It has two modes: fetching to text and fetching to files. The former is the default, whereas the latter can be turned on by providing dir_ argument. If provided, each url is considered a separate file to fetch. Thus, the function will also check dir_ (if it exists) for files that were already fetched to avoid useless work. This can be turned off via overwrite=True. For this functionality to work, each argument in url_getter_args must be converted to a single (file)name. If an argument is a sequence, fname_idx should point to an index, such that arg[fname_idx] is the filename.

Parameters:
  • url_getter (UrlGetter) – A callable accepting two or more strings and returning a valid url to fetch. The last argument is reserved for fmt.

  • url_getter_args (Iterable[_U]) – An iterable over strings or tuple of strings supplied to the url_getter. Each element must be sufficient for the url_getter to return a valid URL.

  • dir – Dir to save files to. If None, will return either raw string or json-derived dictionary if the fmt is “json”.

  • fmt (str) – File format. It is used construct a full file name “{filename}.{fmt}”.

  • fname_idx (int) – If an element in url_getter_args is a tuple, this argument is used to index this tuple to construct a file name that is used to save file / check if such file exists.

  • args_applier (Callable[[UrlGetter, _U], str] | None) – A callable accepting a URL getter and its args and applying the arguments to the URL getter to obtain the URL. If none, will apply arguments as positional arguments.

  • callback (Callable[[_U, str | bytes], T] | None) – A callable to parse content right after fetching, e.g., json.loads. It’s only used if dir_ is not provided.

  • overwrite (bool) – Overwrite existing files if dir_ is provided.

  • decode (bool) – Decode the fetched content (bytes to utf-8). Should be True if expecting text content.

  • max_trials (int) – Max number of fetching attempts for a given id.

  • num_threads (int | None) – The number of threads to use for parallel requests. If None, will send requests sequentially.

  • verbose (bool) – Display progress bar.

Returns:

A tuple with fetched results and the remaining file names. The former is a list of tuples, where the first element is the original name, and the second element is either the path to a downloaded file or downloaded data as string. The order may differ. The latter is a list of names that failed to fetch.

Return type:

tuple[list[tuple[_U, _F] | tuple[_U, T]], list[_U]]

lXtractor.util.io.get_dirs(path)[source]
Parameters:

path (Path) – Path to a directory.

Returns:

Mapping {dir name => dir path} for each dir in path.

Return type:

dict[str, Path]

lXtractor.util.io.get_files(path)[source]
Parameters:

path (Path) – Path to a directory.

Returns:

Mapping {file name => file path} for each file in path.

Return type:

dict[str, Path]

lXtractor.util.io.parse_suffix(path)[source]

Parse a file suffix.

  1. If there are no suffixes: raise an error.

  2. If there is one suffix, return it.

  3. If there are more than one suffixes, join the last two and return.

Parameters:

path (Path) – Input path.

Returns:

Parsed suffix.

Raises:

FormatError – If not suffix is present.

Return type:

str

lXtractor.util.io.path_tree(path)[source]

Create a tree graph from Chain*-type objects saved to the filesystem.

The function will recursively walk starting from the provided path, connecting parent and children paths (residing within “segments” directory). If it meets a path containing “structures” directory, it will save valid structure paths under a node’s “structures” attribute. In that case, such structures are assumed to be nested under a chain, and they do not form nodes in this graph.

A path to a Chain*-type object is valid if it contains “sequence.tsv” and “meta.tsv” files. A valid structure path must contain “sequence.tsv”, “meta.tsv”, and “structure.*” files.

Parameters:

path (Path) – A root path to start with.

Returns:

An undirected graph with paths as nodes and edges representing parent-child relationships.

Return type:

DiGraph

lXtractor.util.io.read_n_col_table(path, n, sep='\t')[source]

Read table from file and ensure it has exactly n columns.

Return type:

DataFrame | None

lXtractor.util.io.run_sp(cmd, split=True)[source]

It will attempt to run the command as a subprocess returning text. If the command returns CalledProcessError, it will rerun the command with check=False to capture all the outputs into the result.

Parameters:
  • cmd (str) – A single string of a command.

  • split (bool) – Split cmd before running. If False, will pass shell=True.

Returns:

Result of a subprocess with captured output.

lXtractor.util.misc module

Miscellaneous utilities that couldn’t be properly categorized.

lXtractor.util.misc.all_logging_disabled(highest_level=50)[source]

A context manager that will prevent any logging messages triggered during the body from being processed.

The function was borrowed from this gist

Parameters:

highest_level – the maximum logging level in use. This would only need to be changed if a custom level greater than CRITICAL is defined.

lXtractor.util.misc.apply(fn, it, verbose, desc, num_proc, total=None, use_joblib=False, **kwargs)[source]
Parameters:
  • fn (Callable[[T], R]) – A one-argument function.

  • it (Iterable[T]) – An iterable over some objects.

  • verbose (bool) – Display progress bar.

  • desc (str) – Progress bar description.

  • num_proc (int) – The number of processes to use. Anything below 1 indicates sequential processing. Otherwise, will apply fn in parallel using ProcessPoolExecutor.

  • total (int | None) – The total number of elements. Used for the progress bar.

  • use_joblib (bool) – Use joblib.Parallel for parallel application.

Returns:

Passed to ProcessPoolExecutor.map() or joblib.Parallel.

Return type:

Iterator[R]

lXtractor.util.misc.col2col(df, col_fr, col_to)[source]
Parameters:
  • df (DataFrame) – Some DataFrame.

  • col_fr (str) – A column name to map from.

  • col_to (str) – A column name to map to.

Returns:

Mapping between values of a pair of columns.

lXtractor.util.misc.get_cpu_count(c)[source]
lXtractor.util.misc.graph_reindex_nodes(g)[source]

Reindex the graph nodes so that node data equals to node indices.

Parameters:

g (PyGraph) – An arbitrary PyGraph.

Returns:

A PyGraph of the same size and having the same edges but with reindexed nodes.

Return type:

PyGraph

lXtractor.util.misc.is_empty(x)[source]
Return type:

bool

lXtractor.util.misc.is_valid_field_name(s)[source]
Parameters:

s (str) – Some string.

Returns:

True if s` is a valid field name for ``__getattr__ `` operations else ``False.

Return type:

bool

lXtractor.util.misc.json_to_molgraph(inp)[source]

Converts a JSON-formatted molecular graph into a PyGraph object. This graph is a dictionary with two keys: “num_nodes” and “edges”. The former indicates the number of atoms in a structure, whereas the latter is a list of edge tuples.

Parameters:

inp (dict | PathLike) – A dictionary or a path to a JSON file produced using rustworkx.node_link_json.

Returns:

A graph with nodes and edges initialized in order given in inp. Any associated data will be omitted.

Return type:

PyGraph

lXtractor.util.misc.valgroup(m, sep=':')[source]

Reformat a mapping from the format:

X => [Y{sep}Z, ...]

To a format:

X => [(Y, [Z, ...]), ...]
>>> mapping = {'X': ['C:A', 'C:B', 'Y:Z']}
>>> valgroup(mapping)
{'X': [('X', ['A', 'B']), ('Y', ['Z'])]}

Hint

This method is useful for converting the sequence-to-structure mapping outputted by lXtractor.ext.sifts.SIFTS to a format accepted by the :method:`lXtractor.core.chain.initializer.ChainInitializer.from_mapping` to initialize lXtractor.core.chain.Chain objects

Parameters:
  • m (Mapping[str, list[str]]) – A mapping from strings to a list of strings.

  • sep (str) – A separator of each mapped string in the list.

Returns:

A reformatted mapping.

lXtractor.util.seq module

Low-level utilities to work with sequences (as strings) or sequence files.

lXtractor.util.seq.biotite_align(seqs, **kwargs)[source]

Align two sequences using biotite align_optimal function.

Parameters:
  • seqs (Iterable[tuple[str, str]]) – An iterable with exactly two sequences.

  • kwargs – Additional arguments to align_optimal.

Returns:

A pair of aligned sequences.

Return type:

tuple[tuple[str, str], tuple[str, str]]

lXtractor.util.seq.mafft_add(msa, seqs, *, mafft='mafft', thread=1, keeplength=True)[source]

Add sequences to existing MSA using mafft.

This is a curried function: incomplete argument set yield partially evaluated function (e.g., mafft_add(thread=10)).

Parameters:
  • msa (Iterable[tuple[str, str]] | Path) – an iterable over sequences with the same length.

  • seqs (Iterable[tuple[str, str]]) – an iterable over sequences comprising the addition.

  • thread (int) – how many threads to dedicate for mafft.

  • keeplength (bool) – force to preserve the MSA’s length.

  • mafft (str) – mafft executable.

Returns:

A tuple of two lists of SeqRecord objects: with (1) alignment sequences with addition, and (2) aligned addition, separately.

Return type:

Iterator[tuple[str, str]]

lXtractor.util.seq.mafft_align(seqs, *, mafft='mafft-linsi', thread=1)[source]

Align an arbitrary number of sequences using mafft.

Parameters:
  • seqs (Iterable[tuple[str, str]] | Path) – An iterable over (header, _seq) pairs or path to file with sequences to align.

  • thread (int) – How many threads to dedicate for mafft.

  • mafft (str) – mafft executable (path or env variable).

Returns:

An Iterator over aligned (header, _seq) pairs.

Return type:

Iterator[tuple[str, str]]

lXtractor.util.seq.map_pairs_numbering(s1, s1_numbering, s2, s2_numbering, align=True, align_method=<function mafft_align>, empty=None, **kwargs)[source]

Map numbering between a pair of sequences.

Parameters:
  • s1 (str) – The first sequence.

  • s1_numbering (Iterable[int]) – The first sequence’s numbering.

  • s2 (str) – The second sequence.

  • s2_numbering (Iterable[int]) – The second sequence’s numbering.

  • align (bool) – Align before calculating. If False, sequences are assumed to be aligned.

  • align_method (AlignMethod) – Align method to use. Must be a callable accepting and returning a list of sequences.

  • empty (Any | None) – Empty numeration element in place of a gap.

  • kwargs – Passed to align_method.

Returns:

Iterator over character pairs (a, b), where a and b are the original sequences’ numberings. One of a or b in a pair can be empty to represent a gap.

Return type:

Generator[tuple[int | None, int | None], None, None]

lXtractor.util.seq.partition_gap_sequences(seqs, max_fraction_of_gaps=1.0)[source]

Removes sequences having fraction of gaps above the given threshold.

Parameters:
  • seqs (Iterable[tuple[str, str]]) – a collection of arbitrary sequences.

  • max_fraction_of_gaps (float) – a threshold specifying an upper bound on allowed fraction of gap characters within a sequence.

Returns:

a filtered list of sequences.

Return type:

tuple[Iterator[str], Iterator[str]]

lXtractor.util.seq.read_fasta(inp, strip_id=True)[source]

Simple lazy fasta reader.

Parameters:
  • inp (str | PathLike | TextIOBase | Iterable[str]) – Pathlike object compatible with open or opened file or an iterable over lines or raw text as str.

  • strip_id (bool) – Strip ID to the first consecutive (spaceless) string.

Returns:

An iterator of (header, seq) pairs.

Return type:

Iterator[tuple[str, str]]

lXtractor.util.seq.remove_gap_columns(seqs, max_gaps=1.0)[source]

Remove gap columns from a collection of sequences.

Parameters:
  • seqs (Iterable[str]) – A collection of equal length sequences.

  • max_gaps (float) – Max fraction of gaps allowed per column.

Returns:

Initial seqs with gap columns removed and removed columns’ indices.

Return type:

tuple[Iterator[str], ndarray]

lXtractor.util.seq.write_fasta(inp, out)[source]

Simple fasta writer.

Parameters:
  • inp (Iterable[tuple[str, str]]) – Iterable over (header, _seq) pairs.

  • out (Path | SupportsWrite) – Something that supports .write method.

Returns:

Nothing.

Return type:

None

lXtractor.util.structure module

Low-level utilities to work with structures.

lXtractor.util.structure.calculate_dihedral(atom1, atom2, atom3, atom4)[source]

Calculate angle between planes formed by [a1, a2, atom3] and [a2, atom3, atom4].

Each atom is an array of shape (3, ) with XYZ coordinates.

Calculation method inspired by https://math.stackexchange.com/questions/47059/how-do-i-calculate-a- dihedral-angle-given-cartesian-coordinates

Return type:

float

lXtractor.util.structure.compare_arrays(a, b, eps=0.001)[source]

Compare two numerical arrays.

Parameters:
  • a (ndarray[Any, dtype[float | int]]) – The first array.

  • b (ndarray[Any, dtype[float | int]]) – The second array.

  • eps (float) – Comparison tolerance.

Returns:

True if the absolute difference between the two arrays is within eps.

Raises:

LengthMismatch – If the two arrays are not of the same shape.

lXtractor.util.structure.compare_coord(a, b, eps=0.001)[source]

Compare coordinates between atoms of two atom arrays.

Parameters:
  • a (AtomArray) – The first atom array.

  • b (AtomArray) – The second atom array.

  • eps (float) – Comparison tolerance.

Returns:

True if the two arrays are of the same length and the absolute difference between coordinates of the corresponding atom pairs is within eps.

lXtractor.util.structure.extend_residue_mask(a, idx)[source]

Extend a residue mask for given atoms.

Parameters:
  • a (AtomArray) – An arbitrary atom array.

  • idx (list[int]) – Indices pointing to atoms at which to extend the mask.

Returns:

The extended mask, where True indicates that the atom belongs to the same residue as indicated by idx.

Return type:

ndarray[Any, dtype[bool_]]

lXtractor.util.structure.filter_any_polymer(a, min_size=2)[source]

Get a mask indicating atoms being a part of a macromolecular polymer: peptide, nucleotide, or carbohydrate.

Parameters:
  • a (AtomArray) – Array of atoms.

  • min_size (int) – Min number of polymer monomers.

Returns:

A boolean mask True for polymers’ atoms.

Return type:

ndarray

lXtractor.util.structure.filter_ligand(a)[source]

Filter for ligand atoms – non-polymer and non-solvent hetero atoms.

..note ::

No contact-based verification is performed here.

Parameters:

a (AtomArray) – Atom array.

Returns:

A boolean mask True for ligand atoms.

Return type:

ndarray

lXtractor.util.structure.filter_polymer(a, min_size=2, pol_type='peptide')[source]

Filter for atoms that are a part of a consecutive standard macromolecular polymer entity.

Parameters:
  • a (AtomArray) – The array to filter.

  • min_size – The minimum number of monomers.

  • pol_type – The polymer type, either "peptide", "nucleotide", or "carbohydrate". Abbreviations are supported: "p", "pep", "n", etc.

Returns:

This array is True for all indices in array, where atoms belong to consecutive polymer entity having at least min_size monomers.

Return type:

ndarray[Any, dtype[bool_]]

lXtractor.util.structure.filter_selection(array, res_id, atom_names=None)[source]

Filter AtomArray by residue numbers and atom names.

Parameters:
  • array (AtomArray) – Arbitrary structure.

  • res_id (Sequence[int] | None) – A sequence of residue numbers.

  • atom_names (Sequence[Sequence[str]] | Sequence[str] | None) – A sequence of atom names (broadcasted to each position in res_id) or an iterable over such sequences for each position in res_id.

Returns:

A binary mask that is True for filtered atoms.

Return type:

ndarray

lXtractor.util.structure.filter_solvent_extended(a)[source]

Filter for solvent atoms using a curated solvent list including non-water molecules typically being a part of a crystallization solution.

Parameters:

a (AtomArray) – Atom array.

Returns:

A boolean mask True for solvent atoms.

Return type:

ndarray

lXtractor.util.structure.filter_to_common_atoms(a1, a2, allow_residue_mismatch=False)[source]

Filter to atoms common between residues of atom arrays a1 and a2.

Parameters:
  • a1 (AtomArray) – Arbitrary atom array.

  • a2 (AtomArray) – Arbitrary atom array.

  • allow_residue_mismatch (bool) – If True, when residue names mismatch, the common atoms are derived from the intersection a1.atoms & a2.atoms & {"C", "N", "CA", "CB"}.

Returns:

A pair of masks for a1 and a2, True for matching atoms.

Raises:

ValueError

  1. If a1 and a2 have different number of residues.

  2. If the selection for some residue produces different number

    of atoms.

Return type:

tuple[ndarray, ndarray]

lXtractor.util.structure.find_contacts(a, mask)[source]

Find contacts between a subset of atoms within the structure and the rest of the structure. An atom is considered to be in contact with another atom if the distance between them is below the threshold for the non-covalent bond specified in config (DefaultConfig["bonds"]["NC-NC"][1]).

Parameters:
  • a (AtomArray) – Atom array.

  • mask (ndarray) – A boolean mask True for atoms for which to find contacts.

Returns:

A tuple with three arrays of size equal to the a’s number of atoms:

  1. Contact mask: True for a[~mask] atoms in contact with

    a[mask].

  2. Distances: for a[mask] atoms to the closest a[~mask] atom.

  3. Indices: of these closest a[~mask] atoms within the mask.

Suppose that mask specifies a ligand. Then, for i-th atom in a, contacts[i], distances[i], indices[i] indicate whether a[i] has a contact, the precise distance from a[i] atom to the closest ligand atom, and an index of this ligand atom, respectively.

Return type:

tuple[ndarray, ndarray, ndarray]

lXtractor.util.structure.find_first_polymer_type(a, min_size=2, order=('p', 'n', 'c'))[source]

Determines polymer type of the supplied atom array or an array of atom marks.

Probe polymer types in a sequence in a given order. If a polymer with at least min_size atoms of the probed type is found, it will be returned.

Hint

The function serves as a good quick-check when a single polymer type is expected, which should always be true when a is an array of atom marks.

Parameters:
  • a (AtomArray | ndarray[Any, dtype[int]]) – An arbitrary array of atoms.

  • min_size (int) – A minimum number of monomers in a polymer.

  • order (tuple[str, str, str]) – An order of the polymers to probe.

Returns:

The first polymer type to accommodate min_size requirement.

Return type:

str

lXtractor.util.structure.find_primary_polymer_type(a, min_size=2, residues=False)[source]

Find the major polymer type, i.e., the one with the largest number of atoms or monomers.

Parameters:
  • a (AtomArray) – An arbitrary atom array.

  • min_size (int) – Minimum number of monomers for a polymer.

  • residues (bool) – True if the dominant polymer should be picked according to the number of residues. Otherwise, the number of atoms will be used.

Returns:

A binary mask pointing at the polymer atoms in a and the polymer type – “c” (carbohydrate), “n” (nucleotide), or “p” (peptide). If no polymer atoms were found, polymer type will be designated as “x”.

Return type:

tuple[ndarray, str]

lXtractor.util.structure.get_missing_atoms(a, excluding_names=('OXT',), excluding_elements=('H',))[source]

For each residue, compare with the one stored in CCD, and find missing atoms.

Parameters:
  • a (AtomArray) – Non-empty atom array.

  • excluding_names (Sequence[str] | None) – A sequence of atom names to exclude for calculation.

  • excluding_elements (Sequence[str] | None) – A sequence of element names to exclude for calculation.

Returns:

A generator of lists of missing atoms (excluding hydrogens) per residue in a or None if not such residue was found in CCD.

Return type:

Generator[list[str | None] | None, None, None]

lXtractor.util.structure.get_observed_atoms_frac(a, excluding_names=('OXT',), excluding_elements=('H',))[source]

Find fractions of observed atoms compared to canonical residue versions stored in CCD.

Parameters:
  • a (AtomArray) – Non-empty atom array.

  • excluding_names (Sequence[str] | None) – A sequence of atom names to exclude for calculation.

  • excluding_elements (Sequence[str] | None) – A sequence of element names to exclude for calculation.

Returns:

A generator of observed atom fractions per residue in a or None if a residue was not found in CCD.

Return type:

Generator[list[str | None] | None, None, None]

lXtractor.util.structure.iter_canonical(a)[source]
Parameters:

a (AtomArray) – Arbitrary atom array.

Returns:

Generator of canonical versions of residues in a or None if no such residue found in CCD.

Return type:

Generator[AtomArray | None, None, None]

lXtractor.util.structure.iter_residue_masks(a)[source]

Iterate over residue masks.

Parameters:

a (AtomArray) – Atom array.

Returns:

A generator over boolean masks for each residue in a.

Return type:

Generator[ndarray[Any, dtype[bool_]], None, None]

lXtractor.util.structure.load_structure(inp, fmt='', *, gz=False, **kwargs)[source]

This is a simplified version of a biotite.io.general.load_structure extending the supported input types. Namely, it allows using paths, strings, bytes or gzipped files. On the other hand, there are less supported formats: pdb, cif, and mmtf.

Parameters:
  • inp (IOBase | Path | str | bytes) – Input to load from. It can be a path to a file, an opened file handle, a string or bytes of file contents. Gzipped bytes and files are supported.

  • fmt (str) – If inp is a Path-like object, it must be of the form “name.fmt” or “name.fmt.gz”. In this case, fmt is ignored. Otherwise, it is used to determine the parser type and must be provided.

  • gz (bool) – If inp is gzipped bytes, this flag must be True.

  • kwargs – Passed to get_structure: either a method or a separate function used by biotite to convert the input into an AtomArray.

Returns:

Return type:

AtomArray

lXtractor.util.structure.mark_polymer_type(a, min_size=2)[source]

Denote polymer type in an atom array.

It will find the breakpoints in a and split it into segments. Each segment will be checked separately to determine its polymer type. The results are then concatenated into a single array and returned.

Parameters:
  • a (AtomArray) – Any atom array.

  • min_size (int) – Minimum number of consecutive monomers in a polymer.

Returns:

An array where each atom of a is marked by a character: "n", "p", or "c" for nucleotide, peptide, and carbohydrate. Non-polymer atoms are marked by “x”.

Return type:

ndarray[Any, dtype[str_]]

lXtractor.util.structure.save_structure(array, path, **kwargs)[source]

This is a simplified version of a biotite.io.general.save_structure. On the one hand, it can conveniently compress the data using gzip. On the other hand, the number of supported formats is fewer: pdb, cif, and mmtf.

Parameters:
  • array (AtomArray) – An AtomArray to write.

  • path (Path) – A path with correct extension, e.g., Path("data/structure.pdb"), or Path("data/structure.pdb.gz").

  • kwargs – If compressing is not required, the original save_structure from biotite is used with these kwargs. Otherwise, kwargs are ignored.

Returns:

If the file was successfully written, returns the original path.

lXtractor.util.structure.to_graph(a, split_chains=False)[source]

Create a molecular connectivity graph from an atom array.

Molecular graph is a undirected graph without multiedges, where nodes are indices to atoms. Thus, node indices point directly to atoms in the provided atom array, and the number of nodes equals the number of atoms. A pair of nodes has an edge between them, if they form a covalent bond. The edges are constructed according to atom-depended bond thresholds defined by the global config. These distances are stored as edge values. See the docs of rustworkx on how to manipulate the resulting graph object.

Parameters:
  • a (AtomArray) – Atom array to guild a graph from.

  • split_chains (bool) – Edges between atoms from different chains are forbidden.

Returns:

A graph object where nodes are atom indices and edges represent covalent bonds.

Return type:

PyGraph