chemistry_tools.pubchem.properties

Attention

This package has the following additional requirements:

cawdrey>=0.1.7
mathematical>=0.1.13
pillow>=7.0.0
pyparsing>=2.4.6
tabulate>=0.8.9

These can be installed as follows:

python -m pip install chemistry-tools[pubchem]

Functions and classes to access properties of compounds in the PubChem database.

Data:

PROPERTY_MAP

Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes

valid_properties

Properties for PubChem REST API

Classes:

PropData(name, description, type, attr_name)

Metadata about a property.

PubChemProperty(label[, name, value, dtype, …])

Represents a property parsed from the full PubChem record.

Functions:

force_valid_properties(properties)

Coerce properties into a list of strings and exclude any invalid properties, or raise a ValueError if that is not possible.

get_properties(identifier[, properties, …])

Returns the requested properties for the compound with the given identifier.

get_property(identifier[, property, namespace])

Returns the requested property for the compound with the given identifier.

parse_properties(property_data)

Parse raw data from the property endpoint of the REST API.

rest_get_properties(identifier[, namespace, …])

Returns the properties for the compound with the given identifier in the desired format.

rest_get_properties_json(identifier[, …])

Returns the properties for the compound with the given identifier as a dictionary.

PROPERTY_MAP = {'atom_stereo_count': 'AtomStereoCount', 'bond_stereo_count': 'BondStereoCount', 'canonical_smiles': 'CanonicalSMILES', 'charge': 'Charge', 'complexity': 'Complexity', 'conformer_count_3d': 'ConformerCount3D', 'conformer_model_rmsd_3d': 'ConformerModelRMSD3D', 'covalent_unit_count': 'CovalentUnitCount', 'defined_atom_stereo_count': 'DefinedAtomStereoCount', 'defined_bond_stereo_count': 'DefinedBondStereoCount', 'effective_rotor_count_3d': 'EffectiveRotorCount3D', 'exact_mass': 'ExactMass', 'feature_acceptor_count_3d': 'FeatureAcceptorCount3D', 'feature_anion_count_3d': 'FeatureAnionCount3D', 'feature_cation_count_3d': 'FeatureCationCount3D', 'feature_count_3d': 'FeatureCount3D', 'feature_donor_count_3d': 'FeatureDonorCount3D', 'feature_hydrophobe_count_3d': 'FeatureHydrophobeCount3D', 'feature_ring_count_3d': 'FeatureRingCount3D', 'fingerprint_2d': 'Fingerprint2D', 'h_bond_acceptor_count': 'HBondAcceptorCount', 'h_bond_donor_count': 'HBondDonorCount', 'heavy_atom_count': 'HeavyAtomCount', 'inchi': 'InChI', 'inchikey': 'InChIKey', 'isomeric_smiles': 'IsomericSMILES', 'isotope_atom_count': 'IsotopeAtomCount', 'iupac_name': 'IUPACName', 'molecular_formula': 'MolecularFormula', 'molecular_weight': 'MolecularWeight', 'monoisotopic_mass': 'MonoisotopicMass', 'rotatable_bond_count': 'RotatableBondCount', 'tpsa': 'TPSA', 'undefined_atom_stereo_count': 'UndefinedAtomStereoCount', 'undefined_bond_stereo_count': 'UndefinedBondStereoCount', 'volume3d': 'Volume3D', 'volume_3d': 'XStericQuadrupole3D', 'x_steric_quadrupole_3d': 'YStericQuadrupole3D', 'xlogp': 'XLogP', 'y_steric_quadrupole_3d': 'ZStericQuadrupole3D'}

Type:    Dict[str, str]

Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes

namedtuple PropData(name, description, type, attr_name)[source]

Bases: NamedTuple

Metadata about a property.

Fields
  1.  name (str) – The name of the property.

  2.  description (str) – The description of the property.

  3.  type (Callable) – The type of the property.

  4.  attr_name (str) – The Python attribute name of the property in a chemistry_tools.pubchem.compound.Compound.

__repr__()

Return a nicely formatted representation string

namedtuple PubChemProperty(label, name=None, value=None, dtype=None, source=None)[source]

Bases: NamedTuple

Represents a property parsed from the full PubChem record.

Fields
  1.  label (str) – The label of the property.

  2.  name (Optional[str]) – The name of the property.

  3.  value (Any) – The property’s value.

  4.  dtype (Callable) – The data type property’s value.

  5.  source (Dict) – Dictionary of property sources.

static __new__(cls, label, name=None, value=None, dtype=None, source=None)[source]

Create new instance of __BasePubChemProperty(label, name, value, dtype, source)

force_valid_properties(properties)[source]

Coerce properties into a list of strings and exclude any invalid properties, or raise a ValueError if that is not possible.

Parameters

properties (Union[str, Iterable[str]])

Return type

List[str]

get_properties(identifier, properties='', namespace=<PubChemNamespace.Name: 'name'>, as_dataframe=False)[source]

Returns the requested properties for the compound with the given identifier. As more than one compound may be identified the results are returned in a list.

Parameters
  • identifier (Union[str, int, Sequence[Union[str, int]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.

  • properties (Union[Sequence[str], str]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default ''.

  • namespace (Union[PubChemNamespace, str]) – The type of identifier to look up. Valid values are in PubChemNamespace. Default <PubChemNamespace.Name: 'name'>.

  • as_dataframe (bool) – Automatically extract the properties into a pandas DataFrame. Default False.

Raises
  • ValueError – If the response body does not contain valid JSON.

  • NotFoundError – If the compound with the requested identifier was not found in PubChem.

Return type

Union[List[Dict[str, Any]], DataFrame]

Returns

List of dictionaries mapping properties to values

get_property(identifier, property='', namespace=<PubChemNamespace.Name: 'name'>)[source]

Returns the requested property for the compound with the given identifier.

This convenience function only allows for a single property to be accessed at once, and for only a single compound. if you require multiple properties and/or properties for multiple compounds use chemistry_tools.pubchem.properties.get_properties, which helps reduce the burden on the PubChem servers.

Parameters
Raises
  • ValueError – If the response body does not contain valid JSON.

  • NotFoundError – If the compound with the requested identifier was not found in PubChem.

Return type

Any

Returns

The requested property. Type depends on the property requested.

parse_properties(property_data)[source]

Parse raw data from the property endpoint of the REST API.

Parameters

property_data (Dict)

Return type

List[Dict]

Returns

A list of dictionaries mapping the properties to values for each compound

rest_get_properties(identifier, namespace=<PubChemNamespace.Name: 'name'>, properties='', format_=<PubChemFormats.CSV: 'CSV'>)[source]

Returns the properties for the compound with the given identifier in the desired format.

Parameters
  • identifier (Union[str, int, Sequence[Union[str, int]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.

  • namespace – The type of identifier to look up. Valid values are in PubChemNamespace. Default <PubChemNamespace.Name: 'name'>.

  • properties (Union[Sequence[str], str]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default ''.

  • format_ (Union[PubChemFormats, str]) – The format to obtain the data in. Default <PubChemFormats.CSV: 'CSV'>.

Return type

str

rest_get_properties_json(identifier, namespace=<PubChemNamespace.Name: 'name'>, properties='', **kwargs)[source]

Returns the properties for the compound with the given identifier as a dictionary.

Parameters
  • identifier (Union[str, int, Sequence[Union[str, int]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.

  • namespace (Union[str, PubChemNamespace]) – The type of identifier to look up. Valid values are in PubChemNamespace. Default <PubChemNamespace.Name: 'name'>.

  • properties (Union[Sequence[str], str]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default ''.

  • kwargs – Optional arguments that json.loads takes.

Raises

ValueError – If the response body does not contain valid JSON.

Return type

Dict

Returns

Parsed JSON data

valid_properties = {'AtomStereoCount': <class 'int'>, 'BondStereoCount': <class 'int'>, 'CanonicalSMILES': <class 'str'>, 'Charge': <class 'int'>, 'Complexity': <class 'float'>, 'ConformerCount3D': <class 'int'>, 'ConformerModelRMSD3D': <class 'float'>, 'CovalentUnitCount': <class 'int'>, 'DefinedAtomStereoCount': <class 'int'>, 'DefinedBondStereoCount': <class 'int'>, 'EffectiveRotorCount3D': <class 'int'>, 'ExactMass': <class 'float'>, 'FeatureAcceptorCount3D': <class 'int'>, 'FeatureAnionCount3D': <class 'int'>, 'FeatureCationCount3D': <class 'int'>, 'FeatureCount3D': <class 'int'>, 'FeatureDonorCount3D': <class 'int'>, 'FeatureHydrophobeCount3D': <class 'int'>, 'FeatureRingCount3D': <class 'int'>, 'Fingerprint2D': <class 'str'>, 'HBondAcceptorCount': <class 'int'>, 'HBondDonorCount': <class 'int'>, 'HeavyAtomCount': <class 'int'>, 'IUPACName': <class 'str'>, 'InChI': <class 'str'>, 'InChIKey': <class 'str'>, 'IsomericSMILES': <class 'str'>, 'IsotopeAtomCount': <class 'int'>, 'MolecularFormula': <bound method Formula.from_string of <class 'chemistry_tools.formulae.formula.Formula'>>, 'MolecularWeight': <class 'float'>, 'MonoisotopicMass': <class 'float'>, 'RotatableBondCount': <class 'int'>, 'TPSA': <class 'float'>, 'UndefinedAtomStereoCount': <class 'int'>, 'UndefinedBondStereoCount': <class 'int'>, 'Volume3D': <class 'str'>, 'XLogP': <class 'float'>, 'XStericQuadrupole3D': <class 'float'>, 'YStericQuadrupole3D': <class 'float'>, 'ZStericQuadrupole3D': <class 'float'>}

Type:    Dict[str, Callable]

Properties for PubChem REST API