chemistry_tools.pubchem.properties
Attention
This package has the following additional requirements:
cawdrey>=0.1.7 mathematical>=0.1.13 pillow>=7.0.0 pyparsing>=2.4.6 tabulate>=0.8.9
These can be installed as follows:
python -m pip install chemistry-tools[pubchem]
Functions and classes to access properties of compounds in the PubChem database.
Data:
Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes |
|
Properties for PubChem REST API |
Classes:
|
Metadata about a property. |
|
Represents a property parsed from the full PubChem record. |
Functions:
|
Coerce |
|
Returns the requested properties for the compound with the given identifier. |
|
Returns the requested property for the compound with the given identifier. |
|
Parse raw data from the |
|
Returns the properties for the compound with the given identifier in the desired format. |
|
Returns the properties for the compound with the given identifier as a dictionary. |
-
PROPERTY_MAP
= {'atom_stereo_count': 'AtomStereoCount', 'bond_stereo_count': 'BondStereoCount', 'canonical_smiles': 'CanonicalSMILES', 'charge': 'Charge', 'complexity': 'Complexity', 'conformer_count_3d': 'ConformerCount3D', 'conformer_model_rmsd_3d': 'ConformerModelRMSD3D', 'covalent_unit_count': 'CovalentUnitCount', 'defined_atom_stereo_count': 'DefinedAtomStereoCount', 'defined_bond_stereo_count': 'DefinedBondStereoCount', 'effective_rotor_count_3d': 'EffectiveRotorCount3D', 'exact_mass': 'ExactMass', 'feature_acceptor_count_3d': 'FeatureAcceptorCount3D', 'feature_anion_count_3d': 'FeatureAnionCount3D', 'feature_cation_count_3d': 'FeatureCationCount3D', 'feature_count_3d': 'FeatureCount3D', 'feature_donor_count_3d': 'FeatureDonorCount3D', 'feature_hydrophobe_count_3d': 'FeatureHydrophobeCount3D', 'feature_ring_count_3d': 'FeatureRingCount3D', 'fingerprint_2d': 'Fingerprint2D', 'h_bond_acceptor_count': 'HBondAcceptorCount', 'h_bond_donor_count': 'HBondDonorCount', 'heavy_atom_count': 'HeavyAtomCount', 'inchi': 'InChI', 'inchikey': 'InChIKey', 'isomeric_smiles': 'IsomericSMILES', 'isotope_atom_count': 'IsotopeAtomCount', 'iupac_name': 'IUPACName', 'molecular_formula': 'MolecularFormula', 'molecular_weight': 'MolecularWeight', 'monoisotopic_mass': 'MonoisotopicMass', 'rotatable_bond_count': 'RotatableBondCount', 'tpsa': 'TPSA', 'undefined_atom_stereo_count': 'UndefinedAtomStereoCount', 'undefined_bond_stereo_count': 'UndefinedBondStereoCount', 'volume3d': 'Volume3D', 'volume_3d': 'XStericQuadrupole3D', 'x_steric_quadrupole_3d': 'YStericQuadrupole3D', 'xlogp': 'XLogP', 'y_steric_quadrupole_3d': 'ZStericQuadrupole3D'} -
Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes
-
namedtuple
PropData
(name, description, type, attr_name)[source] Bases:
NamedTuple
Metadata about a property.
- Fields
name (
str
) – The name of the property.description (
str
) – The description of the property.type (
Callable
) – The type of the property.attr_name (
str
) – The Python attribute name of the property in achemistry_tools.pubchem.compound.Compound
.
-
__repr__
() Return a nicely formatted representation string
-
namedtuple
PubChemProperty
(label, name=None, value=None, dtype=None, source=None)[source] Bases:
NamedTuple
Represents a property parsed from the full PubChem record.
- Fields
-
force_valid_properties
(properties)[source] Coerce
properties
into a list of strings and exclude any invalid properties, or raise aValueError
if that is not possible.
-
get_properties
(identifier, properties='', namespace=<PubChemNamespace.Name: 'name'>, as_dataframe=False)[source] Returns the requested properties for the compound with the given identifier. As more than one compound may be identified the results are returned in a list.
- Parameters
identifier (
Union
[str
,int
,Sequence
[Union
[str
,int
]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.properties (
Union
[Sequence
[str
],str
]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default''
.namespace (
Union
[PubChemNamespace
,str
]) – The type of identifier to look up. Valid values are inPubChemNamespace
. Default<PubChemNamespace.Name: 'name'>
.as_dataframe (
bool
) – Automatically extract the properties into a pandasDataFrame
. DefaultFalse
.
- Raises
ValueError – If the response body does not contain valid JSON.
NotFoundError – If the compound with the requested identifier was not found in PubChem.
- Return type
- Returns
List of dictionaries mapping properties to values
-
get_property
(identifier, property='', namespace=<PubChemNamespace.Name: 'name'>)[source] Returns the requested property for the compound with the given identifier.
This convenience function only allows for a single property to be accessed at once, and for only a single compound. if you require multiple properties and/or properties for multiple compounds use
chemistry_tools.pubchem.properties.get_properties
, which helps reduce the burden on the PubChem servers.- Parameters
identifier (
Union
[str
,int
,Sequence
[Union
[str
,int
]]]) – Identifiers (e.g. name, CID) for the compound to look up.properties – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties.
namespace (
Union
[PubChemNamespace
,str
]) – The type of identifier to look up. Valid values are inPubChemNamespace
. Default<PubChemNamespace.Name: 'name'>
.
- Raises
ValueError – If the response body does not contain valid JSON.
NotFoundError – If the compound with the requested identifier was not found in PubChem.
- Return type
- Returns
The requested property. Type depends on the property requested.
-
rest_get_properties
(identifier, namespace=<PubChemNamespace.Name: 'name'>, properties='', format_=<PubChemFormats.CSV: 'CSV'>)[source] Returns the properties for the compound with the given identifier in the desired format.
- Parameters
identifier (
Union
[str
,int
,Sequence
[Union
[str
,int
]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.namespace – The type of identifier to look up. Valid values are in
PubChemNamespace
. Default<PubChemNamespace.Name: 'name'>
.properties (
Union
[Sequence
[str
],str
]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default''
.format_ (
Union
[PubChemFormats
,str
]) – The format to obtain the data in. Default<PubChemFormats.CSV: 'CSV'>
.
- Return type
-
rest_get_properties_json
(identifier, namespace=<PubChemNamespace.Name: 'name'>, properties='', **kwargs)[source] Returns the properties for the compound with the given identifier as a dictionary.
- Parameters
identifier (
Union
[str
,int
,Sequence
[Union
[str
,int
]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.namespace (
Union
[str
,PubChemNamespace
]) – The type of identifier to look up. Valid values are inPubChemNamespace
. Default<PubChemNamespace.Name: 'name'>
.properties (
Union
[Sequence
[str
],str
]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default''
.kwargs – Optional arguments that
json.loads
takes.
- Raises
ValueError – If the response body does not contain valid JSON.
- Return type
- Returns
Parsed JSON data
-
valid_properties
= {'AtomStereoCount': <class 'int'>, 'BondStereoCount': <class 'int'>, 'CanonicalSMILES': <class 'str'>, 'Charge': <class 'int'>, 'Complexity': <class 'float'>, 'ConformerCount3D': <class 'int'>, 'ConformerModelRMSD3D': <class 'float'>, 'CovalentUnitCount': <class 'int'>, 'DefinedAtomStereoCount': <class 'int'>, 'DefinedBondStereoCount': <class 'int'>, 'EffectiveRotorCount3D': <class 'int'>, 'ExactMass': <class 'float'>, 'FeatureAcceptorCount3D': <class 'int'>, 'FeatureAnionCount3D': <class 'int'>, 'FeatureCationCount3D': <class 'int'>, 'FeatureCount3D': <class 'int'>, 'FeatureDonorCount3D': <class 'int'>, 'FeatureHydrophobeCount3D': <class 'int'>, 'FeatureRingCount3D': <class 'int'>, 'Fingerprint2D': <class 'str'>, 'HBondAcceptorCount': <class 'int'>, 'HBondDonorCount': <class 'int'>, 'HeavyAtomCount': <class 'int'>, 'IUPACName': <class 'str'>, 'InChI': <class 'str'>, 'InChIKey': <class 'str'>, 'IsomericSMILES': <class 'str'>, 'IsotopeAtomCount': <class 'int'>, 'MolecularFormula': <bound method Formula.from_string of <class 'chemistry_tools.formulae.formula.Formula'>>, 'MolecularWeight': <class 'float'>, 'MonoisotopicMass': <class 'float'>, 'RotatableBondCount': <class 'int'>, 'TPSA': <class 'float'>, 'UndefinedAtomStereoCount': <class 'int'>, 'UndefinedBondStereoCount': <class 'int'>, 'Volume3D': <class 'str'>, 'XLogP': <class 'float'>, 'XStericQuadrupole3D': <class 'float'>, 'YStericQuadrupole3D': <class 'float'>, 'ZStericQuadrupole3D': <class 'float'>} -
Properties for PubChem REST API