
This package has the following additional requirements:


These can be installed as follows:

python -m pip install chemistry-tools[pubchem]

Functions and classes to access properties of compounds in the PubChem database.



Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes


Properties for PubChem REST API


PropData(name, description, type, attr_name)

Metadata about a property.

PubChemProperty(label[, name, value, dtype, …])

Represents a property parsed from the full PubChem record.



Coerce properties into a list of strings and exclude any invalid properties, or raise a ValueError if that is not possible.

get_properties(identifier[, properties, …])

Returns the requested properties for the compound with the given identifier.

get_property(identifier[, property, namespace])

Returns the requested property for the compound with the given identifier.


Parse raw data from the property endpoint of the REST API.

rest_get_properties(identifier[, namespace, …])

Returns the properties for the compound with the given identifier in the desired format.

rest_get_properties_json(identifier[, …])

Returns the properties for the compound with the given identifier as a dictionary.

PROPERTY_MAP = {'atom_stereo_count': 'AtomStereoCount', 'bond_stereo_count': 'BondStereoCount', 'canonical_smiles': 'CanonicalSMILES', 'charge': 'Charge', 'complexity': 'Complexity', 'conformer_count_3d': 'ConformerCount3D', 'conformer_model_rmsd_3d': 'ConformerModelRMSD3D', 'covalent_unit_count': 'CovalentUnitCount', 'defined_atom_stereo_count': 'DefinedAtomStereoCount', 'defined_bond_stereo_count': 'DefinedBondStereoCount', 'effective_rotor_count_3d': 'EffectiveRotorCount3D', 'exact_mass': 'ExactMass', 'feature_acceptor_count_3d': 'FeatureAcceptorCount3D', 'feature_anion_count_3d': 'FeatureAnionCount3D', 'feature_cation_count_3d': 'FeatureCationCount3D', 'feature_count_3d': 'FeatureCount3D', 'feature_donor_count_3d': 'FeatureDonorCount3D', 'feature_hydrophobe_count_3d': 'FeatureHydrophobeCount3D', 'feature_ring_count_3d': 'FeatureRingCount3D', 'fingerprint_2d': 'Fingerprint2D', 'h_bond_acceptor_count': 'HBondAcceptorCount', 'h_bond_donor_count': 'HBondDonorCount', 'heavy_atom_count': 'HeavyAtomCount', 'inchi': 'InChI', 'inchikey': 'InChIKey', 'isomeric_smiles': 'IsomericSMILES', 'isotope_atom_count': 'IsotopeAtomCount', 'iupac_name': 'IUPACName', 'molecular_formula': 'MolecularFormula', 'molecular_weight': 'MolecularWeight', 'monoisotopic_mass': 'MonoisotopicMass', 'rotatable_bond_count': 'RotatableBondCount', 'tpsa': 'TPSA', 'undefined_atom_stereo_count': 'UndefinedAtomStereoCount', 'undefined_bond_stereo_count': 'UndefinedBondStereoCount', 'volume3d': 'Volume3D', 'volume_3d': 'XStericQuadrupole3D', 'x_steric_quadrupole_3d': 'YStericQuadrupole3D', 'xlogp': 'XLogP', 'y_steric_quadrupole_3d': 'ZStericQuadrupole3D'}

Type:    Dict[str, str]

Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes

namedtuple PropData(name, description, type, attr_name)[source]

Bases: NamedTuple

Metadata about a property.

  1.  name (str) – The name of the property.

  2.  description (str) – The description of the property.

  3.  type (Callable) – The type of the property.

  4.  attr_name (str) – The Python attribute name of the property in a chemistry_tools.pubchem.compound.Compound.


Return a nicely formatted representation string

namedtuple PubChemProperty(label, name=None, value=None, dtype=None, source=None)[source]

Bases: NamedTuple

Represents a property parsed from the full PubChem record.

  1.  label (str) – The label of the property.

  2.  name (str) – The name of the property.

  3.  value (Any) – The property’s value.

  4.  dtype (Callable) – The data type property’s value.

  5.  source (Dict) – Dictionary of property sources.


Coerce properties into a list of strings and exclude any invalid properties, or raise a ValueError if that is not possible.


properties (Union[str, Iterable[str]])

Return type


get_properties(identifier, properties='', namespace=<PubChemNamespace.Name: 'name'>, as_dataframe=False)[source]

Returns the requested properties for the compound with the given identifier. As more than one compound may be identified the results are returned in a list.

  • identifier (Union[str, int, Sequence[Union[str, int]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.

  • properties (Union[Sequence[str], str]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default ''.

  • namespace (Union[PubChemNamespace, str]) – The type of identifier to look up. Valid values are in PubChemNamespace. Default <PubChemNamespace.Name: 'name'>.

  • as_dataframe (bool) – Automatically extract the properties into a pandas DataFrame. Default False.

  • ValueError – If the response body does not contain valid JSON.

  • NotFoundError – If the compound with the requested identifier was not found in PubChem.

Return type

Union[List[Dict[str, Any]], DataFrame]


List of dictionaries mapping properties to values

get_property(identifier, property='', namespace=<PubChemNamespace.Name: 'name'>)[source]

Returns the requested property for the compound with the given identifier.

This convenience function only allows for a single property to be accessed at once, and for only a single compound. if you require multiple properties and/or properties for multiple compounds use, which helps reduce the burden on the PubChem servers.

  • ValueError – If the response body does not contain valid JSON.

  • NotFoundError – If the compound with the requested identifier was not found in PubChem.

Return type



The requested property. Type depends on the property requested.


Parse raw data from the property endpoint of the REST API.


property_data (Dict)

Return type



A list of dictionaries mapping the properties to values for each compound

rest_get_properties(identifier, namespace=<PubChemNamespace.Name: 'name'>, properties='', format_=<PubChemFormats.CSV: 'CSV'>)[source]

Returns the properties for the compound with the given identifier in the desired format.

  • identifier (Union[str, int, Sequence[Union[str, int]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.

  • namespace – The type of identifier to look up. Valid values are in PubChemNamespace. Default <PubChemNamespace.Name: 'name'>.

  • properties (Union[Sequence[str], str]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default ''.

  • format_ (Union[PubChemFormats, str]) – The format to obtain the data in. Default <PubChemFormats.CSV: 'CSV'>.

rest_get_properties_json(identifier, namespace=<PubChemNamespace.Name: 'name'>, properties='', **kwargs)[source]

Returns the properties for the compound with the given identifier as a dictionary.

  • identifier (Union[str, int, Sequence[Union[str, int]]]) – Identifiers (e.g. name, CID) for the compound to look up. When using the CID namespace data for multiple compounds can be retrieved at once by supplying either a comma-separated string or a list.

  • namespace – The type of identifier to look up. Valid values are in PubChemNamespace. Default <PubChemNamespace.Name: 'name'>.

  • properties (Union[Sequence[str], str]) – The properties to retrieve for the compound. Can be either a comma-separated string or a list. See the table at the start of this chapter for a list of valid properties. Default ''.

  • kwargs – Optional arguments that json.loads takes.


ValueError – If the response body does not contain valid JSON.

Return type



Parsed JSON data

valid_properties = {'AtomStereoCount': <class 'int'>, 'BondStereoCount': <class 'int'>, 'CanonicalSMILES': <class 'str'>, 'Charge': <class 'int'>, 'Complexity': <class 'float'>, 'ConformerCount3D': <class 'int'>, 'ConformerModelRMSD3D': <class 'float'>, 'CovalentUnitCount': <class 'int'>, 'DefinedAtomStereoCount': <class 'int'>, 'DefinedBondStereoCount': <class 'int'>, 'EffectiveRotorCount3D': <class 'int'>, 'ExactMass': <class 'float'>, 'FeatureAcceptorCount3D': <class 'int'>, 'FeatureAnionCount3D': <class 'int'>, 'FeatureCationCount3D': <class 'int'>, 'FeatureCount3D': <class 'int'>, 'FeatureDonorCount3D': <class 'int'>, 'FeatureHydrophobeCount3D': <class 'int'>, 'FeatureRingCount3D': <class 'int'>, 'Fingerprint2D': <class 'str'>, 'HBondAcceptorCount': <class 'int'>, 'HBondDonorCount': <class 'int'>, 'HeavyAtomCount': <class 'int'>, 'IUPACName': <class 'str'>, 'InChI': <class 'str'>, 'InChIKey': <class 'str'>, 'IsomericSMILES': <class 'str'>, 'IsotopeAtomCount': <class 'int'>, 'MolecularFormula': <bound method Formula.from_string of <class 'chemistry_tools.formulae.formula.Formula'>>, 'MolecularWeight': <class 'float'>, 'MonoisotopicMass': <class 'float'>, 'RotatableBondCount': <class 'int'>, 'TPSA': <class 'float'>, 'UndefinedAtomStereoCount': <class 'int'>, 'UndefinedBondStereoCount': <class 'int'>, 'Volume3D': <class 'str'>, 'XLogP': <class 'float'>, 'XStericQuadrupole3D': <class 'float'>, 'YStericQuadrupole3D': <class 'float'>, 'ZStericQuadrupole3D': <class 'float'>}

Type:    Dict[str, Callable]

Properties for PubChem REST API